public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
@ 2019-01-01  0:00 Jessica Yu
  2019-01-01  0:00 ` Matthias Maennich via libabigail
  2019-01-01  0:00 ` Jessica Yu
  0 siblings, 2 replies; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: libabigail; +Cc: Dodji Seketeli, Matthias Maennich, Jessica Yu

As described in commit ad8c2531fb9, the format of the Linux kernel
ksymtab changed in v4.19 to use relative references instead of absolute
references. This changes the type of relocations emitted for ksymtab
sections to be place-relative 32-bit relocations instead of absolute
relocations. One side-effect of this is that libdwfl will not relocate
the ksymtab sections due to the PC-relative relocations. This breaks
load_kernel_symbol_table() for kernel modules because it only reads in
zeros from the unrelocated ksymtab section and is subsequently unable to
determine what exported symbols it refers to. Since a vmlinux binary is
already fully linked and relocated (and therefore we can read its
ksymtab section just fine), this problem is only relevant to Linux
kernel modules.

To work around this, we utilize the ksymtab relocation sections to
determine which symbols the ksymtab entries refer to. We do this by
inspecting each relocation's r_info field for the symbol table index and
from there we are able to read each symbol's value and subsequently add
that to the set of exported symbols.

In addition, for Linux kernel modules, we can utilize relocation types
to implement a new heuristic to determine the ksymtab format we have.
The presence of PC-relative relocations suggest the new v4.19 format,
and absolute relocation types suggest the old pre v4.19 format.

         * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
         symbol value and shndx parameters.
         (elf_symbol::{get_value, get_shndx}): Declare new accessors.
         * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
         (elf_symbol::priv::priv): Adjust.
         (elf_symbol::elf_symbol): Take new value and shndx parameters.
         (elf_symbol::create): Likewise.
         (elf_symbol::{get_value, get_shndx}): Define new accessors.
         * src/abg-reader.cc (build_elf_symbol): Adjust.
         * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
         (lookup_symbol_from_gnu_hash_tab)
         (lookup_symbol_from_symtab): Adjust.
         (read_context::{ksymtab_reloc_section_,
         ksymtab_gpl_reloc_section_,
         ksymtab_strings_section_}): New data members.
         (read_context::read_context): Initialize ksymtab_reloc_section_,
         ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
         (read_context::{find_ksymtab_reloc_section,
         find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
         find_any_ksymtab_reloc_section, get_ksymtab_format_module,
         populate_symbol_map_from_ksymtab,
         populate_symbol_map_from_ksymtab_reloc,
         is_linux_kernel_module}): New member functions.
         (read_context::load_kernel_symbol_table): Adjust to call either
         populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
         format.
         (read_context::get_ksymtab_format): Adjust to call
         get_ksymtab_format_module for linux kernel modules.
         (read_context::lookup_elf_symbol_from_index): Adjust.
         (create_default_var_sym, create_default_fn_sym): Adjust.

Signed-off-by: Jessica Yu <jeyu@kernel.org>
---
 include/abg-ir.h        |  10 ++
 src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
 src/abg-ir.cc           |  38 ++++-
 src/abg-reader.cc       |   2 +
 4 files changed, 388 insertions(+), 53 deletions(-)

diff --git a/include/abg-ir.h b/include/abg-ir.h
index 170a844..d26f472 100644
--- a/include/abg-ir.h
+++ b/include/abg-ir.h
@@ -796,6 +796,8 @@ private:
   elf_symbol(const environment* e,
 	     size_t		i,
 	     size_t		s,
+	     uint64_t		v,
+	     uint32_t		x,
 	     const string&	n,
 	     type		t,
 	     binding		b,
@@ -818,6 +820,8 @@ public:
   create(const environment*	e,
 	 size_t		i,
 	 size_t		s,
+	 uint64_t	v,
+	 uint32_t	x,
 	 const string&		n,
 	 type			t,
 	 binding		b,
@@ -838,6 +842,12 @@ public:
   void
   set_index(size_t);
 
+  uint64_t
+  get_value() const;
+
+  uint32_t
+  get_shndx() const;
+
   const string&
   get_name() const;
 
diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 47af7e3..835bf3d 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
   GElf_Sym symbol;
   const char* sym_name_str;
   size_t sym_size;
+  uint64_t sym_value;
+  uint32_t sym_shndx;
   elf_symbol::type sym_type;
   elf_symbol::binding sym_binding;
   elf_symbol::visibility sym_visibility;
@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
 	  sym_visibility =
 	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
 	  sym_size = symbol.st_size;
+	  sym_value = symbol.st_value;
+	  sym_shndx = symbol.st_shndx;
 	  elf_symbol::version ver;
 	  if (get_version_for_symbol(elf_handle, symbol_index,
 				     /*get_def_version=*/true, ver))
@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
 	    elf_symbol::create(env,
 			       symbol_index,
 			       sym_size,
+			       sym_value,
+			       sym_shndx,
 			       sym_name_str,
 			       sym_type,
 			       sym_binding,
@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
 	    ABG_ASSERT(!ver.str().empty());
 
 	  elf_symbol_sptr symbol_found =
-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
+			       symbol.st_shndx, sym_name_str,
 			       sym_type, sym_binding,
 			       symbol.st_shndx != SHN_UNDEF,
 			       symbol.st_shndx == SHN_COMMON,
@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
 	    ABG_ASSERT(!ver.str().empty());
 	  elf_symbol_sptr symbol_found =
 	    elf_symbol::create(env, i, sym->st_size,
+			       sym->st_value, sym->st_shndx,
 			       name_str, sym_type,
 			       sym_binding, sym_is_defined,
 			       sym_is_common, ver, sym_visibility);
@@ -3081,7 +3089,10 @@ public:
   /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
   /// linux kernel.
   Elf_Scn*			ksymtab_section_;
+  Elf_Scn*			ksymtab_reloc_section_;
   Elf_Scn*			ksymtab_gpl_section_;
+  Elf_Scn*			ksymtab_gpl_reloc_section_;
+  Elf_Scn*			ksymtab_strings_section_;
   Elf_Scn*			versym_section_;
   Elf_Scn*			verdef_section_;
   Elf_Scn*			verneed_section_;
@@ -3277,7 +3288,10 @@ public:
     nb_ksymtab_entries_ = 0;
     nb_ksymtab_gpl_entries_ = 0;
     ksymtab_section_ = 0;
+    ksymtab_reloc_section_ = 0;
     ksymtab_gpl_section_ = 0;
+    ksymtab_gpl_reloc_section_ = 0;
+    ksymtab_strings_section_ = 0;
     versym_section_ = 0;
     verdef_section_ = 0;
     verneed_section_ = 0;
@@ -6079,6 +6093,23 @@ public:
     return ksymtab_section_;
   }
 
+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
+  /// a vmlinux binary or a kernel module).
+  ///
+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
+  Elf_Scn*
+  find_ksymtab_reloc_section() const
+  {
+    if (!ksymtab_reloc_section_)
+      {
+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
+        if (!sec)
+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
+      }
+    return ksymtab_reloc_section_;
+  }
+
   /// Return the __ksymtab_gpl section of a linux kernel ELF file
   /// (either a vmlinux binary or a kernel module).
   ///
@@ -6092,6 +6123,36 @@ public:
     return ksymtab_gpl_section_;
   }
 
+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
+  /// (either a vmlinux binary or a kernel module).
+  ///
+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
+  Elf_Scn*
+  find_ksymtab_gpl_reloc_section() const
+  {
+    if (!ksymtab_gpl_reloc_section_)
+      {
+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
+        if (!sec)
+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
+      }
+    return ksymtab_gpl_reloc_section_;
+  }
+
+  /// Return the __ksymtab_strings section of a linux kernel ELF file
+  /// (either a vmlinux binary or a kernel module).
+  ///
+  /// @return the __ksymtab_strings section if found, nil otherwise.
+  Elf_Scn*
+  find_ksymtab_strings_section() const
+  {
+    if (!ksymtab_strings_section_)
+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
+    return ksymtab_strings_section_;
+  }
+
   /// Return either a __ksymtab or a __ksymtab_gpl section, in case
   /// only the __ksymtab_gpl exists.
   ///
@@ -6106,6 +6167,19 @@ public:
     return result;
   }
 
+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
+  ///
+  /// @return the .rel{a,}__ksymtab section if it exists, or the
+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
+  Elf_Scn*
+  find_any_ksymtab_reloc_section() const
+  {
+    Elf_Scn *result = find_ksymtab_reloc_section();
+    if (!result)
+      result = find_ksymtab_gpl_reloc_section();
+    return result;
+  }
+
   /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
   /// sections that are involved in symbol versionning.
   ///
@@ -6283,7 +6357,8 @@ public:
       stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
 
     elf_symbol_sptr sym =
-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
+      elf_symbol::create(env(), symbol_index, s->st_size,
+			 s->st_value, s->st_shndx, name_str,
 			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
 			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
 			 sym_is_defined, sym_is_common, ver, vis);
@@ -7435,6 +7510,64 @@ public:
     return symbol;
   }
 
+  /// Determine the format of the __ksymtab and __ksymtab_gpl
+  /// sections of Linux kernel modules.
+  ///
+  /// This is important because we need the know the format of these
+  /// sections to be able to read from them.
+  ///
+  /// @return the format the __ksymtab[_gpl] sections.
+  enum ksymtab_format
+  get_ksymtab_format_module() const
+  {
+    Elf_Scn *section = find_any_ksymtab_reloc_section();
+
+    if (!section)
+      return UNDEFINED_KSYMTAB_FORMAT;
+
+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
+    // knows how to and it will zero a relocation after applying it. If the
+    // .rela__ksymtab* section contained only simple (absolute) relocations,
+    // they will have been all applied and sh_size will be 0. For arches that
+    // support relative ksymtabs, simple relocations only appear in pre-4.19
+    // kernel modules.
+    GElf_Shdr section_mem;
+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
+    if (section_shdr->sh_size == 0)
+      return PRE_V4_19_KSYMTAB_FORMAT;
+    /* SHT_REL sections not supported yet */
+    if (section_shdr->sh_type == SHT_REL)
+      return UNDEFINED_KSYMTAB_FORMAT;
+
+    // If we still have a normal non-zeroed relocation section, we can guess
+    // what format the ksymtab is in depending on what types of relocs it
+    // contains.
+    GElf_Rela rela;
+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
+    gelf_getrela(rela_section_data, 0, &rela);
+
+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
+    // supported on x86 and arm64 as of v4.19.
+    ksymtab_format format;
+    switch (GELF_R_TYPE(rela.r_info))
+      {
+      case R_X86_64_64: // Same as R_386_32
+      case R_AARCH64_ABS64:
+        format = PRE_V4_19_KSYMTAB_FORMAT;
+	break;
+      case R_X86_64_PC32: // Same as R_386_PC32
+      case R_AARCH64_PREL32:
+	format = V4_19_KSYMTAB_FORMAT;
+	break;
+      default:
+        format = UNDEFINED_KSYMTAB_FORMAT;
+	break;
+      }
+    return format;
+  }
+
   /// Determine the format of the __ksymtab and __ksymtab_gpl
   /// sections.
   ///
@@ -7451,6 +7584,18 @@ public:
       {
 	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
 	  {
+            // Since Linux kernel modules are relocatable, we can first try
+            // using a heuristic based on relocations to guess the ksymtab format.
+            if (is_linux_kernel_module())
+              {
+                ksymtab_format_ = get_ksymtab_format_module();
+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
+                  return ksymtab_format_;
+              }
+
+	    // If it's not a kernel module or we couldn't determine its format
+	    // with relocations, fall back to the heuristic below.
+
 	    // OK this is a dirty little heuristic to determine the
 	    // format of the ksymtab section.
 	    //
@@ -7553,48 +7698,24 @@ public:
     return nb_ksymtab_gpl_entries_;
   }
 
-  /// Load a given kernel symbol table.
+  /// Populate the symbol map by reading exported symbols from the
+  /// ksymtab directly.
   ///
-  /// One can thus retrieve the resulting symbols by using the
-  /// accessors read_context::linux_exported_fn_syms(),
-  /// read_context::linux_exported_var_syms(),
-  /// read_context::linux_exported_gpl_fn_syms(), or
-  /// read_context::linux_exported_gpl_var_syms().
+  /// @param section the ksymtab section to read from
   ///
-  /// @param kind the kind of kernel symbol table to load.
+  /// @param exported_fns_set the set of exported functions
+  ///
+  /// @param exported_vars_set the set of exported variables
+  ///
+  /// @param nb_entries the number of ksymtab entries to read
   ///
   /// @return true upon successful completion, false otherwise.
   bool
-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
+                                   address_set_sptr exported_fns_set,
+                                   address_set_sptr exported_vars_set,
+                                   size_t nb_entries)
   {
-    size_t nb_entries = 0;
-    Elf_Scn *section = 0;
-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
-
-    switch (kind)
-      {
-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
-	break;
-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
-	section = find_ksymtab_section();
-	nb_entries = get_nb_ksymtab_entries();
-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
-	break;
-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
-	section = find_ksymtab_gpl_section();
-	nb_entries = get_nb_ksymtab_gpl_entries();
-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
-	break;
-      }
-
-    if (!linux_exported_vars_set
-	|| !linux_exported_fns_set
-	|| !section
-	|| !nb_entries)
-      return false;
-
     // The data of the section.
     Elf_Data *elf_data = elf_rawdata(section, 0);
 
@@ -7654,28 +7775,30 @@ public:
 	// (aka ELF symbol table).
 	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
 	if (!symbol)
-	  {
-	    adjusted_symbol_address =
-	      maybe_adjust_var_sym_address(symbol_address);
-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
-	    if (!symbol)
-	      // This must be a symbol that is of type neither FUNC
-	      // (function) nor OBJECT (variable).  There are for intance,
-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
-	      // am not sure what those are.
-	      continue;
-	  }
+          {
+            adjusted_symbol_address =
+              maybe_adjust_var_sym_address(symbol_address);
+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
+            if (!symbol)
+              {
+	        // This must be a symbol that is of type neither FUNC
+	        // (function) nor OBJECT (variable).  There are for intance,
+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
+	        // am not sure what those are.
+	        continue;
+	      }
+          }
 
 	address_set_sptr set;
 	if (symbol->is_function())
 	  {
 	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
-	    set = linux_exported_fns_set;
+	    set = exported_fns_set;
 	  }
 	else if (symbol->is_variable())
 	  {
 	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
-	    set = linux_exported_vars_set;
+	    set = exported_vars_set;
 	  }
 	else
 	  ABG_ASSERT_NOT_REACHED;
@@ -7684,6 +7807,157 @@ public:
     return true;
   }
 
+  /// Populate the symbol map by extracting the exported symbols from a
+  /// ksymtab rela section.
+  ///
+  /// @param section the ksymtab section to read from
+  ///
+  /// @param strings_section the __ksymtab_strings section
+  ///
+  /// @param exported_fns_set the set of exported functions
+  ///
+  /// @param exported_vars_set the set of exported variables
+  ///
+  /// @return true upon successful completion, false otherwise.
+  bool
+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
+                                         address_set_sptr exported_fns_set,
+                                         address_set_sptr exported_vars_set)
+  {
+    GElf_Shdr reloc_section_mem;
+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
+
+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
+    size_t strings_ndx = elf_ndxscn(strings_section);
+
+    GElf_Rela rela;
+    elf_symbol_sptr symbol;
+    for (unsigned int i = 0; i < reloc_count; i++)
+      {
+        gelf_getrela(reloc_section_data, i, &rela);
+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
+
+        // This shouldn't happen..
+        if (!symbol)
+          return false;
+
+        // If the symbol is from __ksymtab_strings, ignore it.
+        if (symbol->get_shndx() == strings_ndx)
+          continue;
+
+	if (!symbol->is_function() && !symbol->is_variable())
+          {
+            if (do_log())
+              {
+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
+                  cerr << "skipping NOTYPE symbol "
+                       << symbol->get_name()
+                       << " shndx: "
+                       << symbol->get_index()
+                       << " @"
+                       << elf_path()
+                       << "\n";
+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
+                  cerr << "skipping SECTION symbol "
+                       << "shndx: "
+                       << symbol->get_index()
+                       << " @"
+                       << elf_path()
+                       << "\n";
+               }
+            continue;
+	  }
+
+	address_set_sptr set;
+	if (symbol->is_function())
+	  {
+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
+	    set = exported_fns_set;
+	  }
+	else if (symbol->is_variable())
+	  {
+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
+	    set = exported_vars_set;
+	  }
+	else
+	  ABG_ASSERT_NOT_REACHED;
+	set->insert(symbol->get_value());
+      }
+    return true;
+  }
+
+  /// Load a given kernel symbol table.
+  ///
+  /// One can thus retrieve the resulting symbols by using the
+  /// accessors read_context::linux_exported_fn_syms(),
+  /// read_context::linux_exported_var_syms(),
+  /// read_context::linux_exported_gpl_fn_syms(), or
+  /// read_context::linux_exported_gpl_var_syms().
+  ///
+  /// @param kind the kind of kernel symbol table to load.
+  ///
+  /// @return true upon successful completion, false otherwise.
+  bool
+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
+  {
+    size_t nb_entries = 0;
+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
+
+    switch (kind)
+      {
+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
+	break;
+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
+	section = find_ksymtab_section();
+	reloc_section = find_ksymtab_reloc_section();
+	nb_entries = get_nb_ksymtab_entries();
+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
+	break;
+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
+	section = find_ksymtab_gpl_section();
+	reloc_section = find_ksymtab_gpl_reloc_section();
+	nb_entries = get_nb_ksymtab_gpl_entries();
+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
+	break;
+      }
+
+    strings_section = find_ksymtab_strings_section();
+
+    if (!linux_exported_vars_set
+	|| !linux_exported_fns_set
+	|| !section
+	|| !strings_section
+	|| !nb_entries)
+      return false;
+
+    GElf_Ehdr eh_mem;
+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
+    ksymtab_format format = get_ksymtab_format();
+
+    // Although pre-v4.19 kernel modules can have a relocation section for the
+    // __ksymtab section, libdwfl zeroes the rela section after applying
+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
+    // apply those relocations and we're safe to read the relocation section to
+    // determine which exported symbols are in the ksymtab.
+    //
+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
+    // supported as of now.
+    if (!reloc_section
+	|| format == PRE_V4_19_KSYMTAB_FORMAT
+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
+                                              linux_exported_vars_set, nb_entries);
+    else
+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
+                                                    linux_exported_fns_set,
+                                                    linux_exported_vars_set);
+  }
+
   /// Load the special __ksymtab section. This is for linux kernel
   /// (module) files.
   ///
@@ -8352,6 +8626,17 @@ public:
   is_linux_kernel_binary() const
   {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
 
+  /// Guess if the current binary is a Linux Kernel module.
+  ///
+  /// To guess that, the function looks for the presence of the
+  /// special ".modinfo" section in the binary.
+  ///
+  bool
+  is_linux_kernel_module() const
+  {
+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
+  }
+
   /// Getter of the "show_stats" flag.
   ///
   /// This flag tells if we should emit statistics about various
@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
     elf_symbol::create(env,
 		       /*symbol index=*/ 0,
 		       /*symbol size=*/ 0,
+		       /*symbol value=*/ 0,
+		       /*symbol shndx=*/ 0,
 		       sym_name,
 		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
 		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
     elf_symbol::create(env,
 		       /*symbol index=*/ 0,
 		       /*symbol size=*/ 0,
+		       /*symbol value=*/ 0,
+		       /*symbol shndx=*/ 0,
 		       sym_name,
 		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
 		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index a5243c3..ad1c261 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
   const environment*	env_;
   size_t		index_;
   size_t		size_;
+  uint64_t		value_;
+  uint32_t		shndx_;
   string		name_;
   elf_symbol::type	type_;
   elf_symbol::binding	binding_;
@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
     : env_(),
       index_(),
       size_(),
+      value_(),
+      shndx_(),
       type_(elf_symbol::NOTYPE_TYPE),
       binding_(elf_symbol::GLOBAL_BINDING),
       visibility_(elf_symbol::DEFAULT_VISIBILITY),
@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
   priv(const environment*		e,
        size_t				i,
        size_t				s,
+       uint64_t				v,
+       uint32_t				x,
        const string&			n,
        elf_symbol::type		t,
        elf_symbol::binding		b,
@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
     : env_(e),
       index_(i),
       size_(s),
+      value_(v),
+      shndx_(x),
       name_(n),
       type_(t),
       binding_(b),
@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
 ///
 /// @param s the size of the symbol.
 ///
+/// @param v the value of the symbol.
+///
+/// @param x the section header index (shndx) of the symbol.
+///
 /// @param n the name of the symbol.
 ///
 /// @param t the type of the symbol.
@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
 elf_symbol::elf_symbol(const environment*	e,
 		       size_t			i,
 		       size_t			s,
+		       uint64_t			v,
+		       uint32_t			x,
 		       const string&		n,
 		       type			t,
 		       binding			b,
@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
 		       bool			c,
 		       const version&		ve,
 		       visibility		vi)
-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
 {}
 
 /// Factory of instances of @ref elf_symbol.
@@ -1319,6 +1333,10 @@ elf_symbol::create()
 ///
 /// @param s the size of the symbol.
 ///
+/// @param v the value of the symbol.
+///
+/// @param x the section header index (shndx) of the symbol.
+///
 /// @param n the name of the symbol.
 ///
 /// @param t the type of the symbol.
@@ -1339,6 +1357,8 @@ elf_symbol_sptr
 elf_symbol::create(const environment*	e,
 		   size_t		i,
 		   size_t		s,
+		   uint64_t		v,
+		   uint32_t		x,
 		   const string&	n,
 		   type		t,
 		   binding		b,
@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
 		   const version&	ve,
 		   visibility		vi)
 {
-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
   sym->priv_->main_symbol_ = sym;
   return sym;
 }
@@ -1411,6 +1431,20 @@ void
 elf_symbol::set_index(size_t s)
 {priv_->index_ = s;}
 
+/// Getter for the symbol value.
+///
+/// @return the value of the symbol.
+uint64_t
+elf_symbol::get_value() const
+{return priv_->value_;}
+
+/// Getter for the symbol section header index.
+///
+/// @return the section header index of the symbol.
+uint32_t
+elf_symbol::get_shndx() const
+{return priv_->shndx_;}
+
 /// Getter for the name of the @ref elf_symbol.
 ///
 /// @return a reference to the name of the @ref symbol.
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index 810a65f..3c481ad 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
 
   const environment* env = ctxt.get_environment();
   elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
+					 /*value=*/0,
+					 /*shndx=*/0,
 					 name, type, binding,
 					 is_defined, is_common,
 					 version, visibility);
-- 
2.16.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00 ` Matthias Maennich via libabigail
  2019-01-01  0:00   ` Jessica Yu
  2019-01-01  0:00   ` Jessica Yu
@ 2019-01-01  0:00   ` Jessica Yu
  2019-01-01  0:00     ` Matthias Maennich via libabigail
  2 siblings, 1 reply; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, Dodji Seketeli, kernel-team

+++ Matthias Maennich [19/08/19 09:05 +0100]:
>Hi Jessica!
>
>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>ksymtab changed in v4.19 to use relative references instead of absolute
>>references. This changes the type of relocations emitted for ksymtab
>>sections to be place-relative 32-bit relocations instead of absolute
>>relocations. One side-effect of this is that libdwfl will not relocate
>>the ksymtab sections due to the PC-relative relocations. This breaks
>>load_kernel_symbol_table() for kernel modules because it only reads in
>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>determine what exported symbols it refers to. Since a vmlinux binary is
>>already fully linked and relocated (and therefore we can read its
>>ksymtab section just fine), this problem is only relevant to Linux
>>kernel modules.
>>
>>To work around this, we utilize the ksymtab relocation sections to
>>determine which symbols the ksymtab entries refer to. We do this by
>>inspecting each relocation's r_info field for the symbol table index and
>>from there we are able to read each symbol's value and subsequently add
>>that to the set of exported symbols.
>>
>>In addition, for Linux kernel modules, we can utilize relocation types
>>to implement a new heuristic to determine the ksymtab format we have.
>>The presence of PC-relative relocations suggest the new v4.19 format,
>>and absolute relocation types suggest the old pre v4.19 format.
>>
>>        * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>        symbol value and shndx parameters.
>>        (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>        * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>        (elf_symbol::priv::priv): Adjust.
>>        (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>        (elf_symbol::create): Likewise.
>>        (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>        * src/abg-reader.cc (build_elf_symbol): Adjust.
>>        * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>        (lookup_symbol_from_gnu_hash_tab)
>>        (lookup_symbol_from_symtab): Adjust.
>>        (read_context::{ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_,
>>        ksymtab_strings_section_}): New data members.
>>        (read_context::read_context): Initialize ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>        (read_context::{find_ksymtab_reloc_section,
>>        find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>        find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>        populate_symbol_map_from_ksymtab,
>>        populate_symbol_map_from_ksymtab_reloc,
>>        is_linux_kernel_module}): New member functions.
>>        (read_context::load_kernel_symbol_table): Adjust to call either
>>        populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>        format.
>>        (read_context::get_ksymtab_format): Adjust to call
>>        get_ksymtab_format_module for linux kernel modules.
>>        (read_context::lookup_elf_symbol_from_index): Adjust.
>>        (create_default_var_sym, create_default_fn_sym): Adjust.
>>
>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>---
>>include/abg-ir.h        |  10 ++
>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>src/abg-ir.cc           |  38 ++++-
>>src/abg-reader.cc       |   2 +
>>4 files changed, 388 insertions(+), 53 deletions(-)
>>
>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>index 170a844..d26f472 100644
>>--- a/include/abg-ir.h
>>+++ b/include/abg-ir.h
>>@@ -796,6 +796,8 @@ private:
>>  elf_symbol(const environment* e,
>>	     size_t		i,
>>	     size_t		s,
>>+	     uint64_t		v,
>>+	     uint32_t		x,
>
>I would like to see some comments here to describe the purpose of the
>struct's members. I know there weren't any before, but maybe that is a
>good chance.
>
>Also, I am not entirely sure why these are all one-character names, but
>it makes it hard to come from x to "section header index". Maybe we
>should give them more meaningful names.
>
>>	     const string&	n,
>>	     type		t,
>>	     binding		b,
>>@@ -818,6 +820,8 @@ public:
>>  create(const environment*	e,
>>	 size_t		i,
>>	 size_t		s,
>>+	 uint64_t	v,
>>+	 uint32_t	x,
>>	 const string&		n,
>>	 type			t,
>>	 binding		b,
>>@@ -838,6 +842,12 @@ public:
>>  void
>>  set_index(size_t);
>>
>>+  uint64_t
>>+  get_value() const;
>>+
>>+  uint32_t
>>+  get_shndx() const;
>>+
>>  const string&
>>  get_name() const;
>>
>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>index 47af7e3..835bf3d 100644
>>--- a/src/abg-dwarf-reader.cc
>>+++ b/src/abg-dwarf-reader.cc
>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>  GElf_Sym symbol;
>>  const char* sym_name_str;
>>  size_t sym_size;
>>+  uint64_t sym_value;
>>+  uint32_t sym_shndx;
>>  elf_symbol::type sym_type;
>>  elf_symbol::binding sym_binding;
>>  elf_symbol::visibility sym_visibility;
>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	  sym_visibility =
>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>	  sym_size = symbol.st_size;
>>+	  sym_value = symbol.st_value;
>>+	  sym_shndx = symbol.st_shndx;
>>	  elf_symbol::version ver;
>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>				     /*get_def_version=*/true, ver))
>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	    elf_symbol::create(env,
>>			       symbol_index,
>>			       sym_size,
>>+			       sym_value,
>>+			       sym_shndx,
>>			       sym_name_str,
>>			       sym_type,
>>			       sym_binding,
>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>
>>	  elf_symbol_sptr symbol_found =
>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>+			       symbol.st_shndx, sym_name_str,
>>			       sym_type, sym_binding,
>>			       symbol.st_shndx != SHN_UNDEF,
>>			       symbol.st_shndx == SHN_COMMON,
>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>	  elf_symbol_sptr symbol_found =
>>	    elf_symbol::create(env, i, sym->st_size,
>>+			       sym->st_value, sym->st_shndx,
>>			       name_str, sym_type,
>>			       sym_binding, sym_is_defined,
>>			       sym_is_common, ver, sym_visibility);
>>@@ -3081,7 +3089,10 @@ public:
>>  /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>  /// linux kernel.
>>  Elf_Scn*			ksymtab_section_;
>>+  Elf_Scn*			ksymtab_reloc_section_;
>>  Elf_Scn*			ksymtab_gpl_section_;
>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>+  Elf_Scn*			ksymtab_strings_section_;
>>  Elf_Scn*			versym_section_;
>>  Elf_Scn*			verdef_section_;
>>  Elf_Scn*			verneed_section_;
>>@@ -3277,7 +3288,10 @@ public:
>>    nb_ksymtab_entries_ = 0;
>>    nb_ksymtab_gpl_entries_ = 0;
>>    ksymtab_section_ = 0;
>>+    ksymtab_reloc_section_ = 0;
>>    ksymtab_gpl_section_ = 0;
>>+    ksymtab_gpl_reloc_section_ = 0;
>>+    ksymtab_strings_section_ = 0;
>>    versym_section_ = 0;
>>    verdef_section_ = 0;
>>    verneed_section_ = 0;
>>@@ -6079,6 +6093,23 @@ public:
>>    return ksymtab_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>+  /// a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_reloc_section() const
>>+  {
>>+    if (!ksymtab_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_reloc_section_;
>>+  }
>>+
>>  /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>  /// (either a vmlinux binary or a kernel module).
>>  ///
>>@@ -6092,6 +6123,36 @@ public:
>>    return ksymtab_gpl_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_gpl_reloc_section() const
>>+  {
>>+    if (!ksymtab_gpl_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_gpl_reloc_section_;
>>+  }
>>+
>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_strings_section() const
>>+  {
>>+    if (!ksymtab_strings_section_)
>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>+    return ksymtab_strings_section_;
>>+  }
>>+
>
>We should find a way to deduplicate these functions. Not necessarily in
>this patch, but in a followup. Most of them only differ in the section
>name and whether they check for an alternative. Something generic like
>
> struct SectionQuery{ string name, int type };
> Elf_Scn* find_section(const vector<SectionQuery>&);
>
>should be sufficient to forward from all the other variants to one
>implementation.
>
>>  /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>  /// only the __ksymtab_gpl exists.
>>  ///
>>@@ -6106,6 +6167,19 @@ public:
>>    return result;
>>  }
>>
>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>+  Elf_Scn*
>>+  find_any_ksymtab_reloc_section() const
>>+  {
>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>+    if (!result)
>>+      result = find_ksymtab_gpl_reloc_section();
>>+    return result;
>>+  }
>>+
>>  /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>  /// sections that are involved in symbol versionning.
>>  ///
>>@@ -6283,7 +6357,8 @@ public:
>>      stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>
>>    elf_symbol_sptr sym =
>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>+			 s->st_value, s->st_shndx, name_str,
>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>			 sym_is_defined, sym_is_common, ver, vis);
>>@@ -7435,6 +7510,64 @@ public:
>>    return symbol;
>>  }
>>
>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>+  /// sections of Linux kernel modules.
>>+  ///
>>+  /// This is important because we need the know the format of these
>
>s/need the/need to/                       ^^^
>
>>+  /// sections to be able to read from them.
>>+  ///
>>+  /// @return the format the __ksymtab[_gpl] sections.
>>+  enum ksymtab_format
>>+  get_ksymtab_format_module() const
>>+  {
>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>+
>>+    if (!section)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>+    // knows how to and it will zero a relocation after applying it. If the
>
>will zero the relocation info after
>
>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>+    // they will have been all applied and sh_size will be 0. For arches that
>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>+    // kernel modules.
>>+    GElf_Shdr section_mem;
>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>+    if (section_shdr->sh_size == 0)
>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>+    /* SHT_REL sections not supported yet */
>>+    if (section_shdr->sh_type == SHT_REL)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>+    // what format the ksymtab is in depending on what types of relocs it
>>+    // contains.
>>+    GElf_Rela rela;
>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>+    gelf_getrela(rela_section_data, 0, &rela);
>>+
>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>+    // supported on x86 and arm64 as of v4.19.
>>+    ksymtab_format format;
>>+    switch (GELF_R_TYPE(rela.r_info))
>>+      {
>>+      case R_X86_64_64: // Same as R_386_32
>
>add 'fallthrough' to the comments, else younger compilers will warn
>about the potentially unintentional fallthrough.
>
>>+      case R_AARCH64_ABS64:
>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>+      case R_AARCH64_PREL32:
>>+	format = V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      default:
>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>
>We should probably emit a warning in this case. Are there more
>architectures that have PREL32 enabled as of now that are not yet
>covered here?

On second thought, I don't think we should actually warn *here* at
this point, inside get_ksymtab_format_module(), because then we would
have to include/check for reloc types for all arches that libabigail
supports.

For instance if we have a ppc64 module, we would hit the default case
and warn immediately, but instead we should be falling back to
try_reading_first_ksymtab_entry_using_pre_v4_19_format(), which
gets called when get_ksymtab_format_module() returns
UNDEFINED_KSYMTAB_FORMAT. We should warn when we have already
exhausted all methods of determining the ksymtab format, which is what
the current code does. The current sequence of calls is (1) if it's a
module, see if we recognize any of the special PREL32 reloc types that
we know of. if not, try method (2), call try_reading_first_ksymtab_entry_using_pre_v4_19_format(),
then try method (3), try_reading_first_ksymtab_entry_using_v4_19_format(),
then finally do ABG_ASSERT when all fails. I hope I'm making sense..
Does that sound reasonable?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00     ` Matthias Maennich via libabigail
@ 2019-01-01  0:00       ` Jessica Yu
  0 siblings, 0 replies; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, Dodji Seketeli, Android Kernel Team

+++ Matthias Maennich [12/09/19 11:01 +0100]:
>On Tue, Sep 03, 2019 at 04:23:37PM +0200, Jessica Yu wrote:
>>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>>Hi Jessica!
>>>
>>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>>references. This changes the type of relocations emitted for ksymtab
>>>>sections to be place-relative 32-bit relocations instead of absolute
>>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>>already fully linked and relocated (and therefore we can read its
>>>>ksymtab section just fine), this problem is only relevant to Linux
>>>>kernel modules.
>>>>
>>>>To work around this, we utilize the ksymtab relocation sections to
>>>>determine which symbols the ksymtab entries refer to. We do this by
>>>>inspecting each relocation's r_info field for the symbol table index and
>>>>from there we are able to read each symbol's value and subsequently add
>>>>that to the set of exported symbols.
>>>>
>>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>>to implement a new heuristic to determine the ksymtab format we have.
>>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>>and absolute relocation types suggest the old pre v4.19 format.
>>>>
>>>>      * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>>      symbol value and shndx parameters.
>>>>      (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>>      * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>>      (elf_symbol::priv::priv): Adjust.
>>>>      (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>>      (elf_symbol::create): Likewise.
>>>>      (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>>      * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>>      * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>>      (lookup_symbol_from_gnu_hash_tab)
>>>>      (lookup_symbol_from_symtab): Adjust.
>>>>      (read_context::{ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_,
>>>>      ksymtab_strings_section_}): New data members.
>>>>      (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>>      (read_context::{find_ksymtab_reloc_section,
>>>>      find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>>      find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>>      populate_symbol_map_from_ksymtab,
>>>>      populate_symbol_map_from_ksymtab_reloc,
>>>>      is_linux_kernel_module}): New member functions.
>>>>      (read_context::load_kernel_symbol_table): Adjust to call either
>>>>      populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>>      format.
>>>>      (read_context::get_ksymtab_format): Adjust to call
>>>>      get_ksymtab_format_module for linux kernel modules.
>>>>      (read_context::lookup_elf_symbol_from_index): Adjust.
>>>>      (create_default_var_sym, create_default_fn_sym): Adjust.
>>>>
>>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>>---
>>>>include/abg-ir.h        |  10 ++
>>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>>src/abg-ir.cc           |  38 ++++-
>>>>src/abg-reader.cc       |   2 +
>>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>>
>>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>>index 170a844..d26f472 100644
>>>>--- a/include/abg-ir.h
>>>>+++ b/include/abg-ir.h
>>>>@@ -796,6 +796,8 @@ private:
>>>>elf_symbol(const environment* e,
>>>>	     size_t		i,
>>>>	     size_t		s,
>>>>+	     uint64_t		v,
>>>>+	     uint32_t		x,
>>>
>>>I would like to see some comments here to describe the purpose of the
>>>struct's members. I know there weren't any before, but maybe that is a
>>>good chance.
>>>
>>>Also, I am not entirely sure why these are all one-character names, but
>>>it makes it hard to come from x to "section header index". Maybe we
>>>should give them more meaningful names.
>>>
>>>>	     const string&	n,
>>>>	     type		t,
>>>>	     binding		b,
>>>>@@ -818,6 +820,8 @@ public:
>>>>create(const environment*	e,
>>>>	 size_t		i,
>>>>	 size_t		s,
>>>>+	 uint64_t	v,
>>>>+	 uint32_t	x,
>>>>	 const string&		n,
>>>>	 type			t,
>>>>	 binding		b,
>>>>@@ -838,6 +842,12 @@ public:
>>>>void
>>>>set_index(size_t);
>>>>
>>>>+  uint64_t
>>>>+  get_value() const;
>>>>+
>>>>+  uint32_t
>>>>+  get_shndx() const;
>>>>+
>>>>const string&
>>>>get_name() const;
>>>>
>>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>>index 47af7e3..835bf3d 100644
>>>>--- a/src/abg-dwarf-reader.cc
>>>>+++ b/src/abg-dwarf-reader.cc
>>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>GElf_Sym symbol;
>>>>const char* sym_name_str;
>>>>size_t sym_size;
>>>>+  uint64_t sym_value;
>>>>+  uint32_t sym_shndx;
>>>>elf_symbol::type sym_type;
>>>>elf_symbol::binding sym_binding;
>>>>elf_symbol::visibility sym_visibility;
>>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	  sym_visibility =
>>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>>	  sym_size = symbol.st_size;
>>>>+	  sym_value = symbol.st_value;
>>>>+	  sym_shndx = symbol.st_shndx;
>>>>	  elf_symbol::version ver;
>>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>>				     /*get_def_version=*/true, ver))
>>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	    elf_symbol::create(env,
>>>>			       symbol_index,
>>>>			       sym_size,
>>>>+			       sym_value,
>>>>+			       sym_shndx,
>>>>			       sym_name_str,
>>>>			       sym_type,
>>>>			       sym_binding,
>>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>
>>>>	  elf_symbol_sptr symbol_found =
>>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>>+			       symbol.st_shndx, sym_name_str,
>>>>			       sym_type, sym_binding,
>>>>			       symbol.st_shndx != SHN_UNDEF,
>>>>			       symbol.st_shndx == SHN_COMMON,
>>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>	  elf_symbol_sptr symbol_found =
>>>>	    elf_symbol::create(env, i, sym->st_size,
>>>>+			       sym->st_value, sym->st_shndx,
>>>>			       name_str, sym_type,
>>>>			       sym_binding, sym_is_defined,
>>>>			       sym_is_common, ver, sym_visibility);
>>>>@@ -3081,7 +3089,10 @@ public:
>>>>/// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>>>/// linux kernel.
>>>>Elf_Scn*			ksymtab_section_;
>>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>>>Elf_Scn*			ksymtab_gpl_section_;
>>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>>+  Elf_Scn*			ksymtab_strings_section_;
>>>>Elf_Scn*			versym_section_;
>>>>Elf_Scn*			verdef_section_;
>>>>Elf_Scn*			verneed_section_;
>>>>@@ -3277,7 +3288,10 @@ public:
>>>>  nb_ksymtab_entries_ = 0;
>>>>  nb_ksymtab_gpl_entries_ = 0;
>>>>  ksymtab_section_ = 0;
>>>>+    ksymtab_reloc_section_ = 0;
>>>>  ksymtab_gpl_section_ = 0;
>>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>>+    ksymtab_strings_section_ = 0;
>>>>  versym_section_ = 0;
>>>>  verdef_section_ = 0;
>>>>  verneed_section_ = 0;
>>>>@@ -6079,6 +6093,23 @@ public:
>>>>  return ksymtab_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>>+  /// a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_reloc_section_;
>>>>+  }
>>>>+
>>>>/// Return the __ksymtab_gpl section of a linux kernel ELF file
>>>>/// (either a vmlinux binary or a kernel module).
>>>>///
>>>>@@ -6092,6 +6123,36 @@ public:
>>>>  return ksymtab_gpl_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_gpl_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_gpl_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_gpl_reloc_section_;
>>>>+  }
>>>>+
>>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_strings_section() const
>>>>+  {
>>>>+    if (!ksymtab_strings_section_)
>>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>>+    return ksymtab_strings_section_;
>>>>+  }
>>>>+
>>>
>>>We should find a way to deduplicate these functions. Not necessarily in
>>>this patch, but in a followup. Most of them only differ in the section
>>>name and whether they check for an alternative. Something generic like
>>>
>>>struct SectionQuery{ string name, int type };
>>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>>
>>>should be sufficient to forward from all the other variants to one
>>>implementation.
>>>
>>>>/// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>>>/// only the __ksymtab_gpl exists.
>>>>///
>>>>@@ -6106,6 +6167,19 @@ public:
>>>>  return result;
>>>>}
>>>>
>>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>>+  Elf_Scn*
>>>>+  find_any_ksymtab_reloc_section() const
>>>>+  {
>>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>>+    if (!result)
>>>>+      result = find_ksymtab_gpl_reloc_section();
>>>>+    return result;
>>>>+  }
>>>>+
>>>>/// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>>>/// sections that are involved in symbol versionning.
>>>>///
>>>>@@ -6283,7 +6357,8 @@ public:
>>>>    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>>
>>>>  elf_symbol_sptr sym =
>>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>>+			 s->st_value, s->st_shndx, name_str,
>>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>>@@ -7435,6 +7510,64 @@ public:
>>>>  return symbol;
>>>>}
>>>>
>>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>+  /// sections of Linux kernel modules.
>>>>+  ///
>>>>+  /// This is important because we need the know the format of these
>>>
>>>s/need the/need to/                       ^^^
>>>
>>>>+  /// sections to be able to read from them.
>>>>+  ///
>>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>>+  enum ksymtab_format
>>>>+  get_ksymtab_format_module() const
>>>>+  {
>>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>>+
>>>>+    if (!section)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>>+    // knows how to and it will zero a relocation after applying it. If the
>>>
>>>will zero the relocation info after
>>>
>>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>>+    // kernel modules.
>>>>+    GElf_Shdr section_mem;
>>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>>+    if (section_shdr->sh_size == 0)
>>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>>+    /* SHT_REL sections not supported yet */
>>>>+    if (section_shdr->sh_type == SHT_REL)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>>+    // contains.
>>>>+    GElf_Rela rela;
>>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>>+
>>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>>+    // supported on x86 and arm64 as of v4.19.
>>>>+    ksymtab_format format;
>>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>>+      {
>>>>+      case R_X86_64_64: // Same as R_386_32
>>>
>>>add 'fallthrough' to the comments, else younger compilers will warn
>>>about the potentially unintentional fallthrough.
>>>
>>>>+      case R_AARCH64_ABS64:
>>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>>+      case R_AARCH64_PREL32:
>>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      default:
>>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>>
>>>We should probably emit a warning in this case. Are there more
>>>architectures that have PREL32 enabled as of now that are not yet
>>>covered here?
>>
>>On second thought, I don't think we should actually warn *here* at
>>this point, inside get_ksymtab_format_module(), because then we would
>>have to include/check for reloc types for all arches that libabigail
>>supports.
>>
>>For instance if we have a ppc64 module, we would hit the default case
>>and warn immediately, but instead we should be falling back to
>>try_reading_first_ksymtab_entry_using_pre_v4_19_format(), which
>>gets called when get_ksymtab_format_module() returns
>>UNDEFINED_KSYMTAB_FORMAT. We should warn when we have already
>>exhausted all methods of determining the ksymtab format, which is what
>>the current code does. The current sequence of calls is (1) if it's a
>>module, see if we recognize any of the special PREL32 reloc types that
>>we know of. if not, try method (2), call try_reading_first_ksymtab_entry_using_pre_v4_19_format(),
>>then try method (3), try_reading_first_ksymtab_entry_using_v4_19_format(),
>>then finally do ABG_ASSERT when all fails. I hope I'm making sense..
>>Does that sound reasonable?
>
>Hi Jessica,
>
>I think we are talking about the same thing. All I want to prevent is a
>misleading fallback. So, yes, we should try all known methods and should
>only warn if we are not sure whether the result we came up is actually
>correct. Which again leads also to the question: Do you think it is
>reasonable to have something in the kernel binaries to declare the
>ksymtab format? In a separate section or a magic symbol? Something to
>make it easier (and safer) for tools like libabigail?

Yes, I think that would be reasonable, although I have not yet thought
of a non-intrusive/kabi-compatible way of expressing that yet. Perhaps
we could stick a magic string in __ksymtab_strings or a separate
section like you suggest, maybe named .ksymtab_format with a magic
value indicating the type. Although with the merge window at our heels
it might have to be for 5.5, but that's plenty of time to come up with
something at least.

Thanks,

Jessica

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00 ` Matthias Maennich via libabigail
@ 2019-01-01  0:00   ` Jessica Yu
  2019-01-01  0:00     ` Matthias Maennich via libabigail
  2019-01-01  0:00   ` Jessica Yu
  2019-01-01  0:00   ` Jessica Yu
  2 siblings, 1 reply; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, Dodji Seketeli, kernel-team

+++ Matthias Maennich [19/08/19 09:05 +0100]:
>Hi Jessica!
>
>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>ksymtab changed in v4.19 to use relative references instead of absolute
>>references. This changes the type of relocations emitted for ksymtab
>>sections to be place-relative 32-bit relocations instead of absolute
>>relocations. One side-effect of this is that libdwfl will not relocate
>>the ksymtab sections due to the PC-relative relocations. This breaks
>>load_kernel_symbol_table() for kernel modules because it only reads in
>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>determine what exported symbols it refers to. Since a vmlinux binary is
>>already fully linked and relocated (and therefore we can read its
>>ksymtab section just fine), this problem is only relevant to Linux
>>kernel modules.
>>
>>To work around this, we utilize the ksymtab relocation sections to
>>determine which symbols the ksymtab entries refer to. We do this by
>>inspecting each relocation's r_info field for the symbol table index and
>>from there we are able to read each symbol's value and subsequently add
>>that to the set of exported symbols.
>>
>>In addition, for Linux kernel modules, we can utilize relocation types
>>to implement a new heuristic to determine the ksymtab format we have.
>>The presence of PC-relative relocations suggest the new v4.19 format,
>>and absolute relocation types suggest the old pre v4.19 format.
>>
>>        * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>        symbol value and shndx parameters.
>>        (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>        * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>        (elf_symbol::priv::priv): Adjust.
>>        (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>        (elf_symbol::create): Likewise.
>>        (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>        * src/abg-reader.cc (build_elf_symbol): Adjust.
>>        * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>        (lookup_symbol_from_gnu_hash_tab)
>>        (lookup_symbol_from_symtab): Adjust.
>>        (read_context::{ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_,
>>        ksymtab_strings_section_}): New data members.
>>        (read_context::read_context): Initialize ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>        (read_context::{find_ksymtab_reloc_section,
>>        find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>        find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>        populate_symbol_map_from_ksymtab,
>>        populate_symbol_map_from_ksymtab_reloc,
>>        is_linux_kernel_module}): New member functions.
>>        (read_context::load_kernel_symbol_table): Adjust to call either
>>        populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>        format.
>>        (read_context::get_ksymtab_format): Adjust to call
>>        get_ksymtab_format_module for linux kernel modules.
>>        (read_context::lookup_elf_symbol_from_index): Adjust.
>>        (create_default_var_sym, create_default_fn_sym): Adjust.
>>
>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>---
>>include/abg-ir.h        |  10 ++
>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>src/abg-ir.cc           |  38 ++++-
>>src/abg-reader.cc       |   2 +
>>4 files changed, 388 insertions(+), 53 deletions(-)
>>
>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>index 170a844..d26f472 100644
>>--- a/include/abg-ir.h
>>+++ b/include/abg-ir.h
>>@@ -796,6 +796,8 @@ private:
>>  elf_symbol(const environment* e,
>>	     size_t		i,
>>	     size_t		s,
>>+	     uint64_t		v,
>>+	     uint32_t		x,
>
>I would like to see some comments here to describe the purpose of the
>struct's members. I know there weren't any before, but maybe that is a
>good chance.
>
>Also, I am not entirely sure why these are all one-character names, but
>it makes it hard to come from x to "section header index". Maybe we
>should give them more meaningful names.

Hi Matthias!

The parameters are documented in src/abg-ir.cc, which is why I did not
include it here. Maybe for section header index and value we can use
shndx and val if x and v are not descriptive.

>
>>	     const string&	n,
>>	     type		t,
>>	     binding		b,
>>@@ -818,6 +820,8 @@ public:
>>  create(const environment*	e,
>>	 size_t		i,
>>	 size_t		s,
>>+	 uint64_t	v,
>>+	 uint32_t	x,
>>	 const string&		n,
>>	 type			t,
>>	 binding		b,
>>@@ -838,6 +842,12 @@ public:
>>  void
>>  set_index(size_t);
>>
>>+  uint64_t
>>+  get_value() const;
>>+
>>+  uint32_t
>>+  get_shndx() const;
>>+
>>  const string&
>>  get_name() const;
>>
>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>index 47af7e3..835bf3d 100644
>>--- a/src/abg-dwarf-reader.cc
>>+++ b/src/abg-dwarf-reader.cc
>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>  GElf_Sym symbol;
>>  const char* sym_name_str;
>>  size_t sym_size;
>>+  uint64_t sym_value;
>>+  uint32_t sym_shndx;
>>  elf_symbol::type sym_type;
>>  elf_symbol::binding sym_binding;
>>  elf_symbol::visibility sym_visibility;
>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	  sym_visibility =
>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>	  sym_size = symbol.st_size;
>>+	  sym_value = symbol.st_value;
>>+	  sym_shndx = symbol.st_shndx;
>>	  elf_symbol::version ver;
>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>				     /*get_def_version=*/true, ver))
>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	    elf_symbol::create(env,
>>			       symbol_index,
>>			       sym_size,
>>+			       sym_value,
>>+			       sym_shndx,
>>			       sym_name_str,
>>			       sym_type,
>>			       sym_binding,
>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>
>>	  elf_symbol_sptr symbol_found =
>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>+			       symbol.st_shndx, sym_name_str,
>>			       sym_type, sym_binding,
>>			       symbol.st_shndx != SHN_UNDEF,
>>			       symbol.st_shndx == SHN_COMMON,
>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>	  elf_symbol_sptr symbol_found =
>>	    elf_symbol::create(env, i, sym->st_size,
>>+			       sym->st_value, sym->st_shndx,
>>			       name_str, sym_type,
>>			       sym_binding, sym_is_defined,
>>			       sym_is_common, ver, sym_visibility);
>>@@ -3081,7 +3089,10 @@ public:
>>  /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>  /// linux kernel.
>>  Elf_Scn*			ksymtab_section_;
>>+  Elf_Scn*			ksymtab_reloc_section_;
>>  Elf_Scn*			ksymtab_gpl_section_;
>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>+  Elf_Scn*			ksymtab_strings_section_;
>>  Elf_Scn*			versym_section_;
>>  Elf_Scn*			verdef_section_;
>>  Elf_Scn*			verneed_section_;
>>@@ -3277,7 +3288,10 @@ public:
>>    nb_ksymtab_entries_ = 0;
>>    nb_ksymtab_gpl_entries_ = 0;
>>    ksymtab_section_ = 0;
>>+    ksymtab_reloc_section_ = 0;
>>    ksymtab_gpl_section_ = 0;
>>+    ksymtab_gpl_reloc_section_ = 0;
>>+    ksymtab_strings_section_ = 0;
>>    versym_section_ = 0;
>>    verdef_section_ = 0;
>>    verneed_section_ = 0;
>>@@ -6079,6 +6093,23 @@ public:
>>    return ksymtab_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>+  /// a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_reloc_section() const
>>+  {
>>+    if (!ksymtab_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_reloc_section_;
>>+  }
>>+
>>  /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>  /// (either a vmlinux binary or a kernel module).
>>  ///
>>@@ -6092,6 +6123,36 @@ public:
>>    return ksymtab_gpl_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_gpl_reloc_section() const
>>+  {
>>+    if (!ksymtab_gpl_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_gpl_reloc_section_;
>>+  }
>>+
>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_strings_section() const
>>+  {
>>+    if (!ksymtab_strings_section_)
>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>+    return ksymtab_strings_section_;
>>+  }
>>+
>
>We should find a way to deduplicate these functions. Not necessarily in
>this patch, but in a followup. Most of them only differ in the section
>name and whether they check for an alternative. Something generic like
>
> struct SectionQuery{ string name, int type };
> Elf_Scn* find_section(const vector<SectionQuery>&);
>
>should be sufficient to forward from all the other variants to one
>implementation.
>
>>  /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>  /// only the __ksymtab_gpl exists.
>>  ///
>>@@ -6106,6 +6167,19 @@ public:
>>    return result;
>>  }
>>
>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>+  Elf_Scn*
>>+  find_any_ksymtab_reloc_section() const
>>+  {
>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>+    if (!result)
>>+      result = find_ksymtab_gpl_reloc_section();
>>+    return result;
>>+  }
>>+
>>  /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>  /// sections that are involved in symbol versionning.
>>  ///
>>@@ -6283,7 +6357,8 @@ public:
>>      stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>
>>    elf_symbol_sptr sym =
>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>+			 s->st_value, s->st_shndx, name_str,
>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>			 sym_is_defined, sym_is_common, ver, vis);
>>@@ -7435,6 +7510,64 @@ public:
>>    return symbol;
>>  }
>>
>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>+  /// sections of Linux kernel modules.
>>+  ///
>>+  /// This is important because we need the know the format of these
>
>s/need the/need to/                       ^^^
>
>>+  /// sections to be able to read from them.
>>+  ///
>>+  /// @return the format the __ksymtab[_gpl] sections.
>>+  enum ksymtab_format
>>+  get_ksymtab_format_module() const
>>+  {
>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>+
>>+    if (!section)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>+    // knows how to and it will zero a relocation after applying it. If the
>
>will zero the relocation info after
>
>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>+    // they will have been all applied and sh_size will be 0. For arches that
>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>+    // kernel modules.
>>+    GElf_Shdr section_mem;
>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>+    if (section_shdr->sh_size == 0)
>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>+    /* SHT_REL sections not supported yet */
>>+    if (section_shdr->sh_type == SHT_REL)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>+    // what format the ksymtab is in depending on what types of relocs it
>>+    // contains.
>>+    GElf_Rela rela;
>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>+    gelf_getrela(rela_section_data, 0, &rela);
>>+
>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>+    // supported on x86 and arm64 as of v4.19.
>>+    ksymtab_format format;
>>+    switch (GELF_R_TYPE(rela.r_info))
>>+      {
>>+      case R_X86_64_64: // Same as R_386_32
>
>add 'fallthrough' to the comments, else younger compilers will warn
>about the potentially unintentional fallthrough.
>
>>+      case R_AARCH64_ABS64:
>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>+      case R_AARCH64_PREL32:
>>+	format = V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      default:
>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>
>We should probably emit a warning in this case. Are there more
>architectures that have PREL32 enabled as of now that are not yet
>covered here?

Just double checked here, as of v5.3-rc6, only x86/x86_64 and arm64
have HAVE_ARCH_PREL32_RELOCATIONS. I can put an ABG_ASSERT there to
warn us of potentially new ksymtab formats.

>>+	break;
>>+      }
>>+    return format;
>>+  }
>>+
>>  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>  /// sections.
>>  ///
>>@@ -7451,6 +7584,18 @@ public:
>>      {
>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>	  {
>>+            // Since Linux kernel modules are relocatable, we can first try
>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>+            if (is_linux_kernel_module())
>>+              {
>>+                ksymtab_format_ = get_ksymtab_format_module();
>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>+                  return ksymtab_format_;
>>+              }
>
>nit: use tabs

Just wondering, is there an official style guide documented anywhere?
I know there is .clang-format, but a lot of the files don't seem to
follow all of it (at least when it comes to tabs and spaces, it seems
inconsistent so I wasn't sure to follow the existing style of the
file, which seemed to favor spaces, or to switch to purely tabs).
Sometimes there is a tab first and then spaces to get to the right
indentation level, sometimes there's just spaces, etc.

>>+
>>+	    // If it's not a kernel module or we couldn't determine its format
>>+	    // with relocations, fall back to the heuristic below.
>>+
>>	    // OK this is a dirty little heuristic to determine the
>>	    // format of the ksymtab section.
>>	    //
>>@@ -7553,48 +7698,24 @@ public:
>>    return nb_ksymtab_gpl_entries_;
>>  }
>>
>>-  /// Load a given kernel symbol table.
>>+  /// Populate the symbol map by reading exported symbols from the
>>+  /// ksymtab directly.
>>  ///
>>-  /// One can thus retrieve the resulting symbols by using the
>>-  /// accessors read_context::linux_exported_fn_syms(),
>>-  /// read_context::linux_exported_var_syms(),
>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>-  /// read_context::linux_exported_gpl_var_syms().
>>+  /// @param section the ksymtab section to read from
>>  ///
>>-  /// @param kind the kind of kernel symbol table to load.
>>+  /// @param exported_fns_set the set of exported functions
>>+  ///
>>+  /// @param exported_vars_set the set of exported variables
>>+  ///
>>+  /// @param nb_entries the number of ksymtab entries to read
>>  ///
>>  /// @return true upon successful completion, false otherwise.
>>  bool
>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>+                                   address_set_sptr exported_fns_set,
>>+                                   address_set_sptr exported_vars_set,
>>+                                   size_t nb_entries)
>>  {
>>-    size_t nb_entries = 0;
>>-    Elf_Scn *section = 0;
>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>-
>>-    switch (kind)
>>-      {
>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>-	break;
>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>-	section = find_ksymtab_section();
>>-	nb_entries = get_nb_ksymtab_entries();
>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>-	break;
>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>-	section = find_ksymtab_gpl_section();
>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>-	break;
>>-      }
>>-
>>-    if (!linux_exported_vars_set
>>-	|| !linux_exported_fns_set
>>-	|| !section
>>-	|| !nb_entries)
>>-      return false;
>>-
>>    // The data of the section.
>>    Elf_Data *elf_data = elf_rawdata(section, 0);
>>
>>@@ -7654,28 +7775,30 @@ public:
>>	// (aka ELF symbol table).
>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>	if (!symbol)
>>-	  {
>>-	    adjusted_symbol_address =
>>-	      maybe_adjust_var_sym_address(symbol_address);
>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>-	    if (!symbol)
>>-	      // This must be a symbol that is of type neither FUNC
>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>-	      // am not sure what those are.
>>-	      continue;
>>-	  }
>>+          {
>>+            adjusted_symbol_address =
>>+              maybe_adjust_var_sym_address(symbol_address);
>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>+            if (!symbol)
>>+              {
>>+	        // This must be a symbol that is of type neither FUNC
>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>+	        // am not sure what those are.
>>+	        continue;
>>+	      }
>>+          }
>
>This part of the diff seems to a change purely of the indentation/tabs.
>Maybe you can revert this part.

Yes you are right. Reverted!

>>
>>	address_set_sptr set;
>>	if (symbol->is_function())
>>	  {
>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>-	    set = linux_exported_fns_set;
>>+	    set = exported_fns_set;
>>	  }
>>	else if (symbol->is_variable())
>>	  {
>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>-	    set = linux_exported_vars_set;
>>+	    set = exported_vars_set;
>>	  }
>>	else
>>	  ABG_ASSERT_NOT_REACHED;
>>@@ -7684,6 +7807,157 @@ public:
>>    return true;
>>  }
>>
>>+  /// Populate the symbol map by extracting the exported symbols from a
>>+  /// ksymtab rela section.
>>+  ///
>>+  /// @param section the ksymtab section to read from
>>+  ///
>>+  /// @param strings_section the __ksymtab_strings section
>>+  ///
>>+  /// @param exported_fns_set the set of exported functions
>>+  ///
>>+  /// @param exported_vars_set the set of exported variables
>>+  ///
>>+  /// @return true upon successful completion, false otherwise.
>>+  bool
>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>+                                         address_set_sptr exported_fns_set,
>>+                                         address_set_sptr exported_vars_set)
>>+  {
>>+    GElf_Shdr reloc_section_mem;
>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>
>gelf_getshdr returns &reloc_section_mem. So you might want to use
>reloc_section_mem directly rather than introducing reloc_section_mem.

Do you mean get rid of the reloc_section_shdr pointer and just use
&reloc_section_mem? I just followed the convention for gelf_getshdr()
invocations in other places to be consistent.

>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>+
>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>+
>>+    GElf_Rela rela;
>>+    elf_symbol_sptr symbol;
>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>+      {
>>+        gelf_getrela(reloc_section_data, i, &rela);
>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>+
>>+        // This shouldn't happen..
>>+        if (!symbol)
>
>Maybe ABG_ASSERT then?

Good idea!

>>+          return false;
>>+
>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>+        if (symbol->get_shndx() == strings_ndx)
>>+          continue;
>>+
>>+	if (!symbol->is_function() && !symbol->is_variable())
>>+          {
>>+            if (do_log())
>>+              {
>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>+                  cerr << "skipping NOTYPE symbol "
>>+                       << symbol->get_name()
>>+                       << " shndx: "
>>+                       << symbol->get_index()
>>+                       << " @"
>>+                       << elf_path()
>>+                       << "\n";
>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>+                  cerr << "skipping SECTION symbol "
>>+                       << "shndx: "
>>+                       << symbol->get_index()
>>+                       << " @"
>>+                       << elf_path()
>>+                       << "\n";
>>+               }
>>+            continue;
>>+	  }
>>+
>>+	address_set_sptr set;
>>+	if (symbol->is_function())
>>+	  {
>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>+	    set = exported_fns_set;
>>+	  }
>>+	else if (symbol->is_variable())
>>+	  {
>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>+	    set = exported_vars_set;
>>+	  }
>>+	else
>>+	  ABG_ASSERT_NOT_REACHED;
>>+	set->insert(symbol->get_value());
>>+      }
>>+    return true;
>>+  }
>>+
>>+  /// Load a given kernel symbol table.
>>+  ///
>>+  /// One can thus retrieve the resulting symbols by using the
>>+  /// accessors read_context::linux_exported_fn_syms(),
>>+  /// read_context::linux_exported_var_syms(),
>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>+  /// read_context::linux_exported_gpl_var_syms().
>>+  ///
>>+  /// @param kind the kind of kernel symbol table to load.
>>+  ///
>>+  /// @return true upon successful completion, false otherwise.
>>+  bool
>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>+  {
>>+    size_t nb_entries = 0;
>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>+
>>+    switch (kind)
>>+      {
>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>+	break;
>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>+	section = find_ksymtab_section();
>>+	reloc_section = find_ksymtab_reloc_section();
>>+	nb_entries = get_nb_ksymtab_entries();
>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>+	break;
>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>+	section = find_ksymtab_gpl_section();
>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>+	break;
>>+      }
>>+
>>+    strings_section = find_ksymtab_strings_section();
>>+
>>+    if (!linux_exported_vars_set
>>+	|| !linux_exported_fns_set
>>+	|| !section
>>+	|| !strings_section
>>+	|| !nb_entries)
>>+      return false;
>>+
>>+    GElf_Ehdr eh_mem;
>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>+    ksymtab_format format = get_ksymtab_format();
>>+
>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>+    // apply those relocations and we're safe to read the relocation section to
>>+    // determine which exported symbols are in the ksymtab.
>>+    //
>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>+    // supported as of now.
>>+    if (!reloc_section
>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>+                                              linux_exported_vars_set, nb_entries);
>>+    else
>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>+                                                    linux_exported_fns_set,
>>+                                                    linux_exported_vars_set);
>>+  }
>>+
>>  /// Load the special __ksymtab section. This is for linux kernel
>>  /// (module) files.
>>  ///
>>@@ -8352,6 +8626,17 @@ public:
>>  is_linux_kernel_binary() const
>>  {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>
>>+  /// Guess if the current binary is a Linux Kernel module.
>>+  ///
>>+  /// To guess that, the function looks for the presence of the
>>+  /// special ".modinfo" section in the binary.
>>+  ///
>>+  bool
>>+  is_linux_kernel_module() const
>>+  {
>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>+  }
>>+
>
>Also refer to this proposal to restrict checking for a module a bit
>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>
>>  /// Getter of the "show_stats" flag.
>>  ///
>>  /// This flag tells if we should emit statistics about various
>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>    elf_symbol::create(env,
>>		       /*symbol index=*/ 0,
>>		       /*symbol size=*/ 0,
>>+		       /*symbol value=*/ 0,
>>+		       /*symbol shndx=*/ 0,
>>		       sym_name,
>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>    elf_symbol::create(env,
>>		       /*symbol index=*/ 0,
>>		       /*symbol size=*/ 0,
>>+		       /*symbol value=*/ 0,
>>+		       /*symbol shndx=*/ 0,
>>		       sym_name,
>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>index a5243c3..ad1c261 100644
>>--- a/src/abg-ir.cc
>>+++ b/src/abg-ir.cc
>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>  const environment*	env_;
>>  size_t		index_;
>>  size_t		size_;
>>+  uint64_t		value_;
>>+  uint32_t		shndx_;
>>  string		name_;
>>  elf_symbol::type	type_;
>>  elf_symbol::binding	binding_;
>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>    : env_(),
>>      index_(),
>>      size_(),
>>+      value_(),
>>+      shndx_(),
>>      type_(elf_symbol::NOTYPE_TYPE),
>>      binding_(elf_symbol::GLOBAL_BINDING),
>>      visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>  priv(const environment*		e,
>>       size_t				i,
>>       size_t				s,
>>+       uint64_t				v,
>>+       uint32_t				x,
>>       const string&			n,
>>       elf_symbol::type		t,
>>       elf_symbol::binding		b,
>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>    : env_(e),
>>      index_(i),
>>      size_(s),
>>+      value_(v),
>>+      shndx_(x),
>>      name_(n),
>>      type_(t),
>>      binding_(b),
>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>///
>>/// @param s the size of the symbol.
>>///
>>+/// @param v the value of the symbol.
>>+///
>>+/// @param x the section header index (shndx) of the symbol.
>>+///
>>/// @param n the name of the symbol.
>>///
>>/// @param t the type of the symbol.
>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>elf_symbol::elf_symbol(const environment*	e,
>>		       size_t			i,
>>		       size_t			s,
>>+		       uint64_t			v,
>>+		       uint32_t			x,
>>		       const string&		n,
>>		       type			t,
>>		       binding			b,
>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>		       bool			c,
>>		       const version&		ve,
>>		       visibility		vi)
>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>{}
>>
>>/// Factory of instances of @ref elf_symbol.
>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>///
>>/// @param s the size of the symbol.
>>///
>>+/// @param v the value of the symbol.
>>+///
>>+/// @param x the section header index (shndx) of the symbol.
>>+///
>>/// @param n the name of the symbol.
>>///
>>/// @param t the type of the symbol.
>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>elf_symbol::create(const environment*	e,
>>		   size_t		i,
>>		   size_t		s,
>>+		   uint64_t		v,
>>+		   uint32_t		x,
>>		   const string&	n,
>>		   type		t,
>>		   binding		b,
>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>		   const version&	ve,
>>		   visibility		vi)
>>{
>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>  sym->priv_->main_symbol_ = sym;
>>  return sym;
>>}
>>@@ -1411,6 +1431,20 @@ void
>>elf_symbol::set_index(size_t s)
>>{priv_->index_ = s;}
>>
>>+/// Getter for the symbol value.
>>+///
>>+/// @return the value of the symbol.
>>+uint64_t
>>+elf_symbol::get_value() const
>>+{return priv_->value_;}
>>+
>>+/// Getter for the symbol section header index.
>>+///
>>+/// @return the section header index of the symbol.
>>+uint32_t
>>+elf_symbol::get_shndx() const
>>+{return priv_->shndx_;}
>>+
>>/// Getter for the name of the @ref elf_symbol.
>>///
>>/// @return a reference to the name of the @ref symbol.
>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>index 810a65f..3c481ad 100644
>>--- a/src/abg-reader.cc
>>+++ b/src/abg-reader.cc
>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>
>>  const environment* env = ctxt.get_environment();
>>  elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>+					 /*value=*/0,
>>+					 /*shndx=*/0,
>>					 name, type, binding,
>>					 is_defined, is_common,
>>					 version, visibility);
>>-- 
>>2.16.4
>>
>
>Nice work! Thank you for that!
>The patch looks in general good to me. I am a bit concerned about the
>platform support. There are some corner cases that need to be covered.
>See the testing below.
>
>You might find the .clang-format file useful to match the coding style
>at some points (especially tabs vs spaces). Although when reviewing the
>patch, I had a bit trouble with unchanged but reformatted lines as they
>were confusing my diff. Maybe you can revert the changes that were
>purely for formatting and we take care of formatting in a later patch.
>
>I conducted some testing. I choose a minimal configuration with modules:
>
> $ export ARCH=<arch>
> $ make allnoconfig
> $ ./scripts/config   \
>     -e DEBUG_INFO    \  # dwarf info
>     -e MODULES       \  # module support
>     -e USB_SUPPORT   \
>     -m USB           \
>     -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>     -m USB_GADGET       # depends on usb-common.ko, which I test with
>
> $ make olddefconfig
> $ make
> $ abidw ./drivers/usb/common/usb-common.ko | less
> # then check for the existence of the elf-function-symbols section
>
>                    master         patched
>Linux v4.18
>
> x86_64             works            works
> x86                works            works
> arm64              works            works
>
>Linux v4.19
>
> x86_64             broken           works
> x86                broken           STILL BROKEN
> arm64              broken           works
>
>So, for x86_64 and arm64 the situation got better and elf symbols are
>now identified. For x86, this patch did not improve the situation. I did
>not check that the list of symbols are entirely correct and that they
>work downstream with abidiff etc. Just the pure existence is a very good
>indicator for this patch to work.
>
>As I said earlier, I am a bit concerned about the platform support. We
>should probably at emit a warning if we know that we can't handle a
>certain platform or (preferably) avoid the platform specific code. Not
>sure whether there is a better way to extract the information though.

Yeah, I did not like that either too much. It's difficult to determine
the ksymtab format of modules though, especially since for the newer
format ksymtab sections they are not relocated (and appear as just
zeroes), so we cannot tell which symbols are in it and what ksymtab
format they are using :-(. If we think of a platform-agnostic way of
doing it, I'm for it (perhaps a different patch). I'll definitely add
a warning if we come across a platform that we don't support yet.

Thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00     ` Matthias Maennich via libabigail
@ 2019-01-01  0:00       ` Jessica Yu
  2019-01-01  0:00         ` Matthias Maennich via libabigail
  0 siblings, 1 reply; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, Dodji Seketeli, kernel-team

+++ Matthias Maennich [22/08/19 19:04 +0100]:
>On Wed, Aug 21, 2019 at 02:34:37PM +0200, Jessica Yu wrote:
>>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>>Hi Jessica!
>>>
>>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>>references. This changes the type of relocations emitted for ksymtab
>>>>sections to be place-relative 32-bit relocations instead of absolute
>>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>>already fully linked and relocated (and therefore we can read its
>>>>ksymtab section just fine), this problem is only relevant to Linux
>>>>kernel modules.
>>>>
>>>>To work around this, we utilize the ksymtab relocation sections to
>>>>determine which symbols the ksymtab entries refer to. We do this by
>>>>inspecting each relocation's r_info field for the symbol table index and
>>>>from there we are able to read each symbol's value and subsequently add
>>>>that to the set of exported symbols.
>>>>
>>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>>to implement a new heuristic to determine the ksymtab format we have.
>>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>>and absolute relocation types suggest the old pre v4.19 format.
>>>>
>>>>      * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>>      symbol value and shndx parameters.
>>>>      (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>>      * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>>      (elf_symbol::priv::priv): Adjust.
>>>>      (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>>      (elf_symbol::create): Likewise.
>>>>      (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>>      * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>>      * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>>      (lookup_symbol_from_gnu_hash_tab)
>>>>      (lookup_symbol_from_symtab): Adjust.
>>>>      (read_context::{ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_,
>>>>      ksymtab_strings_section_}): New data members.
>>>>      (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>>      ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>>      (read_context::{find_ksymtab_reloc_section,
>>>>      find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>>      find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>>      populate_symbol_map_from_ksymtab,
>>>>      populate_symbol_map_from_ksymtab_reloc,
>>>>      is_linux_kernel_module}): New member functions.
>>>>      (read_context::load_kernel_symbol_table): Adjust to call either
>>>>      populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>>      format.
>>>>      (read_context::get_ksymtab_format): Adjust to call
>>>>      get_ksymtab_format_module for linux kernel modules.
>>>>      (read_context::lookup_elf_symbol_from_index): Adjust.
>>>>      (create_default_var_sym, create_default_fn_sym): Adjust.
>>>>
>>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>>---
>>>>include/abg-ir.h        |  10 ++
>>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>>src/abg-ir.cc           |  38 ++++-
>>>>src/abg-reader.cc       |   2 +
>>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>>
>>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>>index 170a844..d26f472 100644
>>>>--- a/include/abg-ir.h
>>>>+++ b/include/abg-ir.h
>>>>@@ -796,6 +796,8 @@ private:
>>>>elf_symbol(const environment* e,
>>>>	     size_t		i,
>>>>	     size_t		s,
>>>>+	     uint64_t		v,
>>>>+	     uint32_t		x,
>>>
>>>I would like to see some comments here to describe the purpose of the
>>>struct's members. I know there weren't any before, but maybe that is a
>>>good chance.
>>>
>>>Also, I am not entirely sure why these are all one-character names, but
>>>it makes it hard to come from x to "section header index". Maybe we
>>>should give them more meaningful names.
>>>
>>>>	     const string&	n,
>>>>	     type		t,
>>>>	     binding		b,
>>>>@@ -818,6 +820,8 @@ public:
>>>>create(const environment*	e,
>>>>	 size_t		i,
>>>>	 size_t		s,
>>>>+	 uint64_t	v,
>>>>+	 uint32_t	x,
>>>>	 const string&		n,
>>>>	 type			t,
>>>>	 binding		b,
>>>>@@ -838,6 +842,12 @@ public:
>>>>void
>>>>set_index(size_t);
>>>>
>>>>+  uint64_t
>>>>+  get_value() const;
>>>>+
>>>>+  uint32_t
>>>>+  get_shndx() const;
>>>>+
>>>>const string&
>>>>get_name() const;
>>>>
>>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>>index 47af7e3..835bf3d 100644
>>>>--- a/src/abg-dwarf-reader.cc
>>>>+++ b/src/abg-dwarf-reader.cc
>>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>GElf_Sym symbol;
>>>>const char* sym_name_str;
>>>>size_t sym_size;
>>>>+  uint64_t sym_value;
>>>>+  uint32_t sym_shndx;
>>>>elf_symbol::type sym_type;
>>>>elf_symbol::binding sym_binding;
>>>>elf_symbol::visibility sym_visibility;
>>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	  sym_visibility =
>>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>>	  sym_size = symbol.st_size;
>>>>+	  sym_value = symbol.st_value;
>>>>+	  sym_shndx = symbol.st_shndx;
>>>>	  elf_symbol::version ver;
>>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>>				     /*get_def_version=*/true, ver))
>>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>	    elf_symbol::create(env,
>>>>			       symbol_index,
>>>>			       sym_size,
>>>>+			       sym_value,
>>>>+			       sym_shndx,
>>>>			       sym_name_str,
>>>>			       sym_type,
>>>>			       sym_binding,
>>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>
>>>>	  elf_symbol_sptr symbol_found =
>>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>>+			       symbol.st_shndx, sym_name_str,
>>>>			       sym_type, sym_binding,
>>>>			       symbol.st_shndx != SHN_UNDEF,
>>>>			       symbol.st_shndx == SHN_COMMON,
>>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>	  elf_symbol_sptr symbol_found =
>>>>	    elf_symbol::create(env, i, sym->st_size,
>>>>+			       sym->st_value, sym->st_shndx,
>>>>			       name_str, sym_type,
>>>>			       sym_binding, sym_is_defined,
>>>>			       sym_is_common, ver, sym_visibility);
>>>>@@ -3081,7 +3089,10 @@ public:
>>>>/// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>>>/// linux kernel.
>>>>Elf_Scn*			ksymtab_section_;
>>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>>>Elf_Scn*			ksymtab_gpl_section_;
>>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>>+  Elf_Scn*			ksymtab_strings_section_;
>>>>Elf_Scn*			versym_section_;
>>>>Elf_Scn*			verdef_section_;
>>>>Elf_Scn*			verneed_section_;
>>>>@@ -3277,7 +3288,10 @@ public:
>>>>  nb_ksymtab_entries_ = 0;
>>>>  nb_ksymtab_gpl_entries_ = 0;
>>>>  ksymtab_section_ = 0;
>>>>+    ksymtab_reloc_section_ = 0;
>>>>  ksymtab_gpl_section_ = 0;
>>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>>+    ksymtab_strings_section_ = 0;
>>>>  versym_section_ = 0;
>>>>  verdef_section_ = 0;
>>>>  verneed_section_ = 0;
>>>>@@ -6079,6 +6093,23 @@ public:
>>>>  return ksymtab_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>>+  /// a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_reloc_section_;
>>>>+  }
>>>>+
>>>>/// Return the __ksymtab_gpl section of a linux kernel ELF file
>>>>/// (either a vmlinux binary or a kernel module).
>>>>///
>>>>@@ -6092,6 +6123,36 @@ public:
>>>>  return ksymtab_gpl_section_;
>>>>}
>>>>
>>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_gpl_reloc_section() const
>>>>+  {
>>>>+    if (!ksymtab_gpl_reloc_section_)
>>>>+      {
>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>>+        if (!sec)
>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>>+      }
>>>>+    return ksymtab_gpl_reloc_section_;
>>>>+  }
>>>>+
>>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>+  ///
>>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>>+  Elf_Scn*
>>>>+  find_ksymtab_strings_section() const
>>>>+  {
>>>>+    if (!ksymtab_strings_section_)
>>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>>+    return ksymtab_strings_section_;
>>>>+  }
>>>>+
>>>
>>>We should find a way to deduplicate these functions. Not necessarily in
>>>this patch, but in a followup. Most of them only differ in the section
>>>name and whether they check for an alternative. Something generic like
>>>
>>>struct SectionQuery{ string name, int type };
>>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>>
>>>should be sufficient to forward from all the other variants to one
>>>implementation.
>>>
>>>>/// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>>>/// only the __ksymtab_gpl exists.
>>>>///
>>>>@@ -6106,6 +6167,19 @@ public:
>>>>  return result;
>>>>}
>>>>
>>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>>+  ///
>>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>>+  Elf_Scn*
>>>>+  find_any_ksymtab_reloc_section() const
>>>>+  {
>>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>>+    if (!result)
>>>>+      result = find_ksymtab_gpl_reloc_section();
>>>>+    return result;
>>>>+  }
>>>>+
>>>>/// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>>>/// sections that are involved in symbol versionning.
>>>>///
>>>>@@ -6283,7 +6357,8 @@ public:
>>>>    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>>
>>>>  elf_symbol_sptr sym =
>>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>>+			 s->st_value, s->st_shndx, name_str,
>>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>>@@ -7435,6 +7510,64 @@ public:
>>>>  return symbol;
>>>>}
>>>>
>>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>+  /// sections of Linux kernel modules.
>>>>+  ///
>>>>+  /// This is important because we need the know the format of these
>>>
>>>s/need the/need to/                       ^^^
>>>
>>>>+  /// sections to be able to read from them.
>>>>+  ///
>>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>>+  enum ksymtab_format
>>>>+  get_ksymtab_format_module() const
>>>>+  {
>>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>>+
>>>>+    if (!section)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>>+    // knows how to and it will zero a relocation after applying it. If the
>>>
>>>will zero the relocation info after
>>>
>>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>>+    // kernel modules.
>>>>+    GElf_Shdr section_mem;
>>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>>+    if (section_shdr->sh_size == 0)
>>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>>+    /* SHT_REL sections not supported yet */
>>>>+    if (section_shdr->sh_type == SHT_REL)
>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>+
>>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>>+    // contains.
>>>>+    GElf_Rela rela;
>>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>>+
>>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>>+    // supported on x86 and arm64 as of v4.19.
>>>>+    ksymtab_format format;
>>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>>+      {
>>>>+      case R_X86_64_64: // Same as R_386_32
>>>
>>>add 'fallthrough' to the comments, else younger compilers will warn
>>>about the potentially unintentional fallthrough.
>>>
>>>>+      case R_AARCH64_ABS64:
>>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>>+      case R_AARCH64_PREL32:
>>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>>+	break;
>>>>+      default:
>>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>>
>>>We should probably emit a warning in this case. Are there more
>>>architectures that have PREL32 enabled as of now that are not yet
>>>covered here?
>>>
>>>>+	break;
>>>>+      }
>>>>+    return format;
>>>>+  }
>>>>+
>>>>/// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>/// sections.
>>>>///
>>>>@@ -7451,6 +7584,18 @@ public:
>>>>    {
>>>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>>>	  {
>>>>+            // Since Linux kernel modules are relocatable, we can first try
>>>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>>>+            if (is_linux_kernel_module())
>>>>+              {
>>>>+                ksymtab_format_ = get_ksymtab_format_module();
>>>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>>>+                  return ksymtab_format_;
>>>>+              }
>>>
>>>nit: use tabs
>>>
>>>>+
>>>>+	    // If it's not a kernel module or we couldn't determine its format
>>>>+	    // with relocations, fall back to the heuristic below.
>>>>+
>>>>	    // OK this is a dirty little heuristic to determine the
>>>>	    // format of the ksymtab section.
>>>>	    //
>>>>@@ -7553,48 +7698,24 @@ public:
>>>>  return nb_ksymtab_gpl_entries_;
>>>>}
>>>>
>>>>-  /// Load a given kernel symbol table.
>>>>+  /// Populate the symbol map by reading exported symbols from the
>>>>+  /// ksymtab directly.
>>>>///
>>>>-  /// One can thus retrieve the resulting symbols by using the
>>>>-  /// accessors read_context::linux_exported_fn_syms(),
>>>>-  /// read_context::linux_exported_var_syms(),
>>>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>>>-  /// read_context::linux_exported_gpl_var_syms().
>>>>+  /// @param section the ksymtab section to read from
>>>>///
>>>>-  /// @param kind the kind of kernel symbol table to load.
>>>>+  /// @param exported_fns_set the set of exported functions
>>>>+  ///
>>>>+  /// @param exported_vars_set the set of exported variables
>>>>+  ///
>>>>+  /// @param nb_entries the number of ksymtab entries to read
>>>>///
>>>>/// @return true upon successful completion, false otherwise.
>>>>bool
>>>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>>>+                                   address_set_sptr exported_fns_set,
>>>>+                                   address_set_sptr exported_vars_set,
>>>>+                                   size_t nb_entries)
>>>>{
>>>>-    size_t nb_entries = 0;
>>>>-    Elf_Scn *section = 0;
>>>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>>-
>>>>-    switch (kind)
>>>>-      {
>>>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>>-	break;
>>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>>-	section = find_ksymtab_section();
>>>>-	nb_entries = get_nb_ksymtab_entries();
>>>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>>-	break;
>>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>>-	section = find_ksymtab_gpl_section();
>>>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>>-	break;
>>>>-      }
>>>>-
>>>>-    if (!linux_exported_vars_set
>>>>-	|| !linux_exported_fns_set
>>>>-	|| !section
>>>>-	|| !nb_entries)
>>>>-      return false;
>>>>-
>>>>  // The data of the section.
>>>>  Elf_Data *elf_data = elf_rawdata(section, 0);
>>>>
>>>>@@ -7654,28 +7775,30 @@ public:
>>>>	// (aka ELF symbol table).
>>>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>	if (!symbol)
>>>>-	  {
>>>>-	    adjusted_symbol_address =
>>>>-	      maybe_adjust_var_sym_address(symbol_address);
>>>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>-	    if (!symbol)
>>>>-	      // This must be a symbol that is of type neither FUNC
>>>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>>-	      // am not sure what those are.
>>>>-	      continue;
>>>>-	  }
>>>>+          {
>>>>+            adjusted_symbol_address =
>>>>+              maybe_adjust_var_sym_address(symbol_address);
>>>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>+            if (!symbol)
>>>>+              {
>>>>+	        // This must be a symbol that is of type neither FUNC
>>>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>>+	        // am not sure what those are.
>>>>+	        continue;
>>>>+	      }
>>>>+          }
>>>
>>>This part of the diff seems to a change purely of the indentation/tabs.
>>>Maybe you can revert this part.
>>>
>>>>
>>>>	address_set_sptr set;
>>>>	if (symbol->is_function())
>>>>	  {
>>>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>>>-	    set = linux_exported_fns_set;
>>>>+	    set = exported_fns_set;
>>>>	  }
>>>>	else if (symbol->is_variable())
>>>>	  {
>>>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>>>-	    set = linux_exported_vars_set;
>>>>+	    set = exported_vars_set;
>>>>	  }
>>>>	else
>>>>	  ABG_ASSERT_NOT_REACHED;
>>>>@@ -7684,6 +7807,157 @@ public:
>>>>  return true;
>>>>}
>>>>
>>>>+  /// Populate the symbol map by extracting the exported symbols from a
>>>>+  /// ksymtab rela section.
>>>>+  ///
>>>>+  /// @param section the ksymtab section to read from
>>>>+  ///
>>>>+  /// @param strings_section the __ksymtab_strings section
>>>>+  ///
>>>>+  /// @param exported_fns_set the set of exported functions
>>>>+  ///
>>>>+  /// @param exported_vars_set the set of exported variables
>>>>+  ///
>>>>+  /// @return true upon successful completion, false otherwise.
>>>>+  bool
>>>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>>>+                                         address_set_sptr exported_fns_set,
>>>>+                                         address_set_sptr exported_vars_set)
>>>>+  {
>>>>+    GElf_Shdr reloc_section_mem;
>>>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>>>
>>>gelf_getshdr returns &reloc_section_mem. So you might want to use
>>>reloc_section_mem directly rather than introducing reloc_section_mem.
>>>
>>>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>>>+
>>>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>>>+
>>>>+    GElf_Rela rela;
>>>>+    elf_symbol_sptr symbol;
>>>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>>>+      {
>>>>+        gelf_getrela(reloc_section_data, i, &rela);
>>>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>>+
>>>>+        // This shouldn't happen..
>>>>+        if (!symbol)
>>>
>>>Maybe ABG_ASSERT then?
>>>
>>>>+          return false;
>>>>+
>>>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>>>+        if (symbol->get_shndx() == strings_ndx)
>>>>+          continue;
>>>>+
>>>>+	if (!symbol->is_function() && !symbol->is_variable())
>>>>+          {
>>>>+            if (do_log())
>>>>+              {
>>>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>>>+                  cerr << "skipping NOTYPE symbol "
>>>>+                       << symbol->get_name()
>>>>+                       << " shndx: "
>>>>+                       << symbol->get_index()
>>>>+                       << " @"
>>>>+                       << elf_path()
>>>>+                       << "\n";
>>>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>>>+                  cerr << "skipping SECTION symbol "
>>>>+                       << "shndx: "
>>>>+                       << symbol->get_index()
>>>>+                       << " @"
>>>>+                       << elf_path()
>>>>+                       << "\n";
>>>>+               }
>>>>+            continue;
>>>>+	  }
>>>>+
>>>>+	address_set_sptr set;
>>>>+	if (symbol->is_function())
>>>>+	  {
>>>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>>>+	    set = exported_fns_set;
>>>>+	  }
>>>>+	else if (symbol->is_variable())
>>>>+	  {
>>>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>>>+	    set = exported_vars_set;
>>>>+	  }
>>>>+	else
>>>>+	  ABG_ASSERT_NOT_REACHED;
>>>>+	set->insert(symbol->get_value());
>>>>+      }
>>>>+    return true;
>>>>+  }
>>>>+
>>>>+  /// Load a given kernel symbol table.
>>>>+  ///
>>>>+  /// One can thus retrieve the resulting symbols by using the
>>>>+  /// accessors read_context::linux_exported_fn_syms(),
>>>>+  /// read_context::linux_exported_var_syms(),
>>>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>>>+  /// read_context::linux_exported_gpl_var_syms().
>>>>+  ///
>>>>+  /// @param kind the kind of kernel symbol table to load.
>>>>+  ///
>>>>+  /// @return true upon successful completion, false otherwise.
>>>>+  bool
>>>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>>+  {
>>>>+    size_t nb_entries = 0;
>>>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>>+
>>>>+    switch (kind)
>>>>+      {
>>>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>>+	break;
>>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>>+	section = find_ksymtab_section();
>>>>+	reloc_section = find_ksymtab_reloc_section();
>>>>+	nb_entries = get_nb_ksymtab_entries();
>>>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>>+	break;
>>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>>+	section = find_ksymtab_gpl_section();
>>>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>>+	break;
>>>>+      }
>>>>+
>>>>+    strings_section = find_ksymtab_strings_section();
>>>>+
>>>>+    if (!linux_exported_vars_set
>>>>+	|| !linux_exported_fns_set
>>>>+	|| !section
>>>>+	|| !strings_section
>>>>+	|| !nb_entries)
>>>>+      return false;
>>>>+
>>>>+    GElf_Ehdr eh_mem;
>>>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>>>+    ksymtab_format format = get_ksymtab_format();
>>>>+
>>>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>>>+    // apply those relocations and we're safe to read the relocation section to
>>>>+    // determine which exported symbols are in the ksymtab.
>>>>+    //
>>>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>>>+    // supported as of now.
>>>>+    if (!reloc_section
>>>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>>>+                                              linux_exported_vars_set, nb_entries);
>>>>+    else
>>>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>>>+                                                    linux_exported_fns_set,
>>>>+                                                    linux_exported_vars_set);
>>>>+  }
>>>>+
>>>>/// Load the special __ksymtab section. This is for linux kernel
>>>>/// (module) files.
>>>>///
>>>>@@ -8352,6 +8626,17 @@ public:
>>>>is_linux_kernel_binary() const
>>>>{return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>>>
>>>>+  /// Guess if the current binary is a Linux Kernel module.
>>>>+  ///
>>>>+  /// To guess that, the function looks for the presence of the
>>>>+  /// special ".modinfo" section in the binary.
>>>>+  ///
>>>>+  bool
>>>>+  is_linux_kernel_module() const
>>>>+  {
>>>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>>>+  }
>>>>+
>>>
>>>Also refer to this proposal to restrict checking for a module a bit
>>>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>>>
>>>>/// Getter of the "show_stats" flag.
>>>>///
>>>>/// This flag tells if we should emit statistics about various
>>>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>>>  elf_symbol::create(env,
>>>>		       /*symbol index=*/ 0,
>>>>		       /*symbol size=*/ 0,
>>>>+		       /*symbol value=*/ 0,
>>>>+		       /*symbol shndx=*/ 0,
>>>>		       sym_name,
>>>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>>>  elf_symbol::create(env,
>>>>		       /*symbol index=*/ 0,
>>>>		       /*symbol size=*/ 0,
>>>>+		       /*symbol value=*/ 0,
>>>>+		       /*symbol shndx=*/ 0,
>>>>		       sym_name,
>>>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>>>index a5243c3..ad1c261 100644
>>>>--- a/src/abg-ir.cc
>>>>+++ b/src/abg-ir.cc
>>>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>>>const environment*	env_;
>>>>size_t		index_;
>>>>size_t		size_;
>>>>+  uint64_t		value_;
>>>>+  uint32_t		shndx_;
>>>>string		name_;
>>>>elf_symbol::type	type_;
>>>>elf_symbol::binding	binding_;
>>>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>>>  : env_(),
>>>>    index_(),
>>>>    size_(),
>>>>+      value_(),
>>>>+      shndx_(),
>>>>    type_(elf_symbol::NOTYPE_TYPE),
>>>>    binding_(elf_symbol::GLOBAL_BINDING),
>>>>    visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>>>priv(const environment*		e,
>>>>     size_t				i,
>>>>     size_t				s,
>>>>+       uint64_t				v,
>>>>+       uint32_t				x,
>>>>     const string&			n,
>>>>     elf_symbol::type		t,
>>>>     elf_symbol::binding		b,
>>>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>>>  : env_(e),
>>>>    index_(i),
>>>>    size_(s),
>>>>+      value_(v),
>>>>+      shndx_(x),
>>>>    name_(n),
>>>>    type_(t),
>>>>    binding_(b),
>>>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>>>///
>>>>/// @param s the size of the symbol.
>>>>///
>>>>+/// @param v the value of the symbol.
>>>>+///
>>>>+/// @param x the section header index (shndx) of the symbol.
>>>>+///
>>>>/// @param n the name of the symbol.
>>>>///
>>>>/// @param t the type of the symbol.
>>>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>>>elf_symbol::elf_symbol(const environment*	e,
>>>>		       size_t			i,
>>>>		       size_t			s,
>>>>+		       uint64_t			v,
>>>>+		       uint32_t			x,
>>>>		       const string&		n,
>>>>		       type			t,
>>>>		       binding			b,
>>>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>>>		       bool			c,
>>>>		       const version&		ve,
>>>>		       visibility		vi)
>>>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>>>{}
>>>>
>>>>/// Factory of instances of @ref elf_symbol.
>>>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>>>///
>>>>/// @param s the size of the symbol.
>>>>///
>>>>+/// @param v the value of the symbol.
>>>>+///
>>>>+/// @param x the section header index (shndx) of the symbol.
>>>>+///
>>>>/// @param n the name of the symbol.
>>>>///
>>>>/// @param t the type of the symbol.
>>>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>>>elf_symbol::create(const environment*	e,
>>>>		   size_t		i,
>>>>		   size_t		s,
>>>>+		   uint64_t		v,
>>>>+		   uint32_t		x,
>>>>		   const string&	n,
>>>>		   type		t,
>>>>		   binding		b,
>>>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>>>		   const version&	ve,
>>>>		   visibility		vi)
>>>>{
>>>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>>>sym->priv_->main_symbol_ = sym;
>>>>return sym;
>>>>}
>>>>@@ -1411,6 +1431,20 @@ void
>>>>elf_symbol::set_index(size_t s)
>>>>{priv_->index_ = s;}
>>>>
>>>>+/// Getter for the symbol value.
>>>>+///
>>>>+/// @return the value of the symbol.
>>>>+uint64_t
>>>>+elf_symbol::get_value() const
>>>>+{return priv_->value_;}
>>>>+
>>>>+/// Getter for the symbol section header index.
>>>>+///
>>>>+/// @return the section header index of the symbol.
>>>>+uint32_t
>>>>+elf_symbol::get_shndx() const
>>>>+{return priv_->shndx_;}
>>>>+
>>>>/// Getter for the name of the @ref elf_symbol.
>>>>///
>>>>/// @return a reference to the name of the @ref symbol.
>>>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>>>index 810a65f..3c481ad 100644
>>>>--- a/src/abg-reader.cc
>>>>+++ b/src/abg-reader.cc
>>>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>>>
>>>>const environment* env = ctxt.get_environment();
>>>>elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>>>+					 /*value=*/0,
>>>>+					 /*shndx=*/0,
>>>>					 name, type, binding,
>>>>					 is_defined, is_common,
>>>>					 version, visibility);
>>>>-- 
>>>>2.16.4
>>>>
>>>
>>>Nice work! Thank you for that!
>>>The patch looks in general good to me. I am a bit concerned about the
>>>platform support. There are some corner cases that need to be covered.
>>>See the testing below.
>>>
>>>You might find the .clang-format file useful to match the coding style
>>>at some points (especially tabs vs spaces). Although when reviewing the
>>>patch, I had a bit trouble with unchanged but reformatted lines as they
>>>were confusing my diff. Maybe you can revert the changes that were
>>>purely for formatting and we take care of formatting in a later patch.
>>>
>>>I conducted some testing. I choose a minimal configuration with modules:
>>>
>>>$ export ARCH=<arch>
>>>$ make allnoconfig
>>>$ ./scripts/config   \
>>>   -e DEBUG_INFO    \  # dwarf info
>>>   -e MODULES       \  # module support
>>>   -e USB_SUPPORT   \
>>>   -m USB           \
>>>   -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>>>   -m USB_GADGET       # depends on usb-common.ko, which I test with
>>>
>>>$ make olddefconfig
>>>$ make
>>>$ abidw ./drivers/usb/common/usb-common.ko | less
>>># then check for the existence of the elf-function-symbols section
>>>
>>>                  master         patched
>>>Linux v4.18
>>>
>>>x86_64             works            works
>>>x86                works            works
>>>arm64              works            works
>>>
>>>Linux v4.19
>>>
>>>x86_64             broken           works
>>>x86                broken           STILL BROKEN
>>>arm64              broken           works
>>>
>>>So, for x86_64 and arm64 the situation got better and elf symbols are
>>>now identified. For x86, this patch did not improve the situation. I did
>>>not check that the list of symbols are entirely correct and that they
>>>work downstream with abidiff etc. Just the pure existence is a very good
>>>indicator for this patch to work.
>>
>>Hi Matthias! Thanks a bunch for your review! :-)
>>
>>I'm aware of the x86 issue and I think the diff below should fix it.
>>Could you please let me know if it passes your x86 testing? (note, I have
>>yet to address your other comments, so this is to be continued!)
>
>Hi Jessica!
>
>I confirm this patch is fixing the x86 testing as lined out above. I
>also retested the other platforms successfully.
>
>Thanks! Great work!
>
>Cheers,
>Matthias
>
>>
>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>index 835bf3d..ed8a8db 100644
>>--- a/src/abg-dwarf-reader.cc
>>+++ b/src/abg-dwarf-reader.cc
>>@@ -7536,22 +7536,33 @@ public:
>>   GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>   if (section_shdr->sh_size == 0)
>>     return PRE_V4_19_KSYMTAB_FORMAT;
>>-    /* SHT_REL sections not supported yet */
>>-    if (section_shdr->sh_type == SHT_REL)
>>-      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    bool is_relasec = (section_shdr->sh_type == SHT_RELA);
>>
>>   // If we still have a normal non-zeroed relocation section, we can guess
>>   // what format the ksymtab is in depending on what types of relocs it
>>   // contains.
>>-    GElf_Rela rela;
>>-    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>-    gelf_getrela(rela_section_data, 0, &rela);
>>+
>>+    int type;
>>+    Elf_Data *section_data = elf_rawdata(section, 0);
>>+    if (is_relasec)
>>+      {
>>+        GElf_Rela rela;
>>+        gelf_getrela(section_data, 0, &rela);
>>+        type = GELF_R_TYPE(rela.r_info);
>>+      }
>>+    else
>>+      {
>>+        GElf_Rel rel;
>>+        gelf_getrel(section_data, 0, &rel);
>>+        type = GELF_R_TYPE(rel.r_info);
>>+      }
>
>Maybe we can deduplicate this a bit. Also further down.
>Missing out that additional 'a' might be a bug source at a later time.

Yeah, I wasn't sure how to deduplicate this since rels and relas are
different data types and we have to use gelf_getrel+GElf_Rel and
gelf_getrela+GElf_Rela on rels and relas respectively..

>>   // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>   // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>   // supported on x86 and arm64 as of v4.19.
>>   ksymtab_format format;
>>-    switch (GELF_R_TYPE(rela.r_info))
>>+    switch (type)
>>     {
>>     case R_X86_64_64: // Same as R_386_32
>>     case R_AARCH64_ABS64:
>>@@ -7831,12 +7842,22 @@ public:
>>   Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>   size_t strings_ndx = elf_ndxscn(strings_section);
>>
>>-    GElf_Rela rela;
>>+    bool is_relasec = (reloc_section_shdr->sh_type == SHT_RELA);
>>   elf_symbol_sptr symbol;
>>   for (unsigned int i = 0; i < reloc_count; i++)
>>     {
>>-        gelf_getrela(reloc_section_data, i, &rela);
>>-        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>+        if (is_relasec)
>>+          {
>>+            GElf_Rela rela;
>>+            gelf_getrela(reloc_section_data, i, &rela);
>>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>+          }
>>+        else
>>+          {
>>+            GElf_Rel rel;
>>+            gelf_getrel(reloc_section_data, i, &rel);
>>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rel.r_info));
>>+          }
>>
>>       // This shouldn't happen..
>>       if (!symbol)
>>@@ -7934,8 +7955,6 @@ public:
>>	|| !nb_entries)
>>     return false;
>>
>>-    GElf_Ehdr eh_mem;
>>-    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>   ksymtab_format format = get_ksymtab_format();
>>
>>   // Although pre-v4.19 kernel modules can have a relocation section for the
>>@@ -7947,9 +7966,7 @@ public:
>>   //
>>   // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>   // supported as of now.
>
>The comment might need updating.

Thanks for catching that!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00 [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules Jessica Yu
  2019-01-01  0:00 ` Matthias Maennich via libabigail
@ 2019-01-01  0:00 ` Jessica Yu
  1 sibling, 0 replies; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: libabigail; +Cc: Dodji Seketeli, Matthias Maennich

+++ Jessica Yu [09/08/19 17:49 +0200]:
>As described in commit ad8c2531fb9, the format of the Linux kernel
>ksymtab changed in v4.19 to use relative references instead of absolute
>references. This changes the type of relocations emitted for ksymtab
>sections to be place-relative 32-bit relocations instead of absolute
>relocations. One side-effect of this is that libdwfl will not relocate
>the ksymtab sections due to the PC-relative relocations. This breaks
>load_kernel_symbol_table() for kernel modules because it only reads in
>zeros from the unrelocated ksymtab section and is subsequently unable to
>determine what exported symbols it refers to. Since a vmlinux binary is
>already fully linked and relocated (and therefore we can read its
>ksymtab section just fine), this problem is only relevant to Linux
>kernel modules.
>
>To work around this, we utilize the ksymtab relocation sections to
>determine which symbols the ksymtab entries refer to. We do this by
>inspecting each relocation's r_info field for the symbol table index and
>from there we are able to read each symbol's value and subsequently add
>that to the set of exported symbols.
>
>In addition, for Linux kernel modules, we can utilize relocation types
>to implement a new heuristic to determine the ksymtab format we have.
>The presence of PC-relative relocations suggest the new v4.19 format,
>and absolute relocation types suggest the old pre v4.19 format.

Hello! Just a small addendum:

This is my first patch to libabigail so things might be a bit rough
around the edges. I'm sending this first as an RFC because not
everyone might agree on the general approach and it may need to be
reworked. Testing is greatly appreciated! I will also be on vacation
next week with no access to email so I won't be able to respond to
comments until the following week.

Thanks!

Jessica

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00       ` Jessica Yu
@ 2019-01-01  0:00         ` Matthias Maennich via libabigail
  0 siblings, 0 replies; 12+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Jessica Yu; +Cc: libabigail, Dodji Seketeli, kernel-team

On Mon, Sep 02, 2019 at 06:49:27PM +0200, Jessica Yu wrote:
>+++ Matthias Maennich [22/08/19 19:04 +0100]:
>>On Wed, Aug 21, 2019 at 02:34:37PM +0200, Jessica Yu wrote:
>>>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>>>Hi Jessica!
>>>>
>>>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>>>references. This changes the type of relocations emitted for ksymtab
>>>>>sections to be place-relative 32-bit relocations instead of absolute
>>>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>>>already fully linked and relocated (and therefore we can read its
>>>>>ksymtab section just fine), this problem is only relevant to Linux
>>>>>kernel modules.
>>>>>
>>>>>To work around this, we utilize the ksymtab relocation sections to
>>>>>determine which symbols the ksymtab entries refer to. We do this by
>>>>>inspecting each relocation's r_info field for the symbol table index and
>>>>>from there we are able to read each symbol's value and subsequently add
>>>>>that to the set of exported symbols.
>>>>>
>>>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>>>to implement a new heuristic to determine the ksymtab format we have.
>>>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>>>and absolute relocation types suggest the old pre v4.19 format.
>>>>>
>>>>>     * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>>>     symbol value and shndx parameters.
>>>>>     (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>>>     * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>>>     (elf_symbol::priv::priv): Adjust.
>>>>>     (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>>>     (elf_symbol::create): Likewise.
>>>>>     (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>>>     * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>>>     * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>>>     (lookup_symbol_from_gnu_hash_tab)
>>>>>     (lookup_symbol_from_symtab): Adjust.
>>>>>     (read_context::{ksymtab_reloc_section_,
>>>>>     ksymtab_gpl_reloc_section_,
>>>>>     ksymtab_strings_section_}): New data members.
>>>>>     (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>>>     ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>>>     (read_context::{find_ksymtab_reloc_section,
>>>>>     find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>>>     find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>>>     populate_symbol_map_from_ksymtab,
>>>>>     populate_symbol_map_from_ksymtab_reloc,
>>>>>     is_linux_kernel_module}): New member functions.
>>>>>     (read_context::load_kernel_symbol_table): Adjust to call either
>>>>>     populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>>>     format.
>>>>>     (read_context::get_ksymtab_format): Adjust to call
>>>>>     get_ksymtab_format_module for linux kernel modules.
>>>>>     (read_context::lookup_elf_symbol_from_index): Adjust.
>>>>>     (create_default_var_sym, create_default_fn_sym): Adjust.
>>>>>
>>>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>>>---
>>>>>include/abg-ir.h        |  10 ++
>>>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>>>src/abg-ir.cc           |  38 ++++-
>>>>>src/abg-reader.cc       |   2 +
>>>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>>>
>>>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>>>index 170a844..d26f472 100644
>>>>>--- a/include/abg-ir.h
>>>>>+++ b/include/abg-ir.h
>>>>>@@ -796,6 +796,8 @@ private:
>>>>>elf_symbol(const environment* e,
>>>>>	     size_t		i,
>>>>>	     size_t		s,
>>>>>+	     uint64_t		v,
>>>>>+	     uint32_t		x,
>>>>
>>>>I would like to see some comments here to describe the purpose of the
>>>>struct's members. I know there weren't any before, but maybe that is a
>>>>good chance.
>>>>
>>>>Also, I am not entirely sure why these are all one-character names, but
>>>>it makes it hard to come from x to "section header index". Maybe we
>>>>should give them more meaningful names.
>>>>
>>>>>	     const string&	n,
>>>>>	     type		t,
>>>>>	     binding		b,
>>>>>@@ -818,6 +820,8 @@ public:
>>>>>create(const environment*	e,
>>>>>	 size_t		i,
>>>>>	 size_t		s,
>>>>>+	 uint64_t	v,
>>>>>+	 uint32_t	x,
>>>>>	 const string&		n,
>>>>>	 type			t,
>>>>>	 binding		b,
>>>>>@@ -838,6 +842,12 @@ public:
>>>>>void
>>>>>set_index(size_t);
>>>>>
>>>>>+  uint64_t
>>>>>+  get_value() const;
>>>>>+
>>>>>+  uint32_t
>>>>>+  get_shndx() const;
>>>>>+
>>>>>const string&
>>>>>get_name() const;
>>>>>
>>>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>>>index 47af7e3..835bf3d 100644
>>>>>--- a/src/abg-dwarf-reader.cc
>>>>>+++ b/src/abg-dwarf-reader.cc
>>>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>>GElf_Sym symbol;
>>>>>const char* sym_name_str;
>>>>>size_t sym_size;
>>>>>+  uint64_t sym_value;
>>>>>+  uint32_t sym_shndx;
>>>>>elf_symbol::type sym_type;
>>>>>elf_symbol::binding sym_binding;
>>>>>elf_symbol::visibility sym_visibility;
>>>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>>	  sym_visibility =
>>>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>>>	  sym_size = symbol.st_size;
>>>>>+	  sym_value = symbol.st_value;
>>>>>+	  sym_shndx = symbol.st_shndx;
>>>>>	  elf_symbol::version ver;
>>>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>>>				     /*get_def_version=*/true, ver))
>>>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>>>	    elf_symbol::create(env,
>>>>>			       symbol_index,
>>>>>			       sym_size,
>>>>>+			       sym_value,
>>>>>+			       sym_shndx,
>>>>>			       sym_name_str,
>>>>>			       sym_type,
>>>>>			       sym_binding,
>>>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>>
>>>>>	  elf_symbol_sptr symbol_found =
>>>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>>>+			       symbol.st_shndx, sym_name_str,
>>>>>			       sym_type, sym_binding,
>>>>>			       symbol.st_shndx != SHN_UNDEF,
>>>>>			       symbol.st_shndx == SHN_COMMON,
>>>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>>>	    ABG_ASSERT(!ver.str().empty());
>>>>>	  elf_symbol_sptr symbol_found =
>>>>>	    elf_symbol::create(env, i, sym->st_size,
>>>>>+			       sym->st_value, sym->st_shndx,
>>>>>			       name_str, sym_type,
>>>>>			       sym_binding, sym_is_defined,
>>>>>			       sym_is_common, ver, sym_visibility);
>>>>>@@ -3081,7 +3089,10 @@ public:
>>>>>/// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>>>>/// linux kernel.
>>>>>Elf_Scn*			ksymtab_section_;
>>>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>>>>Elf_Scn*			ksymtab_gpl_section_;
>>>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>>>+  Elf_Scn*			ksymtab_strings_section_;
>>>>>Elf_Scn*			versym_section_;
>>>>>Elf_Scn*			verdef_section_;
>>>>>Elf_Scn*			verneed_section_;
>>>>>@@ -3277,7 +3288,10 @@ public:
>>>>> nb_ksymtab_entries_ = 0;
>>>>> nb_ksymtab_gpl_entries_ = 0;
>>>>> ksymtab_section_ = 0;
>>>>>+    ksymtab_reloc_section_ = 0;
>>>>> ksymtab_gpl_section_ = 0;
>>>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>>>+    ksymtab_strings_section_ = 0;
>>>>> versym_section_ = 0;
>>>>> verdef_section_ = 0;
>>>>> verneed_section_ = 0;
>>>>>@@ -6079,6 +6093,23 @@ public:
>>>>> return ksymtab_section_;
>>>>>}
>>>>>
>>>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>>>+  /// a vmlinux binary or a kernel module).
>>>>>+  ///
>>>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>>>+  Elf_Scn*
>>>>>+  find_ksymtab_reloc_section() const
>>>>>+  {
>>>>>+    if (!ksymtab_reloc_section_)
>>>>>+      {
>>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>>>+        if (!sec)
>>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>>>+      }
>>>>>+    return ksymtab_reloc_section_;
>>>>>+  }
>>>>>+
>>>>>/// Return the __ksymtab_gpl section of a linux kernel ELF file
>>>>>/// (either a vmlinux binary or a kernel module).
>>>>>///
>>>>>@@ -6092,6 +6123,36 @@ public:
>>>>> return ksymtab_gpl_section_;
>>>>>}
>>>>>
>>>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>>+  ///
>>>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>>>+  Elf_Scn*
>>>>>+  find_ksymtab_gpl_reloc_section() const
>>>>>+  {
>>>>>+    if (!ksymtab_gpl_reloc_section_)
>>>>>+      {
>>>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>>>+        if (!sec)
>>>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>>>+      }
>>>>>+    return ksymtab_gpl_reloc_section_;
>>>>>+  }
>>>>>+
>>>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>>>+  /// (either a vmlinux binary or a kernel module).
>>>>>+  ///
>>>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>>>+  Elf_Scn*
>>>>>+  find_ksymtab_strings_section() const
>>>>>+  {
>>>>>+    if (!ksymtab_strings_section_)
>>>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>>>+    return ksymtab_strings_section_;
>>>>>+  }
>>>>>+
>>>>
>>>>We should find a way to deduplicate these functions. Not necessarily in
>>>>this patch, but in a followup. Most of them only differ in the section
>>>>name and whether they check for an alternative. Something generic like
>>>>
>>>>struct SectionQuery{ string name, int type };
>>>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>>>
>>>>should be sufficient to forward from all the other variants to one
>>>>implementation.
>>>>
>>>>>/// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>>>>/// only the __ksymtab_gpl exists.
>>>>>///
>>>>>@@ -6106,6 +6167,19 @@ public:
>>>>> return result;
>>>>>}
>>>>>
>>>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>>>+  ///
>>>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>>>+  Elf_Scn*
>>>>>+  find_any_ksymtab_reloc_section() const
>>>>>+  {
>>>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>>>+    if (!result)
>>>>>+      result = find_ksymtab_gpl_reloc_section();
>>>>>+    return result;
>>>>>+  }
>>>>>+
>>>>>/// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>>>>/// sections that are involved in symbol versionning.
>>>>>///
>>>>>@@ -6283,7 +6357,8 @@ public:
>>>>>   stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>>>
>>>>> elf_symbol_sptr sym =
>>>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>>>+			 s->st_value, s->st_shndx, name_str,
>>>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>>>@@ -7435,6 +7510,64 @@ public:
>>>>> return symbol;
>>>>>}
>>>>>
>>>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>>+  /// sections of Linux kernel modules.
>>>>>+  ///
>>>>>+  /// This is important because we need the know the format of these
>>>>
>>>>s/need the/need to/                       ^^^
>>>>
>>>>>+  /// sections to be able to read from them.
>>>>>+  ///
>>>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>>>+  enum ksymtab_format
>>>>>+  get_ksymtab_format_module() const
>>>>>+  {
>>>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>>>+
>>>>>+    if (!section)
>>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>>+
>>>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>>>+    // knows how to and it will zero a relocation after applying it. If the
>>>>
>>>>will zero the relocation info after
>>>>
>>>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>>>+    // kernel modules.
>>>>>+    GElf_Shdr section_mem;
>>>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>>>+    if (section_shdr->sh_size == 0)
>>>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>>>+    /* SHT_REL sections not supported yet */
>>>>>+    if (section_shdr->sh_type == SHT_REL)
>>>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>>>+
>>>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>>>+    // contains.
>>>>>+    GElf_Rela rela;
>>>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>>>+
>>>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>>>+    // supported on x86 and arm64 as of v4.19.
>>>>>+    ksymtab_format format;
>>>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>>>+      {
>>>>>+      case R_X86_64_64: // Same as R_386_32
>>>>
>>>>add 'fallthrough' to the comments, else younger compilers will warn
>>>>about the potentially unintentional fallthrough.
>>>>
>>>>>+      case R_AARCH64_ABS64:
>>>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>>>+	break;
>>>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>>>+      case R_AARCH64_PREL32:
>>>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>>>+	break;
>>>>>+      default:
>>>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>>>
>>>>We should probably emit a warning in this case. Are there more
>>>>architectures that have PREL32 enabled as of now that are not yet
>>>>covered here?
>>>>
>>>>>+	break;
>>>>>+      }
>>>>>+    return format;
>>>>>+  }
>>>>>+
>>>>>/// Determine the format of the __ksymtab and __ksymtab_gpl
>>>>>/// sections.
>>>>>///
>>>>>@@ -7451,6 +7584,18 @@ public:
>>>>>   {
>>>>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>>>>	  {
>>>>>+            // Since Linux kernel modules are relocatable, we can first try
>>>>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>>>>+            if (is_linux_kernel_module())
>>>>>+              {
>>>>>+                ksymtab_format_ = get_ksymtab_format_module();
>>>>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>>>>+                  return ksymtab_format_;
>>>>>+              }
>>>>
>>>>nit: use tabs
>>>>
>>>>>+
>>>>>+	    // If it's not a kernel module or we couldn't determine its format
>>>>>+	    // with relocations, fall back to the heuristic below.
>>>>>+
>>>>>	    // OK this is a dirty little heuristic to determine the
>>>>>	    // format of the ksymtab section.
>>>>>	    //
>>>>>@@ -7553,48 +7698,24 @@ public:
>>>>> return nb_ksymtab_gpl_entries_;
>>>>>}
>>>>>
>>>>>-  /// Load a given kernel symbol table.
>>>>>+  /// Populate the symbol map by reading exported symbols from the
>>>>>+  /// ksymtab directly.
>>>>>///
>>>>>-  /// One can thus retrieve the resulting symbols by using the
>>>>>-  /// accessors read_context::linux_exported_fn_syms(),
>>>>>-  /// read_context::linux_exported_var_syms(),
>>>>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>>>>-  /// read_context::linux_exported_gpl_var_syms().
>>>>>+  /// @param section the ksymtab section to read from
>>>>>///
>>>>>-  /// @param kind the kind of kernel symbol table to load.
>>>>>+  /// @param exported_fns_set the set of exported functions
>>>>>+  ///
>>>>>+  /// @param exported_vars_set the set of exported variables
>>>>>+  ///
>>>>>+  /// @param nb_entries the number of ksymtab entries to read
>>>>>///
>>>>>/// @return true upon successful completion, false otherwise.
>>>>>bool
>>>>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>>>>+                                   address_set_sptr exported_fns_set,
>>>>>+                                   address_set_sptr exported_vars_set,
>>>>>+                                   size_t nb_entries)
>>>>>{
>>>>>-    size_t nb_entries = 0;
>>>>>-    Elf_Scn *section = 0;
>>>>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>>>-
>>>>>-    switch (kind)
>>>>>-      {
>>>>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>>>-	break;
>>>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>>>-	section = find_ksymtab_section();
>>>>>-	nb_entries = get_nb_ksymtab_entries();
>>>>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>>>-	break;
>>>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>>>-	section = find_ksymtab_gpl_section();
>>>>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>>>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>>>-	break;
>>>>>-      }
>>>>>-
>>>>>-    if (!linux_exported_vars_set
>>>>>-	|| !linux_exported_fns_set
>>>>>-	|| !section
>>>>>-	|| !nb_entries)
>>>>>-      return false;
>>>>>-
>>>>> // The data of the section.
>>>>> Elf_Data *elf_data = elf_rawdata(section, 0);
>>>>>
>>>>>@@ -7654,28 +7775,30 @@ public:
>>>>>	// (aka ELF symbol table).
>>>>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>>	if (!symbol)
>>>>>-	  {
>>>>>-	    adjusted_symbol_address =
>>>>>-	      maybe_adjust_var_sym_address(symbol_address);
>>>>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>>-	    if (!symbol)
>>>>>-	      // This must be a symbol that is of type neither FUNC
>>>>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>>>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>>>-	      // am not sure what those are.
>>>>>-	      continue;
>>>>>-	  }
>>>>>+          {
>>>>>+            adjusted_symbol_address =
>>>>>+              maybe_adjust_var_sym_address(symbol_address);
>>>>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>>>+            if (!symbol)
>>>>>+              {
>>>>>+	        // This must be a symbol that is of type neither FUNC
>>>>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>>>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>>>+	        // am not sure what those are.
>>>>>+	        continue;
>>>>>+	      }
>>>>>+          }
>>>>
>>>>This part of the diff seems to a change purely of the indentation/tabs.
>>>>Maybe you can revert this part.
>>>>
>>>>>
>>>>>	address_set_sptr set;
>>>>>	if (symbol->is_function())
>>>>>	  {
>>>>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>>>>-	    set = linux_exported_fns_set;
>>>>>+	    set = exported_fns_set;
>>>>>	  }
>>>>>	else if (symbol->is_variable())
>>>>>	  {
>>>>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>>>>-	    set = linux_exported_vars_set;
>>>>>+	    set = exported_vars_set;
>>>>>	  }
>>>>>	else
>>>>>	  ABG_ASSERT_NOT_REACHED;
>>>>>@@ -7684,6 +7807,157 @@ public:
>>>>> return true;
>>>>>}
>>>>>
>>>>>+  /// Populate the symbol map by extracting the exported symbols from a
>>>>>+  /// ksymtab rela section.
>>>>>+  ///
>>>>>+  /// @param section the ksymtab section to read from
>>>>>+  ///
>>>>>+  /// @param strings_section the __ksymtab_strings section
>>>>>+  ///
>>>>>+  /// @param exported_fns_set the set of exported functions
>>>>>+  ///
>>>>>+  /// @param exported_vars_set the set of exported variables
>>>>>+  ///
>>>>>+  /// @return true upon successful completion, false otherwise.
>>>>>+  bool
>>>>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>>>>+                                         address_set_sptr exported_fns_set,
>>>>>+                                         address_set_sptr exported_vars_set)
>>>>>+  {
>>>>>+    GElf_Shdr reloc_section_mem;
>>>>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>>>>
>>>>gelf_getshdr returns &reloc_section_mem. So you might want to use
>>>>reloc_section_mem directly rather than introducing reloc_section_mem.
>>>>
>>>>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>>>>+
>>>>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>>>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>>>>+
>>>>>+    GElf_Rela rela;
>>>>>+    elf_symbol_sptr symbol;
>>>>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>>>>+      {
>>>>>+        gelf_getrela(reloc_section_data, i, &rela);
>>>>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>>>+
>>>>>+        // This shouldn't happen..
>>>>>+        if (!symbol)
>>>>
>>>>Maybe ABG_ASSERT then?
>>>>
>>>>>+          return false;
>>>>>+
>>>>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>>>>+        if (symbol->get_shndx() == strings_ndx)
>>>>>+          continue;
>>>>>+
>>>>>+	if (!symbol->is_function() && !symbol->is_variable())
>>>>>+          {
>>>>>+            if (do_log())
>>>>>+              {
>>>>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>>>>+                  cerr << "skipping NOTYPE symbol "
>>>>>+                       << symbol->get_name()
>>>>>+                       << " shndx: "
>>>>>+                       << symbol->get_index()
>>>>>+                       << " @"
>>>>>+                       << elf_path()
>>>>>+                       << "\n";
>>>>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>>>>+                  cerr << "skipping SECTION symbol "
>>>>>+                       << "shndx: "
>>>>>+                       << symbol->get_index()
>>>>>+                       << " @"
>>>>>+                       << elf_path()
>>>>>+                       << "\n";
>>>>>+               }
>>>>>+            continue;
>>>>>+	  }
>>>>>+
>>>>>+	address_set_sptr set;
>>>>>+	if (symbol->is_function())
>>>>>+	  {
>>>>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>>>>+	    set = exported_fns_set;
>>>>>+	  }
>>>>>+	else if (symbol->is_variable())
>>>>>+	  {
>>>>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>>>>+	    set = exported_vars_set;
>>>>>+	  }
>>>>>+	else
>>>>>+	  ABG_ASSERT_NOT_REACHED;
>>>>>+	set->insert(symbol->get_value());
>>>>>+      }
>>>>>+    return true;
>>>>>+  }
>>>>>+
>>>>>+  /// Load a given kernel symbol table.
>>>>>+  ///
>>>>>+  /// One can thus retrieve the resulting symbols by using the
>>>>>+  /// accessors read_context::linux_exported_fn_syms(),
>>>>>+  /// read_context::linux_exported_var_syms(),
>>>>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>>>>+  /// read_context::linux_exported_gpl_var_syms().
>>>>>+  ///
>>>>>+  /// @param kind the kind of kernel symbol table to load.
>>>>>+  ///
>>>>>+  /// @return true upon successful completion, false otherwise.
>>>>>+  bool
>>>>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>>>+  {
>>>>>+    size_t nb_entries = 0;
>>>>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>>>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>>>+
>>>>>+    switch (kind)
>>>>>+      {
>>>>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>>>+	break;
>>>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>>>+	section = find_ksymtab_section();
>>>>>+	reloc_section = find_ksymtab_reloc_section();
>>>>>+	nb_entries = get_nb_ksymtab_entries();
>>>>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>>>+	break;
>>>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>>>+	section = find_ksymtab_gpl_section();
>>>>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>>>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>>>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>>>+	break;
>>>>>+      }
>>>>>+
>>>>>+    strings_section = find_ksymtab_strings_section();
>>>>>+
>>>>>+    if (!linux_exported_vars_set
>>>>>+	|| !linux_exported_fns_set
>>>>>+	|| !section
>>>>>+	|| !strings_section
>>>>>+	|| !nb_entries)
>>>>>+      return false;
>>>>>+
>>>>>+    GElf_Ehdr eh_mem;
>>>>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>>>>+    ksymtab_format format = get_ksymtab_format();
>>>>>+
>>>>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>>>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>>>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>>>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>>>>+    // apply those relocations and we're safe to read the relocation section to
>>>>>+    // determine which exported symbols are in the ksymtab.
>>>>>+    //
>>>>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>>>>+    // supported as of now.
>>>>>+    if (!reloc_section
>>>>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>>>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>>>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>>>>+                                              linux_exported_vars_set, nb_entries);
>>>>>+    else
>>>>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>>>>+                                                    linux_exported_fns_set,
>>>>>+                                                    linux_exported_vars_set);
>>>>>+  }
>>>>>+
>>>>>/// Load the special __ksymtab section. This is for linux kernel
>>>>>/// (module) files.
>>>>>///
>>>>>@@ -8352,6 +8626,17 @@ public:
>>>>>is_linux_kernel_binary() const
>>>>>{return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>>>>
>>>>>+  /// Guess if the current binary is a Linux Kernel module.
>>>>>+  ///
>>>>>+  /// To guess that, the function looks for the presence of the
>>>>>+  /// special ".modinfo" section in the binary.
>>>>>+  ///
>>>>>+  bool
>>>>>+  is_linux_kernel_module() const
>>>>>+  {
>>>>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>>>>+  }
>>>>>+
>>>>
>>>>Also refer to this proposal to restrict checking for a module a bit
>>>>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>>>>
>>>>>/// Getter of the "show_stats" flag.
>>>>>///
>>>>>/// This flag tells if we should emit statistics about various
>>>>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>>>> elf_symbol::create(env,
>>>>>		       /*symbol index=*/ 0,
>>>>>		       /*symbol size=*/ 0,
>>>>>+		       /*symbol value=*/ 0,
>>>>>+		       /*symbol shndx=*/ 0,
>>>>>		       sym_name,
>>>>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>>>> elf_symbol::create(env,
>>>>>		       /*symbol index=*/ 0,
>>>>>		       /*symbol size=*/ 0,
>>>>>+		       /*symbol value=*/ 0,
>>>>>+		       /*symbol shndx=*/ 0,
>>>>>		       sym_name,
>>>>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>>>>index a5243c3..ad1c261 100644
>>>>>--- a/src/abg-ir.cc
>>>>>+++ b/src/abg-ir.cc
>>>>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>>>>const environment*	env_;
>>>>>size_t		index_;
>>>>>size_t		size_;
>>>>>+  uint64_t		value_;
>>>>>+  uint32_t		shndx_;
>>>>>string		name_;
>>>>>elf_symbol::type	type_;
>>>>>elf_symbol::binding	binding_;
>>>>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>>>> : env_(),
>>>>>   index_(),
>>>>>   size_(),
>>>>>+      value_(),
>>>>>+      shndx_(),
>>>>>   type_(elf_symbol::NOTYPE_TYPE),
>>>>>   binding_(elf_symbol::GLOBAL_BINDING),
>>>>>   visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>>>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>>>>priv(const environment*		e,
>>>>>    size_t				i,
>>>>>    size_t				s,
>>>>>+       uint64_t				v,
>>>>>+       uint32_t				x,
>>>>>    const string&			n,
>>>>>    elf_symbol::type		t,
>>>>>    elf_symbol::binding		b,
>>>>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>>>> : env_(e),
>>>>>   index_(i),
>>>>>   size_(s),
>>>>>+      value_(v),
>>>>>+      shndx_(x),
>>>>>   name_(n),
>>>>>   type_(t),
>>>>>   binding_(b),
>>>>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>>>>///
>>>>>/// @param s the size of the symbol.
>>>>>///
>>>>>+/// @param v the value of the symbol.
>>>>>+///
>>>>>+/// @param x the section header index (shndx) of the symbol.
>>>>>+///
>>>>>/// @param n the name of the symbol.
>>>>>///
>>>>>/// @param t the type of the symbol.
>>>>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>>>>elf_symbol::elf_symbol(const environment*	e,
>>>>>		       size_t			i,
>>>>>		       size_t			s,
>>>>>+		       uint64_t			v,
>>>>>+		       uint32_t			x,
>>>>>		       const string&		n,
>>>>>		       type			t,
>>>>>		       binding			b,
>>>>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>>>>		       bool			c,
>>>>>		       const version&		ve,
>>>>>		       visibility		vi)
>>>>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>>>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>>>>{}
>>>>>
>>>>>/// Factory of instances of @ref elf_symbol.
>>>>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>>>>///
>>>>>/// @param s the size of the symbol.
>>>>>///
>>>>>+/// @param v the value of the symbol.
>>>>>+///
>>>>>+/// @param x the section header index (shndx) of the symbol.
>>>>>+///
>>>>>/// @param n the name of the symbol.
>>>>>///
>>>>>/// @param t the type of the symbol.
>>>>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>>>>elf_symbol::create(const environment*	e,
>>>>>		   size_t		i,
>>>>>		   size_t		s,
>>>>>+		   uint64_t		v,
>>>>>+		   uint32_t		x,
>>>>>		   const string&	n,
>>>>>		   type		t,
>>>>>		   binding		b,
>>>>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>>>>		   const version&	ve,
>>>>>		   visibility		vi)
>>>>>{
>>>>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>>>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>>>>sym->priv_->main_symbol_ = sym;
>>>>>return sym;
>>>>>}
>>>>>@@ -1411,6 +1431,20 @@ void
>>>>>elf_symbol::set_index(size_t s)
>>>>>{priv_->index_ = s;}
>>>>>
>>>>>+/// Getter for the symbol value.
>>>>>+///
>>>>>+/// @return the value of the symbol.
>>>>>+uint64_t
>>>>>+elf_symbol::get_value() const
>>>>>+{return priv_->value_;}
>>>>>+
>>>>>+/// Getter for the symbol section header index.
>>>>>+///
>>>>>+/// @return the section header index of the symbol.
>>>>>+uint32_t
>>>>>+elf_symbol::get_shndx() const
>>>>>+{return priv_->shndx_;}
>>>>>+
>>>>>/// Getter for the name of the @ref elf_symbol.
>>>>>///
>>>>>/// @return a reference to the name of the @ref symbol.
>>>>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>>>>index 810a65f..3c481ad 100644
>>>>>--- a/src/abg-reader.cc
>>>>>+++ b/src/abg-reader.cc
>>>>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>>>>
>>>>>const environment* env = ctxt.get_environment();
>>>>>elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>>>>+					 /*value=*/0,
>>>>>+					 /*shndx=*/0,
>>>>>					 name, type, binding,
>>>>>					 is_defined, is_common,
>>>>>					 version, visibility);
>>>>>-- 
>>>>>2.16.4
>>>>>
>>>>
>>>>Nice work! Thank you for that!
>>>>The patch looks in general good to me. I am a bit concerned about the
>>>>platform support. There are some corner cases that need to be covered.
>>>>See the testing below.
>>>>
>>>>You might find the .clang-format file useful to match the coding style
>>>>at some points (especially tabs vs spaces). Although when reviewing the
>>>>patch, I had a bit trouble with unchanged but reformatted lines as they
>>>>were confusing my diff. Maybe you can revert the changes that were
>>>>purely for formatting and we take care of formatting in a later patch.
>>>>
>>>>I conducted some testing. I choose a minimal configuration with modules:
>>>>
>>>>$ export ARCH=<arch>
>>>>$ make allnoconfig
>>>>$ ./scripts/config   \
>>>>  -e DEBUG_INFO    \  # dwarf info
>>>>  -e MODULES       \  # module support
>>>>  -e USB_SUPPORT   \
>>>>  -m USB           \
>>>>  -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>>>>  -m USB_GADGET       # depends on usb-common.ko, which I test with
>>>>
>>>>$ make olddefconfig
>>>>$ make
>>>>$ abidw ./drivers/usb/common/usb-common.ko | less
>>>># then check for the existence of the elf-function-symbols section
>>>>
>>>>                 master         patched
>>>>Linux v4.18
>>>>
>>>>x86_64             works            works
>>>>x86                works            works
>>>>arm64              works            works
>>>>
>>>>Linux v4.19
>>>>
>>>>x86_64             broken           works
>>>>x86                broken           STILL BROKEN
>>>>arm64              broken           works
>>>>
>>>>So, for x86_64 and arm64 the situation got better and elf symbols are
>>>>now identified. For x86, this patch did not improve the situation. I did
>>>>not check that the list of symbols are entirely correct and that they
>>>>work downstream with abidiff etc. Just the pure existence is a very good
>>>>indicator for this patch to work.
>>>
>>>Hi Matthias! Thanks a bunch for your review! :-)
>>>
>>>I'm aware of the x86 issue and I think the diff below should fix it.
>>>Could you please let me know if it passes your x86 testing? (note, I have
>>>yet to address your other comments, so this is to be continued!)
>>
>>Hi Jessica!
>>
>>I confirm this patch is fixing the x86 testing as lined out above. I
>>also retested the other platforms successfully.
>>
>>Thanks! Great work!
>>
>>Cheers,
>>Matthias
>>
>>>
>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>index 835bf3d..ed8a8db 100644
>>>--- a/src/abg-dwarf-reader.cc
>>>+++ b/src/abg-dwarf-reader.cc
>>>@@ -7536,22 +7536,33 @@ public:
>>>  GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>  if (section_shdr->sh_size == 0)
>>>    return PRE_V4_19_KSYMTAB_FORMAT;
>>>-    /* SHT_REL sections not supported yet */
>>>-    if (section_shdr->sh_type == SHT_REL)
>>>-      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    bool is_relasec = (section_shdr->sh_type == SHT_RELA);
>>>
>>>  // If we still have a normal non-zeroed relocation section, we can guess
>>>  // what format the ksymtab is in depending on what types of relocs it
>>>  // contains.
>>>-    GElf_Rela rela;
>>>-    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>-    gelf_getrela(rela_section_data, 0, &rela);
>>>+
>>>+    int type;
>>>+    Elf_Data *section_data = elf_rawdata(section, 0);
>>>+    if (is_relasec)
>>>+      {
>>>+        GElf_Rela rela;
>>>+        gelf_getrela(section_data, 0, &rela);
>>>+        type = GELF_R_TYPE(rela.r_info);
>>>+      }
>>>+    else
>>>+      {
>>>+        GElf_Rel rel;
>>>+        gelf_getrel(section_data, 0, &rel);
>>>+        type = GELF_R_TYPE(rel.r_info);
>>>+      }
>>
>>Maybe we can deduplicate this a bit. Also further down.
>>Missing out that additional 'a' might be a bug source at a later time.
>
>Yeah, I wasn't sure how to deduplicate this since rels and relas are
>different data types and we have to use gelf_getrel+GElf_Rel and
>gelf_getrela+GElf_Rela on rels and relas respectively..

I believe, we could use templates here and specialize sh_type. Not sure
it will make the code shorter and immediately more readable though. How
about we go with your suggestion for now and see whether we can
deduplicate that in a later patch.

>>>  // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>  // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>  // supported on x86 and arm64 as of v4.19.
>>>  ksymtab_format format;
>>>-    switch (GELF_R_TYPE(rela.r_info))
>>>+    switch (type)
>>>    {
>>>    case R_X86_64_64: // Same as R_386_32
>>>    case R_AARCH64_ABS64:
>>>@@ -7831,12 +7842,22 @@ public:
>>>  Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>>  size_t strings_ndx = elf_ndxscn(strings_section);
>>>
>>>-    GElf_Rela rela;
>>>+    bool is_relasec = (reloc_section_shdr->sh_type == SHT_RELA);
>>>  elf_symbol_sptr symbol;
>>>  for (unsigned int i = 0; i < reloc_count; i++)
>>>    {
>>>-        gelf_getrela(reloc_section_data, i, &rela);
>>>-        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>+        if (is_relasec)
>>>+          {
>>>+            GElf_Rela rela;
>>>+            gelf_getrela(reloc_section_data, i, &rela);
>>>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>+          }
>>>+        else
>>>+          {
>>>+            GElf_Rel rel;
>>>+            gelf_getrel(reloc_section_data, i, &rel);
>>>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rel.r_info));
>>>+          }
>>>
>>>      // This shouldn't happen..
>>>      if (!symbol)
>>>@@ -7934,8 +7955,6 @@ public:
>>>	|| !nb_entries)
>>>    return false;
>>>
>>>-    GElf_Ehdr eh_mem;
>>>-    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>>  ksymtab_format format = get_ksymtab_format();
>>>
>>>  // Although pre-v4.19 kernel modules can have a relocation section for the
>>>@@ -7947,9 +7966,7 @@ public:
>>>  //
>>>  // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>>  // supported as of now.
>>
>>The comment might need updating.
>
>Thanks for catching that!
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00 ` Matthias Maennich via libabigail
  2019-01-01  0:00   ` Jessica Yu
@ 2019-01-01  0:00   ` Jessica Yu
  2019-01-01  0:00     ` Matthias Maennich via libabigail
  2019-01-01  0:00   ` Jessica Yu
  2 siblings, 1 reply; 12+ messages in thread
From: Jessica Yu @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, Dodji Seketeli, kernel-team

+++ Matthias Maennich [19/08/19 09:05 +0100]:
>Hi Jessica!
>
>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>ksymtab changed in v4.19 to use relative references instead of absolute
>>references. This changes the type of relocations emitted for ksymtab
>>sections to be place-relative 32-bit relocations instead of absolute
>>relocations. One side-effect of this is that libdwfl will not relocate
>>the ksymtab sections due to the PC-relative relocations. This breaks
>>load_kernel_symbol_table() for kernel modules because it only reads in
>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>determine what exported symbols it refers to. Since a vmlinux binary is
>>already fully linked and relocated (and therefore we can read its
>>ksymtab section just fine), this problem is only relevant to Linux
>>kernel modules.
>>
>>To work around this, we utilize the ksymtab relocation sections to
>>determine which symbols the ksymtab entries refer to. We do this by
>>inspecting each relocation's r_info field for the symbol table index and
>>from there we are able to read each symbol's value and subsequently add
>>that to the set of exported symbols.
>>
>>In addition, for Linux kernel modules, we can utilize relocation types
>>to implement a new heuristic to determine the ksymtab format we have.
>>The presence of PC-relative relocations suggest the new v4.19 format,
>>and absolute relocation types suggest the old pre v4.19 format.
>>
>>        * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>        symbol value and shndx parameters.
>>        (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>        * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>        (elf_symbol::priv::priv): Adjust.
>>        (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>        (elf_symbol::create): Likewise.
>>        (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>        * src/abg-reader.cc (build_elf_symbol): Adjust.
>>        * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>        (lookup_symbol_from_gnu_hash_tab)
>>        (lookup_symbol_from_symtab): Adjust.
>>        (read_context::{ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_,
>>        ksymtab_strings_section_}): New data members.
>>        (read_context::read_context): Initialize ksymtab_reloc_section_,
>>        ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>        (read_context::{find_ksymtab_reloc_section,
>>        find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>        find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>        populate_symbol_map_from_ksymtab,
>>        populate_symbol_map_from_ksymtab_reloc,
>>        is_linux_kernel_module}): New member functions.
>>        (read_context::load_kernel_symbol_table): Adjust to call either
>>        populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>        format.
>>        (read_context::get_ksymtab_format): Adjust to call
>>        get_ksymtab_format_module for linux kernel modules.
>>        (read_context::lookup_elf_symbol_from_index): Adjust.
>>        (create_default_var_sym, create_default_fn_sym): Adjust.
>>
>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>---
>>include/abg-ir.h        |  10 ++
>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>src/abg-ir.cc           |  38 ++++-
>>src/abg-reader.cc       |   2 +
>>4 files changed, 388 insertions(+), 53 deletions(-)
>>
>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>index 170a844..d26f472 100644
>>--- a/include/abg-ir.h
>>+++ b/include/abg-ir.h
>>@@ -796,6 +796,8 @@ private:
>>  elf_symbol(const environment* e,
>>	     size_t		i,
>>	     size_t		s,
>>+	     uint64_t		v,
>>+	     uint32_t		x,
>
>I would like to see some comments here to describe the purpose of the
>struct's members. I know there weren't any before, but maybe that is a
>good chance.
>
>Also, I am not entirely sure why these are all one-character names, but
>it makes it hard to come from x to "section header index". Maybe we
>should give them more meaningful names.
>
>>	     const string&	n,
>>	     type		t,
>>	     binding		b,
>>@@ -818,6 +820,8 @@ public:
>>  create(const environment*	e,
>>	 size_t		i,
>>	 size_t		s,
>>+	 uint64_t	v,
>>+	 uint32_t	x,
>>	 const string&		n,
>>	 type			t,
>>	 binding		b,
>>@@ -838,6 +842,12 @@ public:
>>  void
>>  set_index(size_t);
>>
>>+  uint64_t
>>+  get_value() const;
>>+
>>+  uint32_t
>>+  get_shndx() const;
>>+
>>  const string&
>>  get_name() const;
>>
>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>index 47af7e3..835bf3d 100644
>>--- a/src/abg-dwarf-reader.cc
>>+++ b/src/abg-dwarf-reader.cc
>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>  GElf_Sym symbol;
>>  const char* sym_name_str;
>>  size_t sym_size;
>>+  uint64_t sym_value;
>>+  uint32_t sym_shndx;
>>  elf_symbol::type sym_type;
>>  elf_symbol::binding sym_binding;
>>  elf_symbol::visibility sym_visibility;
>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	  sym_visibility =
>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>	  sym_size = symbol.st_size;
>>+	  sym_value = symbol.st_value;
>>+	  sym_shndx = symbol.st_shndx;
>>	  elf_symbol::version ver;
>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>				     /*get_def_version=*/true, ver))
>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>	    elf_symbol::create(env,
>>			       symbol_index,
>>			       sym_size,
>>+			       sym_value,
>>+			       sym_shndx,
>>			       sym_name_str,
>>			       sym_type,
>>			       sym_binding,
>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>
>>	  elf_symbol_sptr symbol_found =
>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>+			       symbol.st_shndx, sym_name_str,
>>			       sym_type, sym_binding,
>>			       symbol.st_shndx != SHN_UNDEF,
>>			       symbol.st_shndx == SHN_COMMON,
>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>	    ABG_ASSERT(!ver.str().empty());
>>	  elf_symbol_sptr symbol_found =
>>	    elf_symbol::create(env, i, sym->st_size,
>>+			       sym->st_value, sym->st_shndx,
>>			       name_str, sym_type,
>>			       sym_binding, sym_is_defined,
>>			       sym_is_common, ver, sym_visibility);
>>@@ -3081,7 +3089,10 @@ public:
>>  /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>  /// linux kernel.
>>  Elf_Scn*			ksymtab_section_;
>>+  Elf_Scn*			ksymtab_reloc_section_;
>>  Elf_Scn*			ksymtab_gpl_section_;
>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>+  Elf_Scn*			ksymtab_strings_section_;
>>  Elf_Scn*			versym_section_;
>>  Elf_Scn*			verdef_section_;
>>  Elf_Scn*			verneed_section_;
>>@@ -3277,7 +3288,10 @@ public:
>>    nb_ksymtab_entries_ = 0;
>>    nb_ksymtab_gpl_entries_ = 0;
>>    ksymtab_section_ = 0;
>>+    ksymtab_reloc_section_ = 0;
>>    ksymtab_gpl_section_ = 0;
>>+    ksymtab_gpl_reloc_section_ = 0;
>>+    ksymtab_strings_section_ = 0;
>>    versym_section_ = 0;
>>    verdef_section_ = 0;
>>    verneed_section_ = 0;
>>@@ -6079,6 +6093,23 @@ public:
>>    return ksymtab_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>+  /// a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_reloc_section() const
>>+  {
>>+    if (!ksymtab_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_reloc_section_;
>>+  }
>>+
>>  /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>  /// (either a vmlinux binary or a kernel module).
>>  ///
>>@@ -6092,6 +6123,36 @@ public:
>>    return ksymtab_gpl_section_;
>>  }
>>
>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_gpl_reloc_section() const
>>+  {
>>+    if (!ksymtab_gpl_reloc_section_)
>>+      {
>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>+        if (!sec)
>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>+      }
>>+    return ksymtab_gpl_reloc_section_;
>>+  }
>>+
>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>+  /// (either a vmlinux binary or a kernel module).
>>+  ///
>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>+  Elf_Scn*
>>+  find_ksymtab_strings_section() const
>>+  {
>>+    if (!ksymtab_strings_section_)
>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>+    return ksymtab_strings_section_;
>>+  }
>>+
>
>We should find a way to deduplicate these functions. Not necessarily in
>this patch, but in a followup. Most of them only differ in the section
>name and whether they check for an alternative. Something generic like
>
> struct SectionQuery{ string name, int type };
> Elf_Scn* find_section(const vector<SectionQuery>&);
>
>should be sufficient to forward from all the other variants to one
>implementation.
>
>>  /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>  /// only the __ksymtab_gpl exists.
>>  ///
>>@@ -6106,6 +6167,19 @@ public:
>>    return result;
>>  }
>>
>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>+  ///
>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>+  Elf_Scn*
>>+  find_any_ksymtab_reloc_section() const
>>+  {
>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>+    if (!result)
>>+      result = find_ksymtab_gpl_reloc_section();
>>+    return result;
>>+  }
>>+
>>  /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>  /// sections that are involved in symbol versionning.
>>  ///
>>@@ -6283,7 +6357,8 @@ public:
>>      stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>
>>    elf_symbol_sptr sym =
>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>+			 s->st_value, s->st_shndx, name_str,
>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>			 sym_is_defined, sym_is_common, ver, vis);
>>@@ -7435,6 +7510,64 @@ public:
>>    return symbol;
>>  }
>>
>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>+  /// sections of Linux kernel modules.
>>+  ///
>>+  /// This is important because we need the know the format of these
>
>s/need the/need to/                       ^^^
>
>>+  /// sections to be able to read from them.
>>+  ///
>>+  /// @return the format the __ksymtab[_gpl] sections.
>>+  enum ksymtab_format
>>+  get_ksymtab_format_module() const
>>+  {
>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>+
>>+    if (!section)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>+    // knows how to and it will zero a relocation after applying it. If the
>
>will zero the relocation info after
>
>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>+    // they will have been all applied and sh_size will be 0. For arches that
>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>+    // kernel modules.
>>+    GElf_Shdr section_mem;
>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>+    if (section_shdr->sh_size == 0)
>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>+    /* SHT_REL sections not supported yet */
>>+    if (section_shdr->sh_type == SHT_REL)
>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>+
>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>+    // what format the ksymtab is in depending on what types of relocs it
>>+    // contains.
>>+    GElf_Rela rela;
>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>+    gelf_getrela(rela_section_data, 0, &rela);
>>+
>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>+    // supported on x86 and arm64 as of v4.19.
>>+    ksymtab_format format;
>>+    switch (GELF_R_TYPE(rela.r_info))
>>+      {
>>+      case R_X86_64_64: // Same as R_386_32
>
>add 'fallthrough' to the comments, else younger compilers will warn
>about the potentially unintentional fallthrough.
>
>>+      case R_AARCH64_ABS64:
>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>+      case R_AARCH64_PREL32:
>>+	format = V4_19_KSYMTAB_FORMAT;
>>+	break;
>>+      default:
>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>
>We should probably emit a warning in this case. Are there more
>architectures that have PREL32 enabled as of now that are not yet
>covered here?
>
>>+	break;
>>+      }
>>+    return format;
>>+  }
>>+
>>  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>  /// sections.
>>  ///
>>@@ -7451,6 +7584,18 @@ public:
>>      {
>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>	  {
>>+            // Since Linux kernel modules are relocatable, we can first try
>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>+            if (is_linux_kernel_module())
>>+              {
>>+                ksymtab_format_ = get_ksymtab_format_module();
>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>+                  return ksymtab_format_;
>>+              }
>
>nit: use tabs
>
>>+
>>+	    // If it's not a kernel module or we couldn't determine its format
>>+	    // with relocations, fall back to the heuristic below.
>>+
>>	    // OK this is a dirty little heuristic to determine the
>>	    // format of the ksymtab section.
>>	    //
>>@@ -7553,48 +7698,24 @@ public:
>>    return nb_ksymtab_gpl_entries_;
>>  }
>>
>>-  /// Load a given kernel symbol table.
>>+  /// Populate the symbol map by reading exported symbols from the
>>+  /// ksymtab directly.
>>  ///
>>-  /// One can thus retrieve the resulting symbols by using the
>>-  /// accessors read_context::linux_exported_fn_syms(),
>>-  /// read_context::linux_exported_var_syms(),
>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>-  /// read_context::linux_exported_gpl_var_syms().
>>+  /// @param section the ksymtab section to read from
>>  ///
>>-  /// @param kind the kind of kernel symbol table to load.
>>+  /// @param exported_fns_set the set of exported functions
>>+  ///
>>+  /// @param exported_vars_set the set of exported variables
>>+  ///
>>+  /// @param nb_entries the number of ksymtab entries to read
>>  ///
>>  /// @return true upon successful completion, false otherwise.
>>  bool
>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>+                                   address_set_sptr exported_fns_set,
>>+                                   address_set_sptr exported_vars_set,
>>+                                   size_t nb_entries)
>>  {
>>-    size_t nb_entries = 0;
>>-    Elf_Scn *section = 0;
>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>-
>>-    switch (kind)
>>-      {
>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>-	break;
>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>-	section = find_ksymtab_section();
>>-	nb_entries = get_nb_ksymtab_entries();
>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>-	break;
>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>-	section = find_ksymtab_gpl_section();
>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>-	break;
>>-      }
>>-
>>-    if (!linux_exported_vars_set
>>-	|| !linux_exported_fns_set
>>-	|| !section
>>-	|| !nb_entries)
>>-      return false;
>>-
>>    // The data of the section.
>>    Elf_Data *elf_data = elf_rawdata(section, 0);
>>
>>@@ -7654,28 +7775,30 @@ public:
>>	// (aka ELF symbol table).
>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>	if (!symbol)
>>-	  {
>>-	    adjusted_symbol_address =
>>-	      maybe_adjust_var_sym_address(symbol_address);
>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>-	    if (!symbol)
>>-	      // This must be a symbol that is of type neither FUNC
>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>-	      // am not sure what those are.
>>-	      continue;
>>-	  }
>>+          {
>>+            adjusted_symbol_address =
>>+              maybe_adjust_var_sym_address(symbol_address);
>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>+            if (!symbol)
>>+              {
>>+	        // This must be a symbol that is of type neither FUNC
>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>+	        // am not sure what those are.
>>+	        continue;
>>+	      }
>>+          }
>
>This part of the diff seems to a change purely of the indentation/tabs.
>Maybe you can revert this part.
>
>>
>>	address_set_sptr set;
>>	if (symbol->is_function())
>>	  {
>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>-	    set = linux_exported_fns_set;
>>+	    set = exported_fns_set;
>>	  }
>>	else if (symbol->is_variable())
>>	  {
>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>-	    set = linux_exported_vars_set;
>>+	    set = exported_vars_set;
>>	  }
>>	else
>>	  ABG_ASSERT_NOT_REACHED;
>>@@ -7684,6 +7807,157 @@ public:
>>    return true;
>>  }
>>
>>+  /// Populate the symbol map by extracting the exported symbols from a
>>+  /// ksymtab rela section.
>>+  ///
>>+  /// @param section the ksymtab section to read from
>>+  ///
>>+  /// @param strings_section the __ksymtab_strings section
>>+  ///
>>+  /// @param exported_fns_set the set of exported functions
>>+  ///
>>+  /// @param exported_vars_set the set of exported variables
>>+  ///
>>+  /// @return true upon successful completion, false otherwise.
>>+  bool
>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>+                                         address_set_sptr exported_fns_set,
>>+                                         address_set_sptr exported_vars_set)
>>+  {
>>+    GElf_Shdr reloc_section_mem;
>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>
>gelf_getshdr returns &reloc_section_mem. So you might want to use
>reloc_section_mem directly rather than introducing reloc_section_mem.
>
>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>+
>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>+
>>+    GElf_Rela rela;
>>+    elf_symbol_sptr symbol;
>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>+      {
>>+        gelf_getrela(reloc_section_data, i, &rela);
>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>+
>>+        // This shouldn't happen..
>>+        if (!symbol)
>
>Maybe ABG_ASSERT then?
>
>>+          return false;
>>+
>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>+        if (symbol->get_shndx() == strings_ndx)
>>+          continue;
>>+
>>+	if (!symbol->is_function() && !symbol->is_variable())
>>+          {
>>+            if (do_log())
>>+              {
>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>+                  cerr << "skipping NOTYPE symbol "
>>+                       << symbol->get_name()
>>+                       << " shndx: "
>>+                       << symbol->get_index()
>>+                       << " @"
>>+                       << elf_path()
>>+                       << "\n";
>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>+                  cerr << "skipping SECTION symbol "
>>+                       << "shndx: "
>>+                       << symbol->get_index()
>>+                       << " @"
>>+                       << elf_path()
>>+                       << "\n";
>>+               }
>>+            continue;
>>+	  }
>>+
>>+	address_set_sptr set;
>>+	if (symbol->is_function())
>>+	  {
>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>+	    set = exported_fns_set;
>>+	  }
>>+	else if (symbol->is_variable())
>>+	  {
>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>+	    set = exported_vars_set;
>>+	  }
>>+	else
>>+	  ABG_ASSERT_NOT_REACHED;
>>+	set->insert(symbol->get_value());
>>+      }
>>+    return true;
>>+  }
>>+
>>+  /// Load a given kernel symbol table.
>>+  ///
>>+  /// One can thus retrieve the resulting symbols by using the
>>+  /// accessors read_context::linux_exported_fn_syms(),
>>+  /// read_context::linux_exported_var_syms(),
>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>+  /// read_context::linux_exported_gpl_var_syms().
>>+  ///
>>+  /// @param kind the kind of kernel symbol table to load.
>>+  ///
>>+  /// @return true upon successful completion, false otherwise.
>>+  bool
>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>+  {
>>+    size_t nb_entries = 0;
>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>+
>>+    switch (kind)
>>+      {
>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>+	break;
>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>+	section = find_ksymtab_section();
>>+	reloc_section = find_ksymtab_reloc_section();
>>+	nb_entries = get_nb_ksymtab_entries();
>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>+	break;
>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>+	section = find_ksymtab_gpl_section();
>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>+	break;
>>+      }
>>+
>>+    strings_section = find_ksymtab_strings_section();
>>+
>>+    if (!linux_exported_vars_set
>>+	|| !linux_exported_fns_set
>>+	|| !section
>>+	|| !strings_section
>>+	|| !nb_entries)
>>+      return false;
>>+
>>+    GElf_Ehdr eh_mem;
>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>+    ksymtab_format format = get_ksymtab_format();
>>+
>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>+    // apply those relocations and we're safe to read the relocation section to
>>+    // determine which exported symbols are in the ksymtab.
>>+    //
>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>+    // supported as of now.
>>+    if (!reloc_section
>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>+                                              linux_exported_vars_set, nb_entries);
>>+    else
>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>+                                                    linux_exported_fns_set,
>>+                                                    linux_exported_vars_set);
>>+  }
>>+
>>  /// Load the special __ksymtab section. This is for linux kernel
>>  /// (module) files.
>>  ///
>>@@ -8352,6 +8626,17 @@ public:
>>  is_linux_kernel_binary() const
>>  {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>
>>+  /// Guess if the current binary is a Linux Kernel module.
>>+  ///
>>+  /// To guess that, the function looks for the presence of the
>>+  /// special ".modinfo" section in the binary.
>>+  ///
>>+  bool
>>+  is_linux_kernel_module() const
>>+  {
>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>+  }
>>+
>
>Also refer to this proposal to restrict checking for a module a bit
>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>
>>  /// Getter of the "show_stats" flag.
>>  ///
>>  /// This flag tells if we should emit statistics about various
>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>    elf_symbol::create(env,
>>		       /*symbol index=*/ 0,
>>		       /*symbol size=*/ 0,
>>+		       /*symbol value=*/ 0,
>>+		       /*symbol shndx=*/ 0,
>>		       sym_name,
>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>    elf_symbol::create(env,
>>		       /*symbol index=*/ 0,
>>		       /*symbol size=*/ 0,
>>+		       /*symbol value=*/ 0,
>>+		       /*symbol shndx=*/ 0,
>>		       sym_name,
>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>index a5243c3..ad1c261 100644
>>--- a/src/abg-ir.cc
>>+++ b/src/abg-ir.cc
>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>  const environment*	env_;
>>  size_t		index_;
>>  size_t		size_;
>>+  uint64_t		value_;
>>+  uint32_t		shndx_;
>>  string		name_;
>>  elf_symbol::type	type_;
>>  elf_symbol::binding	binding_;
>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>    : env_(),
>>      index_(),
>>      size_(),
>>+      value_(),
>>+      shndx_(),
>>      type_(elf_symbol::NOTYPE_TYPE),
>>      binding_(elf_symbol::GLOBAL_BINDING),
>>      visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>  priv(const environment*		e,
>>       size_t				i,
>>       size_t				s,
>>+       uint64_t				v,
>>+       uint32_t				x,
>>       const string&			n,
>>       elf_symbol::type		t,
>>       elf_symbol::binding		b,
>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>    : env_(e),
>>      index_(i),
>>      size_(s),
>>+      value_(v),
>>+      shndx_(x),
>>      name_(n),
>>      type_(t),
>>      binding_(b),
>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>///
>>/// @param s the size of the symbol.
>>///
>>+/// @param v the value of the symbol.
>>+///
>>+/// @param x the section header index (shndx) of the symbol.
>>+///
>>/// @param n the name of the symbol.
>>///
>>/// @param t the type of the symbol.
>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>elf_symbol::elf_symbol(const environment*	e,
>>		       size_t			i,
>>		       size_t			s,
>>+		       uint64_t			v,
>>+		       uint32_t			x,
>>		       const string&		n,
>>		       type			t,
>>		       binding			b,
>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>		       bool			c,
>>		       const version&		ve,
>>		       visibility		vi)
>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>{}
>>
>>/// Factory of instances of @ref elf_symbol.
>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>///
>>/// @param s the size of the symbol.
>>///
>>+/// @param v the value of the symbol.
>>+///
>>+/// @param x the section header index (shndx) of the symbol.
>>+///
>>/// @param n the name of the symbol.
>>///
>>/// @param t the type of the symbol.
>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>elf_symbol::create(const environment*	e,
>>		   size_t		i,
>>		   size_t		s,
>>+		   uint64_t		v,
>>+		   uint32_t		x,
>>		   const string&	n,
>>		   type		t,
>>		   binding		b,
>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>		   const version&	ve,
>>		   visibility		vi)
>>{
>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>  sym->priv_->main_symbol_ = sym;
>>  return sym;
>>}
>>@@ -1411,6 +1431,20 @@ void
>>elf_symbol::set_index(size_t s)
>>{priv_->index_ = s;}
>>
>>+/// Getter for the symbol value.
>>+///
>>+/// @return the value of the symbol.
>>+uint64_t
>>+elf_symbol::get_value() const
>>+{return priv_->value_;}
>>+
>>+/// Getter for the symbol section header index.
>>+///
>>+/// @return the section header index of the symbol.
>>+uint32_t
>>+elf_symbol::get_shndx() const
>>+{return priv_->shndx_;}
>>+
>>/// Getter for the name of the @ref elf_symbol.
>>///
>>/// @return a reference to the name of the @ref symbol.
>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>index 810a65f..3c481ad 100644
>>--- a/src/abg-reader.cc
>>+++ b/src/abg-reader.cc
>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>
>>  const environment* env = ctxt.get_environment();
>>  elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>+					 /*value=*/0,
>>+					 /*shndx=*/0,
>>					 name, type, binding,
>>					 is_defined, is_common,
>>					 version, visibility);
>>-- 
>>2.16.4
>>
>
>Nice work! Thank you for that!
>The patch looks in general good to me. I am a bit concerned about the
>platform support. There are some corner cases that need to be covered.
>See the testing below.
>
>You might find the .clang-format file useful to match the coding style
>at some points (especially tabs vs spaces). Although when reviewing the
>patch, I had a bit trouble with unchanged but reformatted lines as they
>were confusing my diff. Maybe you can revert the changes that were
>purely for formatting and we take care of formatting in a later patch.
>
>I conducted some testing. I choose a minimal configuration with modules:
>
> $ export ARCH=<arch>
> $ make allnoconfig
> $ ./scripts/config   \
>     -e DEBUG_INFO    \  # dwarf info
>     -e MODULES       \  # module support
>     -e USB_SUPPORT   \
>     -m USB           \
>     -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>     -m USB_GADGET       # depends on usb-common.ko, which I test with
>
> $ make olddefconfig
> $ make
> $ abidw ./drivers/usb/common/usb-common.ko | less
> # then check for the existence of the elf-function-symbols section
>
>                    master         patched
>Linux v4.18
>
> x86_64             works            works
> x86                works            works
> arm64              works            works
>
>Linux v4.19
>
> x86_64             broken           works
> x86                broken           STILL BROKEN
> arm64              broken           works
>
>So, for x86_64 and arm64 the situation got better and elf symbols are
>now identified. For x86, this patch did not improve the situation. I did
>not check that the list of symbols are entirely correct and that they
>work downstream with abidiff etc. Just the pure existence is a very good
>indicator for this patch to work.

Hi Matthias! Thanks a bunch for your review! :-)

I'm aware of the x86 issue and I think the diff below should fix it.
Could you please let me know if it passes your x86 testing? (note, I have
yet to address your other comments, so this is to be continued!)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 835bf3d..ed8a8db 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -7536,22 +7536,33 @@ public:
     GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
     if (section_shdr->sh_size == 0)
       return PRE_V4_19_KSYMTAB_FORMAT;
-    /* SHT_REL sections not supported yet */
-    if (section_shdr->sh_type == SHT_REL)
-      return UNDEFINED_KSYMTAB_FORMAT;
+
+    bool is_relasec = (section_shdr->sh_type == SHT_RELA);
 
     // If we still have a normal non-zeroed relocation section, we can guess
     // what format the ksymtab is in depending on what types of relocs it
     // contains.
-    GElf_Rela rela;
-    Elf_Data *rela_section_data = elf_rawdata(section, 0);
-    gelf_getrela(rela_section_data, 0, &rela);
+
+    int type;
+    Elf_Data *section_data = elf_rawdata(section, 0);
+    if (is_relasec)
+      {
+        GElf_Rela rela;
+        gelf_getrela(section_data, 0, &rela);
+        type = GELF_R_TYPE(rela.r_info);
+      }
+    else
+      {
+        GElf_Rel rel;
+        gelf_getrel(section_data, 0, &rel);
+        type = GELF_R_TYPE(rel.r_info);
+      }
 
     // Sigh, I dislike the arch-dependent code here, but this seems to be a
     // reliable heuristic for kernel modules for now. Relative ksymtabs only
     // supported on x86 and arm64 as of v4.19.
     ksymtab_format format;
-    switch (GELF_R_TYPE(rela.r_info))
+    switch (type)
       {
       case R_X86_64_64: // Same as R_386_32
       case R_AARCH64_ABS64:
@@ -7831,12 +7842,22 @@ public:
     Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
     size_t strings_ndx = elf_ndxscn(strings_section);
 
-    GElf_Rela rela;
+    bool is_relasec = (reloc_section_shdr->sh_type == SHT_RELA);
     elf_symbol_sptr symbol;
     for (unsigned int i = 0; i < reloc_count; i++)
       {
-        gelf_getrela(reloc_section_data, i, &rela);
-        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
+        if (is_relasec)
+          {
+            GElf_Rela rela;
+            gelf_getrela(reloc_section_data, i, &rela);
+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
+          }
+        else
+          {
+            GElf_Rel rel;
+            gelf_getrel(reloc_section_data, i, &rel);
+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rel.r_info));
+          }
 
         // This shouldn't happen..
         if (!symbol)
@@ -7934,8 +7955,6 @@ public:
 	|| !nb_entries)
       return false;
 
-    GElf_Ehdr eh_mem;
-    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
     ksymtab_format format = get_ksymtab_format();
 
     // Although pre-v4.19 kernel modules can have a relocation section for the
@@ -7947,9 +7966,7 @@ public:
     //
     // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
     // supported as of now.
-    if (!reloc_section
-	|| format == PRE_V4_19_KSYMTAB_FORMAT
-	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
+    if (!reloc_section || format == PRE_V4_19_KSYMTAB_FORMAT)
       return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
                                               linux_exported_vars_set, nb_entries);
     else

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00   ` Jessica Yu
@ 2019-01-01  0:00     ` Matthias Maennich via libabigail
  2019-01-01  0:00       ` Jessica Yu
  0 siblings, 1 reply; 12+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Jessica Yu; +Cc: libabigail, Dodji Seketeli, Android Kernel Team

On Tue, Sep 03, 2019 at 04:23:37PM +0200, Jessica Yu wrote:
>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>Hi Jessica!
>>
>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>references. This changes the type of relocations emitted for ksymtab
>>>sections to be place-relative 32-bit relocations instead of absolute
>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>already fully linked and relocated (and therefore we can read its
>>>ksymtab section just fine), this problem is only relevant to Linux
>>>kernel modules.
>>>
>>>To work around this, we utilize the ksymtab relocation sections to
>>>determine which symbols the ksymtab entries refer to. We do this by
>>>inspecting each relocation's r_info field for the symbol table index and
>>>from there we are able to read each symbol's value and subsequently add
>>>that to the set of exported symbols.
>>>
>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>to implement a new heuristic to determine the ksymtab format we have.
>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>and absolute relocation types suggest the old pre v4.19 format.
>>>
>>>       * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>       symbol value and shndx parameters.
>>>       (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>       * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>       (elf_symbol::priv::priv): Adjust.
>>>       (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>       (elf_symbol::create): Likewise.
>>>       (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>       * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>       * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>       (lookup_symbol_from_gnu_hash_tab)
>>>       (lookup_symbol_from_symtab): Adjust.
>>>       (read_context::{ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_,
>>>       ksymtab_strings_section_}): New data members.
>>>       (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>       (read_context::{find_ksymtab_reloc_section,
>>>       find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>       find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>       populate_symbol_map_from_ksymtab,
>>>       populate_symbol_map_from_ksymtab_reloc,
>>>       is_linux_kernel_module}): New member functions.
>>>       (read_context::load_kernel_symbol_table): Adjust to call either
>>>       populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>       format.
>>>       (read_context::get_ksymtab_format): Adjust to call
>>>       get_ksymtab_format_module for linux kernel modules.
>>>       (read_context::lookup_elf_symbol_from_index): Adjust.
>>>       (create_default_var_sym, create_default_fn_sym): Adjust.
>>>
>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>---
>>>include/abg-ir.h        |  10 ++
>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>src/abg-ir.cc           |  38 ++++-
>>>src/abg-reader.cc       |   2 +
>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>
>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>index 170a844..d26f472 100644
>>>--- a/include/abg-ir.h
>>>+++ b/include/abg-ir.h
>>>@@ -796,6 +796,8 @@ private:
>>> elf_symbol(const environment* e,
>>>	     size_t		i,
>>>	     size_t		s,
>>>+	     uint64_t		v,
>>>+	     uint32_t		x,
>>
>>I would like to see some comments here to describe the purpose of the
>>struct's members. I know there weren't any before, but maybe that is a
>>good chance.
>>
>>Also, I am not entirely sure why these are all one-character names, but
>>it makes it hard to come from x to "section header index". Maybe we
>>should give them more meaningful names.
>>
>>>	     const string&	n,
>>>	     type		t,
>>>	     binding		b,
>>>@@ -818,6 +820,8 @@ public:
>>> create(const environment*	e,
>>>	 size_t		i,
>>>	 size_t		s,
>>>+	 uint64_t	v,
>>>+	 uint32_t	x,
>>>	 const string&		n,
>>>	 type			t,
>>>	 binding		b,
>>>@@ -838,6 +842,12 @@ public:
>>> void
>>> set_index(size_t);
>>>
>>>+  uint64_t
>>>+  get_value() const;
>>>+
>>>+  uint32_t
>>>+  get_shndx() const;
>>>+
>>> const string&
>>> get_name() const;
>>>
>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>index 47af7e3..835bf3d 100644
>>>--- a/src/abg-dwarf-reader.cc
>>>+++ b/src/abg-dwarf-reader.cc
>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>> GElf_Sym symbol;
>>> const char* sym_name_str;
>>> size_t sym_size;
>>>+  uint64_t sym_value;
>>>+  uint32_t sym_shndx;
>>> elf_symbol::type sym_type;
>>> elf_symbol::binding sym_binding;
>>> elf_symbol::visibility sym_visibility;
>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	  sym_visibility =
>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>	  sym_size = symbol.st_size;
>>>+	  sym_value = symbol.st_value;
>>>+	  sym_shndx = symbol.st_shndx;
>>>	  elf_symbol::version ver;
>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>				     /*get_def_version=*/true, ver))
>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	    elf_symbol::create(env,
>>>			       symbol_index,
>>>			       sym_size,
>>>+			       sym_value,
>>>+			       sym_shndx,
>>>			       sym_name_str,
>>>			       sym_type,
>>>			       sym_binding,
>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>
>>>	  elf_symbol_sptr symbol_found =
>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>+			       symbol.st_shndx, sym_name_str,
>>>			       sym_type, sym_binding,
>>>			       symbol.st_shndx != SHN_UNDEF,
>>>			       symbol.st_shndx == SHN_COMMON,
>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>	  elf_symbol_sptr symbol_found =
>>>	    elf_symbol::create(env, i, sym->st_size,
>>>+			       sym->st_value, sym->st_shndx,
>>>			       name_str, sym_type,
>>>			       sym_binding, sym_is_defined,
>>>			       sym_is_common, ver, sym_visibility);
>>>@@ -3081,7 +3089,10 @@ public:
>>> /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>> /// linux kernel.
>>> Elf_Scn*			ksymtab_section_;
>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>> Elf_Scn*			ksymtab_gpl_section_;
>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>+  Elf_Scn*			ksymtab_strings_section_;
>>> Elf_Scn*			versym_section_;
>>> Elf_Scn*			verdef_section_;
>>> Elf_Scn*			verneed_section_;
>>>@@ -3277,7 +3288,10 @@ public:
>>>   nb_ksymtab_entries_ = 0;
>>>   nb_ksymtab_gpl_entries_ = 0;
>>>   ksymtab_section_ = 0;
>>>+    ksymtab_reloc_section_ = 0;
>>>   ksymtab_gpl_section_ = 0;
>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>+    ksymtab_strings_section_ = 0;
>>>   versym_section_ = 0;
>>>   verdef_section_ = 0;
>>>   verneed_section_ = 0;
>>>@@ -6079,6 +6093,23 @@ public:
>>>   return ksymtab_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>+  /// a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_reloc_section_;
>>>+  }
>>>+
>>> /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>> /// (either a vmlinux binary or a kernel module).
>>> ///
>>>@@ -6092,6 +6123,36 @@ public:
>>>   return ksymtab_gpl_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_gpl_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_gpl_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_gpl_reloc_section_;
>>>+  }
>>>+
>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_strings_section() const
>>>+  {
>>>+    if (!ksymtab_strings_section_)
>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>+    return ksymtab_strings_section_;
>>>+  }
>>>+
>>
>>We should find a way to deduplicate these functions. Not necessarily in
>>this patch, but in a followup. Most of them only differ in the section
>>name and whether they check for an alternative. Something generic like
>>
>>struct SectionQuery{ string name, int type };
>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>
>>should be sufficient to forward from all the other variants to one
>>implementation.
>>
>>> /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>> /// only the __ksymtab_gpl exists.
>>> ///
>>>@@ -6106,6 +6167,19 @@ public:
>>>   return result;
>>> }
>>>
>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>+  Elf_Scn*
>>>+  find_any_ksymtab_reloc_section() const
>>>+  {
>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>+    if (!result)
>>>+      result = find_ksymtab_gpl_reloc_section();
>>>+    return result;
>>>+  }
>>>+
>>> /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>> /// sections that are involved in symbol versionning.
>>> ///
>>>@@ -6283,7 +6357,8 @@ public:
>>>     stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>
>>>   elf_symbol_sptr sym =
>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>+			 s->st_value, s->st_shndx, name_str,
>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>@@ -7435,6 +7510,64 @@ public:
>>>   return symbol;
>>> }
>>>
>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>+  /// sections of Linux kernel modules.
>>>+  ///
>>>+  /// This is important because we need the know the format of these
>>
>>s/need the/need to/                       ^^^
>>
>>>+  /// sections to be able to read from them.
>>>+  ///
>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>+  enum ksymtab_format
>>>+  get_ksymtab_format_module() const
>>>+  {
>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>+
>>>+    if (!section)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>+    // knows how to and it will zero a relocation after applying it. If the
>>
>>will zero the relocation info after
>>
>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>+    // kernel modules.
>>>+    GElf_Shdr section_mem;
>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>+    if (section_shdr->sh_size == 0)
>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>+    /* SHT_REL sections not supported yet */
>>>+    if (section_shdr->sh_type == SHT_REL)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>+    // contains.
>>>+    GElf_Rela rela;
>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>+
>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>+    // supported on x86 and arm64 as of v4.19.
>>>+    ksymtab_format format;
>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>+      {
>>>+      case R_X86_64_64: // Same as R_386_32
>>
>>add 'fallthrough' to the comments, else younger compilers will warn
>>about the potentially unintentional fallthrough.
>>
>>>+      case R_AARCH64_ABS64:
>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>+      case R_AARCH64_PREL32:
>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      default:
>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>
>>We should probably emit a warning in this case. Are there more
>>architectures that have PREL32 enabled as of now that are not yet
>>covered here?
>
>On second thought, I don't think we should actually warn *here* at
>this point, inside get_ksymtab_format_module(), because then we would
>have to include/check for reloc types for all arches that libabigail
>supports.
>
>For instance if we have a ppc64 module, we would hit the default case
>and warn immediately, but instead we should be falling back to
>try_reading_first_ksymtab_entry_using_pre_v4_19_format(), which
>gets called when get_ksymtab_format_module() returns
>UNDEFINED_KSYMTAB_FORMAT. We should warn when we have already
>exhausted all methods of determining the ksymtab format, which is what
>the current code does. The current sequence of calls is (1) if it's a
>module, see if we recognize any of the special PREL32 reloc types that
>we know of. if not, try method (2), call try_reading_first_ksymtab_entry_using_pre_v4_19_format(),
>then try method (3), try_reading_first_ksymtab_entry_using_v4_19_format(),
>then finally do ABG_ASSERT when all fails. I hope I'm making sense..
>Does that sound reasonable?

Hi Jessica,

I think we are talking about the same thing. All I want to prevent is a
misleading fallback. So, yes, we should try all known methods and should
only warn if we are not sure whether the result we came up is actually
correct. Which again leads also to the question: Do you think it is
reasonable to have something in the kernel binaries to declare the
ksymtab format? In a separate section or a magic symbol? Something to
make it easier (and safer) for tools like libabigail?

I will have a look at your v2 of this now.

Cheers,
Matthias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00   ` Jessica Yu
@ 2019-01-01  0:00     ` Matthias Maennich via libabigail
  2019-01-01  0:00       ` Jessica Yu
  0 siblings, 1 reply; 12+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Jessica Yu; +Cc: libabigail, Dodji Seketeli, kernel-team

On Wed, Aug 21, 2019 at 02:34:37PM +0200, Jessica Yu wrote:
>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>Hi Jessica!
>>
>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>references. This changes the type of relocations emitted for ksymtab
>>>sections to be place-relative 32-bit relocations instead of absolute
>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>already fully linked and relocated (and therefore we can read its
>>>ksymtab section just fine), this problem is only relevant to Linux
>>>kernel modules.
>>>
>>>To work around this, we utilize the ksymtab relocation sections to
>>>determine which symbols the ksymtab entries refer to. We do this by
>>>inspecting each relocation's r_info field for the symbol table index and
>>>from there we are able to read each symbol's value and subsequently add
>>>that to the set of exported symbols.
>>>
>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>to implement a new heuristic to determine the ksymtab format we have.
>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>and absolute relocation types suggest the old pre v4.19 format.
>>>
>>>       * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>       symbol value and shndx parameters.
>>>       (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>       * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>       (elf_symbol::priv::priv): Adjust.
>>>       (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>       (elf_symbol::create): Likewise.
>>>       (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>       * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>       * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>       (lookup_symbol_from_gnu_hash_tab)
>>>       (lookup_symbol_from_symtab): Adjust.
>>>       (read_context::{ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_,
>>>       ksymtab_strings_section_}): New data members.
>>>       (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>       (read_context::{find_ksymtab_reloc_section,
>>>       find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>       find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>       populate_symbol_map_from_ksymtab,
>>>       populate_symbol_map_from_ksymtab_reloc,
>>>       is_linux_kernel_module}): New member functions.
>>>       (read_context::load_kernel_symbol_table): Adjust to call either
>>>       populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>       format.
>>>       (read_context::get_ksymtab_format): Adjust to call
>>>       get_ksymtab_format_module for linux kernel modules.
>>>       (read_context::lookup_elf_symbol_from_index): Adjust.
>>>       (create_default_var_sym, create_default_fn_sym): Adjust.
>>>
>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>---
>>>include/abg-ir.h        |  10 ++
>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>src/abg-ir.cc           |  38 ++++-
>>>src/abg-reader.cc       |   2 +
>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>
>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>index 170a844..d26f472 100644
>>>--- a/include/abg-ir.h
>>>+++ b/include/abg-ir.h
>>>@@ -796,6 +796,8 @@ private:
>>> elf_symbol(const environment* e,
>>>	     size_t		i,
>>>	     size_t		s,
>>>+	     uint64_t		v,
>>>+	     uint32_t		x,
>>
>>I would like to see some comments here to describe the purpose of the
>>struct's members. I know there weren't any before, but maybe that is a
>>good chance.
>>
>>Also, I am not entirely sure why these are all one-character names, but
>>it makes it hard to come from x to "section header index". Maybe we
>>should give them more meaningful names.
>>
>>>	     const string&	n,
>>>	     type		t,
>>>	     binding		b,
>>>@@ -818,6 +820,8 @@ public:
>>> create(const environment*	e,
>>>	 size_t		i,
>>>	 size_t		s,
>>>+	 uint64_t	v,
>>>+	 uint32_t	x,
>>>	 const string&		n,
>>>	 type			t,
>>>	 binding		b,
>>>@@ -838,6 +842,12 @@ public:
>>> void
>>> set_index(size_t);
>>>
>>>+  uint64_t
>>>+  get_value() const;
>>>+
>>>+  uint32_t
>>>+  get_shndx() const;
>>>+
>>> const string&
>>> get_name() const;
>>>
>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>index 47af7e3..835bf3d 100644
>>>--- a/src/abg-dwarf-reader.cc
>>>+++ b/src/abg-dwarf-reader.cc
>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>> GElf_Sym symbol;
>>> const char* sym_name_str;
>>> size_t sym_size;
>>>+  uint64_t sym_value;
>>>+  uint32_t sym_shndx;
>>> elf_symbol::type sym_type;
>>> elf_symbol::binding sym_binding;
>>> elf_symbol::visibility sym_visibility;
>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	  sym_visibility =
>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>	  sym_size = symbol.st_size;
>>>+	  sym_value = symbol.st_value;
>>>+	  sym_shndx = symbol.st_shndx;
>>>	  elf_symbol::version ver;
>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>				     /*get_def_version=*/true, ver))
>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	    elf_symbol::create(env,
>>>			       symbol_index,
>>>			       sym_size,
>>>+			       sym_value,
>>>+			       sym_shndx,
>>>			       sym_name_str,
>>>			       sym_type,
>>>			       sym_binding,
>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>
>>>	  elf_symbol_sptr symbol_found =
>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>+			       symbol.st_shndx, sym_name_str,
>>>			       sym_type, sym_binding,
>>>			       symbol.st_shndx != SHN_UNDEF,
>>>			       symbol.st_shndx == SHN_COMMON,
>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>	  elf_symbol_sptr symbol_found =
>>>	    elf_symbol::create(env, i, sym->st_size,
>>>+			       sym->st_value, sym->st_shndx,
>>>			       name_str, sym_type,
>>>			       sym_binding, sym_is_defined,
>>>			       sym_is_common, ver, sym_visibility);
>>>@@ -3081,7 +3089,10 @@ public:
>>> /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>> /// linux kernel.
>>> Elf_Scn*			ksymtab_section_;
>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>> Elf_Scn*			ksymtab_gpl_section_;
>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>+  Elf_Scn*			ksymtab_strings_section_;
>>> Elf_Scn*			versym_section_;
>>> Elf_Scn*			verdef_section_;
>>> Elf_Scn*			verneed_section_;
>>>@@ -3277,7 +3288,10 @@ public:
>>>   nb_ksymtab_entries_ = 0;
>>>   nb_ksymtab_gpl_entries_ = 0;
>>>   ksymtab_section_ = 0;
>>>+    ksymtab_reloc_section_ = 0;
>>>   ksymtab_gpl_section_ = 0;
>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>+    ksymtab_strings_section_ = 0;
>>>   versym_section_ = 0;
>>>   verdef_section_ = 0;
>>>   verneed_section_ = 0;
>>>@@ -6079,6 +6093,23 @@ public:
>>>   return ksymtab_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>+  /// a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_reloc_section_;
>>>+  }
>>>+
>>> /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>> /// (either a vmlinux binary or a kernel module).
>>> ///
>>>@@ -6092,6 +6123,36 @@ public:
>>>   return ksymtab_gpl_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_gpl_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_gpl_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_gpl_reloc_section_;
>>>+  }
>>>+
>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_strings_section() const
>>>+  {
>>>+    if (!ksymtab_strings_section_)
>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>+    return ksymtab_strings_section_;
>>>+  }
>>>+
>>
>>We should find a way to deduplicate these functions. Not necessarily in
>>this patch, but in a followup. Most of them only differ in the section
>>name and whether they check for an alternative. Something generic like
>>
>>struct SectionQuery{ string name, int type };
>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>
>>should be sufficient to forward from all the other variants to one
>>implementation.
>>
>>> /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>> /// only the __ksymtab_gpl exists.
>>> ///
>>>@@ -6106,6 +6167,19 @@ public:
>>>   return result;
>>> }
>>>
>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>+  Elf_Scn*
>>>+  find_any_ksymtab_reloc_section() const
>>>+  {
>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>+    if (!result)
>>>+      result = find_ksymtab_gpl_reloc_section();
>>>+    return result;
>>>+  }
>>>+
>>> /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>> /// sections that are involved in symbol versionning.
>>> ///
>>>@@ -6283,7 +6357,8 @@ public:
>>>     stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>
>>>   elf_symbol_sptr sym =
>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>+			 s->st_value, s->st_shndx, name_str,
>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>@@ -7435,6 +7510,64 @@ public:
>>>   return symbol;
>>> }
>>>
>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>+  /// sections of Linux kernel modules.
>>>+  ///
>>>+  /// This is important because we need the know the format of these
>>
>>s/need the/need to/                       ^^^
>>
>>>+  /// sections to be able to read from them.
>>>+  ///
>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>+  enum ksymtab_format
>>>+  get_ksymtab_format_module() const
>>>+  {
>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>+
>>>+    if (!section)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>+    // knows how to and it will zero a relocation after applying it. If the
>>
>>will zero the relocation info after
>>
>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>+    // kernel modules.
>>>+    GElf_Shdr section_mem;
>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>+    if (section_shdr->sh_size == 0)
>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>+    /* SHT_REL sections not supported yet */
>>>+    if (section_shdr->sh_type == SHT_REL)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>+    // contains.
>>>+    GElf_Rela rela;
>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>+
>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>+    // supported on x86 and arm64 as of v4.19.
>>>+    ksymtab_format format;
>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>+      {
>>>+      case R_X86_64_64: // Same as R_386_32
>>
>>add 'fallthrough' to the comments, else younger compilers will warn
>>about the potentially unintentional fallthrough.
>>
>>>+      case R_AARCH64_ABS64:
>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>+      case R_AARCH64_PREL32:
>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      default:
>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>
>>We should probably emit a warning in this case. Are there more
>>architectures that have PREL32 enabled as of now that are not yet
>>covered here?
>>
>>>+	break;
>>>+      }
>>>+    return format;
>>>+  }
>>>+
>>> /// Determine the format of the __ksymtab and __ksymtab_gpl
>>> /// sections.
>>> ///
>>>@@ -7451,6 +7584,18 @@ public:
>>>     {
>>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>>	  {
>>>+            // Since Linux kernel modules are relocatable, we can first try
>>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>>+            if (is_linux_kernel_module())
>>>+              {
>>>+                ksymtab_format_ = get_ksymtab_format_module();
>>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>>+                  return ksymtab_format_;
>>>+              }
>>
>>nit: use tabs
>>
>>>+
>>>+	    // If it's not a kernel module or we couldn't determine its format
>>>+	    // with relocations, fall back to the heuristic below.
>>>+
>>>	    // OK this is a dirty little heuristic to determine the
>>>	    // format of the ksymtab section.
>>>	    //
>>>@@ -7553,48 +7698,24 @@ public:
>>>   return nb_ksymtab_gpl_entries_;
>>> }
>>>
>>>-  /// Load a given kernel symbol table.
>>>+  /// Populate the symbol map by reading exported symbols from the
>>>+  /// ksymtab directly.
>>> ///
>>>-  /// One can thus retrieve the resulting symbols by using the
>>>-  /// accessors read_context::linux_exported_fn_syms(),
>>>-  /// read_context::linux_exported_var_syms(),
>>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>>-  /// read_context::linux_exported_gpl_var_syms().
>>>+  /// @param section the ksymtab section to read from
>>> ///
>>>-  /// @param kind the kind of kernel symbol table to load.
>>>+  /// @param exported_fns_set the set of exported functions
>>>+  ///
>>>+  /// @param exported_vars_set the set of exported variables
>>>+  ///
>>>+  /// @param nb_entries the number of ksymtab entries to read
>>> ///
>>> /// @return true upon successful completion, false otherwise.
>>> bool
>>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>>+                                   address_set_sptr exported_fns_set,
>>>+                                   address_set_sptr exported_vars_set,
>>>+                                   size_t nb_entries)
>>> {
>>>-    size_t nb_entries = 0;
>>>-    Elf_Scn *section = 0;
>>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>-
>>>-    switch (kind)
>>>-      {
>>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>-	break;
>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>-	section = find_ksymtab_section();
>>>-	nb_entries = get_nb_ksymtab_entries();
>>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>-	break;
>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>-	section = find_ksymtab_gpl_section();
>>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>-	break;
>>>-      }
>>>-
>>>-    if (!linux_exported_vars_set
>>>-	|| !linux_exported_fns_set
>>>-	|| !section
>>>-	|| !nb_entries)
>>>-      return false;
>>>-
>>>   // The data of the section.
>>>   Elf_Data *elf_data = elf_rawdata(section, 0);
>>>
>>>@@ -7654,28 +7775,30 @@ public:
>>>	// (aka ELF symbol table).
>>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>	if (!symbol)
>>>-	  {
>>>-	    adjusted_symbol_address =
>>>-	      maybe_adjust_var_sym_address(symbol_address);
>>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>-	    if (!symbol)
>>>-	      // This must be a symbol that is of type neither FUNC
>>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>-	      // am not sure what those are.
>>>-	      continue;
>>>-	  }
>>>+          {
>>>+            adjusted_symbol_address =
>>>+              maybe_adjust_var_sym_address(symbol_address);
>>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>+            if (!symbol)
>>>+              {
>>>+	        // This must be a symbol that is of type neither FUNC
>>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>+	        // am not sure what those are.
>>>+	        continue;
>>>+	      }
>>>+          }
>>
>>This part of the diff seems to a change purely of the indentation/tabs.
>>Maybe you can revert this part.
>>
>>>
>>>	address_set_sptr set;
>>>	if (symbol->is_function())
>>>	  {
>>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>>-	    set = linux_exported_fns_set;
>>>+	    set = exported_fns_set;
>>>	  }
>>>	else if (symbol->is_variable())
>>>	  {
>>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>>-	    set = linux_exported_vars_set;
>>>+	    set = exported_vars_set;
>>>	  }
>>>	else
>>>	  ABG_ASSERT_NOT_REACHED;
>>>@@ -7684,6 +7807,157 @@ public:
>>>   return true;
>>> }
>>>
>>>+  /// Populate the symbol map by extracting the exported symbols from a
>>>+  /// ksymtab rela section.
>>>+  ///
>>>+  /// @param section the ksymtab section to read from
>>>+  ///
>>>+  /// @param strings_section the __ksymtab_strings section
>>>+  ///
>>>+  /// @param exported_fns_set the set of exported functions
>>>+  ///
>>>+  /// @param exported_vars_set the set of exported variables
>>>+  ///
>>>+  /// @return true upon successful completion, false otherwise.
>>>+  bool
>>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>>+                                         address_set_sptr exported_fns_set,
>>>+                                         address_set_sptr exported_vars_set)
>>>+  {
>>>+    GElf_Shdr reloc_section_mem;
>>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>>
>>gelf_getshdr returns &reloc_section_mem. So you might want to use
>>reloc_section_mem directly rather than introducing reloc_section_mem.
>>
>>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>>+
>>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>>+
>>>+    GElf_Rela rela;
>>>+    elf_symbol_sptr symbol;
>>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>>+      {
>>>+        gelf_getrela(reloc_section_data, i, &rela);
>>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>+
>>>+        // This shouldn't happen..
>>>+        if (!symbol)
>>
>>Maybe ABG_ASSERT then?
>>
>>>+          return false;
>>>+
>>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>>+        if (symbol->get_shndx() == strings_ndx)
>>>+          continue;
>>>+
>>>+	if (!symbol->is_function() && !symbol->is_variable())
>>>+          {
>>>+            if (do_log())
>>>+              {
>>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>>+                  cerr << "skipping NOTYPE symbol "
>>>+                       << symbol->get_name()
>>>+                       << " shndx: "
>>>+                       << symbol->get_index()
>>>+                       << " @"
>>>+                       << elf_path()
>>>+                       << "\n";
>>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>>+                  cerr << "skipping SECTION symbol "
>>>+                       << "shndx: "
>>>+                       << symbol->get_index()
>>>+                       << " @"
>>>+                       << elf_path()
>>>+                       << "\n";
>>>+               }
>>>+            continue;
>>>+	  }
>>>+
>>>+	address_set_sptr set;
>>>+	if (symbol->is_function())
>>>+	  {
>>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>>+	    set = exported_fns_set;
>>>+	  }
>>>+	else if (symbol->is_variable())
>>>+	  {
>>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>>+	    set = exported_vars_set;
>>>+	  }
>>>+	else
>>>+	  ABG_ASSERT_NOT_REACHED;
>>>+	set->insert(symbol->get_value());
>>>+      }
>>>+    return true;
>>>+  }
>>>+
>>>+  /// Load a given kernel symbol table.
>>>+  ///
>>>+  /// One can thus retrieve the resulting symbols by using the
>>>+  /// accessors read_context::linux_exported_fn_syms(),
>>>+  /// read_context::linux_exported_var_syms(),
>>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>>+  /// read_context::linux_exported_gpl_var_syms().
>>>+  ///
>>>+  /// @param kind the kind of kernel symbol table to load.
>>>+  ///
>>>+  /// @return true upon successful completion, false otherwise.
>>>+  bool
>>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>+  {
>>>+    size_t nb_entries = 0;
>>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>+
>>>+    switch (kind)
>>>+      {
>>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>+	break;
>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>+	section = find_ksymtab_section();
>>>+	reloc_section = find_ksymtab_reloc_section();
>>>+	nb_entries = get_nb_ksymtab_entries();
>>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>+	break;
>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>+	section = find_ksymtab_gpl_section();
>>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>+	break;
>>>+      }
>>>+
>>>+    strings_section = find_ksymtab_strings_section();
>>>+
>>>+    if (!linux_exported_vars_set
>>>+	|| !linux_exported_fns_set
>>>+	|| !section
>>>+	|| !strings_section
>>>+	|| !nb_entries)
>>>+      return false;
>>>+
>>>+    GElf_Ehdr eh_mem;
>>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>>+    ksymtab_format format = get_ksymtab_format();
>>>+
>>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>>+    // apply those relocations and we're safe to read the relocation section to
>>>+    // determine which exported symbols are in the ksymtab.
>>>+    //
>>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>>+    // supported as of now.
>>>+    if (!reloc_section
>>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>>+                                              linux_exported_vars_set, nb_entries);
>>>+    else
>>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>>+                                                    linux_exported_fns_set,
>>>+                                                    linux_exported_vars_set);
>>>+  }
>>>+
>>> /// Load the special __ksymtab section. This is for linux kernel
>>> /// (module) files.
>>> ///
>>>@@ -8352,6 +8626,17 @@ public:
>>> is_linux_kernel_binary() const
>>> {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>>
>>>+  /// Guess if the current binary is a Linux Kernel module.
>>>+  ///
>>>+  /// To guess that, the function looks for the presence of the
>>>+  /// special ".modinfo" section in the binary.
>>>+  ///
>>>+  bool
>>>+  is_linux_kernel_module() const
>>>+  {
>>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>>+  }
>>>+
>>
>>Also refer to this proposal to restrict checking for a module a bit
>>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>>
>>> /// Getter of the "show_stats" flag.
>>> ///
>>> /// This flag tells if we should emit statistics about various
>>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>>   elf_symbol::create(env,
>>>		       /*symbol index=*/ 0,
>>>		       /*symbol size=*/ 0,
>>>+		       /*symbol value=*/ 0,
>>>+		       /*symbol shndx=*/ 0,
>>>		       sym_name,
>>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>>   elf_symbol::create(env,
>>>		       /*symbol index=*/ 0,
>>>		       /*symbol size=*/ 0,
>>>+		       /*symbol value=*/ 0,
>>>+		       /*symbol shndx=*/ 0,
>>>		       sym_name,
>>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>>index a5243c3..ad1c261 100644
>>>--- a/src/abg-ir.cc
>>>+++ b/src/abg-ir.cc
>>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>> const environment*	env_;
>>> size_t		index_;
>>> size_t		size_;
>>>+  uint64_t		value_;
>>>+  uint32_t		shndx_;
>>> string		name_;
>>> elf_symbol::type	type_;
>>> elf_symbol::binding	binding_;
>>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>>   : env_(),
>>>     index_(),
>>>     size_(),
>>>+      value_(),
>>>+      shndx_(),
>>>     type_(elf_symbol::NOTYPE_TYPE),
>>>     binding_(elf_symbol::GLOBAL_BINDING),
>>>     visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>> priv(const environment*		e,
>>>      size_t				i,
>>>      size_t				s,
>>>+       uint64_t				v,
>>>+       uint32_t				x,
>>>      const string&			n,
>>>      elf_symbol::type		t,
>>>      elf_symbol::binding		b,
>>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>>   : env_(e),
>>>     index_(i),
>>>     size_(s),
>>>+      value_(v),
>>>+      shndx_(x),
>>>     name_(n),
>>>     type_(t),
>>>     binding_(b),
>>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>>///
>>>/// @param s the size of the symbol.
>>>///
>>>+/// @param v the value of the symbol.
>>>+///
>>>+/// @param x the section header index (shndx) of the symbol.
>>>+///
>>>/// @param n the name of the symbol.
>>>///
>>>/// @param t the type of the symbol.
>>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>>elf_symbol::elf_symbol(const environment*	e,
>>>		       size_t			i,
>>>		       size_t			s,
>>>+		       uint64_t			v,
>>>+		       uint32_t			x,
>>>		       const string&		n,
>>>		       type			t,
>>>		       binding			b,
>>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>>		       bool			c,
>>>		       const version&		ve,
>>>		       visibility		vi)
>>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>>{}
>>>
>>>/// Factory of instances of @ref elf_symbol.
>>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>>///
>>>/// @param s the size of the symbol.
>>>///
>>>+/// @param v the value of the symbol.
>>>+///
>>>+/// @param x the section header index (shndx) of the symbol.
>>>+///
>>>/// @param n the name of the symbol.
>>>///
>>>/// @param t the type of the symbol.
>>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>>elf_symbol::create(const environment*	e,
>>>		   size_t		i,
>>>		   size_t		s,
>>>+		   uint64_t		v,
>>>+		   uint32_t		x,
>>>		   const string&	n,
>>>		   type		t,
>>>		   binding		b,
>>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>>		   const version&	ve,
>>>		   visibility		vi)
>>>{
>>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>> sym->priv_->main_symbol_ = sym;
>>> return sym;
>>>}
>>>@@ -1411,6 +1431,20 @@ void
>>>elf_symbol::set_index(size_t s)
>>>{priv_->index_ = s;}
>>>
>>>+/// Getter for the symbol value.
>>>+///
>>>+/// @return the value of the symbol.
>>>+uint64_t
>>>+elf_symbol::get_value() const
>>>+{return priv_->value_;}
>>>+
>>>+/// Getter for the symbol section header index.
>>>+///
>>>+/// @return the section header index of the symbol.
>>>+uint32_t
>>>+elf_symbol::get_shndx() const
>>>+{return priv_->shndx_;}
>>>+
>>>/// Getter for the name of the @ref elf_symbol.
>>>///
>>>/// @return a reference to the name of the @ref symbol.
>>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>>index 810a65f..3c481ad 100644
>>>--- a/src/abg-reader.cc
>>>+++ b/src/abg-reader.cc
>>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>>
>>> const environment* env = ctxt.get_environment();
>>> elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>>+					 /*value=*/0,
>>>+					 /*shndx=*/0,
>>>					 name, type, binding,
>>>					 is_defined, is_common,
>>>					 version, visibility);
>>>-- 
>>>2.16.4
>>>
>>
>>Nice work! Thank you for that!
>>The patch looks in general good to me. I am a bit concerned about the
>>platform support. There are some corner cases that need to be covered.
>>See the testing below.
>>
>>You might find the .clang-format file useful to match the coding style
>>at some points (especially tabs vs spaces). Although when reviewing the
>>patch, I had a bit trouble with unchanged but reformatted lines as they
>>were confusing my diff. Maybe you can revert the changes that were
>>purely for formatting and we take care of formatting in a later patch.
>>
>>I conducted some testing. I choose a minimal configuration with modules:
>>
>>$ export ARCH=<arch>
>>$ make allnoconfig
>>$ ./scripts/config   \
>>    -e DEBUG_INFO    \  # dwarf info
>>    -e MODULES       \  # module support
>>    -e USB_SUPPORT   \
>>    -m USB           \
>>    -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>>    -m USB_GADGET       # depends on usb-common.ko, which I test with
>>
>>$ make olddefconfig
>>$ make
>>$ abidw ./drivers/usb/common/usb-common.ko | less
>># then check for the existence of the elf-function-symbols section
>>
>>                   master         patched
>>Linux v4.18
>>
>>x86_64             works            works
>>x86                works            works
>>arm64              works            works
>>
>>Linux v4.19
>>
>>x86_64             broken           works
>>x86                broken           STILL BROKEN
>>arm64              broken           works
>>
>>So, for x86_64 and arm64 the situation got better and elf symbols are
>>now identified. For x86, this patch did not improve the situation. I did
>>not check that the list of symbols are entirely correct and that they
>>work downstream with abidiff etc. Just the pure existence is a very good
>>indicator for this patch to work.
>
>Hi Matthias! Thanks a bunch for your review! :-)
>
>I'm aware of the x86 issue and I think the diff below should fix it.
>Could you please let me know if it passes your x86 testing? (note, I have
>yet to address your other comments, so this is to be continued!)

Hi Jessica!

I confirm this patch is fixing the x86 testing as lined out above. I
also retested the other platforms successfully.

Thanks! Great work!

Cheers,
Matthias

>
>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>index 835bf3d..ed8a8db 100644
>--- a/src/abg-dwarf-reader.cc
>+++ b/src/abg-dwarf-reader.cc
>@@ -7536,22 +7536,33 @@ public:
>    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>    if (section_shdr->sh_size == 0)
>      return PRE_V4_19_KSYMTAB_FORMAT;
>-    /* SHT_REL sections not supported yet */
>-    if (section_shdr->sh_type == SHT_REL)
>-      return UNDEFINED_KSYMTAB_FORMAT;
>+
>+    bool is_relasec = (section_shdr->sh_type == SHT_RELA);
>
>    // If we still have a normal non-zeroed relocation section, we can guess
>    // what format the ksymtab is in depending on what types of relocs it
>    // contains.
>-    GElf_Rela rela;
>-    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>-    gelf_getrela(rela_section_data, 0, &rela);
>+
>+    int type;
>+    Elf_Data *section_data = elf_rawdata(section, 0);
>+    if (is_relasec)
>+      {
>+        GElf_Rela rela;
>+        gelf_getrela(section_data, 0, &rela);
>+        type = GELF_R_TYPE(rela.r_info);
>+      }
>+    else
>+      {
>+        GElf_Rel rel;
>+        gelf_getrel(section_data, 0, &rel);
>+        type = GELF_R_TYPE(rel.r_info);
>+      }

Maybe we can deduplicate this a bit. Also further down.
Missing out that additional 'a' might be a bug source at a later time.

>    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>    // supported on x86 and arm64 as of v4.19.
>    ksymtab_format format;
>-    switch (GELF_R_TYPE(rela.r_info))
>+    switch (type)
>      {
>      case R_X86_64_64: // Same as R_386_32
>      case R_AARCH64_ABS64:
>@@ -7831,12 +7842,22 @@ public:
>    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>    size_t strings_ndx = elf_ndxscn(strings_section);
>
>-    GElf_Rela rela;
>+    bool is_relasec = (reloc_section_shdr->sh_type == SHT_RELA);
>    elf_symbol_sptr symbol;
>    for (unsigned int i = 0; i < reloc_count; i++)
>      {
>-        gelf_getrela(reloc_section_data, i, &rela);
>-        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>+        if (is_relasec)
>+          {
>+            GElf_Rela rela;
>+            gelf_getrela(reloc_section_data, i, &rela);
>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>+          }
>+        else
>+          {
>+            GElf_Rel rel;
>+            gelf_getrel(reloc_section_data, i, &rel);
>+            symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rel.r_info));
>+          }
>
>        // This shouldn't happen..
>        if (!symbol)
>@@ -7934,8 +7955,6 @@ public:
>	|| !nb_entries)
>      return false;
>
>-    GElf_Ehdr eh_mem;
>-    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>    ksymtab_format format = get_ksymtab_format();
>
>    // Although pre-v4.19 kernel modules can have a relocation section for the
>@@ -7947,9 +7966,7 @@ public:
>    //
>    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>    // supported as of now.

The comment might need updating.

>-    if (!reloc_section
>-	|| format == PRE_V4_19_KSYMTAB_FORMAT
>-	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>+    if (!reloc_section || format == PRE_V4_19_KSYMTAB_FORMAT)
>      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>                                              linux_exported_vars_set, nb_entries);
>    else

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00 [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules Jessica Yu
@ 2019-01-01  0:00 ` Matthias Maennich via libabigail
  2019-01-01  0:00   ` Jessica Yu
                     ` (2 more replies)
  2019-01-01  0:00 ` Jessica Yu
  1 sibling, 3 replies; 12+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Jessica Yu; +Cc: libabigail, Dodji Seketeli, kernel-team

Hi Jessica!

On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>As described in commit ad8c2531fb9, the format of the Linux kernel
>ksymtab changed in v4.19 to use relative references instead of absolute
>references. This changes the type of relocations emitted for ksymtab
>sections to be place-relative 32-bit relocations instead of absolute
>relocations. One side-effect of this is that libdwfl will not relocate
>the ksymtab sections due to the PC-relative relocations. This breaks
>load_kernel_symbol_table() for kernel modules because it only reads in
>zeros from the unrelocated ksymtab section and is subsequently unable to
>determine what exported symbols it refers to. Since a vmlinux binary is
>already fully linked and relocated (and therefore we can read its
>ksymtab section just fine), this problem is only relevant to Linux
>kernel modules.
>
>To work around this, we utilize the ksymtab relocation sections to
>determine which symbols the ksymtab entries refer to. We do this by
>inspecting each relocation's r_info field for the symbol table index and
>from there we are able to read each symbol's value and subsequently add
>that to the set of exported symbols.
>
>In addition, for Linux kernel modules, we can utilize relocation types
>to implement a new heuristic to determine the ksymtab format we have.
>The presence of PC-relative relocations suggest the new v4.19 format,
>and absolute relocation types suggest the old pre v4.19 format.
>
>         * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>         symbol value and shndx parameters.
>         (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>         * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>         (elf_symbol::priv::priv): Adjust.
>         (elf_symbol::elf_symbol): Take new value and shndx parameters.
>         (elf_symbol::create): Likewise.
>         (elf_symbol::{get_value, get_shndx}): Define new accessors.
>         * src/abg-reader.cc (build_elf_symbol): Adjust.
>         * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>         (lookup_symbol_from_gnu_hash_tab)
>         (lookup_symbol_from_symtab): Adjust.
>         (read_context::{ksymtab_reloc_section_,
>         ksymtab_gpl_reloc_section_,
>         ksymtab_strings_section_}): New data members.
>         (read_context::read_context): Initialize ksymtab_reloc_section_,
>         ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>         (read_context::{find_ksymtab_reloc_section,
>         find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>         find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>         populate_symbol_map_from_ksymtab,
>         populate_symbol_map_from_ksymtab_reloc,
>         is_linux_kernel_module}): New member functions.
>         (read_context::load_kernel_symbol_table): Adjust to call either
>         populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>         format.
>         (read_context::get_ksymtab_format): Adjust to call
>         get_ksymtab_format_module for linux kernel modules.
>         (read_context::lookup_elf_symbol_from_index): Adjust.
>         (create_default_var_sym, create_default_fn_sym): Adjust.
>
>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>---
> include/abg-ir.h        |  10 ++
> src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
> src/abg-ir.cc           |  38 ++++-
> src/abg-reader.cc       |   2 +
> 4 files changed, 388 insertions(+), 53 deletions(-)
>
>diff --git a/include/abg-ir.h b/include/abg-ir.h
>index 170a844..d26f472 100644
>--- a/include/abg-ir.h
>+++ b/include/abg-ir.h
>@@ -796,6 +796,8 @@ private:
>   elf_symbol(const environment* e,
> 	     size_t		i,
> 	     size_t		s,
>+	     uint64_t		v,
>+	     uint32_t		x,

I would like to see some comments here to describe the purpose of the
struct's members. I know there weren't any before, but maybe that is a
good chance.

Also, I am not entirely sure why these are all one-character names, but
it makes it hard to come from x to "section header index". Maybe we
should give them more meaningful names.

> 	     const string&	n,
> 	     type		t,
> 	     binding		b,
>@@ -818,6 +820,8 @@ public:
>   create(const environment*	e,
> 	 size_t		i,
> 	 size_t		s,
>+	 uint64_t	v,
>+	 uint32_t	x,
> 	 const string&		n,
> 	 type			t,
> 	 binding		b,
>@@ -838,6 +842,12 @@ public:
>   void
>   set_index(size_t);
>
>+  uint64_t
>+  get_value() const;
>+
>+  uint32_t
>+  get_shndx() const;
>+
>   const string&
>   get_name() const;
>
>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>index 47af7e3..835bf3d 100644
>--- a/src/abg-dwarf-reader.cc
>+++ b/src/abg-dwarf-reader.cc
>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>   GElf_Sym symbol;
>   const char* sym_name_str;
>   size_t sym_size;
>+  uint64_t sym_value;
>+  uint32_t sym_shndx;
>   elf_symbol::type sym_type;
>   elf_symbol::binding sym_binding;
>   elf_symbol::visibility sym_visibility;
>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
> 	  sym_visibility =
> 	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
> 	  sym_size = symbol.st_size;
>+	  sym_value = symbol.st_value;
>+	  sym_shndx = symbol.st_shndx;
> 	  elf_symbol::version ver;
> 	  if (get_version_for_symbol(elf_handle, symbol_index,
> 				     /*get_def_version=*/true, ver))
>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
> 	    elf_symbol::create(env,
> 			       symbol_index,
> 			       sym_size,
>+			       sym_value,
>+			       sym_shndx,
> 			       sym_name_str,
> 			       sym_type,
> 			       sym_binding,
>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
> 	    ABG_ASSERT(!ver.str().empty());
>
> 	  elf_symbol_sptr symbol_found =
>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>+			       symbol.st_shndx, sym_name_str,
> 			       sym_type, sym_binding,
> 			       symbol.st_shndx != SHN_UNDEF,
> 			       symbol.st_shndx == SHN_COMMON,
>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
> 	    ABG_ASSERT(!ver.str().empty());
> 	  elf_symbol_sptr symbol_found =
> 	    elf_symbol::create(env, i, sym->st_size,
>+			       sym->st_value, sym->st_shndx,
> 			       name_str, sym_type,
> 			       sym_binding, sym_is_defined,
> 			       sym_is_common, ver, sym_visibility);
>@@ -3081,7 +3089,10 @@ public:
>   /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>   /// linux kernel.
>   Elf_Scn*			ksymtab_section_;
>+  Elf_Scn*			ksymtab_reloc_section_;
>   Elf_Scn*			ksymtab_gpl_section_;
>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>+  Elf_Scn*			ksymtab_strings_section_;
>   Elf_Scn*			versym_section_;
>   Elf_Scn*			verdef_section_;
>   Elf_Scn*			verneed_section_;
>@@ -3277,7 +3288,10 @@ public:
>     nb_ksymtab_entries_ = 0;
>     nb_ksymtab_gpl_entries_ = 0;
>     ksymtab_section_ = 0;
>+    ksymtab_reloc_section_ = 0;
>     ksymtab_gpl_section_ = 0;
>+    ksymtab_gpl_reloc_section_ = 0;
>+    ksymtab_strings_section_ = 0;
>     versym_section_ = 0;
>     verdef_section_ = 0;
>     verneed_section_ = 0;
>@@ -6079,6 +6093,23 @@ public:
>     return ksymtab_section_;
>   }
>
>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>+  /// a vmlinux binary or a kernel module).
>+  ///
>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>+  Elf_Scn*
>+  find_ksymtab_reloc_section() const
>+  {
>+    if (!ksymtab_reloc_section_)
>+      {
>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>+        if (!sec)
>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>+      }
>+    return ksymtab_reloc_section_;
>+  }
>+
>   /// Return the __ksymtab_gpl section of a linux kernel ELF file
>   /// (either a vmlinux binary or a kernel module).
>   ///
>@@ -6092,6 +6123,36 @@ public:
>     return ksymtab_gpl_section_;
>   }
>
>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>+  /// (either a vmlinux binary or a kernel module).
>+  ///
>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>+  Elf_Scn*
>+  find_ksymtab_gpl_reloc_section() const
>+  {
>+    if (!ksymtab_gpl_reloc_section_)
>+      {
>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>+        if (!sec)
>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>+      }
>+    return ksymtab_gpl_reloc_section_;
>+  }
>+
>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>+  /// (either a vmlinux binary or a kernel module).
>+  ///
>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>+  Elf_Scn*
>+  find_ksymtab_strings_section() const
>+  {
>+    if (!ksymtab_strings_section_)
>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>+    return ksymtab_strings_section_;
>+  }
>+

We should find a way to deduplicate these functions. Not necessarily in
this patch, but in a followup. Most of them only differ in the section
name and whether they check for an alternative. Something generic like

  struct SectionQuery{ string name, int type };
  Elf_Scn* find_section(const vector<SectionQuery>&);

should be sufficient to forward from all the other variants to one
implementation.

>   /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>   /// only the __ksymtab_gpl exists.
>   ///
>@@ -6106,6 +6167,19 @@ public:
>     return result;
>   }
>
>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>+  ///
>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>+  Elf_Scn*
>+  find_any_ksymtab_reloc_section() const
>+  {
>+    Elf_Scn *result = find_ksymtab_reloc_section();
>+    if (!result)
>+      result = find_ksymtab_gpl_reloc_section();
>+    return result;
>+  }
>+
>   /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>   /// sections that are involved in symbol versionning.
>   ///
>@@ -6283,7 +6357,8 @@ public:
>       stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>
>     elf_symbol_sptr sym =
>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>+      elf_symbol::create(env(), symbol_index, s->st_size,
>+			 s->st_value, s->st_shndx, name_str,
> 			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
> 			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
> 			 sym_is_defined, sym_is_common, ver, vis);
>@@ -7435,6 +7510,64 @@ public:
>     return symbol;
>   }
>
>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>+  /// sections of Linux kernel modules.
>+  ///
>+  /// This is important because we need the know the format of these

s/need the/need to/                       ^^^

>+  /// sections to be able to read from them.
>+  ///
>+  /// @return the format the __ksymtab[_gpl] sections.
>+  enum ksymtab_format
>+  get_ksymtab_format_module() const
>+  {
>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>+
>+    if (!section)
>+      return UNDEFINED_KSYMTAB_FORMAT;
>+
>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>+    // knows how to and it will zero a relocation after applying it. If the

will zero the relocation info after

>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>+    // they will have been all applied and sh_size will be 0. For arches that
>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>+    // kernel modules.
>+    GElf_Shdr section_mem;
>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>+    if (section_shdr->sh_size == 0)
>+      return PRE_V4_19_KSYMTAB_FORMAT;
>+    /* SHT_REL sections not supported yet */
>+    if (section_shdr->sh_type == SHT_REL)
>+      return UNDEFINED_KSYMTAB_FORMAT;
>+
>+    // If we still have a normal non-zeroed relocation section, we can guess
>+    // what format the ksymtab is in depending on what types of relocs it
>+    // contains.
>+    GElf_Rela rela;
>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>+    gelf_getrela(rela_section_data, 0, &rela);
>+
>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>+    // supported on x86 and arm64 as of v4.19.
>+    ksymtab_format format;
>+    switch (GELF_R_TYPE(rela.r_info))
>+      {
>+      case R_X86_64_64: // Same as R_386_32

add 'fallthrough' to the comments, else younger compilers will warn
about the potentially unintentional fallthrough.

>+      case R_AARCH64_ABS64:
>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>+	break;
>+      case R_X86_64_PC32: // Same as R_386_PC32
>+      case R_AARCH64_PREL32:
>+	format = V4_19_KSYMTAB_FORMAT;
>+	break;
>+      default:
>+        format = UNDEFINED_KSYMTAB_FORMAT;

We should probably emit a warning in this case. Are there more
architectures that have PREL32 enabled as of now that are not yet
covered here?

>+	break;
>+      }
>+    return format;
>+  }
>+
>   /// Determine the format of the __ksymtab and __ksymtab_gpl
>   /// sections.
>   ///
>@@ -7451,6 +7584,18 @@ public:
>       {
> 	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
> 	  {
>+            // Since Linux kernel modules are relocatable, we can first try
>+            // using a heuristic based on relocations to guess the ksymtab format.
>+            if (is_linux_kernel_module())
>+              {
>+                ksymtab_format_ = get_ksymtab_format_module();
>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>+                  return ksymtab_format_;
>+              }

nit: use tabs

>+
>+	    // If it's not a kernel module or we couldn't determine its format
>+	    // with relocations, fall back to the heuristic below.
>+
> 	    // OK this is a dirty little heuristic to determine the
> 	    // format of the ksymtab section.
> 	    //
>@@ -7553,48 +7698,24 @@ public:
>     return nb_ksymtab_gpl_entries_;
>   }
>
>-  /// Load a given kernel symbol table.
>+  /// Populate the symbol map by reading exported symbols from the
>+  /// ksymtab directly.
>   ///
>-  /// One can thus retrieve the resulting symbols by using the
>-  /// accessors read_context::linux_exported_fn_syms(),
>-  /// read_context::linux_exported_var_syms(),
>-  /// read_context::linux_exported_gpl_fn_syms(), or
>-  /// read_context::linux_exported_gpl_var_syms().
>+  /// @param section the ksymtab section to read from
>   ///
>-  /// @param kind the kind of kernel symbol table to load.
>+  /// @param exported_fns_set the set of exported functions
>+  ///
>+  /// @param exported_vars_set the set of exported variables
>+  ///
>+  /// @param nb_entries the number of ksymtab entries to read
>   ///
>   /// @return true upon successful completion, false otherwise.
>   bool
>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>+                                   address_set_sptr exported_fns_set,
>+                                   address_set_sptr exported_vars_set,
>+                                   size_t nb_entries)
>   {
>-    size_t nb_entries = 0;
>-    Elf_Scn *section = 0;
>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>-
>-    switch (kind)
>-      {
>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>-	break;
>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>-	section = find_ksymtab_section();
>-	nb_entries = get_nb_ksymtab_entries();
>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>-	break;
>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>-	section = find_ksymtab_gpl_section();
>-	nb_entries = get_nb_ksymtab_gpl_entries();
>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>-	break;
>-      }
>-
>-    if (!linux_exported_vars_set
>-	|| !linux_exported_fns_set
>-	|| !section
>-	|| !nb_entries)
>-      return false;
>-
>     // The data of the section.
>     Elf_Data *elf_data = elf_rawdata(section, 0);
>
>@@ -7654,28 +7775,30 @@ public:
> 	// (aka ELF symbol table).
> 	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
> 	if (!symbol)
>-	  {
>-	    adjusted_symbol_address =
>-	      maybe_adjust_var_sym_address(symbol_address);
>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>-	    if (!symbol)
>-	      // This must be a symbol that is of type neither FUNC
>-	      // (function) nor OBJECT (variable).  There are for intance,
>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>-	      // am not sure what those are.
>-	      continue;
>-	  }
>+          {
>+            adjusted_symbol_address =
>+              maybe_adjust_var_sym_address(symbol_address);
>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>+            if (!symbol)
>+              {
>+	        // This must be a symbol that is of type neither FUNC
>+	        // (function) nor OBJECT (variable).  There are for intance,
>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>+	        // am not sure what those are.
>+	        continue;
>+	      }
>+          }

This part of the diff seems to a change purely of the indentation/tabs.
Maybe you can revert this part.

>
> 	address_set_sptr set;
> 	if (symbol->is_function())
> 	  {
> 	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>-	    set = linux_exported_fns_set;
>+	    set = exported_fns_set;
> 	  }
> 	else if (symbol->is_variable())
> 	  {
> 	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>-	    set = linux_exported_vars_set;
>+	    set = exported_vars_set;
> 	  }
> 	else
> 	  ABG_ASSERT_NOT_REACHED;
>@@ -7684,6 +7807,157 @@ public:
>     return true;
>   }
>
>+  /// Populate the symbol map by extracting the exported symbols from a
>+  /// ksymtab rela section.
>+  ///
>+  /// @param section the ksymtab section to read from
>+  ///
>+  /// @param strings_section the __ksymtab_strings section
>+  ///
>+  /// @param exported_fns_set the set of exported functions
>+  ///
>+  /// @param exported_vars_set the set of exported variables
>+  ///
>+  /// @return true upon successful completion, false otherwise.
>+  bool
>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>+                                         address_set_sptr exported_fns_set,
>+                                         address_set_sptr exported_vars_set)
>+  {
>+    GElf_Shdr reloc_section_mem;
>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);

gelf_getshdr returns &reloc_section_mem. So you might want to use
reloc_section_mem directly rather than introducing reloc_section_mem.

>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>+
>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>+    size_t strings_ndx = elf_ndxscn(strings_section);
>+
>+    GElf_Rela rela;
>+    elf_symbol_sptr symbol;
>+    for (unsigned int i = 0; i < reloc_count; i++)
>+      {
>+        gelf_getrela(reloc_section_data, i, &rela);
>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>+
>+        // This shouldn't happen..
>+        if (!symbol)

Maybe ABG_ASSERT then?

>+          return false;
>+
>+        // If the symbol is from __ksymtab_strings, ignore it.
>+        if (symbol->get_shndx() == strings_ndx)
>+          continue;
>+
>+	if (!symbol->is_function() && !symbol->is_variable())
>+          {
>+            if (do_log())
>+              {
>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>+                  cerr << "skipping NOTYPE symbol "
>+                       << symbol->get_name()
>+                       << " shndx: "
>+                       << symbol->get_index()
>+                       << " @"
>+                       << elf_path()
>+                       << "\n";
>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>+                  cerr << "skipping SECTION symbol "
>+                       << "shndx: "
>+                       << symbol->get_index()
>+                       << " @"
>+                       << elf_path()
>+                       << "\n";
>+               }
>+            continue;
>+	  }
>+
>+	address_set_sptr set;
>+	if (symbol->is_function())
>+	  {
>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>+	    set = exported_fns_set;
>+	  }
>+	else if (symbol->is_variable())
>+	  {
>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>+	    set = exported_vars_set;
>+	  }
>+	else
>+	  ABG_ASSERT_NOT_REACHED;
>+	set->insert(symbol->get_value());
>+      }
>+    return true;
>+  }
>+
>+  /// Load a given kernel symbol table.
>+  ///
>+  /// One can thus retrieve the resulting symbols by using the
>+  /// accessors read_context::linux_exported_fn_syms(),
>+  /// read_context::linux_exported_var_syms(),
>+  /// read_context::linux_exported_gpl_fn_syms(), or
>+  /// read_context::linux_exported_gpl_var_syms().
>+  ///
>+  /// @param kind the kind of kernel symbol table to load.
>+  ///
>+  /// @return true upon successful completion, false otherwise.
>+  bool
>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>+  {
>+    size_t nb_entries = 0;
>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>+
>+    switch (kind)
>+      {
>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>+	break;
>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>+	section = find_ksymtab_section();
>+	reloc_section = find_ksymtab_reloc_section();
>+	nb_entries = get_nb_ksymtab_entries();
>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>+	break;
>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>+	section = find_ksymtab_gpl_section();
>+	reloc_section = find_ksymtab_gpl_reloc_section();
>+	nb_entries = get_nb_ksymtab_gpl_entries();
>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>+	break;
>+      }
>+
>+    strings_section = find_ksymtab_strings_section();
>+
>+    if (!linux_exported_vars_set
>+	|| !linux_exported_fns_set
>+	|| !section
>+	|| !strings_section
>+	|| !nb_entries)
>+      return false;
>+
>+    GElf_Ehdr eh_mem;
>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>+    ksymtab_format format = get_ksymtab_format();
>+
>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>+    // apply those relocations and we're safe to read the relocation section to
>+    // determine which exported symbols are in the ksymtab.
>+    //
>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>+    // supported as of now.
>+    if (!reloc_section
>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>+                                              linux_exported_vars_set, nb_entries);
>+    else
>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>+                                                    linux_exported_fns_set,
>+                                                    linux_exported_vars_set);
>+  }
>+
>   /// Load the special __ksymtab section. This is for linux kernel
>   /// (module) files.
>   ///
>@@ -8352,6 +8626,17 @@ public:
>   is_linux_kernel_binary() const
>   {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>
>+  /// Guess if the current binary is a Linux Kernel module.
>+  ///
>+  /// To guess that, the function looks for the presence of the
>+  /// special ".modinfo" section in the binary.
>+  ///
>+  bool
>+  is_linux_kernel_module() const
>+  {
>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>+  }
>+

Also refer to this proposal to restrict checking for a module a bit
more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html

>   /// Getter of the "show_stats" flag.
>   ///
>   /// This flag tells if we should emit statistics about various
>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>     elf_symbol::create(env,
> 		       /*symbol index=*/ 0,
> 		       /*symbol size=*/ 0,
>+		       /*symbol value=*/ 0,
>+		       /*symbol shndx=*/ 0,
> 		       sym_name,
> 		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
> 		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>     elf_symbol::create(env,
> 		       /*symbol index=*/ 0,
> 		       /*symbol size=*/ 0,
>+		       /*symbol value=*/ 0,
>+		       /*symbol shndx=*/ 0,
> 		       sym_name,
> 		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
> 		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>index a5243c3..ad1c261 100644
>--- a/src/abg-ir.cc
>+++ b/src/abg-ir.cc
>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>   const environment*	env_;
>   size_t		index_;
>   size_t		size_;
>+  uint64_t		value_;
>+  uint32_t		shndx_;
>   string		name_;
>   elf_symbol::type	type_;
>   elf_symbol::binding	binding_;
>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>     : env_(),
>       index_(),
>       size_(),
>+      value_(),
>+      shndx_(),
>       type_(elf_symbol::NOTYPE_TYPE),
>       binding_(elf_symbol::GLOBAL_BINDING),
>       visibility_(elf_symbol::DEFAULT_VISIBILITY),
>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>   priv(const environment*		e,
>        size_t				i,
>        size_t				s,
>+       uint64_t				v,
>+       uint32_t				x,
>        const string&			n,
>        elf_symbol::type		t,
>        elf_symbol::binding		b,
>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>     : env_(e),
>       index_(i),
>       size_(s),
>+      value_(v),
>+      shndx_(x),
>       name_(n),
>       type_(t),
>       binding_(b),
>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
> ///
> /// @param s the size of the symbol.
> ///
>+/// @param v the value of the symbol.
>+///
>+/// @param x the section header index (shndx) of the symbol.
>+///
> /// @param n the name of the symbol.
> ///
> /// @param t the type of the symbol.
>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
> elf_symbol::elf_symbol(const environment*	e,
> 		       size_t			i,
> 		       size_t			s,
>+		       uint64_t			v,
>+		       uint32_t			x,
> 		       const string&		n,
> 		       type			t,
> 		       binding			b,
>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
> 		       bool			c,
> 		       const version&		ve,
> 		       visibility		vi)
>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
> {}
>
> /// Factory of instances of @ref elf_symbol.
>@@ -1319,6 +1333,10 @@ elf_symbol::create()
> ///
> /// @param s the size of the symbol.
> ///
>+/// @param v the value of the symbol.
>+///
>+/// @param x the section header index (shndx) of the symbol.
>+///
> /// @param n the name of the symbol.
> ///
> /// @param t the type of the symbol.
>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
> elf_symbol::create(const environment*	e,
> 		   size_t		i,
> 		   size_t		s,
>+		   uint64_t		v,
>+		   uint32_t		x,
> 		   const string&	n,
> 		   type		t,
> 		   binding		b,
>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
> 		   const version&	ve,
> 		   visibility		vi)
> {
>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>   sym->priv_->main_symbol_ = sym;
>   return sym;
> }
>@@ -1411,6 +1431,20 @@ void
> elf_symbol::set_index(size_t s)
> {priv_->index_ = s;}
>
>+/// Getter for the symbol value.
>+///
>+/// @return the value of the symbol.
>+uint64_t
>+elf_symbol::get_value() const
>+{return priv_->value_;}
>+
>+/// Getter for the symbol section header index.
>+///
>+/// @return the section header index of the symbol.
>+uint32_t
>+elf_symbol::get_shndx() const
>+{return priv_->shndx_;}
>+
> /// Getter for the name of the @ref elf_symbol.
> ///
> /// @return a reference to the name of the @ref symbol.
>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>index 810a65f..3c481ad 100644
>--- a/src/abg-reader.cc
>+++ b/src/abg-reader.cc
>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>
>   const environment* env = ctxt.get_environment();
>   elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>+					 /*value=*/0,
>+					 /*shndx=*/0,
> 					 name, type, binding,
> 					 is_defined, is_common,
> 					 version, visibility);
>-- 
>2.16.4
>

Nice work! Thank you for that!
The patch looks in general good to me. I am a bit concerned about the
platform support. There are some corner cases that need to be covered.
See the testing below.

You might find the .clang-format file useful to match the coding style
at some points (especially tabs vs spaces). Although when reviewing the
patch, I had a bit trouble with unchanged but reformatted lines as they
were confusing my diff. Maybe you can revert the changes that were
purely for formatting and we take care of formatting in a later patch.

I conducted some testing. I choose a minimal configuration with modules:

  $ export ARCH=<arch>
  $ make allnoconfig
  $ ./scripts/config   \
      -e DEBUG_INFO    \  # dwarf info
      -e MODULES       \  # module support
      -e USB_SUPPORT   \
      -m USB           \
      -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
      -m USB_GADGET       # depends on usb-common.ko, which I test with

  $ make olddefconfig
  $ make
  $ abidw ./drivers/usb/common/usb-common.ko | less
  # then check for the existence of the elf-function-symbols section

                     master         patched
Linux v4.18

  x86_64             works            works
  x86                works            works
  arm64              works            works

Linux v4.19

  x86_64             broken           works
  x86                broken           STILL BROKEN
  arm64              broken           works

So, for x86_64 and arm64 the situation got better and elf symbols are
now identified. For x86, this patch did not improve the situation. I did
not check that the list of symbols are entirely correct and that they
work downstream with abidiff etc. Just the pure existence is a very good
indicator for this patch to work.

As I said earlier, I am a bit concerned about the platform support. We
should probably at emit a warning if we know that we can't handle a
certain platform or (preferably) avoid the platform specific code. Not
sure whether there is a better way to extract the information though.

Cheers,
Matthias

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules
  2019-01-01  0:00   ` Jessica Yu
@ 2019-01-01  0:00     ` Matthias Maennich via libabigail
  0 siblings, 0 replies; 12+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Jessica Yu; +Cc: libabigail, Dodji Seketeli, kernel-team

On Mon, Sep 02, 2019 at 06:26:44PM +0200, Jessica Yu wrote:
>+++ Matthias Maennich [19/08/19 09:05 +0100]:
>>Hi Jessica!
>>
>>On Fri, 09 Aug, 17:49, Jessica Yu wrote:
>>>As described in commit ad8c2531fb9, the format of the Linux kernel
>>>ksymtab changed in v4.19 to use relative references instead of absolute
>>>references. This changes the type of relocations emitted for ksymtab
>>>sections to be place-relative 32-bit relocations instead of absolute
>>>relocations. One side-effect of this is that libdwfl will not relocate
>>>the ksymtab sections due to the PC-relative relocations. This breaks
>>>load_kernel_symbol_table() for kernel modules because it only reads in
>>>zeros from the unrelocated ksymtab section and is subsequently unable to
>>>determine what exported symbols it refers to. Since a vmlinux binary is
>>>already fully linked and relocated (and therefore we can read its
>>>ksymtab section just fine), this problem is only relevant to Linux
>>>kernel modules.
>>>
>>>To work around this, we utilize the ksymtab relocation sections to
>>>determine which symbols the ksymtab entries refer to. We do this by
>>>inspecting each relocation's r_info field for the symbol table index and
>>>from there we are able to read each symbol's value and subsequently add
>>>that to the set of exported symbols.
>>>
>>>In addition, for Linux kernel modules, we can utilize relocation types
>>>to implement a new heuristic to determine the ksymtab format we have.
>>>The presence of PC-relative relocations suggest the new v4.19 format,
>>>and absolute relocation types suggest the old pre v4.19 format.
>>>
>>>       * include/abg-ir.h (elf_symbol::{elf_symbol, create}): Take new
>>>       symbol value and shndx parameters.
>>>       (elf_symbol::{get_value, get_shndx}): Declare new accessors.
>>>       * src/abg-ir.cc (elf_symbol::priv::{value_, shndx_}): New data members.
>>>       (elf_symbol::priv::priv): Adjust.
>>>       (elf_symbol::elf_symbol): Take new value and shndx parameters.
>>>       (elf_symbol::create): Likewise.
>>>       (elf_symbol::{get_value, get_shndx}): Define new accessors.
>>>       * src/abg-reader.cc (build_elf_symbol): Adjust.
>>>       * src/abg-dwarf-reader.cc (lookup_symbol_from_sysv_hash_tab)
>>>       (lookup_symbol_from_gnu_hash_tab)
>>>       (lookup_symbol_from_symtab): Adjust.
>>>       (read_context::{ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_,
>>>       ksymtab_strings_section_}): New data members.
>>>       (read_context::read_context): Initialize ksymtab_reloc_section_,
>>>       ksymtab_gpl_reloc_section_, ksymtab_strings_section_.
>>>       (read_context::{find_ksymtab_reloc_section,
>>>       find_ksymtab_gpl_reloc_section, find_ksymtab_strings_section,
>>>       find_any_ksymtab_reloc_section, get_ksymtab_format_module,
>>>       populate_symbol_map_from_ksymtab,
>>>       populate_symbol_map_from_ksymtab_reloc,
>>>       is_linux_kernel_module}): New member functions.
>>>       (read_context::load_kernel_symbol_table): Adjust to call either
>>>       populate_symbol_map_from_ksymtab{_reloc,} depending on ksymtab
>>>       format.
>>>       (read_context::get_ksymtab_format): Adjust to call
>>>       get_ksymtab_format_module for linux kernel modules.
>>>       (read_context::lookup_elf_symbol_from_index): Adjust.
>>>       (create_default_var_sym, create_default_fn_sym): Adjust.
>>>
>>>Signed-off-by: Jessica Yu <jeyu@kernel.org>
>>>---
>>>include/abg-ir.h        |  10 ++
>>>src/abg-dwarf-reader.cc | 391 +++++++++++++++++++++++++++++++++++++++++-------
>>>src/abg-ir.cc           |  38 ++++-
>>>src/abg-reader.cc       |   2 +
>>>4 files changed, 388 insertions(+), 53 deletions(-)
>>>
>>>diff --git a/include/abg-ir.h b/include/abg-ir.h
>>>index 170a844..d26f472 100644
>>>--- a/include/abg-ir.h
>>>+++ b/include/abg-ir.h
>>>@@ -796,6 +796,8 @@ private:
>>> elf_symbol(const environment* e,
>>>	     size_t		i,
>>>	     size_t		s,
>>>+	     uint64_t		v,
>>>+	     uint32_t		x,
>>
>>I would like to see some comments here to describe the purpose of the
>>struct's members. I know there weren't any before, but maybe that is a
>>good chance.
>>
>>Also, I am not entirely sure why these are all one-character names, but
>>it makes it hard to come from x to "section header index". Maybe we
>>should give them more meaningful names.
>
>Hi Matthias!
>
>The parameters are documented in src/abg-ir.cc, which is why I did not
>include it here. Maybe for section header index and value we can use
>shndx and val if x and v are not descriptive.

Hi Jessica!

Ah, right. Their documentation is there. Well, maybe we can improve the
readability there with a later patch.

>>
>>>	     const string&	n,
>>>	     type		t,
>>>	     binding		b,
>>>@@ -818,6 +820,8 @@ public:
>>> create(const environment*	e,
>>>	 size_t		i,
>>>	 size_t		s,
>>>+	 uint64_t	v,
>>>+	 uint32_t	x,
>>>	 const string&		n,
>>>	 type			t,
>>>	 binding		b,
>>>@@ -838,6 +842,12 @@ public:
>>> void
>>> set_index(size_t);
>>>
>>>+  uint64_t
>>>+  get_value() const;
>>>+
>>>+  uint32_t
>>>+  get_shndx() const;
>>>+
>>> const string&
>>> get_name() const;
>>>
>>>diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>>>index 47af7e3..835bf3d 100644
>>>--- a/src/abg-dwarf-reader.cc
>>>+++ b/src/abg-dwarf-reader.cc
>>>@@ -1724,6 +1724,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>> GElf_Sym symbol;
>>> const char* sym_name_str;
>>> size_t sym_size;
>>>+  uint64_t sym_value;
>>>+  uint32_t sym_shndx;
>>> elf_symbol::type sym_type;
>>> elf_symbol::binding sym_binding;
>>> elf_symbol::visibility sym_visibility;
>>>@@ -1743,6 +1745,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	  sym_visibility =
>>>	    stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(symbol.st_other));
>>>	  sym_size = symbol.st_size;
>>>+	  sym_value = symbol.st_value;
>>>+	  sym_shndx = symbol.st_shndx;
>>>	  elf_symbol::version ver;
>>>	  if (get_version_for_symbol(elf_handle, symbol_index,
>>>				     /*get_def_version=*/true, ver))
>>>@@ -1751,6 +1755,8 @@ lookup_symbol_from_sysv_hash_tab(const environment*		env,
>>>	    elf_symbol::create(env,
>>>			       symbol_index,
>>>			       sym_size,
>>>+			       sym_value,
>>>+			       sym_shndx,
>>>			       sym_name_str,
>>>			       sym_type,
>>>			       sym_binding,
>>>@@ -2028,7 +2034,8 @@ lookup_symbol_from_gnu_hash_tab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>
>>>	  elf_symbol_sptr symbol_found =
>>>-	    elf_symbol::create(env, i, symbol.st_size, sym_name_str,
>>>+	    elf_symbol::create(env, i, symbol.st_size, symbol.st_value,
>>>+			       symbol.st_shndx, sym_name_str,
>>>			       sym_type, sym_binding,
>>>			       symbol.st_shndx != SHN_UNDEF,
>>>			       symbol.st_shndx == SHN_COMMON,
>>>@@ -2181,6 +2188,7 @@ lookup_symbol_from_symtab(const environment*		env,
>>>	    ABG_ASSERT(!ver.str().empty());
>>>	  elf_symbol_sptr symbol_found =
>>>	    elf_symbol::create(env, i, sym->st_size,
>>>+			       sym->st_value, sym->st_shndx,
>>>			       name_str, sym_type,
>>>			       sym_binding, sym_is_defined,
>>>			       sym_is_common, ver, sym_visibility);
>>>@@ -3081,7 +3089,10 @@ public:
>>> /// to symbols exported using the EXPORT_SYMBOL_GPL macro from the
>>> /// linux kernel.
>>> Elf_Scn*			ksymtab_section_;
>>>+  Elf_Scn*			ksymtab_reloc_section_;
>>> Elf_Scn*			ksymtab_gpl_section_;
>>>+  Elf_Scn*			ksymtab_gpl_reloc_section_;
>>>+  Elf_Scn*			ksymtab_strings_section_;
>>> Elf_Scn*			versym_section_;
>>> Elf_Scn*			verdef_section_;
>>> Elf_Scn*			verneed_section_;
>>>@@ -3277,7 +3288,10 @@ public:
>>>   nb_ksymtab_entries_ = 0;
>>>   nb_ksymtab_gpl_entries_ = 0;
>>>   ksymtab_section_ = 0;
>>>+    ksymtab_reloc_section_ = 0;
>>>   ksymtab_gpl_section_ = 0;
>>>+    ksymtab_gpl_reloc_section_ = 0;
>>>+    ksymtab_strings_section_ = 0;
>>>   versym_section_ = 0;
>>>   verdef_section_ = 0;
>>>   verneed_section_ = 0;
>>>@@ -6079,6 +6093,23 @@ public:
>>>   return ksymtab_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab section of a linux kernel ELF file (either
>>>+  /// a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_reloc_section_;
>>>+  }
>>>+
>>> /// Return the __ksymtab_gpl section of a linux kernel ELF file
>>> /// (either a vmlinux binary or a kernel module).
>>> ///
>>>@@ -6092,6 +6123,36 @@ public:
>>>   return ksymtab_gpl_section_;
>>> }
>>>
>>>+  /// Return the .rel{a,}__ksymtab_gpl section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab_gpl section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_gpl_reloc_section() const
>>>+  {
>>>+    if (!ksymtab_gpl_reloc_section_)
>>>+      {
>>>+        Elf_Scn *sec = find_section(elf_handle(), ".rela__ksymtab_gpl", SHT_RELA);
>>>+        if (!sec)
>>>+          sec = find_section(elf_handle(), ".rel__ksymtab_gpl", SHT_REL);
>>>+        const_cast<read_context*>(this)->ksymtab_gpl_reloc_section_ = sec;
>>>+      }
>>>+    return ksymtab_gpl_reloc_section_;
>>>+  }
>>>+
>>>+  /// Return the __ksymtab_strings section of a linux kernel ELF file
>>>+  /// (either a vmlinux binary or a kernel module).
>>>+  ///
>>>+  /// @return the __ksymtab_strings section if found, nil otherwise.
>>>+  Elf_Scn*
>>>+  find_ksymtab_strings_section() const
>>>+  {
>>>+    if (!ksymtab_strings_section_)
>>>+      const_cast<read_context*>(this)->ksymtab_strings_section_ =
>>>+        find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);
>>>+    return ksymtab_strings_section_;
>>>+  }
>>>+
>>
>>We should find a way to deduplicate these functions. Not necessarily in
>>this patch, but in a followup. Most of them only differ in the section
>>name and whether they check for an alternative. Something generic like
>>
>>struct SectionQuery{ string name, int type };
>>Elf_Scn* find_section(const vector<SectionQuery>&);
>>
>>should be sufficient to forward from all the other variants to one
>>implementation.
>>
>>> /// Return either a __ksymtab or a __ksymtab_gpl section, in case
>>> /// only the __ksymtab_gpl exists.
>>> ///
>>>@@ -6106,6 +6167,19 @@ public:
>>>   return result;
>>> }
>>>
>>>+  /// Return either a .rel{a,}__ksymtab or a .rel{a,}__ksymtab_gpl section
>>>+  ///
>>>+  /// @return the .rel{a,}__ksymtab section if it exists, or the
>>>+  /// .rel{a,}__ksymtab_gpl; or NULL if neither is found.
>>>+  Elf_Scn*
>>>+  find_any_ksymtab_reloc_section() const
>>>+  {
>>>+    Elf_Scn *result = find_ksymtab_reloc_section();
>>>+    if (!result)
>>>+      result = find_ksymtab_gpl_reloc_section();
>>>+    return result;
>>>+  }
>>>+
>>> /// Return the SHT_GNU_versym, SHT_GNU_verdef and SHT_GNU_verneed
>>> /// sections that are involved in symbol versionning.
>>> ///
>>>@@ -6283,7 +6357,8 @@ public:
>>>     stv_to_elf_symbol_visibility(GELF_ST_VISIBILITY(s->st_other));
>>>
>>>   elf_symbol_sptr sym =
>>>-      elf_symbol::create(env(), symbol_index, s->st_size, name_str,
>>>+      elf_symbol::create(env(), symbol_index, s->st_size,
>>>+			 s->st_value, s->st_shndx, name_str,
>>>			 stt_to_elf_symbol_type(GELF_ST_TYPE(s->st_info)),
>>>			 stb_to_elf_symbol_binding(GELF_ST_BIND(s->st_info)),
>>>			 sym_is_defined, sym_is_common, ver, vis);
>>>@@ -7435,6 +7510,64 @@ public:
>>>   return symbol;
>>> }
>>>
>>>+  /// Determine the format of the __ksymtab and __ksymtab_gpl
>>>+  /// sections of Linux kernel modules.
>>>+  ///
>>>+  /// This is important because we need the know the format of these
>>
>>s/need the/need to/                       ^^^
>>
>>>+  /// sections to be able to read from them.
>>>+  ///
>>>+  /// @return the format the __ksymtab[_gpl] sections.
>>>+  enum ksymtab_format
>>>+  get_ksymtab_format_module() const
>>>+  {
>>>+    Elf_Scn *section = find_any_ksymtab_reloc_section();
>>>+
>>>+    if (!section)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // Libdwfl has a weird quirk where, in the process of obtaining an Elf
>>>+    // descriptor via dwfl_module_getelf(), it will apply all relocations it
>>>+    // knows how to and it will zero a relocation after applying it. If the
>>
>>will zero the relocation info after
>>
>>>+    // .rela__ksymtab* section contained only simple (absolute) relocations,
>>>+    // they will have been all applied and sh_size will be 0. For arches that
>>>+    // support relative ksymtabs, simple relocations only appear in pre-4.19
>>>+    // kernel modules.
>>>+    GElf_Shdr section_mem;
>>>+    GElf_Shdr *section_shdr = gelf_getshdr(section, &section_mem);
>>>+    if (section_shdr->sh_size == 0)
>>>+      return PRE_V4_19_KSYMTAB_FORMAT;
>>>+    /* SHT_REL sections not supported yet */
>>>+    if (section_shdr->sh_type == SHT_REL)
>>>+      return UNDEFINED_KSYMTAB_FORMAT;
>>>+
>>>+    // If we still have a normal non-zeroed relocation section, we can guess
>>>+    // what format the ksymtab is in depending on what types of relocs it
>>>+    // contains.
>>>+    GElf_Rela rela;
>>>+    Elf_Data *rela_section_data = elf_rawdata(section, 0);
>>>+    gelf_getrela(rela_section_data, 0, &rela);
>>>+
>>>+    // Sigh, I dislike the arch-dependent code here, but this seems to be a
>>>+    // reliable heuristic for kernel modules for now. Relative ksymtabs only
>>>+    // supported on x86 and arm64 as of v4.19.
>>>+    ksymtab_format format;
>>>+    switch (GELF_R_TYPE(rela.r_info))
>>>+      {
>>>+      case R_X86_64_64: // Same as R_386_32
>>
>>add 'fallthrough' to the comments, else younger compilers will warn
>>about the potentially unintentional fallthrough.
>>
>>>+      case R_AARCH64_ABS64:
>>>+        format = PRE_V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      case R_X86_64_PC32: // Same as R_386_PC32
>>>+      case R_AARCH64_PREL32:
>>>+	format = V4_19_KSYMTAB_FORMAT;
>>>+	break;
>>>+      default:
>>>+        format = UNDEFINED_KSYMTAB_FORMAT;
>>
>>We should probably emit a warning in this case. Are there more
>>architectures that have PREL32 enabled as of now that are not yet
>>covered here?
>
>Just double checked here, as of v5.3-rc6, only x86/x86_64 and arm64
>have HAVE_ARCH_PREL32_RELOCATIONS. I can put an ABG_ASSERT there to
>warn us of potentially new ksymtab formats.
>
>>>+	break;
>>>+      }
>>>+    return format;
>>>+  }
>>>+
>>> /// Determine the format of the __ksymtab and __ksymtab_gpl
>>> /// sections.
>>> ///
>>>@@ -7451,6 +7584,18 @@ public:
>>>     {
>>>	if (ksymtab_format_ == UNDEFINED_KSYMTAB_FORMAT)
>>>	  {
>>>+            // Since Linux kernel modules are relocatable, we can first try
>>>+            // using a heuristic based on relocations to guess the ksymtab format.
>>>+            if (is_linux_kernel_module())
>>>+              {
>>>+                ksymtab_format_ = get_ksymtab_format_module();
>>>+                if (ksymtab_format_ != UNDEFINED_KSYMTAB_FORMAT)
>>>+                  return ksymtab_format_;
>>>+              }
>>
>>nit: use tabs
>
>Just wondering, is there an official style guide documented anywhere?
>I know there is .clang-format, but a lot of the files don't seem to
>follow all of it (at least when it comes to tabs and spaces, it seems
>inconsistent so I wasn't sure to follow the existing style of the
>file, which seemed to favor spaces, or to switch to purely tabs).
>Sometimes there is a tab first and then spaces to get to the right
>indentation level, sometimes there's just spaces, etc.
>

As far as I understand, the style is GNU style (I am not sure there is a
doc saying that) and can be matched best with whatever Emacs thinks GNU
style should look like :-) In case of the tabs, it appears to me that
every full 8 spaces are supposed to be replaced by a tab in this code
base.

I myself am not an Emacs user and added the .clang-format some time ago
as an approximation. It is useful to format the changed lines and in
most cases it matches the existing style.

Dodji, maybe we can work on getting the .clang-format closer to the
project style and format all the files in one single commit? What do you
think?

>>>+
>>>+	    // If it's not a kernel module or we couldn't determine its format
>>>+	    // with relocations, fall back to the heuristic below.
>>>+
>>>	    // OK this is a dirty little heuristic to determine the
>>>	    // format of the ksymtab section.
>>>	    //
>>>@@ -7553,48 +7698,24 @@ public:
>>>   return nb_ksymtab_gpl_entries_;
>>> }
>>>
>>>-  /// Load a given kernel symbol table.
>>>+  /// Populate the symbol map by reading exported symbols from the
>>>+  /// ksymtab directly.
>>> ///
>>>-  /// One can thus retrieve the resulting symbols by using the
>>>-  /// accessors read_context::linux_exported_fn_syms(),
>>>-  /// read_context::linux_exported_var_syms(),
>>>-  /// read_context::linux_exported_gpl_fn_syms(), or
>>>-  /// read_context::linux_exported_gpl_var_syms().
>>>+  /// @param section the ksymtab section to read from
>>> ///
>>>-  /// @param kind the kind of kernel symbol table to load.
>>>+  /// @param exported_fns_set the set of exported functions
>>>+  ///
>>>+  /// @param exported_vars_set the set of exported variables
>>>+  ///
>>>+  /// @param nb_entries the number of ksymtab entries to read
>>> ///
>>> /// @return true upon successful completion, false otherwise.
>>> bool
>>>-  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>+  populate_symbol_map_from_ksymtab(Elf_Scn *section,
>>>+                                   address_set_sptr exported_fns_set,
>>>+                                   address_set_sptr exported_vars_set,
>>>+                                   size_t nb_entries)
>>> {
>>>-    size_t nb_entries = 0;
>>>-    Elf_Scn *section = 0;
>>>-    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>-
>>>-    switch (kind)
>>>-      {
>>>-      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>-	break;
>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>-	section = find_ksymtab_section();
>>>-	nb_entries = get_nb_ksymtab_entries();
>>>-	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>-	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>-	break;
>>>-      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>-	section = find_ksymtab_gpl_section();
>>>-	nb_entries = get_nb_ksymtab_gpl_entries();
>>>-	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>-	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>-	break;
>>>-      }
>>>-
>>>-    if (!linux_exported_vars_set
>>>-	|| !linux_exported_fns_set
>>>-	|| !section
>>>-	|| !nb_entries)
>>>-      return false;
>>>-
>>>   // The data of the section.
>>>   Elf_Data *elf_data = elf_rawdata(section, 0);
>>>
>>>@@ -7654,28 +7775,30 @@ public:
>>>	// (aka ELF symbol table).
>>>	symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>	if (!symbol)
>>>-	  {
>>>-	    adjusted_symbol_address =
>>>-	      maybe_adjust_var_sym_address(symbol_address);
>>>-	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>-	    if (!symbol)
>>>-	      // This must be a symbol that is of type neither FUNC
>>>-	      // (function) nor OBJECT (variable).  There are for intance,
>>>-	      // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>-	      // am not sure what those are.
>>>-	      continue;
>>>-	  }
>>>+          {
>>>+            adjusted_symbol_address =
>>>+              maybe_adjust_var_sym_address(symbol_address);
>>>+            symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
>>>+            if (!symbol)
>>>+              {
>>>+	        // This must be a symbol that is of type neither FUNC
>>>+	        // (function) nor OBJECT (variable).  There are for intance,
>>>+	        // symbols of type 'NOTYPE' in the ksymtab symbol table.  I
>>>+	        // am not sure what those are.
>>>+	        continue;
>>>+	      }
>>>+          }
>>
>>This part of the diff seems to a change purely of the indentation/tabs.
>>Maybe you can revert this part.
>
>Yes you are right. Reverted!
>
>>>
>>>	address_set_sptr set;
>>>	if (symbol->is_function())
>>>	  {
>>>	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(adjusted_symbol_address));
>>>-	    set = linux_exported_fns_set;
>>>+	    set = exported_fns_set;
>>>	  }
>>>	else if (symbol->is_variable())
>>>	  {
>>>	    ABG_ASSERT(lookup_elf_var_symbol_from_address(adjusted_symbol_address));
>>>-	    set = linux_exported_vars_set;
>>>+	    set = exported_vars_set;
>>>	  }
>>>	else
>>>	  ABG_ASSERT_NOT_REACHED;
>>>@@ -7684,6 +7807,157 @@ public:
>>>   return true;
>>> }
>>>
>>>+  /// Populate the symbol map by extracting the exported symbols from a
>>>+  /// ksymtab rela section.
>>>+  ///
>>>+  /// @param section the ksymtab section to read from
>>>+  ///
>>>+  /// @param strings_section the __ksymtab_strings section
>>>+  ///
>>>+  /// @param exported_fns_set the set of exported functions
>>>+  ///
>>>+  /// @param exported_vars_set the set of exported variables
>>>+  ///
>>>+  /// @return true upon successful completion, false otherwise.
>>>+  bool
>>>+  populate_symbol_map_from_ksymtab_reloc(Elf_Scn *reloc_section, Elf_Scn *strings_section,
>>>+                                         address_set_sptr exported_fns_set,
>>>+                                         address_set_sptr exported_vars_set)
>>>+  {
>>>+    GElf_Shdr reloc_section_mem;
>>>+    GElf_Shdr *reloc_section_shdr = gelf_getshdr(reloc_section, &reloc_section_mem);
>>
>>gelf_getshdr returns &reloc_section_mem. So you might want to use
>>reloc_section_mem directly rather than introducing reloc_section_mem.
>
>Do you mean get rid of the reloc_section_shdr pointer and just use
>&reloc_section_mem? I just followed the convention for gelf_getshdr()
>invocations in other places to be consistent.

Maybe you are right to stay consistent. On the other hand side, it is
quite redundant and we might as well fix it in the other locations
fairly quickly.

>>>+    size_t reloc_count = reloc_section_shdr->sh_size / reloc_section_shdr->sh_entsize;
>>>+
>>>+    Elf_Data *reloc_section_data = elf_rawdata(reloc_section, 0);
>>>+    size_t strings_ndx = elf_ndxscn(strings_section);
>>>+
>>>+    GElf_Rela rela;
>>>+    elf_symbol_sptr symbol;
>>>+    for (unsigned int i = 0; i < reloc_count; i++)
>>>+      {
>>>+        gelf_getrela(reloc_section_data, i, &rela);
>>>+        symbol = lookup_elf_symbol_from_index(GELF_R_SYM(rela.r_info));
>>>+
>>>+        // This shouldn't happen..
>>>+        if (!symbol)
>>
>>Maybe ABG_ASSERT then?
>
>Good idea!
>
>>>+          return false;
>>>+
>>>+        // If the symbol is from __ksymtab_strings, ignore it.
>>>+        if (symbol->get_shndx() == strings_ndx)
>>>+          continue;
>>>+
>>>+	if (!symbol->is_function() && !symbol->is_variable())
>>>+          {
>>>+            if (do_log())
>>>+              {
>>>+                if (symbol->get_type() == elf_symbol::NOTYPE_TYPE)
>>>+                  cerr << "skipping NOTYPE symbol "
>>>+                       << symbol->get_name()
>>>+                       << " shndx: "
>>>+                       << symbol->get_index()
>>>+                       << " @"
>>>+                       << elf_path()
>>>+                       << "\n";
>>>+                else if (symbol->get_type() == elf_symbol::SECTION_TYPE)
>>>+                  cerr << "skipping SECTION symbol "
>>>+                       << "shndx: "
>>>+                       << symbol->get_index()
>>>+                       << " @"
>>>+                       << elf_path()
>>>+                       << "\n";
>>>+               }
>>>+            continue;
>>>+	  }
>>>+
>>>+	address_set_sptr set;
>>>+	if (symbol->is_function())
>>>+	  {
>>>+	    ABG_ASSERT(lookup_elf_fn_symbol_from_address(symbol->get_value()));
>>>+	    set = exported_fns_set;
>>>+	  }
>>>+	else if (symbol->is_variable())
>>>+	  {
>>>+	    ABG_ASSERT(lookup_elf_var_symbol_from_address(symbol->get_value()));
>>>+	    set = exported_vars_set;
>>>+	  }
>>>+	else
>>>+	  ABG_ASSERT_NOT_REACHED;
>>>+	set->insert(symbol->get_value());
>>>+      }
>>>+    return true;
>>>+  }
>>>+
>>>+  /// Load a given kernel symbol table.
>>>+  ///
>>>+  /// One can thus retrieve the resulting symbols by using the
>>>+  /// accessors read_context::linux_exported_fn_syms(),
>>>+  /// read_context::linux_exported_var_syms(),
>>>+  /// read_context::linux_exported_gpl_fn_syms(), or
>>>+  /// read_context::linux_exported_gpl_var_syms().
>>>+  ///
>>>+  /// @param kind the kind of kernel symbol table to load.
>>>+  ///
>>>+  /// @return true upon successful completion, false otherwise.
>>>+  bool
>>>+  load_kernel_symbol_table(kernel_symbol_table_kind kind)
>>>+  {
>>>+    size_t nb_entries = 0;
>>>+    Elf_Scn *section = 0, *reloc_section = 0, *strings_section = 0;
>>>+    address_set_sptr linux_exported_fns_set, linux_exported_vars_set;
>>>+
>>>+    switch (kind)
>>>+      {
>>>+      case KERNEL_SYMBOL_TABLE_KIND_UNDEFINED:
>>>+	break;
>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB:
>>>+	section = find_ksymtab_section();
>>>+	reloc_section = find_ksymtab_reloc_section();
>>>+	nb_entries = get_nb_ksymtab_entries();
>>>+	linux_exported_fns_set = create_or_get_linux_exported_fn_syms();
>>>+	linux_exported_vars_set = create_or_get_linux_exported_var_syms();
>>>+	break;
>>>+      case KERNEL_SYMBOL_TABLE_KIND_KSYMTAB_GPL:
>>>+	section = find_ksymtab_gpl_section();
>>>+	reloc_section = find_ksymtab_gpl_reloc_section();
>>>+	nb_entries = get_nb_ksymtab_gpl_entries();
>>>+	linux_exported_fns_set = create_or_get_linux_exported_gpl_fn_syms();
>>>+	linux_exported_vars_set = create_or_get_linux_exported_gpl_var_syms();
>>>+	break;
>>>+      }
>>>+
>>>+    strings_section = find_ksymtab_strings_section();
>>>+
>>>+    if (!linux_exported_vars_set
>>>+	|| !linux_exported_fns_set
>>>+	|| !section
>>>+	|| !strings_section
>>>+	|| !nb_entries)
>>>+      return false;
>>>+
>>>+    GElf_Ehdr eh_mem;
>>>+    GElf_Ehdr* elf_header = gelf_getehdr(elf_handle(), &eh_mem);
>>>+    ksymtab_format format = get_ksymtab_format();
>>>+
>>>+    // Although pre-v4.19 kernel modules can have a relocation section for the
>>>+    // __ksymtab section, libdwfl zeroes the rela section after applying
>>>+    // "simple" absolute relocations via dwfl_module_getelf(). For v4.19 and
>>>+    // above, we get PC-relative relocations so dwfl_module_getelf() doesn't
>>>+    // apply those relocations and we're safe to read the relocation section to
>>>+    // determine which exported symbols are in the ksymtab.
>>>+    //
>>>+    // Note: For 32-bit binaries, only populate_symbol_map_from_ksymtab() is
>>>+    // supported as of now.
>>>+    if (!reloc_section
>>>+	|| format == PRE_V4_19_KSYMTAB_FORMAT
>>>+	|| elf_header->e_ident[EI_CLASS] == ELFCLASS32)
>>>+      return populate_symbol_map_from_ksymtab(section, linux_exported_fns_set,
>>>+                                              linux_exported_vars_set, nb_entries);
>>>+    else
>>>+      return populate_symbol_map_from_ksymtab_reloc(reloc_section, strings_section,
>>>+                                                    linux_exported_fns_set,
>>>+                                                    linux_exported_vars_set);
>>>+  }
>>>+
>>> /// Load the special __ksymtab section. This is for linux kernel
>>> /// (module) files.
>>> ///
>>>@@ -8352,6 +8626,17 @@ public:
>>> is_linux_kernel_binary() const
>>> {return find_section(elf_handle(), "__ksymtab_strings", SHT_PROGBITS);}
>>>
>>>+  /// Guess if the current binary is a Linux Kernel module.
>>>+  ///
>>>+  /// To guess that, the function looks for the presence of the
>>>+  /// special ".modinfo" section in the binary.
>>>+  ///
>>>+  bool
>>>+  is_linux_kernel_module() const
>>>+  {
>>>+    return find_section(elf_handle(), ".modinfo", SHT_PROGBITS);
>>>+  }
>>>+
>>
>>Also refer to this proposal to restrict checking for a module a bit
>>more: https://sourceware.org/ml/libabigail/2019-q3/msg00022.html
>>
>>> /// Getter of the "show_stats" flag.
>>> ///
>>> /// This flag tells if we should emit statistics about various
>>>@@ -15734,6 +16019,8 @@ create_default_var_sym(const string& sym_name, const environment *env)
>>>   elf_symbol::create(env,
>>>		       /*symbol index=*/ 0,
>>>		       /*symbol size=*/ 0,
>>>+		       /*symbol value=*/ 0,
>>>+		       /*symbol shndx=*/ 0,
>>>		       sym_name,
>>>		       /*symbol type=*/ elf_symbol::OBJECT_TYPE,
>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>@@ -16173,6 +16460,8 @@ create_default_fn_sym(const string& sym_name, const environment *env)
>>>   elf_symbol::create(env,
>>>		       /*symbol index=*/ 0,
>>>		       /*symbol size=*/ 0,
>>>+		       /*symbol value=*/ 0,
>>>+		       /*symbol shndx=*/ 0,
>>>		       sym_name,
>>>		       /*symbol type=*/ elf_symbol::FUNC_TYPE,
>>>		       /*symbol binding=*/ elf_symbol::GLOBAL_BINDING,
>>>diff --git a/src/abg-ir.cc b/src/abg-ir.cc
>>>index a5243c3..ad1c261 100644
>>>--- a/src/abg-ir.cc
>>>+++ b/src/abg-ir.cc
>>>@@ -1169,6 +1169,8 @@ struct elf_symbol::priv
>>> const environment*	env_;
>>> size_t		index_;
>>> size_t		size_;
>>>+  uint64_t		value_;
>>>+  uint32_t		shndx_;
>>> string		name_;
>>> elf_symbol::type	type_;
>>> elf_symbol::binding	binding_;
>>>@@ -1213,6 +1215,8 @@ struct elf_symbol::priv
>>>   : env_(),
>>>     index_(),
>>>     size_(),
>>>+      value_(),
>>>+      shndx_(),
>>>     type_(elf_symbol::NOTYPE_TYPE),
>>>     binding_(elf_symbol::GLOBAL_BINDING),
>>>     visibility_(elf_symbol::DEFAULT_VISIBILITY),
>>>@@ -1223,6 +1227,8 @@ struct elf_symbol::priv
>>> priv(const environment*		e,
>>>      size_t				i,
>>>      size_t				s,
>>>+       uint64_t				v,
>>>+       uint32_t				x,
>>>      const string&			n,
>>>      elf_symbol::type		t,
>>>      elf_symbol::binding		b,
>>>@@ -1233,6 +1239,8 @@ struct elf_symbol::priv
>>>   : env_(e),
>>>     index_(i),
>>>     size_(s),
>>>+      value_(v),
>>>+      shndx_(x),
>>>     name_(n),
>>>     type_(t),
>>>     binding_(b),
>>>@@ -1269,6 +1277,10 @@ elf_symbol::elf_symbol()
>>>///
>>>/// @param s the size of the symbol.
>>>///
>>>+/// @param v the value of the symbol.
>>>+///
>>>+/// @param x the section header index (shndx) of the symbol.
>>>+///
>>>/// @param n the name of the symbol.
>>>///
>>>/// @param t the type of the symbol.
>>>@@ -1285,6 +1297,8 @@ elf_symbol::elf_symbol()
>>>elf_symbol::elf_symbol(const environment*	e,
>>>		       size_t			i,
>>>		       size_t			s,
>>>+		       uint64_t			v,
>>>+		       uint32_t			x,
>>>		       const string&		n,
>>>		       type			t,
>>>		       binding			b,
>>>@@ -1292,7 +1306,7 @@ elf_symbol::elf_symbol(const environment*	e,
>>>		       bool			c,
>>>		       const version&		ve,
>>>		       visibility		vi)
>>>-  : priv_(new priv(e, i, s, n, t, b, d, c, ve, vi))
>>>+  : priv_(new priv(e, i, s, v, x, n, t, b, d, c, ve, vi))
>>>{}
>>>
>>>/// Factory of instances of @ref elf_symbol.
>>>@@ -1319,6 +1333,10 @@ elf_symbol::create()
>>>///
>>>/// @param s the size of the symbol.
>>>///
>>>+/// @param v the value of the symbol.
>>>+///
>>>+/// @param x the section header index (shndx) of the symbol.
>>>+///
>>>/// @param n the name of the symbol.
>>>///
>>>/// @param t the type of the symbol.
>>>@@ -1339,6 +1357,8 @@ elf_symbol_sptr
>>>elf_symbol::create(const environment*	e,
>>>		   size_t		i,
>>>		   size_t		s,
>>>+		   uint64_t		v,
>>>+		   uint32_t		x,
>>>		   const string&	n,
>>>		   type		t,
>>>		   binding		b,
>>>@@ -1347,7 +1367,7 @@ elf_symbol::create(const environment*	e,
>>>		   const version&	ve,
>>>		   visibility		vi)
>>>{
>>>-  elf_symbol_sptr sym(new elf_symbol(e, i, s, n, t, b, d, c, ve, vi));
>>>+  elf_symbol_sptr sym(new elf_symbol(e, i, s, v, x, n, t, b, d, c, ve, vi));
>>> sym->priv_->main_symbol_ = sym;
>>> return sym;
>>>}
>>>@@ -1411,6 +1431,20 @@ void
>>>elf_symbol::set_index(size_t s)
>>>{priv_->index_ = s;}
>>>
>>>+/// Getter for the symbol value.
>>>+///
>>>+/// @return the value of the symbol.
>>>+uint64_t
>>>+elf_symbol::get_value() const
>>>+{return priv_->value_;}
>>>+
>>>+/// Getter for the symbol section header index.
>>>+///
>>>+/// @return the section header index of the symbol.
>>>+uint32_t
>>>+elf_symbol::get_shndx() const
>>>+{return priv_->shndx_;}
>>>+
>>>/// Getter for the name of the @ref elf_symbol.
>>>///
>>>/// @return a reference to the name of the @ref symbol.
>>>diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>>>index 810a65f..3c481ad 100644
>>>--- a/src/abg-reader.cc
>>>+++ b/src/abg-reader.cc
>>>@@ -2659,6 +2659,8 @@ build_elf_symbol(read_context& ctxt, const xmlNodePtr node)
>>>
>>> const environment* env = ctxt.get_environment();
>>> elf_symbol_sptr e = elf_symbol::create(env, /*index=*/0, size,
>>>+					 /*value=*/0,
>>>+					 /*shndx=*/0,
>>>					 name, type, binding,
>>>					 is_defined, is_common,
>>>					 version, visibility);
>>>-- 
>>>2.16.4
>>>
>>
>>Nice work! Thank you for that!
>>The patch looks in general good to me. I am a bit concerned about the
>>platform support. There are some corner cases that need to be covered.
>>See the testing below.
>>
>>You might find the .clang-format file useful to match the coding style
>>at some points (especially tabs vs spaces). Although when reviewing the
>>patch, I had a bit trouble with unchanged but reformatted lines as they
>>were confusing my diff. Maybe you can revert the changes that were
>>purely for formatting and we take care of formatting in a later patch.
>>
>>I conducted some testing. I choose a minimal configuration with modules:
>>
>>$ export ARCH=<arch>
>>$ make allnoconfig
>>$ ./scripts/config   \
>>    -e DEBUG_INFO    \  # dwarf info
>>    -e MODULES       \  # module support
>>    -e USB_SUPPORT   \
>>    -m USB           \
>>    -m USB_DUMMY_HCD \  # dummy_hcd.ko is a module without exports
>>    -m USB_GADGET       # depends on usb-common.ko, which I test with
>>
>>$ make olddefconfig
>>$ make
>>$ abidw ./drivers/usb/common/usb-common.ko | less
>># then check for the existence of the elf-function-symbols section
>>
>>                   master         patched
>>Linux v4.18
>>
>>x86_64             works            works
>>x86                works            works
>>arm64              works            works
>>
>>Linux v4.19
>>
>>x86_64             broken           works
>>x86                broken           STILL BROKEN
>>arm64              broken           works
>>
>>So, for x86_64 and arm64 the situation got better and elf symbols are
>>now identified. For x86, this patch did not improve the situation. I did
>>not check that the list of symbols are entirely correct and that they
>>work downstream with abidiff etc. Just the pure existence is a very good
>>indicator for this patch to work.
>>
>>As I said earlier, I am a bit concerned about the platform support. We
>>should probably at emit a warning if we know that we can't handle a
>>certain platform or (preferably) avoid the platform specific code. Not
>>sure whether there is a better way to extract the information though.
>
>Yeah, I did not like that either too much. It's difficult to determine
>the ksymtab format of modules though, especially since for the newer
>format ksymtab sections they are not relocated (and appear as just
>zeroes), so we cannot tell which symbols are in it and what ksymtab
>format they are using :-(. If we think of a platform-agnostic way of
>doing it, I'm for it (perhaps a different patch). I'll definitely add
>a warning if we come across a platform that we don't support yet.

I would be ok with the platform specific code as long as we make sure
warnings are emitted in case there is something unhandled. As you
suggested.

Do you think it makes sense to have kind of a versioning in the Kernel
and its Modules to somehow announce the format? Haven't thought this
through entirely, but something like a magic ksymtab entry (the first
one?) or modinfo, that tells about the ksymtab format? I know we can't
do this for already released versions, but maybe for future kernel
versions?

Thanks!

Cheers,
Matthias

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-09-12 13:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  0:00 [RFC PATCH] Support pre and post v4.19 ksymtabs for Linux kernel modules Jessica Yu
2019-01-01  0:00 ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00       ` Jessica Yu
2019-01-01  0:00         ` Matthias Maennich via libabigail
2019-01-01  0:00   ` Jessica Yu
2019-01-01  0:00     ` Matthias Maennich via libabigail
2019-01-01  0:00       ` Jessica Yu
2019-01-01  0:00 ` Jessica Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).