public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name
@ 2019-01-01  0:00 Matthias Maennich via libabigail
  2020-01-01  0:00 ` Dodji Seketeli
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Maennich via libabigail @ 2019-01-01  0:00 UTC (permalink / raw)
  To: libabigail; +Cc: dodji, kernel-team, maennich

The ksymtab address entry might not be actually pointing to a GLOBAL
visible symbol in .symtab of the same binary. E.g. that happens when
compiling the kernel with clang's -fsanitize-cfi* and friends. In
contrast, the ksymtab.name entry has to match an entry in .symtab. It
would otherwise upset linkers very much. So, rather than relying on the
address being resolvable in any way, we might as well look up the symbol
name in the .symtab directly as this is the symbol we later want to
analyze anyway and ksymtab.name has to match a visible symbol. In this
patch we fall back to a name lookup if anything we tried so far was not
successful. That makes the patch a bit more reviewable.

On the other hand, we should be able to entirely rely on the name based
lookup and drop the address based one. That would require some
refactoring that I did not start at this point.

I created some local test cases to validate my work. As of now,
tests/test-read-dwarf.cc has no explicit support for the
linux_kernel_mode, hence is not fully capable of asserting this
implementation.

Note, there is one additional case missing: If we have CFI binaries with
position relative relocations, we also need to touch how
populate_symbol_map_from_ksymtab_reloc works to do a similar method
there.

Long story short: if we decide to go this approach, I will refactor the
code and add tests accordingly and fill the gap for the missing cases.

Yes, this patch is a bit hacky at this point, but is supposed ot proof
that name based lookup can work here.

	* src/abg-dwarf-reader.cc (populate_symbol_map_from_ksymtab):
	add support for symtab lookup by ksymtab.name entries.
	(try_reading_first_ksymtab_entry): Likewise.

Signed-off-by: Matthias Maennich <maennich@google.com>
---
 src/abg-dwarf-reader.cc | 135 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 85b8e2461721..d1a220643945 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -7681,6 +7681,66 @@ public:
 
     symbol_address = maybe_adjust_fn_sym_address(symbol_address);
     symbol = lookup_elf_symbol_from_address(symbol_address);
+
+    // the ksymtab.addr might not be a match in .symtab.addr (e.g. if CFI is
+    // enabled. We can still lookup the symbol by name in the symtab.
+    if (!symbol)
+      {
+	bytes += symbol_value_size;  // advance now to the ksymtab.name entry
+	if (position_relative_relocations)
+	  {
+	    int32_t offset = 0;
+	    ABG_ASSERT(read_int_from_array_of_bytes(bytes, symbol_value_size,
+						    is_big_endian, offset));
+	    GElf_Shdr section_header;
+	    gelf_getshdr(section, &section_header);
+	    // the actual symbol address is relative to its position. Since we
+	    // do not know the position, we take the beginning of the section,
+	    // add the read_offset that we might have and finally apply the
+	    // offset we read from the section.
+	    symbol_address = section_header.sh_addr + read_offset + offset;
+	  }
+	else
+	  ABG_ASSERT(read_int_from_array_of_bytes(
+	      bytes, symbol_value_size, is_big_endian, symbol_address));
+
+	// the ksymtab.name entry has a corresponding entry in ksymtab_strings
+	// that we now discover.
+
+	Elf_Data* ksymtab_strings
+	    = elf_getdata(find_ksymtab_strings_section(), 0);
+	char*	  strings = reinterpret_cast<char*>(ksymtab_strings->d_buf);
+	GElf_Shdr section_header;
+	gelf_getshdr(find_ksymtab_strings_section(), &section_header);
+
+	GElf_Addr ksymtab_strings_offset
+	    = symbol_address - section_header.sh_addr;
+
+	if (ksymtab_strings_offset < section_header.sh_size)
+	  {
+	    const std::string& name
+		= strings + symbol_address - section_header.sh_addr;
+
+	    // now we got the name, let's find a match in the function or
+	    // variable symbols
+
+	    string_elf_symbols_map_type::const_iterator it
+		= fun_syms().find(name);
+	    if (it != fun_syms().end())
+	      {
+		symbol = it->second[0];
+	      }
+
+	    if (!symbol)
+	      {
+		it = var_syms().find(name);
+		if (it != var_syms().end())
+		  {
+		    symbol = it->second[0];
+		  }
+	      }
+	  }
+      }
     return symbol;
   }
 
@@ -8067,6 +8127,81 @@ public:
 	    adjusted_symbol_address =
 	      maybe_adjust_var_sym_address(symbol_address);
 	    symbol = lookup_elf_symbol_from_address(adjusted_symbol_address);
+
+	    // the ksymtab.addr might not be a match in .symtab.addr (e.g. if
+	    // CFI is enabled. We can still lookup the symbol by name in the
+	    // symtab.
+	    if (!symbol)
+	      {
+		ABG_ASSERT(read_int_from_array_of_bytes(
+		    &bytes[entry_offset
+			   + symbol_value_size], // advance to ksymtab.name
+		    symbol_value_size, is_big_endian, symbol_address));
+
+		// the ksymtab.name entry has a corresponding entry in
+		// ksymtab_strings that we now discover.
+
+		Elf_Data* ksymtab_strings
+		    = elf_getdata(find_ksymtab_strings_section(), 0);
+		char* strings
+		    = reinterpret_cast<char*>(ksymtab_strings->d_buf);
+		GElf_Shdr section_header;
+		gelf_getshdr(find_ksymtab_strings_section(), &section_header);
+
+		GElf_Addr ksymtab_strings_offset
+		    = symbol_address - section_header.sh_addr;
+		if (ksymtab_strings_offset < section_header.sh_size)
+		  {
+		    const std::string& name
+			= strings + symbol_address - section_header.sh_addr;
+
+		    // now we got the name, let's find a match in the function
+		    // or variable symbols. We also need to discover the actual
+		    // symbol address in .symtab to discover the dwarf
+		    // information later. For that, we iterate
+		    // (fun|var)_addr_sym maps to find a match. That is overly
+		    // expensive and can be optimized: TODO: use a proper
+		    // lookup map.
+
+		    addr_elf_symbol_sptr_map_type::const_iterator I, E;
+		    string_elf_symbols_map_type::const_iterator it
+			= fun_syms().find(name);
+		    if (it != fun_syms().end())
+		      {
+			symbol = it->second[0];
+			for (I = fun_addr_sym_map().begin(),
+			    E = fun_addr_sym_map().end();
+			     I != E; ++I)
+			  {
+			    if (I->second->get_name() == name)
+			      {
+				adjusted_symbol_address = I->first;
+			      }
+			  }
+		      }
+
+		    // we continue our search in the variable maps ...
+
+		    if (!symbol)
+		      {
+			it = var_syms().find(name);
+			if (it != var_syms().end())
+			  {
+			    symbol = it->second[0];
+			    for (I = var_addr_sym_map().begin(),
+				E = var_addr_sym_map().end();
+				 I != E; ++I)
+			      {
+				if (I->second->get_name() == name)
+				  {
+				    adjusted_symbol_address = I->first;
+				  }
+			      }
+			  }
+		      }
+		  }
+	      }
+
 	    if (!symbol)
 	      // This must be a symbol that is of type neither FUNC
 	      // (function) nor OBJECT (variable).  There are for intance,
-- 
2.24.0.393.g34dc348eaf-goog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name
  2019-01-01  0:00 [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name Matthias Maennich via libabigail
@ 2020-01-01  0:00 ` Dodji Seketeli
  2020-01-01  0:00   ` Matthias Maennich via libabigail
  0 siblings, 1 reply; 4+ messages in thread
From: Dodji Seketeli @ 2020-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, kernel-team

Hello,

Thank you for working on this.

Before going deep in the patch itself, I'd have some questions about
basic understanding of what is going on, if you don't mind.

Matthias Maennich <maennich@google.com> a écrit:

> The ksymtab address entry might not be actually pointing to a GLOBAL
> visible symbol in .symtab of the same binary. E.g. that happens when
> compiling the kernel with clang's -fsanitize-cfi* and friends.

Just so that I understand better; in cases where the address part of a
ksymtab entry cointains a number that is not the address of a GLOBAL
visible symbol in .symtab, what does that number represent?

I am thinking maybe that number is the address of an artificial "stub"
symbol that would have been generated by the sanitizer, but I am not
sure at this point.

> In contrast, the ksymtab.name entry has to match an entry in
> .symtab. It would otherwise upset linkers very much.

It does seem odd to me that the address part of a ksymtab entry does not
point to a symbol in .symtab that we can use, and yet the name part of
that same entry points to the name of a symbol in .symtab that we can
use.  Doesn't it?  I guess the information I am missing is related to my
earlier question.

Thanks,

-- 
		Dodji

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name
  2020-01-01  0:00   ` Matthias Maennich via libabigail
@ 2020-01-01  0:00     ` Dodji Seketeli
  0 siblings, 0 replies; 4+ messages in thread
From: Dodji Seketeli @ 2020-01-01  0:00 UTC (permalink / raw)
  To: Matthias Maennich; +Cc: libabigail, kernel-team

Hello Matthias,

[...]

Matthias Maennich <maennich@google.com> a écrit:

>>> The ksymtab address entry might not be actually pointing to a GLOBAL
>>> visible symbol in .symtab of the same binary. E.g. that happens when
>>> compiling the kernel with clang's -fsanitize-cfi* and friends.

[...]

On Tue, Jan 07, 2020 at 06:07:49PM +0100, Dodji Seketeli wrote:

>>Just so that I understand better; in cases where the address part of a
>>ksymtab entry cointains a number that is not the address of a GLOBAL
>>visible symbol in .symtab, what does that number represent?
>>
>>I am thinking maybe that number is the address of an artificial "stub"
>>symbol that would have been generated by the sanitizer, but I am not
>>sure at this point.

Matthias Maennich <maennich@google.com> a écrit:


> The issue is caused by the modifications happening to .symtab and
> __ksymtab when using clang's -fsanitize-cfi* and friends.
>
> __ksymtab contains entries like (ignoring namespaces and PREL32 for simplicity):
>
> ksymtab {
>   addr  -> referring to an entry in .symtab
>   name  -> referring to an entry in __ksymtab_strings
> }
>
> Before CFI, .symtab would contain an entry for the symbol referring to
> the relevant .text.<symbol> section and ksymtab.addr would contain that
> particular addr that is referred to by .symtab.
>
> ksymtab.addr -> .symtab -> .text.<symbol>
>
> Now, with CFI, the .symtab contains two entries per symbol:
>
>  <symbol> points to the .text.<symbol> as before
>
>  <symbol>.cfi_jt points to an entry in the cfi jump table that is used
>  to reach the relevant .text.<symbol> section at runtime
>
> The newly introduced symbol is a HIDDEN entry in the .symtab.
>
> Unfortunately, ksymtab.addr refers now to the new entry, but since it is
> hidden, will not be discovered by the usual lookups in libabigail. Even
> if we would find the hidden symbol, the debug info associated with it is
> not sufficient for ABI analysis.

Thank you for the information.

In my understanding, it's true that a ksymtab.addr referring to the
hidden entry of the cfi jump is not useful to us and in any case,
looking that up in the map read_context::fun_entry_addr_sym_map() would
yield an empty result.

But then, if you want to look up the symbol from its name, my
understanding is that we already have a map, in libabigail, that
associates symbol names to the symbols.  That map is returns by the
function read_context::fun_syms().  Here is it's documentation:

  /// Getter for the map of function symbols (name -> sym).
  ///
  /// @return a reference to the map of function symbols.
  const string_elf_symbols_map_type&
  fun_syms() const

Is there a reason why looking up the symbol from that map, once you've
got its name wouldn't work?

Thanks.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name
  2020-01-01  0:00 ` Dodji Seketeli
@ 2020-01-01  0:00   ` Matthias Maennich via libabigail
  2020-01-01  0:00     ` Dodji Seketeli
  0 siblings, 1 reply; 4+ messages in thread
From: Matthias Maennich via libabigail @ 2020-01-01  0:00 UTC (permalink / raw)
  To: Dodji Seketeli; +Cc: libabigail, kernel-team

Hi Dodji!

On Tue, Jan 07, 2020 at 06:07:49PM +0100, Dodji Seketeli wrote:
>Hello,
>
>Thank you for working on this.
>
>Before going deep in the patch itself, I'd have some questions about
>basic understanding of what is going on, if you don't mind.
>
>Matthias Maennich <maennich@google.com> a écrit:
>
>> The ksymtab address entry might not be actually pointing to a GLOBAL
>> visible symbol in .symtab of the same binary. E.g. that happens when
>> compiling the kernel with clang's -fsanitize-cfi* and friends.
>
>Just so that I understand better; in cases where the address part of a
>ksymtab entry cointains a number that is not the address of a GLOBAL
>visible symbol in .symtab, what does that number represent?
>
>I am thinking maybe that number is the address of an artificial "stub"
>symbol that would have been generated by the sanitizer, but I am not
>sure at this point.
>

The issue is caused by the modifications happening to .symtab and
__ksymtab when using clang's -fsanitize-cfi* and friends.

__ksymtab contains entries like (ignoring namespaces and PREL32 for simplicity):

ksymtab {
   addr  -> referring to an entry in .symtab
   name  -> referring to an entry in __ksymtab_strings
}

Before CFI, .symtab would contain an entry for the symbol referring to
the relevant .text.<symbol> section and ksymtab.addr would contain that
particular addr that is referred to by .symtab.

ksymtab.addr -> .symtab -> .text.<symbol>

Now, with CFI, the .symtab contains two entries per symbol:

  <symbol> points to the .text.<symbol> as before

  <symbol>.cfi_jt points to an entry in the cfi jump table that is used
  to reach the relevant .text.<symbol> section at runtime

The newly introduced symbol is a HIDDEN entry in the .symtab.

Unfortunately, ksymtab.addr refers now to the new entry, but since it is
hidden, will not be discovered by the usual lookups in libabigail. Even
if we would find the hidden symbol, the debug info associated with it is
not sufficient for ABI analysis.

One way to get out of this without breaking other binary versions (which
we have) is to make this lookup by name rather by address. That is what
this change attempts.

>> In contrast, the ksymtab.name entry has to match an entry in
>> .symtab. It would otherwise upset linkers very much.
>
>It does seem odd to me that the address part of a ksymtab entry does not
>point to a symbol in .symtab that we can use, and yet the name part of
>that same entry points to the name of a symbol in .symtab that we can
>use.  Doesn't it?  I guess the information I am missing is related to my
>earlier question.

Let me know if you need more clarifications than the above. Took me a
bit to get my head around that as well :-)

Anyway, it also made me thinking of doing _all_ ksymtab lookups by name.
Then we could simplify the code at this point. We would just need to
make the by-name access a bit more convenient.

Cheers,
Matthias

>
>Thanks,
>
>-- 
>		Dodji

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-22  9:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  0:00 [RFC PATCH] dwarf-reader: support ksymtab symbol lookup by name Matthias Maennich via libabigail
2020-01-01  0:00 ` Dodji Seketeli
2020-01-01  0:00   ` Matthias Maennich via libabigail
2020-01-01  0:00     ` Dodji Seketeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).