public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Matthias Maennich <maennich@google.com>
To: Giuliano Procida <gprocida@google.com>
Cc: Dodji Seketeli <dodji@seketeli.org>,
	Giuliano Procida via Libabigail <libabigail@sourceware.org>,
	kernel-team@android.com
Subject: Re: [PATCH 20/20] symtab: Add support for MODVERSIONS (CRC checksums)
Date: Thu, 18 Mar 2021 22:10:54 +0000	[thread overview]
Message-ID: <YFPP7kG6ADa3YL4Z@google.com> (raw)
In-Reply-To: <CAGvU0HkEmKFawJguW+D5+MTcRAopG3yPmpaM9OmFTuJWNqGmiA@mail.gmail.com>

Hi!

On Wed, Mar 17, 2021 at 11:29:13PM +0000, Giuliano Procida wrote:
>Hi Dodji.
>
>Not sure if Matthias would say exactly the same things.
>
>On Wed, 17 Mar 2021 at 17:13, Dodji Seketeli <dodji@seketeli.org> wrote:
>
>> Matthias Maennich <maennich@google.com> a écrit:
>>
>> > The Linux Kernel has a mechanism (MODVERSIONS) to checksum symbols based
>> > on their type. In a way similar to what libabigail does, but different.
>> > The CRC values for symbols can be extracted from the symtab either by
>> > following the __kcrctab_<symbol> entry or by using the __crc_<symbol>
>> > value directly.
>> >
>> > This patch adds support for extracting those CRC values and storing them
>> > as a property of elf_symbol. Subsequently, 'crc' gets emitted as an
>> > attribute of 'elf-symbol' in the XML representation.
>>
>> This is pretty cool btw.
>>
>> Personally, I would have tied the CRC to the decl instead of the ELF
>> symbol.  I know in the kernel land, symbols and decls are usually
>> thought of interchangeably but conceptually, they are not the same
>> thing.
>>
>> The linux kernel CRCs are computed from the types of the declarations
>> that are associated to the ELF symbol.  I am talking about the genksyms
>> machinery.  Se we are really talking about the declarations and types
>> here.  They are storing in ELF because there is no concept of decls and
>> types in ELF.  But we do have that in libabigail.  So I really think it
>> makes more sense to have this tied to decls instead of ELF symbols.
>>
>>
>I can think of a few arguments in the opposite direction:
>
>The link between ELF symbols and declarations isn't completely reliable.
>Examples here include things like strlen and friends which are declared in
>.h
>but defined in .S files. Arguably this is a bug somewhere needing a fix.
>
>I opened a libabigail bug in the last few days regarding a missing parameter
>diff. Having CRCs meant we didn't miss this and checking at the symbol
>level protects us from issues in the much more complex type difference code.
>
>CRCs are part of the module link-loader interface which is very much about
>symbol-level compatibility rather than type-level compatibility. We risk
>diverging from the kernel's notion of module compatibility if we interpret
>CRCs as anything else.
>
>A CRC can change even if the decl type does not. Internal dependencies on

That was my first thought. In fact, I can construct a case where the
same underlying decl is referenced by two elf symbols with different CRC
values: with symbol aliases.

Consider this patch to the existing modversions test case.

   | diff --git a/tests/data/test-symtab/kernel/one_of_each.c b/tests/data/test-symtab/kernel/one_of_each.c
   | index 9d461fbd0053..488dbf38d3d9 100644
   | --- a/tests/data/test-symtab/kernel/one_of_each.c
   | +++ b/tests/data/test-symtab/kernel/one_of_each.c
   | @@ -6,6 +6,9 @@ EXPORT_SYMBOL(exported_function);
   |  void exported_function_gpl(void) {}
   |  EXPORT_SYMBOL_GPL(exported_function_gpl);
   |
   | +void aliased_function(void) __attribute__ ((alias("exported_function")));
   | +EXPORT_SYMBOL_GPL(aliased_function);
   | +

It creates a symbol table layout like this

  | $ readelf -sW one_of_each.ko | egrep "__(ksy|cr).*function"
  |   0000000000000000     0 NOTYPE  LOCAL  DEFAULT    3 __ksymtab_exported_function
  |   000000000000000c     0 NOTYPE  LOCAL  DEFAULT    5 __ksymtab_exported_function_gpl
  |   0000000000000000     0 NOTYPE  LOCAL  DEFAULT    5 __ksymtab_aliased_function
  |   00000000e52d5bcf     0 NOTYPE  GLOBAL DEFAULT  ABS __crc_exported_function
  |   00000000f9cc3f69     0 NOTYPE  GLOBAL DEFAULT  ABS __crc_aliased_function
  |   00000000fda43846     0 NOTYPE  GLOBAL DEFAULT  ABS __crc_exported_function_gpl

In particular, the aliased function obviously maps to the same function
location, but received a different CRC value (probably due to having a
different name).

This creates (simplified)

   <elf-symbol name='aliased_function' type='func-type' ...  crc='0xf9cc3f69'/>
   <elf-symbol name='exported_function' type='func-type' ... alias='aliased_function' crc='0xe52d5bcf'/>
   <elf-symbol name='exported_function_gpl' type='func-type' ... crc='0xfda43846'/>

and 'aliased_function' is not directly connected to any own decl. It
delegates that to 'exported_function'. So, connecting it to a decl
would make us lose information. Without debug information we could not
even surely tell which one is the alias and which one is the original,
and the CRC would be taken rather randomly from one of them.

But we can think this further through. I usually associate the decl and
type information with DWARF and the ELF symbol information with what we
can deduct from the symbol table. For stripped binaries that do not have
debug information, we can only work with symbol information. Now, since
the CRC is part of the symbol table, it is actually not stripped off and
still retained. Hence, libabigail is able at abidw or abidiff time to
extract the CRC from stripped kernel binaries and is able to do somewhat
reasonable ABI analysis based on CRC values and other properties encoded
in the symbol table. That is actually a big win and we would likely give
up on this if we would push this to the decls. (We still could create
the decls just for this property, but this does sound like a hack and
would still not be accurate for aliases.)

>such a decl (admittedly not possible in pure C, I think) should not be
>reported
>as having changed indirectly, just because the CRC has.
>
>It's less and simpler code.

Besides everything I said above, this is the main reason I would argue
to keep this attached to the ELF symbol. Based on the new symtab reader
implementation, this is a fairly simple piece of code. ~100 lines of
code.

>
>If CRCs are ever replaced by something better, it will likely be DWARF or
>perhaps even BTF-based. At this point the argument that it's type info
>and not symbol info will be stronger. BTF would be less sensitive to changes
>than DWARF in odd cases (multidimensional arrays are flattened, for
>example).
>
>If tomorrow we build or get a hash for decls and types we'll have to
>> re-do this again, at the decl level.
>>
>> As a matter of fact, the feature already exists for type units in DWARF
>> for instance.  We just don't support that yet.  But we might have to
>> support it in the future.
>>
>> So the more I think about this, the more I think we should add an
>> "artifact_hash" property to type_or_decl_base instead of putting this
>> into the ELF symbol.
>>
>> What do you think?

I think it belongs to the elf symbol and it makes things much simpler :-)

Cheers,
Matthias

>>
>>
>It's an interesting one.
>
>
>> [...]
>>
>> Cheers,
>>
>> --
>>                 Dodji
>>
>
>Regards,
>Giuliano.

  reply	other threads:[~2021-03-18 22:10 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 21:42 [PATCH v1 00/16] Refactor (k)symtab reader Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 01/16] abg-cxx-compat: add simplified version of std::optional Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 02/16] abg-cxx-compat: more <functional> support: std::bind and friends Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 03/16] abg-ir: elf_symbol: add is_in_ksymtab field Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 04/16] abg-ir: elf_symbol: add is_suppressed field Matthias Maennich
2020-06-22  9:46   ` Giuliano Procida
2020-06-19 21:42 ` [PATCH v1 05/16] dwarf-reader split: create abg-symtab-reader.{h, cc} and test case Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 06/16] Refactor ELF symbol table reading by adding a new symtab reader Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 07/16] Integrate new symtab reader into corpus and read_context Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 08/16] corpus: make get_(undefined_)?_(var|fun)_symbols use the new symtab Matthias Maennich
2020-06-22  9:53   ` Giuliano Procida
2020-06-19 21:42 ` [PATCH v1 09/16] corpus: make get_unreferenced_(function|variable)_symbols " Matthias Maennich
2020-06-19 21:42 ` [PATCH v1 10/16] abg-reader: avoid using the (var|function)_symbol_map Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 11/16] dwarf-reader: read_context: use new symtab in *_symbols_is_exported Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 12/16] Switch kernel stuff over to new symtab and drop unused code Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 13/16] abg-elf-helpers: migrate ppc64 specific helpers Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 14/16] symtab_reader: add support for ppc64 ELFv1 binaries Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 15/16] abg-corpus: remove symbol maps and their setters Matthias Maennich
2020-06-19 21:43 ` [PATCH v1 16/16] dwarf reader: drop (now) unused code related symbol table reading Matthias Maennich
2020-06-22  9:56   ` Giuliano Procida
2020-07-03 16:46 ` [PATCH v2 00/21] Refactor (k)symtab reader Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 01/21] abg-cxx-compat: add simplified version of std::optional Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 02/21] abg-cxx-compat: more <functional> support: std::bind and friends Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 03/21] abg-ir: elf_symbol: add is_in_ksymtab field Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 04/21] abg-ir: elf_symbol: add is_suppressed field Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 05/21] dwarf-reader split: create abg-symtab-reader.{h, cc} and test case Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 06/21] Refactor ELF symbol table reading by adding a new symtab reader Matthias Maennich
2020-07-20 15:39     ` Dodji Seketeli
2020-07-03 16:46   ` [PATCH v2 07/21] Integrate new symtab reader into corpus and read_context Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 08/21] corpus: make get_(undefined_)?_(var|fun)_symbols use the new symtab Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 09/21] corpus: make get_unreferenced_(function|variable)_symbols " Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 10/21] abg-reader: avoid using the (var|function)_symbol_map Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 11/21] dwarf-reader: read_context: use new symtab in *_symbols_is_exported Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 12/21] Switch kernel stuff over to new symtab and drop unused code Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 13/21] abg-elf-helpers: migrate ppc64 specific helpers Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 14/21] symtab_reader: add support for ppc64 ELFv1 binaries Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 15/21] abg-corpus: remove symbol maps and their setters Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 16/21] dwarf reader: drop now-unused code related to symbol table reading Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 17/21] test-symtab: add tests for whitelisted functions Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 18/21] symtab/dwarf-reader: allow hinting of main symbols for aliases Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 19/21] dwarf-reader/writer: consider aliases when dealing with suppressions Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 20/21] symtab: Add support for MODVERSIONS (CRC checksums) Matthias Maennich
2020-07-03 16:46   ` [PATCH v2 21/21] reader/symtab: Improve handling for suppressed aliases Matthias Maennich
2020-07-20 14:27   ` [PATCH v2 00/21] Refactor (k)symtab reader Dodji Seketeli
2021-01-27 12:58 ` [PATCH 00/20] " Matthias Maennich
2021-01-27 12:58   ` [PATCH 01/20] abg-cxx-compat: add simplified version of std::optional Matthias Maennich
2021-03-09  9:43     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 02/20] abg-ir: elf_symbol: add is_in_ksymtab field Matthias Maennich
2021-03-09 14:05     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 03/20] abg-ir: elf_symbol: add is_suppressed field Matthias Maennich
2021-03-09 18:03     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 04/20] dwarf-reader split: create abg-symtab-reader.{h, cc} and test case Matthias Maennich
2021-03-10 18:00     ` [PATCH 04/20] dwarf-reader split: create abg-symtab-reader.{h,cc} " Dodji Seketeli
2021-01-27 12:58   ` [PATCH 05/20] Refactor ELF symbol table reading by adding a new symtab reader Matthias Maennich
2021-03-12 11:18     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 06/20] Integrate new symtab reader into corpus and read_context Matthias Maennich
2021-03-12 15:04     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 07/20] corpus: make get_(undefined_)?_(var|fun)_symbols use the new symtab Matthias Maennich
2021-03-15 10:05     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 08/20] corpus: make get_unreferenced_(function|variable)_symbols " Matthias Maennich
2021-03-15 12:06     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 09/20] abg-reader: avoid using the (var|function)_symbol_map Matthias Maennich
2021-03-15 14:23     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 10/20] dwarf-reader: read_context: use new symtab in *_symbols_is_exported Matthias Maennich
2021-03-15 18:13     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 11/20] Switch kernel stuff over to new symtab and drop unused code Matthias Maennich
2021-03-16 10:38     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 12/20] abg-elf-helpers: migrate ppc64 specific helpers Matthias Maennich
2021-03-16 10:59     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 13/20] symtab_reader: add support for ppc64 ELFv1 binaries Matthias Maennich
2021-03-16 11:39     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 14/20] abg-corpus: remove symbol maps and their setters Matthias Maennich
2021-03-16 18:08     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 15/20] dwarf reader: drop (now) unused code related to symbol table reading Matthias Maennich
2021-03-16 18:42     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 16/20] test-symtab: add tests for whitelisted functions Matthias Maennich
2021-03-17 11:07     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 17/20] symtab/dwarf-reader: allow hinting of main symbols for aliases Matthias Maennich
2021-03-17 13:40     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 18/20] dwarf-reader/writer: consider aliases when dealing with suppressions Matthias Maennich
2021-03-17 15:44     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 19/20] abg-writer.cc: fix write_elf_symbol_reference loop Matthias Maennich
2021-03-17 16:11     ` Dodji Seketeli
2021-01-27 12:58   ` [PATCH 20/20] symtab: Add support for MODVERSIONS (CRC checksums) Matthias Maennich
2021-03-17 17:13     ` Dodji Seketeli
2021-03-17 23:29       ` Giuliano Procida
2021-03-18 22:10         ` Matthias Maennich [this message]
2021-03-19 16:55           ` Dodji Seketeli
2021-03-19 18:15     ` Dodji Seketeli
2021-03-29 13:19   ` [GIT PULL] Refactor (k)symtab reader Matthias Maennich
2021-04-02 14:28     ` Dodji Seketeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFPP7kG6ADa3YL4Z@google.com \
    --to=maennich@google.com \
    --cc=dodji@seketeli.org \
    --cc=gprocida@google.com \
    --cc=kernel-team@android.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).