From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id 7E031386EC4D for ; Thu, 18 Mar 2021 22:10:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7E031386EC4D Received: by mail-wm1-x32e.google.com with SMTP id z6-20020a1c4c060000b029010f13694ba2so4206977wmf.5 for ; Thu, 18 Mar 2021 15:10:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=nxiwOcjo/ROZHwQ8z7IH6aGbkYd34BuRrlFuSImqAfA=; b=AnzLxaf+ybRtPTF9i1WquTgc4tbRIIjCoYig4/JDTRBybAS9fq+Y0ZM1lSlBHtdhQU LUAy5Rdq+vq7Hgwznw6rO7z3lNvGGDxFGTSBh536MhmWf4n6vpRJHsr3fZTBPlgfiDKn VqmU0BFwMjbQWynVqDFW2jIun/7I68Vaep9WYp7aqKFc1q2BqhuNIB+CnROvvRjvAlqc +joOAe+o96prgtgeWeymrthTPcfxb/1pVZ2VdFSp51Lf41CJ2cQDRf0t0u9laqQVMTNJ 5ft1LaRcpYQ+gvjeDZaH5oU/2ynLYR0YdAT+liQ4zEGRgpcH9wX5Ij3rC3RAHvkhXzmz nRoQ== X-Gm-Message-State: AOAM5319ZYJrD8MHMuZPMZfIrMoCEbqobXxj5UTgJHKvgnJhATo8UQfh D7vr9PX/CF1XfjKdqEj2mp0I8w== X-Google-Smtp-Source: ABdhPJw4zStL5iEckZvy34mv0/CDNG3G62U0Yip3ilKOEzyYO+YAQseSjnkDrJjCKcpZe3kcHwNqDQ== X-Received: by 2002:a05:600c:4f14:: with SMTP id l20mr1003729wmq.71.1616105457351; Thu, 18 Mar 2021 15:10:57 -0700 (PDT) Received: from google.com ([2a00:79e0:d:210:119f:8adc:54d8:6aea]) by smtp.gmail.com with ESMTPSA id s3sm3577902wmd.21.2021.03.18.15.10.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Mar 2021 15:10:55 -0700 (PDT) Date: Thu, 18 Mar 2021 22:10:54 +0000 From: Matthias Maennich To: Giuliano Procida Cc: Dodji Seketeli , Giuliano Procida via Libabigail , kernel-team@android.com Subject: Re: [PATCH 20/20] symtab: Add support for MODVERSIONS (CRC checksums) Message-ID: References: <20200619214305.562-1-maennich@google.com> <20210127125853.886677-1-maennich@google.com> <20210127125853.886677-21-maennich@google.com> <87mtv1wvnb.fsf@seketeli.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-30.1 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Mar 2021 22:11:00 -0000 Hi! On Wed, Mar 17, 2021 at 11:29:13PM +0000, Giuliano Procida wrote: >Hi Dodji. > >Not sure if Matthias would say exactly the same things. > >On Wed, 17 Mar 2021 at 17:13, Dodji Seketeli wrote: > >> Matthias Maennich a écrit: >> >> > The Linux Kernel has a mechanism (MODVERSIONS) to checksum symbols based >> > on their type. In a way similar to what libabigail does, but different. >> > The CRC values for symbols can be extracted from the symtab either by >> > following the __kcrctab_ entry or by using the __crc_ >> > value directly. >> > >> > This patch adds support for extracting those CRC values and storing them >> > as a property of elf_symbol. Subsequently, 'crc' gets emitted as an >> > attribute of 'elf-symbol' in the XML representation. >> >> This is pretty cool btw. >> >> Personally, I would have tied the CRC to the decl instead of the ELF >> symbol. I know in the kernel land, symbols and decls are usually >> thought of interchangeably but conceptually, they are not the same >> thing. >> >> The linux kernel CRCs are computed from the types of the declarations >> that are associated to the ELF symbol. I am talking about the genksyms >> machinery. Se we are really talking about the declarations and types >> here. They are storing in ELF because there is no concept of decls and >> types in ELF. But we do have that in libabigail. So I really think it >> makes more sense to have this tied to decls instead of ELF symbols. >> >> >I can think of a few arguments in the opposite direction: > >The link between ELF symbols and declarations isn't completely reliable. >Examples here include things like strlen and friends which are declared in >.h >but defined in .S files. Arguably this is a bug somewhere needing a fix. > >I opened a libabigail bug in the last few days regarding a missing parameter >diff. Having CRCs meant we didn't miss this and checking at the symbol >level protects us from issues in the much more complex type difference code. > >CRCs are part of the module link-loader interface which is very much about >symbol-level compatibility rather than type-level compatibility. We risk >diverging from the kernel's notion of module compatibility if we interpret >CRCs as anything else. > >A CRC can change even if the decl type does not. Internal dependencies on That was my first thought. In fact, I can construct a case where the same underlying decl is referenced by two elf symbols with different CRC values: with symbol aliases. Consider this patch to the existing modversions test case. | diff --git a/tests/data/test-symtab/kernel/one_of_each.c b/tests/data/test-symtab/kernel/one_of_each.c | index 9d461fbd0053..488dbf38d3d9 100644 | --- a/tests/data/test-symtab/kernel/one_of_each.c | +++ b/tests/data/test-symtab/kernel/one_of_each.c | @@ -6,6 +6,9 @@ EXPORT_SYMBOL(exported_function); | void exported_function_gpl(void) {} | EXPORT_SYMBOL_GPL(exported_function_gpl); | | +void aliased_function(void) __attribute__ ((alias("exported_function"))); | +EXPORT_SYMBOL_GPL(aliased_function); | + It creates a symbol table layout like this | $ readelf -sW one_of_each.ko | egrep "__(ksy|cr).*function" | 0000000000000000 0 NOTYPE LOCAL DEFAULT 3 __ksymtab_exported_function | 000000000000000c 0 NOTYPE LOCAL DEFAULT 5 __ksymtab_exported_function_gpl | 0000000000000000 0 NOTYPE LOCAL DEFAULT 5 __ksymtab_aliased_function | 00000000e52d5bcf 0 NOTYPE GLOBAL DEFAULT ABS __crc_exported_function | 00000000f9cc3f69 0 NOTYPE GLOBAL DEFAULT ABS __crc_aliased_function | 00000000fda43846 0 NOTYPE GLOBAL DEFAULT ABS __crc_exported_function_gpl In particular, the aliased function obviously maps to the same function location, but received a different CRC value (probably due to having a different name). This creates (simplified) and 'aliased_function' is not directly connected to any own decl. It delegates that to 'exported_function'. So, connecting it to a decl would make us lose information. Without debug information we could not even surely tell which one is the alias and which one is the original, and the CRC would be taken rather randomly from one of them. But we can think this further through. I usually associate the decl and type information with DWARF and the ELF symbol information with what we can deduct from the symbol table. For stripped binaries that do not have debug information, we can only work with symbol information. Now, since the CRC is part of the symbol table, it is actually not stripped off and still retained. Hence, libabigail is able at abidw or abidiff time to extract the CRC from stripped kernel binaries and is able to do somewhat reasonable ABI analysis based on CRC values and other properties encoded in the symbol table. That is actually a big win and we would likely give up on this if we would push this to the decls. (We still could create the decls just for this property, but this does sound like a hack and would still not be accurate for aliases.) >such a decl (admittedly not possible in pure C, I think) should not be >reported >as having changed indirectly, just because the CRC has. > >It's less and simpler code. Besides everything I said above, this is the main reason I would argue to keep this attached to the ELF symbol. Based on the new symtab reader implementation, this is a fairly simple piece of code. ~100 lines of code. > >If CRCs are ever replaced by something better, it will likely be DWARF or >perhaps even BTF-based. At this point the argument that it's type info >and not symbol info will be stronger. BTF would be less sensitive to changes >than DWARF in odd cases (multidimensional arrays are flattened, for >example). > >If tomorrow we build or get a hash for decls and types we'll have to >> re-do this again, at the decl level. >> >> As a matter of fact, the feature already exists for type units in DWARF >> for instance. We just don't support that yet. But we might have to >> support it in the future. >> >> So the more I think about this, the more I think we should add an >> "artifact_hash" property to type_or_decl_base instead of putting this >> into the ELF symbol. >> >> What do you think? I think it belongs to the elf symbol and it makes things much simpler :-) Cheers, Matthias >> >> >It's an interesting one. > > >> [...] >> >> Cheers, >> >> -- >> Dodji >> > >Regards, >Giuliano.