From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa30.google.com (mail-vk1-xa30.google.com [IPv6:2607:f8b0:4864:20::a30]) by sourceware.org (Postfix) with ESMTPS id B2EFC385782A for ; Wed, 17 Mar 2021 23:29:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B2EFC385782A Received: by mail-vk1-xa30.google.com with SMTP id s136so886986vks.4 for ; Wed, 17 Mar 2021 16:29:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dpI3SuWr+SANiHe1XJxiAPm0cwylApdV7RLnyFWvv3U=; b=h9zggWuUetM73EnMM8nmox83FECV8A8oYToGBwbPrvB5ALYnuKXsYUMqDPdwO04FQq uOnCXHNYw1EFREAPLK+wuhxFHgnmq/ur48xm8kqtLShcq1wVLNXzdXGDsW8bSGHB4YNQ zykL1zw5X2KRNREkwr+mhzkcB7YPDOk6Dba9AsbeaMEK2IIK4d9GDs0q2vWOUXEOxxcc PsY2ESrxIPla1QMzGSmHf7uJIagkFNa5ShGNus23nsRmSMxyNxMqKlvPcR0UyFU/E5Z6 yl2tQIoAJDJkuOOIFyjYSQmeKBA0Y4ANvE31jztkRZXtCeS8FuinsBuLlCSt2twwh2R1 v7Pg== X-Gm-Message-State: AOAM530/9ZICi4aPMKWnP6wrCTLF346Qq9DpUGf/M/TByVjp52L6Jhul JCL++mR6M450/yrqsY+QPlTfW4hoWcKfmWvcZTAIrA== X-Google-Smtp-Source: ABdhPJykkclaelx4+8uLwfO3/y9ISUMlKpZUOD9jjrB3g36cqR6ls9+dFrMOLHb8X/7j+gRADmL9HRQgxHuLu6meIB4= X-Received: by 2002:a1f:7889:: with SMTP id t131mr4947188vkc.2.1616023791139; Wed, 17 Mar 2021 16:29:51 -0700 (PDT) MIME-Version: 1.0 References: <20200619214305.562-1-maennich@google.com> <20210127125853.886677-1-maennich@google.com> <20210127125853.886677-21-maennich@google.com> <87mtv1wvnb.fsf@seketeli.org> In-Reply-To: <87mtv1wvnb.fsf@seketeli.org> From: Giuliano Procida Date: Wed, 17 Mar 2021 23:29:13 +0000 Message-ID: Subject: Re: [PATCH 20/20] symtab: Add support for MODVERSIONS (CRC checksums) To: Dodji Seketeli Cc: Matthias Maennich , Giuliano Procida via Libabigail , kernel-team@android.com X-Spam-Status: No, score=-24.2 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 23:29:53 -0000 Hi Dodji. Not sure if Matthias would say exactly the same things. On Wed, 17 Mar 2021 at 17:13, Dodji Seketeli wrote: > Matthias Maennich a =C3=A9crit: > > > The Linux Kernel has a mechanism (MODVERSIONS) to checksum symbols base= d > > on their type. In a way similar to what libabigail does, but different. > > The CRC values for symbols can be extracted from the symtab either by > > following the __kcrctab_ entry or by using the __crc_ > > value directly. > > > > This patch adds support for extracting those CRC values and storing the= m > > as a property of elf_symbol. Subsequently, 'crc' gets emitted as an > > attribute of 'elf-symbol' in the XML representation. > > This is pretty cool btw. > > Personally, I would have tied the CRC to the decl instead of the ELF > symbol. I know in the kernel land, symbols and decls are usually > thought of interchangeably but conceptually, they are not the same > thing. > > The linux kernel CRCs are computed from the types of the declarations > that are associated to the ELF symbol. I am talking about the genksyms > machinery. Se we are really talking about the declarations and types > here. They are storing in ELF because there is no concept of decls and > types in ELF. But we do have that in libabigail. So I really think it > makes more sense to have this tied to decls instead of ELF symbols. > > I can think of a few arguments in the opposite direction: The link between ELF symbols and declarations isn't completely reliable. Examples here include things like strlen and friends which are declared in .h but defined in .S files. Arguably this is a bug somewhere needing a fix. I opened a libabigail bug in the last few days regarding a missing paramete= r diff. Having CRCs meant we didn't miss this and checking at the symbol level protects us from issues in the much more complex type difference code= . CRCs are part of the module link-loader interface which is very much about symbol-level compatibility rather than type-level compatibility. We risk diverging from the kernel's notion of module compatibility if we interpret CRCs as anything else. A CRC can change even if the decl type does not. Internal dependencies on such a decl (admittedly not possible in pure C, I think) should not be reported as having changed indirectly, just because the CRC has. It's less and simpler code. If CRCs are ever replaced by something better, it will likely be DWARF or perhaps even BTF-based. At this point the argument that it's type info and not symbol info will be stronger. BTF would be less sensitive to change= s than DWARF in odd cases (multidimensional arrays are flattened, for example). If tomorrow we build or get a hash for decls and types we'll have to > re-do this again, at the decl level. > > As a matter of fact, the feature already exists for type units in DWARF > for instance. We just don't support that yet. But we might have to > support it in the future. > > So the more I think about this, the more I think we should add an > "artifact_hash" property to type_or_decl_base instead of putting this > into the ELF symbol. > > What do you think? > > It's an interesting one. > [...] > > Cheers, > > -- > Dodji > Regards, Giuliano.