From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [IPv6:2001:4b98:dc4:8::228]) by sourceware.org (Postfix) with ESMTPS id 576313853543 for ; Thu, 6 Oct 2022 07:42:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 576313853543 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=seketeli.org Received: (Authenticated sender: dodj@seketeli.org) by mail.gandi.net (Postfix) with ESMTPSA id 5AEE71BF208; Thu, 6 Oct 2022 07:42:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seketeli.org; s=gm1; t=1665042134; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G0ydOk9gd45EaicgjWGsS1UuQHY8dp/DiagzhMFISfU=; b=gq/mEmDs6LXqGVu2P4cWccTxNiXhoZ4XQVVmAbcviUkONXfW4gO1bee2fyY0XF9sC4C6Kd 6RzdOKmwfIRggxN+IyxV1DUOBHdUsgl6nwdBcFQ7l9dYFBuOyjQDDUkSTrWmf4xxZcQuqE k4FgKLCT1JNavLtbQYjN6Zpjy7K1CnpMrv/dNIoBabhUWAfVeYR9Px1N5La91dMi5snmGz PdgclSpfVHDETOBJA1TlOegZZ5wSMwvxdpxk1MYPiRuIs9cF5pty2Utj8Y/HYJCDeV8lxF J2EbpZ+cGb14GmxrNAWbZYx3dO6t09f1RumpYSa7NlHtuYO45k0o9PXo2OLYoA== Received: by localhost (Postfix, from userid 1001) id 92BB61A0546; Thu, 6 Oct 2022 09:42:13 +0200 (CEST) From: Dodji Seketeli To: "Guillermo E. Martinez" Cc: "Guillermo E. Martinez via Libabigail" Subject: Re: [PATCH] CTF as a fallback when no DWARF debug info is present Organization: Me, myself and I References: <20221001001544.210234-1-guillermo.e.martinez@oracle.com> <877d1g2c52.fsf@seketeli.org> <568fe730-3bb9-0267-00bc-2873e94e502f@oracle.com> X-Operating-System: Red Hat Enterprise Linux Server 7.9 X-URL: http://www.seketeli.net/~dodji Date: Thu, 06 Oct 2022 09:42:13 +0200 In-Reply-To: <568fe730-3bb9-0267-00bc-2873e94e502f@oracle.com> (Guillermo E. Martinez's message of "Tue, 4 Oct 2022 18:13:53 -0500") Message-ID: <86mta9bdpm.fsf@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Guillermo, "Guillermo E. Martinez" a =C3=A9crit: [...] >> I have also introduced a new function called >> tools_utils::dir_contains_ctf_archive to look for a file that ends with >> ".ctfa". This abstracts away the search for "vmlinux.ctfa" as I wasn't >> sure if those archives could exist for normal (non-kernel) binaries as >> well: > > Ohh, perfect!, I'll use it in CTF reader to located the Linux archive fil= e. ACK. [...] >> @@ -2525,8 +2542,12 @@ get_binary_paths_from_kernel_dist(const stri= ng& dist_root, >> /// @param t time to trace time spent in each step. >> /// >> /// @param env the environment to create the corpus_group in. >> -static void >> -maybe_load_vmlinux_dwarf_corpus(corpus::origin origin, >> +/// >> +/// @return the status of the loading. If it's >> +/// abigail::elf_reader::STATUS_UNKNOWN, then it means nothing was >> +/// done, meaning the function got out early. >> +static abigail::elf_reader::status >> +maybe_load_vmlinux_dwarf_corpus(corpus::origin& origin, >> corpus_group_sptr& group, >> const string& vmlinux, >> vector& modules, >> @@ -2539,10 +2560,11 @@ maybe_load_vmlinux_dwarf_corpus(corpus::ori= gin origin, >> timer& t, >> environment_sptr& env) >> { >> + abigail::elf_reader::status status =3D abigail::elf_reader::STAT= US_UNKNOWN; >> + >> if (!(origin & corpus::DWARF_ORIGIN)) >> - return; >> + return status; >> >> - abigail::elf_reader::status status =3D abigail::elf_reader::STAT= US_OK; >> dwarf_reader::read_context_sptr ctxt; >> ctxt =3D >> dwarf_reader::create_read_context(vmlinux, di_roots, env.get(), >> @@ -2569,6 +2591,7 @@ maybe_load_vmlinux_dwarf_corpus(corpus::origi= n origin, >> << vmlinux << "' ...\n" << std::flush; >> >> // Read the vmlinux corpus and add it to the group. >> + status =3D abigail::elf_reader::STATUS_OK; >> t.start(); >> read_and_add_corpus_to_group_from_elf(*ctxt, *group, status); >> t.stop(); >> @@ -2579,7 +2602,7 @@ maybe_load_vmlinux_dwarf_corpus(corpus::origi= n origin, >> << t << "\n"; >> > > At this point if `vmlinux' file doesn't have DWARF information, the `stat= us' > returned by `maybe_load_vmlinux_dwarf_corpus' will set the bit field > `STATUS_DEBUG_INFO_NOT_FOUND', but it is not verified here, and since vml= inux > corpus was already added into the group in `read_debug_info_into_corpus' > function, it continues processing modules without the main corpus informa= tion, I see. You are right. Yes, the debug info is not found in vmlinux and yet= the whole thing continues, collecting just information from the ELF symbol table, basically, and from the modules. Pretty useless, I guess. > Is this the expected behaviour? Hehe, no :-) I guess maybe the caller should look for the .debug_info section in the vmlinux section (or for split debug info), prior to even calling maybe_load_vmlinux_dwarf_corpus. If there is no debug info, then the function should proceed directly to calling maybe_load_vmlinux_ctf_corpus? What do you think? [...] >> I have also introduced a new function called >> tools_utils::dir_contains_ctf_archive to look for a file that ends with >> ".ctfa". This abstracts away the search for "vmlinux.ctfa" as I wasn't >> sure if those archives could exist for normal (non-kernel) binaries as >> well: > > Ohh, perfect!, I'll use it in CTF reader to located the Linux archive fil= e. > No. there is no `.ctfa' file for non-kernel binaries intead they have `.c= tf' > section, I could implement a similary function to looks for `.ctf' section > using elf helpers Right, abg-elf-helpers.h does have find_section_by_name. That can be used to look for the debug info, I guess. However, we also need to support finding the debug info when it's split out into a different place, like when it's packaged in a separate debug-info package. Today, abg-dwarf-reader.cc uses dwfl (dwarf front-end library, I believe) to do this, as dwfl knows how to find the DWARF debug info, wherever it is. You can see how this is done in read_context::load_debug_info(), in abg-dwarf-reader.cc, around line 2654. Look for the comment "Look for split debuginfo files". Basically, dwfl_module_getdwarf returns a pointer to the debug info it's found, if it has found one. I think we should split this logic out to make it re-usable somehow. If you think this is worthwhile, I can think of splitting it out and stick it into elf-helpers, maybe? > and it can be used in `load_corpus_and_write_abixml' > implementing a similar algorithm as with when we are processing the Kerne= l, > looking for DWARF information, and if it is not present then, test if > `.ctf' section is in ELF file then extract it using CTF reader, > to avoid duplication use of: > > abigail::ctf_reader::read_context_sptr ctxt > =3D abigail::ctf_reader::create_read_context(opts.in_file_path, > opts.prepared_di_root_paths, > env.get()); > > One for `opts.use_ctf' and other one when `STATUS_DEBUG_INFO_NOT_FOUND' i= s returned. > WDYT? Yes, along with the testing for the presence of DWARF debug info, that might be useful, indeed. [...] >> But then, it's here that we are going to inspect c1_status to see if >> loading DWARF failed. If it failed, then we'll try to load CTF. So, >> here is the change I am adding to the process of loading the corpus c1: >> >> >> @@ -1205,6 +1208,36 @@ main(int argc, char* argv[]) >> set_suppressions(*ctxt, opts); >> abigail::dwarf_reader::set_do_log(*ctxt, opts.do_log); >> c1 =3D abigail::dwarf_reader::read_corpus_from_elf(*ct= xt, c1_status); >> + >> +#ifdef WITH_CTF >> + if (// We were not instructed to use CTF ... >> + !opts.use_ctf > > This is always true, because we are in the else block of `opts.use_ctf'. Right. > >> + // ... and yet, no debug info was found ... >> + && (c1_status & STATUS_DEBUG_INFO_NOT_FOUND) >> + // ... but we found ELF symbols ... >> + && !(c1_status & STATUS_NO_SYMBOLS_FOUND)) >> + { >> + string basenam, dirnam; >> + base_name(opts.file1, basenam); >> + dir_name(opts.file1, dirnam); >> + // ... the input file seems to contain CTF >> + // archive, so let's try to see if the file >> + // contains a CTF archive, who knows ... >> + if (dir_contains_ctf_archive(dirnam, basenam)) > > Non-kernel binaries contains `.ctf' section instead of `ctfa' file, > so I can implement a `file_contains_ctf_section' function to test if > it is a valid input file for CTF reader. Great, thanks. OK, I'll look into trying to put together some facility to look for the presence of DWARF debug info, so tools can decide ahead of time what front-end to use. [...] Cheers, --=20 Dodji