public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: "Guillermo E. Martinez" <guillermo.e.martinez@oracle.com>
To: Dodji Seketeli <dodji@seketeli.org>
Cc: "Guillermo E. Martinez via Libabigail" <libabigail@sourceware.org>
Subject: Re: [PATCH] CTF as a fallback when no DWARF debug info is present
Date: Thu, 6 Oct 2022 14:50:22 -0500	[thread overview]
Message-ID: <7dbdea77-dbc9-7bdd-94ee-8e4cf46ce886@oracle.com> (raw)
Message-ID: <20221006195022.hlqZ3EypA9aIhpJGBhW_MEAkfTRFZN9RTdiyTGDz2wk@z> (raw)
In-Reply-To: <86mta9bdpm.fsf@seketeli.org>



On 10/6/22 02:42, Dodji Seketeli wrote:
> Hello Guillermo,
> 

Hello Dodji,

Thanks for your comments!

> "Guillermo E. Martinez" <guillermo.e.martinez@oracle.com> a écrit:
> 
> [...]
> 
>>> I have also introduced a new function called
>>> tools_utils::dir_contains_ctf_archive to look for a file that ends with
>>> ".ctfa".  This abstracts away the search for "vmlinux.ctfa" as I wasn't
>>> sure if those archives could exist for normal (non-kernel) binaries as
>>> well:
>>
>> Ohh, perfect!, I'll use it in CTF reader to located the Linux archive file.
> 
> ACK.
> 
> [...]
> 
>>>       @@ -2525,8 +2542,12 @@ get_binary_paths_from_kernel_dist(const string&	dist_root,
>>>        /// @param t time to trace time spent in each step.
>>>        ///
>>>        /// @param env the environment to create the corpus_group in.
>>>       -static void
>>>       -maybe_load_vmlinux_dwarf_corpus(corpus::origin      origin,
>>>       +///
>>>       +/// @return the status of the loading.  If it's
>>>       +/// abigail::elf_reader::STATUS_UNKNOWN, then it means nothing was
>>>       +/// done, meaning the function got out early.
>>>       +static abigail::elf_reader::status
>>>       +maybe_load_vmlinux_dwarf_corpus(corpus::origin&     origin,
>>>                                        corpus_group_sptr&  group,
>>>                                        const string&       vmlinux,
>>>                                        vector<string>&     modules,
>>>       @@ -2539,10 +2560,11 @@ maybe_load_vmlinux_dwarf_corpus(corpus::origin      origin,
>>>                                        timer&              t,
>>>                                        environment_sptr&   env)
>>>        {
>>>       +  abigail::elf_reader::status status = abigail::elf_reader::STATUS_UNKNOWN;
>>>       +
>>>          if (!(origin & corpus::DWARF_ORIGIN))
>>>       -    return;
>>>       +    return status;
>>>
>>>       -  abigail::elf_reader::status status = abigail::elf_reader::STATUS_OK;
>>>          dwarf_reader::read_context_sptr ctxt;
>>>          ctxt =
>>>           dwarf_reader::create_read_context(vmlinux, di_roots, env.get(),
>>>       @@ -2569,6 +2591,7 @@ maybe_load_vmlinux_dwarf_corpus(corpus::origin      origin,
>>>             << vmlinux << "' ...\n" << std::flush;
>>>
>>>          // Read the vmlinux corpus and add it to the group.
>>>       +  status = abigail::elf_reader::STATUS_OK;
>>>          t.start();
>>>          read_and_add_corpus_to_group_from_elf(*ctxt, *group, status);
>>>          t.stop();
>>>       @@ -2579,7 +2602,7 @@ maybe_load_vmlinux_dwarf_corpus(corpus::origin      origin,
>>>             << t << "\n";
>>>
>>
>> At this point if `vmlinux' file doesn't have DWARF information, the `status'
>> returned by `maybe_load_vmlinux_dwarf_corpus' will set the bit field
>> `STATUS_DEBUG_INFO_NOT_FOUND', but it is not verified here, and since vmlinux
>> corpus was already added into the group in `read_debug_info_into_corpus'
>> function, it continues processing modules without the main corpus information,
> 
> I see.  You are right.  Yes, the debug info is not found in vmlinux and yet the
> whole thing continues, collecting just information from the ELF symbol
> table, basically, and from the modules.  Pretty useless, I guess.
> 
>> Is this the expected behaviour?
> 
> Hehe, no :-)
> 
> I guess maybe the caller should look for the .debug_info section in the
> vmlinux section (or for split debug info), prior to even calling
> maybe_load_vmlinux_dwarf_corpus.  If there is no debug info, then the
> function should proceed directly to calling
> maybe_load_vmlinux_ctf_corpus?  What do you think?
> 

Yes, it sounds good!!, just I would like to know your opinion about of what
will happen when neither DWARF nor CTF debug information is found, the current
behavior is to extract symbols information and compare them, so which symbol
information should I use DWARF::symtab or CTF::symtab?

And an additional use case is whether the tools `kmidiff', `abidiff' could
compare a DWARF IR with CTF IR? I exercised it with some libraries and binaries
using `abidiff' (finding a couple of problems in CTF reader (already fixed)
and three possible issues for DWARF, I will submit information and the test
cases about those) but in general seems to be work!, but before  to continue
I would like to know your thoughts.

> [...]
> 
>>> I have also introduced a new function called
>>> tools_utils::dir_contains_ctf_archive to look for a file that ends with
>>> ".ctfa".  This abstracts away the search for "vmlinux.ctfa" as I wasn't
>>> sure if those archives could exist for normal (non-kernel) binaries as
>>> well:
>>
>> Ohh, perfect!, I'll use it in CTF reader to located the Linux archive file.
>> No. there is no `.ctfa' file for non-kernel binaries intead they have `.ctf'
>> section, I could implement a similary function to looks for `.ctf' section
>> using elf helpers
> 
> Right, abg-elf-helpers.h does have find_section_by_name.  That can be
> used to look for the debug info, I guess.  However, we also need to
> support finding the debug info when it's split out into a different
> place, like when it's packaged in a separate debug-info package.  Today,
> abg-dwarf-reader.cc uses dwfl (dwarf front-end library, I believe) to do
> this, as dwfl knows how to find the DWARF debug info, wherever it is.
> 

Ohh, your are right, I saw `find_alt_debug_info', and in case of CTF front-end
we don't use dwfl, it is done by `find_alt_debuginfo', reading directly the content
of `.gnu_debuglink' section, I'm not sure if CTF reader can use `find_alt_debug_info'
because it calls dwfl_* functions seems be DWARF specific. So I'll investigate.

> You can see how this is done in read_context::load_debug_info(), in
> abg-dwarf-reader.cc, around line 2654.  Look for the comment "Look for
> split debuginfo files".  Basically, dwfl_module_getdwarf returns a
> pointer to the debug info it's found, if it has found one.  I think we
> should split this logic out to make it re-usable somehow.
> 
> If you think this is worthwhile, I can think of splitting it out and
> stick it into elf-helpers, maybe?
> 

It will be really useful!, but I'm not sure `dwfl_module_getdwarf'
can operate in ELF without `.debug_*' sections.

> 
>> and it can be used in `load_corpus_and_write_abixml'
>> implementing a similar algorithm as with when we are processing the Kernel,
>> looking for DWARF information, and if it is not present then, test if
>> `.ctf' section is in ELF file then extract it using CTF reader,
>> to avoid duplication use of:
>>
>> abigail::ctf_reader::read_context_sptr ctxt
>> 		= abigail::ctf_reader::create_read_context(opts.in_file_path,
>> 							   opts.prepared_di_root_paths,
>> 							   env.get());
>>
>> One for `opts.use_ctf' and other one when `STATUS_DEBUG_INFO_NOT_FOUND' is returned.
>> WDYT?
> 
> Yes, along with the testing for the presence of DWARF debug info, that
> might be useful, indeed.
> 

Agree.

> [...]
> 
>>> But then, it's here that we are going to inspect c1_status to see if
>>> loading DWARF failed.  If it failed, then we'll try to load CTF.  So,
>>> here is the change I am adding to the process of loading the corpus c1:
>>>
>>>
>>> @@ -1205,6 +1208,36 @@ main(int argc, char* argv[])
>>>                    set_suppressions(*ctxt, opts);
>>>                    abigail::dwarf_reader::set_do_log(*ctxt, opts.do_log);
>>>                    c1 = abigail::dwarf_reader::read_corpus_from_elf(*ctxt, c1_status);
>>> +
>>> +#ifdef WITH_CTF
>>> +		if (// We were not instructed to use CTF ...
>>> +		    !opts.use_ctf
>>
>> This is always true, because we are in the else block of `opts.use_ctf'.
> 
> Right.
> 
>>
>>> +		    // ... and yet, no debug info was found ...
>>> +		    && (c1_status & STATUS_DEBUG_INFO_NOT_FOUND)
>>> +		    // ... but we found ELF symbols ...
>>> +		    && !(c1_status & STATUS_NO_SYMBOLS_FOUND))
>>> +		  {
>>> +		    string basenam, dirnam;
>>> +		    base_name(opts.file1, basenam);
>>> +		    dir_name(opts.file1, dirnam);
>>> +		    // ... the input file seems to contain CTF
>>> +		    // archive, so let's try to see if the file
>>> +		    // contains a CTF archive, who knows ...
>>> +		    if (dir_contains_ctf_archive(dirnam, basenam))
>>
>> Non-kernel binaries contains `.ctf' section instead of `ctfa' file,
>> so I can implement a `file_contains_ctf_section' function to test if
>> it is a valid input file for CTF reader.
> 
> Great, thanks.
> 
> OK, I'll look into trying to put together some facility to look for the
> presence of DWARF debug info, so tools can decide ahead of time what
> front-end to use.
> 

Really nice!.

> [...]
> 
> Cheers,
> 

Regards,
guillermo

  parent reply	other threads:[~2022-10-07 20:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-01  0:15 Guillermo E. Martinez
2022-10-04  9:04 ` Dodji Seketeli
2022-10-04 23:13   ` Guillermo E. Martinez
2022-10-06  7:42     ` Dodji Seketeli
2022-10-06 14:12       ` Dodji Seketeli
2022-10-07 14:13         ` Guillermo E. Martinez
2022-10-06 19:53       ` Guillermo Martinez [this message]
2022-10-06 19:50         ` Guillermo E. Martinez
2022-10-07 13:38         ` Dodji Seketeli
2022-10-07 16:04           ` Ben Woodard
2022-11-15 20:13 ` [PATCHv2] ELF based front-end readers fallback feature Guillermo E. Martinez
2022-11-21 18:51   ` [PATCHv3] " Guillermo E. Martinez
2022-11-22 14:19     ` Dodji Seketeli
2022-11-22 16:02       ` Guillermo E. Martinez
2022-11-22 16:00     ` [PATCH v4] " Guillermo E. Martinez
2022-11-28 15:56       ` Dodji Seketeli
2022-11-28 21:59         ` Guillermo E. Martinez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dbdea77-dbc9-7bdd-94ee-8e4cf46ce886@oracle.com \
    --to=guillermo.e.martinez@oracle.com \
    --cc=dodji@seketeli.org \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).