Re: [PATCH v2] Add regression tests for ctf reading

public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed

From: Ben Woodard <woodard@redhat.com>
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Ben Woodard via Libabigail <libabigail@sourceware.org>,
	Dodji Seketeli <dodji@seketeli.org>
Subject: Re: [PATCH v2] Add regression tests for ctf reading
Date: Tue, 30 Nov 2021 19:18:25 -0800	[thread overview]
Message-ID: <E464076F-8D44-49FD-A985-1101196255A6@redhat.com> (raw)
In-Reply-To: <87bl28sp9j.fsf@oracle.com>

sorry it is taking me so long to get back to you. American Thanksgiving holiday.

> On Nov 24, 2021, at 10:50 PM, Jose E. Marchesi <jose.marchesi@oracle.com> wrote:
> 
>>> 
>>> On Nov 24, 2021, at 8:36 AM, Dodji Seketeli <dodji@seketeli.org> wrote:
>>> 
>>> Hello,
>>> 
>>> [...]
>>> 
>>> Thanks for working on this.  Nice patch, by the way!  I like its
>>> direction.
>>> 
>>> I have a few comments and I believe that when they are addressed, we'll
>>> be able to apply the patch.
>>> 
>>> [...]
>>> 
>>>> Dependencies/limitations:
>>>> 
>>>> * It was worked on the top of the following patches:
>>>> https://sourceware.org/pipermail/libabigail/2021q4/003853.html
>>>> 
>>>> * Some CTF tests were *disabled* because it generates the XML ABI
>>>> corpus with *same information* but the XML nodes *are not* always
>>>> in the *same order*, so using diff command fails. Details here:
>>>> https://sourceware.org/pipermail/libabigail/2021q4/003824.html
>>> 
>>> In those cases where the abixml file generated from CTF is different
>>> from the one generated from DWARF, I think we should have two
>>> different reference abixml files to diff against.  No CTF test should
>>> be disabled, I think.
>> 
>> Here is a somewhat deeper question that I think needs to be
>> considered. I’ve generally referred to it as “DWARF Idioms” but this
>> email makes me think that it is even larger than that.
>> 
>> For my work, I need libabigail to generate an abstract notion of the
>> ABI corpus. How it constructs that abstract notion of the ABI needs to
>> be independent of the producer. Think of it this way, say we take the
>> same compiler and have it compile the same library producing both CTF
>> and DWARF, the ABI of the library doesn’t change. Since it is
>> literally the same object, the program text is same. Therefore the ABI
>> is unquestionably the the same. Any difference reported by libabigail
>> therefore is a problem with libabigail. It is not taking the source
>> material and abstracting it enough into the ABI artifacts to separate
>> the artifacts from the implementation.
>> 
>> So I kind of believe that we need to look more deeply into WHY the CTF
>> and DWARF are not comparing as equivalent and begin the process of
>> filing the compiler bugs when we need to, and doing what is necessary
>> to abstract the ABI from the source material that libabigail used to
>> construct its IR of the ABI corpus from.
>> 
>> So, I must say that I disagree with both dodji’s approach here and to
>> a lesser extent Guillermo’s approach of disabling the tests. I think
>> that the tests where the CTF doesn’t match the DWARF should be
>> investigated and when necessary marked “xfail” with a note citing
>> their individual cause.
>> 
>> I think that what we need to work towards is:
>> abidw produces the same output (or more precisely libabigail produces
>> the same IR) whether you compile with -gdwarf-3 -gdwarf-4 -gdwarf-5
>> -gsplit-dwarf -gctf
>> also for the most part the ABI should not change between compiler
>> versions. There may be a few cases that we need to look into where the
>> compiler actually breaks ABI and of course libabigail should flag
>> those but I would assert that libabigail needs to abstract its IR of
>> the ABI enough that compiler version changes that don’t actually
>> change the ABI of the ELF object are not reported as ABI breaks. It
>> currently does pretty well at this at the moment.
>> Then once that foundation is built, being able to abstract the ABI IR
>> enough that differences in toolchains e.g. LLVM vs GCC are not flagged
>> as changes in the object’s ABI. This is important to provide people
>> with a tool that will allow them to mix toolchains within a project to
>> achieve optimal code.
> 
> I'm so glad you are raising these concerns.
> 
> I agree with you that ideally abidw should produce "same output" (read
> equivalent output ABI-wise) regardless of the origin of the ABI
> corpus... but then:

I’m not sure it should be the same output. I think that textual diffing of the output probably won’t work especially for the stuff that I have been working on. However, I do believe that the comparison of the corpora once they are loaded into the IR should be the same. Even if anonymous types do not come out in the same order or one debug format is read vs another, I think that they should compare the same in the IR.

> 
> Is the relative order of the <*-decl > nodes in the ABI XML part of the
> ABI?

Not really worried about that as long as the IR’s compare.
> 
> Is the location information (present in DWARF, not present in CTF nor
> BTF) part of the ABI?

Yeah there is ancillary information that is useful to have when printing differences but it should not be part of the ABI artifact comparison. For example. Say I add a comment to a source file. The decl line will change but the ABI of the ELF will not. I argue that the comparison of the ABI should be that they are the same. 
> 
> Are the names assigned to internally-generated types (such as the
> underlying type of an enumerator) part of the ABI?

To me that seems like an implementation detail of the library. It is something of course that an ABI checker needs to get correct even though it is difficult. If someone adds a new enum type but it is not exported, it may still get assigned an anonymous enum in the type system and that may jiggle the type numbering but the difference engine shouldn’t flag it as an ABI change. Dodji tracked down and fixed a huge number of these bugs for me over the years.
> 
> Are the newlines and blank spaces after XML marks part of the ABI?
Comparison needs to be done once the ABIXML is loaded into the IR.

> 
> I am no expert in ABI analysis, but I would say the answers in all these
> cases is `no'.  The internal libabigail ABI comparison algorithms would
> not be impacted by these things, right?
> 
> It seems to me that the main problems here are that ABI XML, as it is,
> conveys way too much information (useful, no doubt, but superfluous
> ABI-wise) and that `diff' is used to perform a textual comparison of
> these files.

I agree. I would not do textual comparison.
One thing that I think makes this task well suited to XML rather than something more regularly structured is that the XML is allows additional attributes such as the aforementioned decl file and decl line which then can be ignored when doing ABI comparison. 
> 
> Textual comparison would work if we defined a "canonicalized" version of
> ABI XML.  Translating from ABI XML to canonical ABI XML would involve:
> 
> 1) Strip _everything_ that is not strictly part of the ABI (such as
>   location information, anonymous type names, etc).
> 
> 2) Sort the XML elements in a predefined way.  Whatever sort criteria is
>   used, it would need to be defined only by ABI-constituent elements,
>   such as the names of non-anonymous types.  And of course the criteria
>   itself must not change.
> 
> The only reason you could operate with ABI XML files heavily annotated
> with non-ABI determining information (such as the loc info and
> superfluous and arbitrary names) is having an unique source for the ABI
> info.  That's no longer the case: we are using CTF, the Google chaps are
> using BTF.
> 
> The above considered, I think we have three options now:
> 
> a) To define that canonical ABI XML and write an abistrip tool that does
>   the translation, maybe add --canonical options to the tools that
>   generate ABI XML.  Then use `diff' in the testsuite.
> 
>   or
> 
> b) To use an external ABI comparison tool in the testsuite, non-textual,
>   that consumes ABI XML.
> 
This would be my preference.
All comparisons should be done at the level of the IR.
>   or
> 
> c) To use the "internal" approach I already suggested in another email,
>   having two different sets of tests: one set to test the front-ends
>   that assume the libabigail comparison algorithm works properly,
>   another set to test the comparison algorithm that uses either
>   hand-written input files or assumes the front-ends work properly.

I kind of agree in principle however, I don’t want to write those tests and I am pretty certain Dodji doesn’t want to write tests that way plus if he started refactoring that way, I would begin nagging him about all the other problems that I have found that need attention first. ;-) I generally feel like I’m breaking his back as is.

> 
> I personally think (c) would be best.

next prev parent reply	other threads:[~2021-12-01  3:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03 22:41 Regression tests for ctf reader: Avoid duplicating files Guillermo Martinez
2021-11-04  8:27 ` Jose E. Marchesi
2021-11-09  2:52   ` Guillermo Martinez
2021-11-09 14:47     ` Dodji Seketeli
2021-11-10 12:06       ` Jose E. Marchesi
2021-11-11 15:16       ` Guillermo Martinez
2021-11-15 13:49 ` [PATCH] Add regression tests for ctf reading Guillermo E. Martinez
2021-11-17  8:33   ` Dodji Seketeli
2021-11-18  4:02     ` Guillermo Martinez
2021-11-18  4:16     ` Guillermo E. Martinez
2021-11-18 13:52       ` Jose E. Marchesi
2021-11-18 15:14         ` Guillermo Martinez
2021-11-22 21:33       ` [PATCH v2] " Guillermo E. Martinez
2021-11-23 15:48         ` Jose E. Marchesi
2021-11-23 18:54           ` Guillermo Martinez
2021-11-25 10:40             ` Dodji Seketeli
2021-11-25 21:03               ` Guillermo Martinez
2021-11-26 10:02                 ` Dodji Seketeli
2021-11-24 16:36         ` Dodji Seketeli
2021-11-24 18:52           ` Guillermo Martinez
2021-11-26 11:23             ` Dodji Seketeli
2021-11-26 13:01               ` Jose E. Marchesi
2021-11-26 13:37               ` Guillermo Martinez
2021-11-24 19:09           ` Ben Woodard
2021-11-25  0:13             ` Ben Woodard
2021-11-25  6:50             ` Jose E. Marchesi
2021-11-25  9:47               ` Giuliano Procida
2021-12-01  3:18               ` Ben Woodard [this message]
2021-11-25  9:34             ` Giuliano Procida
2021-11-25 21:56               ` Ben Woodard
2021-11-26 10:27             ` Dodji Seketeli
2021-12-01  2:13               ` Ben Woodard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E464076F-8D44-49FD-A985-1101196255A6@redhat.com \
    --to=woodard@redhat.com \
    --cc=dodji@seketeli.org \
    --cc=jose.marchesi@oracle.com \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).