From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gprocida@google.com>
Received: from mail-ua1-x92c.google.com (mail-ua1-x92c.google.com
 [IPv6:2607:f8b0:4864:20::92c])
 by sourceware.org (Postfix) with ESMTPS id 89A3B3858C39
 for <libabigail@sourceware.org>; Thu, 25 Nov 2021 09:47:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 89A3B3858C39
Received: by mail-ua1-x92c.google.com with SMTP id o1so11072070uap.4
 for <libabigail@sourceware.org>; Thu, 25 Nov 2021 01:47:58 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=SadvC10er3Y3XkcTvPpiIwdLAwGdz2u7O2wb9o41Gzo=;
 b=BOA/h5ZweyRTEA0LqFHjxN+pUTJTHKZHQlMyPmIs8EIA38reKA5XgixAED1X2tnqBN
 l2CFtQuVmlxxvTH8YfO2IcI58J0g7nLQj4BUEoh/pJ4PEvT6VVyZ4zz7YTFuqFqor0pu
 6tL+RBw1aXrgLXUO3CThxyVE7je+XkEoYCKQoyoTnRIEq+5a8aX+c6rdb2GSv6hIbPbp
 hGNQeUOUWJPZSL4IASyx9IzRvgmwjQTzI7ctbbU3oWkXxN/zZ3vOA+o2Y5NkWtkmtz5H
 z3txgU7L3zi5CgrzV20gJ3NUwG055eZdPcrgeGj4eYAw8BTagAENXxB5yFc8NXy2/2ts
 Z+wA==
X-Gm-Message-State: AOAM5317qKhCYmSDTCzcKA3zImQKV/vr7zOkGSBUJqcyk81iI2rfraMV
 Yy9g4dENWEqGfJp2PKeV8xzYXMyYxrYJfKn1yd8htGFLpa52Hw==
X-Google-Smtp-Source: ABdhPJy0nRqMo9K5N5V9A4YV411RHX8hh7vuuedVqYREB9VF7IeON+c6fsumMXC00hQScbOitf8eKR3FKngjxF2LKgw=
X-Received: by 2002:a05:6102:390c:: with SMTP id
 e12mr7306797vsu.10.1637833677299; 
 Thu, 25 Nov 2021 01:47:57 -0800 (PST)
MIME-Version: 1.0
References: <20211118041625.622972-1-guillermo.e.martinez@oracle.com>
 <20211122213353.2456208-1-guillermo.e.martinez@oracle.com>
 <87ee75tsst.fsf@seketeli.org>
 <C42EAD6E-B2E3-4373-80AE-D665D6B94C43@redhat.com>
 <87bl28sp9j.fsf@oracle.com>
In-Reply-To: <87bl28sp9j.fsf@oracle.com>
From: Giuliano Procida <gprocida@google.com>
Date: Thu, 25 Nov 2021 09:47:20 +0000
Message-ID: <CAGvU0Hns4_exmqjrQEJOjt45_RL1dTMqoiVO9wgNRbhCR1DbCg@mail.gmail.com>
Subject: Re: [PATCH v2] Add regression tests for ctf reading
To: "Jose E. Marchesi" <jose.marchesi@oracle.com>
Cc: Ben Woodard via Libabigail <libabigail@sourceware.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-21.9 required=5.0 tests=BAYES_00, DKIMWL_WL_MED,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL,
 USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libabigail@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mailing list of the Libabigail project <libabigail.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libabigail/>
List-Help: <mailto:libabigail-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libabigail>,
 <mailto:libabigail-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Nov 2021 09:48:00 -0000

Hi Jose.

On Thu, 25 Nov 2021 at 06:50, Jose E. Marchesi via Libabigail
<libabigail@sourceware.org> wrote:
>
>
> >> On Nov 24, 2021, at 8:36 AM, Dodji Seketeli <dodji@seketeli.org> wrote=
:
> >>
> >> Hello,
> >>
> >> [...]
> >>
> >> Thanks for working on this.  Nice patch, by the way!  I like its
> >> direction.
> >>
> >> I have a few comments and I believe that when they are addressed, we'l=
l
> >> be able to apply the patch.
> >>
> >> [...]
> >>
> >>> Dependencies/limitations:
> >>>
> >>> * It was worked on the top of the following patches:
> >>> https://sourceware.org/pipermail/libabigail/2021q4/003853.html
> >>>
> >>> * Some CTF tests were *disabled* because it generates the XML ABI
> >>> corpus with *same information* but the XML nodes *are not* always
> >>> in the *same order*, so using diff command fails. Details here:
> >>> https://sourceware.org/pipermail/libabigail/2021q4/003824.html
> >>
> >> In those cases where the abixml file generated from CTF is different
> >> from the one generated from DWARF, I think we should have two
> >> different reference abixml files to diff against.  No CTF test should
> >> be disabled, I think.
> >
> > Here is a somewhat deeper question that I think needs to be
> > considered. I=E2=80=99ve generally referred to it as =E2=80=9CDWARF Idi=
oms=E2=80=9D but this
> > email makes me think that it is even larger than that.
> >
> > For my work, I need libabigail to generate an abstract notion of the
> > ABI corpus. How it constructs that abstract notion of the ABI needs to
> > be independent of the producer. Think of it this way, say we take the
> > same compiler and have it compile the same library producing both CTF
> > and DWARF, the ABI of the library doesn=E2=80=99t change. Since it is
> > literally the same object, the program text is same. Therefore the ABI
> > is unquestionably the the same. Any difference reported by libabigail
> > therefore is a problem with libabigail. It is not taking the source
> > material and abstracting it enough into the ABI artifacts to separate
> > the artifacts from the implementation.
> >
> > So I kind of believe that we need to look more deeply into WHY the CTF
> > and DWARF are not comparing as equivalent and begin the process of
> > filing the compiler bugs when we need to, and doing what is necessary
> > to abstract the ABI from the source material that libabigail used to
> > construct its IR of the ABI corpus from.
> >
> > So, I must say that I disagree with both dodji=E2=80=99s approach here =
and to
> > a lesser extent Guillermo=E2=80=99s approach of disabling the tests. I =
think
> > that the tests where the CTF doesn=E2=80=99t match the DWARF should be
> > investigated and when necessary marked =E2=80=9Cxfail=E2=80=9D with a n=
ote citing
> > their individual cause.
> >
> > I think that what we need to work towards is:
> > abidw produces the same output (or more precisely libabigail produces
> > the same IR) whether you compile with -gdwarf-3 -gdwarf-4 -gdwarf-5
> > -gsplit-dwarf -gctf
> > also for the most part the ABI should not change between compiler
> > versions. There may be a few cases that we need to look into where the
> > compiler actually breaks ABI and of course libabigail should flag
> > those but I would assert that libabigail needs to abstract its IR of
> > the ABI enough that compiler version changes that don=E2=80=99t actuall=
y
> > change the ABI of the ELF object are not reported as ABI breaks. It
> > currently does pretty well at this at the moment.
> > Then once that foundation is built, being able to abstract the ABI IR
> > enough that differences in toolchains e.g. LLVM vs GCC are not flagged
> > as changes in the object=E2=80=99s ABI. This is important to provide pe=
ople
> > with a tool that will allow them to mix toolchains within a project to
> > achieve optimal code.
>
> I'm so glad you are raising these concerns.
>
> I agree with you that ideally abidw should produce "same output" (read
> equivalent output ABI-wise) regardless of the origin of the ABI
> corpus... but then:
>
> Is the relative order of the <*-decl > nodes in the ABI XML part of the
> ABI?
>

No. And abidiff really doesn't care, unless there are duplicates in
the data or bugs in the code.

> Is the location information (present in DWARF, not present in CTF nor
> BTF) part of the ABI?
>

No. I'm pretty sure abidiff doesn't care.

There are some subtle source-level issues here, but not ABI ones.

struct {} x;
struct {} y;

x and y have different (unique) types. I spent a while, a long while
ago, worrying about this, but I think it's a non-issue. Just treat
them as having the same type.

> Are the names assigned to internally-generated types (such as the
> underlying type of an enumerator) part of the ABI?
>

They should not be, but can result in ABI diff reports. abidiff does
distinguish two levels of difference and has different exit codes. So,
at the very least, this should not be an incompatible diff.

More significantly, for C++, the size, signedness and identity of the
enum-underlying are both under user control and (I believe) can
influence which overloads of functions get called via implicit
conversion. This is an API (compile-time) rather than an ABI
(run-time) issue.

> Are the newlines and blank spaces after XML marks part of the ABI?
>

No.

> I am no expert in ABI analysis, but I would say the answers in all these
> cases is `no'.  The internal libabigail ABI comparison algorithms would
> not be impacted by these things, right?
>

Ideally not.

> It seems to me that the main problems here are that ABI XML, as it is,
> conveys way too much information (useful, no doubt, but superfluous
> ABI-wise) and that `diff' is used to perform a textual comparison of
> these files.
>
> Textual comparison would work if we defined a "canonicalized" version of
> ABI XML.  Translating from ABI XML to canonical ABI XML would involve:
>

0) First having completely stable type ids. --type-id-style hash goes
some way towards this.

> 1) Strip _everything_ that is not strictly part of the ABI (such as
>    location information, anonymous type names, etc).
>

and function parameter names.

You also missed out:

- remove all elements unreachable from the symbol "roots"
- resolve duplicate elements (we tend to get quite a lot)

But if *really* you only care about ABIs and not APIs, we should be
stripping member names and just leaving offsets. I don't think we
should do this!

abidw --no-show-locs strips out source location information.

> 2) Sort the XML elements in a predefined way.  Whatever sort criteria is
>    used, it would need to be defined only by ABI-constituent elements,
>    such as the names of non-anonymous types.  And of course the criteria
>    itself must not change.
>

I've written a tool that does 2) as best as it can and does some of 1).

As yet, it cannot strip function names (we think we get spurious XML
diffs with them due to Clang LTO), but that would be trivial to add.
It can strip column, line or all location information, but we haven't
yet enabled this.

> The only reason you could operate with ABI XML files heavily annotated
> with non-ABI determining information (such as the loc info and
> superfluous and arbitrary names) is having an unique source for the ABI
> info.  That's no longer the case: we are using CTF, the Google chaps are
> using BTF.
>

As mentioned, we haven't yet dared strip source information
completely, it is probably worth keeping at least the file path to
assist in tracking down ABI breaks.

We are not really using BTF, but the code to read it is working and
tested. Unfortunately, BTF format / pahole -J:

- don't do C++
- don't handle global variables properly (implementation)
- model arithmetic types poorly (has improved in the last year)
- have fixed (32-bit mostly) size limits for various things
- flatten multidimensional arrays (implementation choice, not format limita=
tion)

So even on our test suite of small cases, BTF and ABI XML diff reports
for the same code can be very different.

> The above considered, I think we have three options now:
>
> a) To define that canonical ABI XML and write an abistrip tool that does
>    the translation, maybe add --canonical options to the tools that
>    generate ABI XML.  Then use `diff' in the testsuite.
>

There are other abidw options that can sometimes affect XML output
(load all types, etc.). "Canonical" would have to be defined to
include the "right" set of options. However, the very existence of
these options suggests there may not be a unique "right" set.

It's also more than an external tool can do alone, due to type ids, as
mentioned above. And you might spend more time than you like looking
at IR vs XML issues.

One other thing I have considered to help make ABI representations
"more" canonical, is to resolve all typedefs during post-processing.
This has various advantages and disadvantages and I've not yet looked
at the idea in more detail.

>    or
>
> b) To use an external ABI comparison tool in the testsuite, non-textual,
>    that consumes ABI XML.
>

This is the current approach for abidw round-trip and fixed-input
tests within the existing libabigail test suite.

There is no standard for ABI diff output and there's not even
agreement as to what constitutes an ABI difference. abidiff supports
different opinions with arguments like --harmless and different output
formats with arguments like --leaf-changes-only.

We have a separate tool that can diff ABI XML, but it only understands
the C subset. This may be useful for cross-checking where CTF is
involved. There are no knobs (yet?) to control whether something is a
difference or not.

>    or
>
> c) To use the "internal" approach I already suggested in another email,
>    having two different sets of tests: one set to test the front-ends
>    that assume the libabigail comparison algorithm works properly,
>    another set to test the comparison algorithm that uses either
>    hand-written input files or assumes the front-ends work properly.
>

libabigail's IR and XML are not in 1-1 correspondence, perhaps mostly
due to historical reasons, and this affects our confidence w.r.t.
testing. The XML writer does quite a bit of work in determining what
to output and when to output it and the XML reader knows how to merge
definitions of structs that are spread across multiple elements.

Matthias and I discussed this over a year ago when we put together the
tests for the (then) BTF reader and diff tool our intern was working
on. We decided the best thing to do was to take small pairs of files
and compile them with a fixed compiler version and default options to
produce object files.

For XML: Extract ABIs. Then ABI-compare all 4 ways ({.o,.xml}^2).
Verify all outputs (XML and reports) against stored copies (a single
ABI and diff).
For BTF: Extract ABIs. Dump ABI as text. ABI-compare 1 way. Verify all
text outputs against stored copies.

At the time we noticed that using Clang instead of GCC gave different
results for some cases. Since then, we have found bugs in Clang, lack
of DWARF 5 support in various places etc.

> I personally think (c) would be best.

I agree, but it may not be easy to do "properly".

Doing this with multiple compilers, optimisation levels, LTO flavours,
DWARF versions will result in an explosion of cases. Then there is
varying target architecture to properly exercise the toolchains. We
haven't tried any of this, except when debugging problems.

If you would like a nice example that breaks even the 4-way abidiff
comparisons, see
https://sourceware.org/pipermail/libabigail/2020q4/002992.html.

I'm going to spend some time today tidying up our test cases (raw .c
and .cc inputs) for external publication.

A big advantage of small test cases is that it is much easier to spot
when the XML is wrong vs the diff is wrong.

Regards,
Giuliano.