From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id E989E3858C60 for ; Wed, 1 Dec 2021 03:18:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E989E3858C60 Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-329-LlLao-TLOzKFeLaKqpPN1A-1; Tue, 30 Nov 2021 22:18:32 -0500 X-MC-Unique: LlLao-TLOzKFeLaKqpPN1A-1 Received: by mail-pg1-f200.google.com with SMTP id t18-20020a632252000000b003252b088f26so7835117pgm.7 for ; Tue, 30 Nov 2021 19:18:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=vrugtSEtO2ByEsbs8piJQnStolH+LF2KfmAEcUeUM7k=; b=XBzUIchT8cd4eXcUdUtXuE7uWCHpl8BllFdtU84ePMFuc9jggN/UYVArT2Tw4has8I ueHBevvrjleNTAlFUk68gDYrS944rx8kRHAB7667PEPEOzgQzzo3zCrYbV22gyvqkQSa 0J8rvX1ZXOVPm3hKQokO5uvffNO/rHe9yxh+xfdc/dpV8B6h/QPLXWsgLWebwblwVVSo eOSeLCKcEwjTZwbhNaaS/EfMpxRDB4LynC2CoWML+tkQQBXtofFY0WZedvfX6Z4u9jGn dfASMVPdQsFkWOH1mSdDaChj/Z4uhQnMH5aObOmhx2C6HLMvLrBoUMmzt8bNZYaLlvfF oMAQ== X-Gm-Message-State: AOAM533a2xNTBgsUL/WOOq5jLwyeSajvBwX6qtP3r+/kgcUmmLja0v50 leUJooS/H8KsNe3hhCJvCk3AZ5nwR0D24JQRRvnClR0U9FnmzAQWKPHFvjDJB3ysRD2Uzyx4d4Q taSTC6DPKKBdubhf9NtEJ X-Received: by 2002:a17:902:ea10:b0:142:112d:c0b9 with SMTP id s16-20020a170902ea1000b00142112dc0b9mr4109520plg.35.1638328711632; Tue, 30 Nov 2021 19:18:31 -0800 (PST) X-Google-Smtp-Source: ABdhPJzXTqaUxOby3v9o754XpSyHgyAvo6mpeAkq8gpOs7h+nebQfiq1Ku2wzH/geYP+0aFnYI/7pg== X-Received: by 2002:a17:902:ea10:b0:142:112d:c0b9 with SMTP id s16-20020a170902ea1000b00142112dc0b9mr4109465plg.35.1638328711116; Tue, 30 Nov 2021 19:18:31 -0800 (PST) Received: from smtpclient.apple (47-208-193-143.trckcmtc01.res.dyn.suddenlink.net. [47.208.193.143]) by smtp.gmail.com with ESMTPSA id l21sm3818854pjt.24.2021.11.30.19.18.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Nov 2021 19:18:30 -0800 (PST) From: Ben Woodard Message-Id: Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: [PATCH v2] Add regression tests for ctf reading Date: Tue, 30 Nov 2021 19:18:25 -0800 In-Reply-To: <87bl28sp9j.fsf@oracle.com> Cc: Ben Woodard via Libabigail , Dodji Seketeli To: "Jose E. Marchesi" References: <20211118041625.622972-1-guillermo.e.martinez@oracle.com> <20211122213353.2456208-1-guillermo.e.martinez@oracle.com> <87ee75tsst.fsf@seketeli.org> <87bl28sp9j.fsf@oracle.com> X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, HTML_MESSAGE, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2021 03:18:39 -0000 sorry it is taking me so long to get back to you. American Thanksgiving hol= iday. > On Nov 24, 2021, at 10:50 PM, Jose E. Marchesi = wrote: >=20 >>>=20 >>> On Nov 24, 2021, at 8:36 AM, Dodji Seketeli wrote: >>>=20 >>> Hello, >>>=20 >>> [...] >>>=20 >>> Thanks for working on this. Nice patch, by the way! I like its >>> direction. >>>=20 >>> I have a few comments and I believe that when they are addressed, we'll >>> be able to apply the patch. >>>=20 >>> [...] >>>=20 >>>> Dependencies/limitations: >>>>=20 >>>> * It was worked on the top of the following patches: >>>> https://sourceware.org/pipermail/libabigail/2021q4/003853.html >>>>=20 >>>> * Some CTF tests were *disabled* because it generates the XML ABI >>>> corpus with *same information* but the XML nodes *are not* always >>>> in the *same order*, so using diff command fails. Details here: >>>> https://sourceware.org/pipermail/libabigail/2021q4/003824.html >>>=20 >>> In those cases where the abixml file generated from CTF is different >>> from the one generated from DWARF, I think we should have two >>> different reference abixml files to diff against. No CTF test should >>> be disabled, I think. >>=20 >> Here is a somewhat deeper question that I think needs to be >> considered. I=E2=80=99ve generally referred to it as =E2=80=9CDWARF Idio= ms=E2=80=9D but this >> email makes me think that it is even larger than that. >>=20 >> For my work, I need libabigail to generate an abstract notion of the >> ABI corpus. How it constructs that abstract notion of the ABI needs to >> be independent of the producer. Think of it this way, say we take the >> same compiler and have it compile the same library producing both CTF >> and DWARF, the ABI of the library doesn=E2=80=99t change. Since it is >> literally the same object, the program text is same. Therefore the ABI >> is unquestionably the the same. Any difference reported by libabigail >> therefore is a problem with libabigail. It is not taking the source >> material and abstracting it enough into the ABI artifacts to separate >> the artifacts from the implementation. >>=20 >> So I kind of believe that we need to look more deeply into WHY the CTF >> and DWARF are not comparing as equivalent and begin the process of >> filing the compiler bugs when we need to, and doing what is necessary >> to abstract the ABI from the source material that libabigail used to >> construct its IR of the ABI corpus from. >>=20 >> So, I must say that I disagree with both dodji=E2=80=99s approach here a= nd to >> a lesser extent Guillermo=E2=80=99s approach of disabling the tests. I t= hink >> that the tests where the CTF doesn=E2=80=99t match the DWARF should be >> investigated and when necessary marked =E2=80=9Cxfail=E2=80=9D with a no= te citing >> their individual cause. >>=20 >> I think that what we need to work towards is: >> abidw produces the same output (or more precisely libabigail produces >> the same IR) whether you compile with -gdwarf-3 -gdwarf-4 -gdwarf-5 >> -gsplit-dwarf -gctf >> also for the most part the ABI should not change between compiler >> versions. There may be a few cases that we need to look into where the >> compiler actually breaks ABI and of course libabigail should flag >> those but I would assert that libabigail needs to abstract its IR of >> the ABI enough that compiler version changes that don=E2=80=99t actually >> change the ABI of the ELF object are not reported as ABI breaks. It >> currently does pretty well at this at the moment. >> Then once that foundation is built, being able to abstract the ABI IR >> enough that differences in toolchains e.g. LLVM vs GCC are not flagged >> as changes in the object=E2=80=99s ABI. This is important to provide peo= ple >> with a tool that will allow them to mix toolchains within a project to >> achieve optimal code. >=20 > I'm so glad you are raising these concerns. >=20 > I agree with you that ideally abidw should produce "same output" (read > equivalent output ABI-wise) regardless of the origin of the ABI > corpus... but then: I=E2=80=99m not sure it should be the same output. I think that textual dif= fing of the output probably won=E2=80=99t work especially for the stuff tha= t I have been working on. However, I do believe that the comparison of the = corpora once they are loaded into the IR should be the same. Even if anonym= ous types do not come out in the same order or one debug format is read vs = another, I think that they should compare the same in the IR. >=20 > Is the relative order of the <*-decl > nodes in the ABI XML part of the > ABI? Not really worried about that as long as the IR=E2=80=99s compare. >=20 > Is the location information (present in DWARF, not present in CTF nor > BTF) part of the ABI? Yeah there is ancillary information that is useful to have when printing di= fferences but it should not be part of the ABI artifact comparison. For exa= mple. Say I add a comment to a source file. The decl line will change but t= he ABI of the ELF will not. I argue that the comparison of the ABI should b= e that they are the same.=20 >=20 > Are the names assigned to internally-generated types (such as the > underlying type of an enumerator) part of the ABI? To me that seems like an implementation detail of the library. It is someth= ing of course that an ABI checker needs to get correct even though it is di= fficult. If someone adds a new enum type but it is not exported, it may sti= ll get assigned an anonymous enum in the type system and that may jiggle th= e type numbering but the difference engine shouldn=E2=80=99t flag it as an = ABI change. Dodji tracked down and fixed a huge number of these bugs for me= over the years. >=20 > Are the newlines and blank spaces after XML marks part of the ABI? Comparison needs to be done once the ABIXML is loaded into the IR. >=20 > I am no expert in ABI analysis, but I would say the answers in all these > cases is `no'. The internal libabigail ABI comparison algorithms would > not be impacted by these things, right? >=20 > It seems to me that the main problems here are that ABI XML, as it is, > conveys way too much information (useful, no doubt, but superfluous > ABI-wise) and that `diff' is used to perform a textual comparison of > these files. I agree. I would not do textual comparison. One thing that I think makes this task well suited to XML rather than somet= hing more regularly structured is that the XML is allows additional attribu= tes such as the aforementioned decl file and decl line which then can be ig= nored when doing ABI comparison.=20 >=20 > Textual comparison would work if we defined a "canonicalized" version of > ABI XML. Translating from ABI XML to canonical ABI XML would involve: >=20 > 1) Strip _everything_ that is not strictly part of the ABI (such as > location information, anonymous type names, etc). >=20 > 2) Sort the XML elements in a predefined way. Whatever sort criteria is > used, it would need to be defined only by ABI-constituent elements, > such as the names of non-anonymous types. And of course the criteria > itself must not change. >=20 > The only reason you could operate with ABI XML files heavily annotated > with non-ABI determining information (such as the loc info and > superfluous and arbitrary names) is having an unique source for the ABI > info. That's no longer the case: we are using CTF, the Google chaps are > using BTF. >=20 > The above considered, I think we have three options now: >=20 > a) To define that canonical ABI XML and write an abistrip tool that does > the translation, maybe add --canonical options to the tools that > generate ABI XML. Then use `diff' in the testsuite. >=20 > or >=20 > b) To use an external ABI comparison tool in the testsuite, non-textual, > that consumes ABI XML. >=20 This would be my preference. All comparisons should be done at the level of the IR. > or >=20 > c) To use the "internal" approach I already suggested in another email, > having two different sets of tests: one set to test the front-ends > that assume the libabigail comparison algorithm works properly, > another set to test the comparison algorithm that uses either > hand-written input files or assumes the front-ends work properly. I kind of agree in principle however, I don=E2=80=99t want to write those t= ests and I am pretty certain Dodji doesn=E2=80=99t want to write tests that= way plus if he started refactoring that way, I would begin nagging him abo= ut all the other problems that I have found that need attention first. ;-) = I generally feel like I=E2=80=99m breaking his back as is. >=20 > I personally think (c) would be best.