From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net [217.70.183.200]) by sourceware.org (Postfix) with ESMTPS id 5E4063850402 for ; Thu, 15 Apr 2021 14:12:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5E4063850402 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dodji@seketeli.org X-Originating-IP: 88.120.130.27 Received: from localhost (unknown [88.120.130.27]) (Authenticated sender: dodji@seketeli.org) by relay7-d.mail.gandi.net (Postfix) with ESMTPSA id 83FE920007; Thu, 15 Apr 2021 14:12:13 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id 99C4B58000E; Thu, 15 Apr 2021 16:12:12 +0200 (CEST) From: Dodji Seketeli To: Giuliano Procida Cc: libabigail@sourceware.org, kernel-team@android.com, maennich@google.com Subject: Re: [PATCH v2, v4] XML reader: improve XML node traversal Organization: Me, myself and I References: <20210331092352.619148-1-gprocida@google.com> <20210331170454.951900-1-gprocida@google.com> <8735w3vdkv.fsf@seketeli.org> X-Operating-System: Fedora 34 X-URL: http://www.seketeli.net/~dodji Date: Thu, 15 Apr 2021 16:12:12 +0200 In-Reply-To: (Giuliano Procida's message of "Tue, 6 Apr 2021 15:35:52 +0100") Message-ID: <87o8eftz5v.fsf_-_@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 14:12:20 -0000 Giuliano Procida a =C3=A9crit: > Hi. > > Thanks for reviewing. > > On Tue, 6 Apr 2021 at 12:48, Dodji Seketeli wrote: > >> Hello Giuliano, >> >> Giuliano Procida a =C3=A9crit: >> >> [...] >> >> Thanks for looking into this. >> >> Sadly, as several other patches got in before I got to review this, this >> patch doesn't apply to the current state of master anymore. So overall, >> I think it would need a rebasing. Sorry about that :-( >> >> > No problem. So long as the public master is the same as your working > master, I can easily rebase and resend. My v2 posting reflects the > current public master. Thanks. I have seen the v4 posting too. This message is also a reply to that post. [...] >> > The XML reader uses libxml's Reader API for parsing XML and traverses >> > high-level XML nodes using reader cursor advancement. Low-level nodes >> > are read into complete node trees and are traversed following >> > children, next and (in one instance) parent pointers. Some >> > intermediate-level nodes can be parsed both ways. >> > >> > A lot of code cares about node types, typically skipping over >> > non-element nodes. Unfortunately, in a few places, the tracked "corpus >> > node" needs to land on text (whitespace) within elements for the XML >> > reader to work; stripping text nodes out causes some elements to be >> > skipped. >> >> Can you please give an example of this? It doesn't seem obvious to me. >> > > Of course. The contributed test case serves as an example (it just has all > unnecessary space stripped out). It wasn't obvious to me either! Sorry, I wasn't clear enough. I guess my main point is that in this introductory text, I find it important to be precise about what the actual issue is. Yes I can dig into the information around and understand it, but having it clearly stated in the commit log does have real value for people trying to understand what is going on. No problem. I dug into it and I think I have understood what is going on. I'll update the commit log accordingly. > I'm sure there is a much smaller fix that could be made to address each > case of sensitivity to whitespace. However, the main contribution here are > the new primitives which make, I hope, the parsing code easier to read and > understand and harder to hide bugs in. After digging into all this, I think the issue can indeed be addressed by a smaller fix overall. I don't think an almost re-write like this is necessary. By default I tend to be reluctant of re-writes in general, unless the original code really doesn't cut if for what we want to do. I've accepted re-writes, so I it's really not a religious position. In any case, I'd rather try to understand and address the root cause of the particular issue at hand. I do appreciate all the time and effort you did put into this, and I thank you for that. I honestly am not comfortable doing this, but I really think it's better to keep what's in there and fix the root cause of issues instead. What is there has been tested and is consistent. When we find issues, if the foundations appear to not be adequate to serve what we want to do, then I am fine with coming up with something different and new, as long as it integrates well (in its spirit and form) with the rest. Otherwise, I'd rather make what we have evolve. So I am posting a candidate patch that builds on your work and that should address the issues your raised in https://sourceware.org/bugzilla/show_bug.cgi?id=3D27616. [...] > Please see the test case. We'd like to have confidence in the semantics of > libabigail XML, independent of the current reader and writer > implementations. Transformations that do not affect the XML element > structure should not change XML meaning, but do at present. Well, I don't disagree with the intent. There are issues in the current interpretation of the abixml and they ought to be fixed. And this is hopefully happening right now. [...] I am proposing this evolutive fix that should address the issue you raised. >From 94d0194819a34a4972a1575da93a77fe5e19a010 Mon Sep 17 00:00:00 2001 From: Dodji Seketeli Date: Wed, 14 Apr 2021 15:04:35 +0200 Subject: [PATCH] reader: Handle 'abi-corpus' element being possibly empty The abixml reader wrongly assumes that the 'abi-corpus' element is always non-empty. Note that until now, the only emitter of abixml consumed in practice was abg-writer.cc and it only emits non-empty 'abi-corpus' elements. So the issue wasn't exposed. So, the reader assumes that an 'abi-corpus' element has at least a text node. For instance, consider this minimal input file named test-v0.abi: $cat test-v0.abi $ Now, compare it to this file where the abi-corpus element is an empty element (doesn't even contain any text): $cat test-v0.abi $ comparing the two files with abidiff (wrongly) reports: $ abidiff test-v0.abi test-v1.abi ELF architecture changed Functions changes summary: 0 Removed, 0 Changed, 0 Added function Variables changes summary: 0 Removed, 0 Changed, 0 Added variable architecture changed from 'elf-arm-aarch64' to '' $ What's happening is that read_corpus_from_input is getting out early when it sees that the node is empty. This is at: xmlNodePtr node =3D ctxt.get_corpus_node(); @@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt) corp.set_soname(reinterpret_cast(soname_str.get())); } if (!node->children) // <---- we get out early here and we return nil; // forget about the properties of // the current empty corpus element node So, at its core, fixing the issue at hand involves avoiding the early return there. But then, it turns out that's not enough. In the current setting, the different abixml processing entry points are designed to be used in a semi "streaming" mode. So for instance, read_translation_unit_from_input can be invoked repeatedly to "stream in" the next translation unit at each invocation. Alternatively, the lower level xmlTextReaderNext can be used to iterate over XML node until we reach the translation unit XML element we are interested in. At that point xmlTextReaderExpand can be used to expand the XML node, then we let the context know that this is the current node of the corpus that needs to be processed, using read_context::get_corpus_node. Once we've done that, read_translation_unit_from_input can be called to process that particular corpus node. Note that the corpus node at hand, that needs to be processed will be retrieved by read_context::get_corpus_node. These two modes of operation are also available for read_corpus_from_input, read_symbol_db_from_input, read_elf_needed_from_input etc. Today, these functions all assume that the current node returned by read_context::get_corpus_node is the node /before/ the node of the corpus to be processed. So they all start looking at the /next sibling/ of the node returned by read_context::get_corpus_node. So the code was implicitly assuming that read_context::get_corpus_node was pointing to a text node that was before the node of the corpus that we want to process. This is wrong. read_context::get_corpus_node should just return the current node of the corpus that needs to be processed and voila. And so read_context::set_corpus_node should be used to set the current node of the corpus to the current element node that needs to be processed. That's the spirit of the change done by this patch. As its name suggests, the existing xml::advance_to_next_sibling_element is used to skip non element xml nodes (including text nodes) and move to the next element node to process, which is set to the context using read_context::set_corpus_node. Then the actual processing functions like read_corpus_from_input get the node to process, using read_context::get_corpus_node and process it rather than processing the sibling node that comes after it. The other changes are either to prevent related crashes that I noticed while doing various tests, update the abilint tool used to read and debug abixml input files and add better documentation. * src/abg-reader.cc (read_context::get_corpus_node): Add comment to this member function. (read_translation_unit_from_input, read_symbol_db_from_input) (read_elf_needed_from_input): Start processing the current node of the corpus that needs to be processed rather than its next sibling. Once the processing is done, set the new "current node of the corpus to be processed" properly by skipping to the next element node to be processed. (read_corpus_from_input): Don't get out early when the 'abi-corpus' element is empty. If, however, it has children node, skip to the first child element and flag it -- using read_context::set_corpus_node -- as being the element node to be processed by the processing facilities of the reader. If we are in a mode where we called xmlTextReaderExpand ourselves to get the node to process, then it means we need to free that node indirectly by calling xmlTextReaderNext. In that case, that node should not be flagged by read_context::set_corpus_node. Add more comments. * src/abg-corpus.cc (corpus::is_empty): Do not crash when no symtab is around. * src/abg-libxml-utils.cc (go_to_next_sibling_element_or_stay): Fix typo in comment. (advance_to_next_sibling_element): Don't crash when given a nil node. * tests/data/test-abidiff/test-PR27616-squished-v0.abi: Add new test input. * tests/data/test-abidiff/test-PR27616-squished-v1.abi: Likewise. * tests/data/test-abidiff/test-PR27616-v0.xml: Likewise. * tests/data/test-abidiff/test-PR27616-v1.xml: Likewise. * tests/data/Makefile.am: Add the new test inputs above to source distribution. * tests/test-abidiff.cc (specs): Add the new tests inputs above to this harness. * tools/abilint.cc (main): Support writing corpus groups. Signed-off-by: Dodji Seketeli --- src/abg-corpus.cc | 2 +- src/abg-libxml-utils.cc | 6 +- src/abg-reader.cc | 65 +++++++++++++------ tests/data/Makefile.am | 4 ++ .../test-abidiff/test-PR27616-squished-v0.abi | 43 ++++++++++++ .../test-abidiff/test-PR27616-squished-v1.abi | 1 + tests/data/test-abidiff/test-PR27616-v0.xml | 4 ++ tests/data/test-abidiff/test-PR27616-v1.xml | 3 + tests/test-abidiff.cc | 12 ++++ tools/abilint.cc | 9 ++- 10 files changed, 126 insertions(+), 23 deletions(-) create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v0.abi create mode 100644 tests/data/test-abidiff/test-PR27616-squished-v1.abi create mode 100644 tests/data/test-abidiff/test-PR27616-v0.xml create mode 100644 tests/data/test-abidiff/test-PR27616-v1.xml diff --git a/src/abg-corpus.cc b/src/abg-corpus.cc index c9f5c56b..0c684a93 100644 --- a/src/abg-corpus.cc +++ b/src/abg-corpus.cc @@ -1009,7 +1009,7 @@ corpus::is_empty() const } } return (members_empty - && !get_symtab()->has_symbols() + && (!get_symtab() || !get_symtab()->has_symbols()) && priv_->soname.empty() && priv_->needed.empty()); } diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc index a1acff1f..6b55d3ed 100644 --- a/src/abg-libxml-utils.cc +++ b/src/abg-libxml-utils.cc @@ -413,7 +413,8 @@ unescape_xml_comment(const std::string& str) return result; } =20 -/// Maybe get the next sibling element node of an XML node, or stay to the= sam +/// Maybe get the next sibling element node of an XML node, or stay to +/// the same. /// /// If there is no next sibling xml element node, the function returns /// the initial node. @@ -443,6 +444,9 @@ go_to_next_sibling_element_or_stay(xmlNodePtr node) xmlNodePtr advance_to_next_sibling_element(xmlNodePtr node) { + if (!node) + return 0; + xmlNodePtr n =3D go_to_next_sibling_element_or_stay(node->next); if (n =3D=3D 0 || n->type !=3D XML_ELEMENT_NODE) return 0; diff --git a/src/abg-reader.cc b/src/abg-reader.cc index f40e3d22..d2c76a38 100644 --- a/src/abg-reader.cc +++ b/src/abg-reader.cc @@ -196,10 +196,20 @@ public: get_reader() const {return m_reader;} =20 + /// Getter of the current XML node in the corpus element sub-tree + /// that needs to be processed. + /// + /// @return the current XML node in the corpus element sub-tree that + /// needs to be processed. xmlNodePtr get_corpus_node() const {return m_corp_node;} =20 + /// Setter of the current XML node in the corpus element sub-tree + /// that needs to be processed. + /// + /// @param node set the current XML node in the corpus element + /// sub-tree that needs to be processed. void set_corpus_node(xmlNodePtr node) {m_corp_node =3D node;} @@ -1485,7 +1495,7 @@ read_translation_unit_from_input(read_context& ctxt) else { node =3D 0; - for (xmlNodePtr n =3D ctxt.get_corpus_node()->next; n; n =3D n->next) + for (xmlNodePtr n =3D ctxt.get_corpus_node(); n; n =3D n->next) { if (!n || n->type !=3D XML_ELEMENT_NODE) @@ -1501,15 +1511,17 @@ read_translation_unit_from_input(read_context& ctxt) return nil; =20 tu =3D get_or_read_and_add_translation_unit(ctxt, node); - // So read_translation_unit() can trigger (under the hood) reading - // from several translation units just because - // read_context::get_scope_for_node() has been called. In that - // case, after that unexpected call to read_translation_unit(), the - // current corpus node of the context is going to point to that - // translation unit that has been read under the hood. Let's set - // the corpus node to the one we initially called - // read_translation_unit() on here. - ctxt.set_corpus_node(node); + + if (ctxt.get_corpus_node()) + { + // We are not in the mode where the current corpus node came + // from a local invocation of xmlTextReaderExpand. So let's set + // ctxt.get_corpus_node to the next child element node of the + // corpus that needs to be processed. + node =3D xml::advance_to_next_sibling_element(node); + ctxt.set_corpus_node(node); + } + return tu; } =20 @@ -1583,7 +1595,7 @@ read_symbol_db_from_input(read_context& ctxt, xmlTextReaderNext(reader.get()); } else - for (xmlNodePtr n =3D ctxt.get_corpus_node()->next; n; n =3D n->next) + for (xmlNodePtr n =3D ctxt.get_corpus_node(); n; n =3D n->next) { if (!n || n->type !=3D XML_ELEMENT_NODE) continue; @@ -1594,8 +1606,11 @@ read_symbol_db_from_input(read_context& ctxt, else if (xmlStrEqual(n->name, BAD_CAST("elf-variable-symbols"))) has_var_syms =3D true; else - break; - ctxt.set_corpus_node(n); + { + ctxt.set_corpus_node(n); + break; + } + if (has_fn_syms) { fn_symdb =3D build_elf_symbol_db(ctxt, n, true); @@ -1688,7 +1703,7 @@ read_elf_needed_from_input(read_context& ctxt, } else { - for (xmlNodePtr n =3D ctxt.get_corpus_node()->next; n; n =3D n->next) + for (xmlNodePtr n =3D ctxt.get_corpus_node(); n; n =3D n->next) { if (!n || n->type !=3D XML_ELEMENT_NODE) continue; @@ -1703,6 +1718,7 @@ read_elf_needed_from_input(read_context& ctxt, if (node) { result =3D build_needed(node, needed); + node =3D xml::advance_to_next_sibling_element(node); ctxt.set_corpus_node(node); } =20 @@ -1806,6 +1822,8 @@ read_corpus_from_input(read_context& ctxt) if (!reader) return nil; =20 + // This is to remember to call xmlTextReaderNext if we ever call + // xmlTextReaderExpand. bool call_reader_next =3D false; =20 xmlNodePtr node =3D ctxt.get_corpus_node(); @@ -1907,10 +1925,14 @@ read_corpus_from_input(read_context& ctxt) corp.set_soname(reinterpret_cast(soname_str.get())); } =20 - if (!node->children) - return nil; - - ctxt.set_corpus_node(node->children); + // If the corpus element node has children nodes, make + // ctxt.get_corpus_node() returns the first child element node of + // the corpus element that *needs* to be processed. + if (node->children) + { + xmlNodePtr n =3D xml::advance_to_next_sibling_element(node->children= ); + ctxt.set_corpus_node(n); + } =20 corpus& corp =3D *ctxt.get_corpus(); =20 @@ -1966,6 +1988,10 @@ read_corpus_from_input(read_context& ctxt) // This is the necessary counter-part of the xmlTextReaderExpand() // call at the beginning of the function. xmlTextReaderNext(reader.get()); + // The call above invalidates the xml node returned by + // xmlTextReaderExpand, which is can still be accessed via + // ctxt.set_corpus_node. + ctxt.set_corpus_node(0); } else { @@ -1974,7 +2000,8 @@ read_corpus_from_input(read_context& ctxt) if (!node) { node =3D ctxt.get_corpus_node(); - node =3D xml::advance_to_next_sibling_element(node->parent); + if (node) + node =3D xml::advance_to_next_sibling_element(node->parent); } ctxt.set_corpus_node(node); } diff --git a/tests/data/Makefile.am b/tests/data/Makefile.am index 576309e8..0edfe26c 100644 --- a/tests/data/Makefile.am +++ b/tests/data/Makefile.am @@ -206,6 +206,10 @@ test-abidiff-exit/test-crc-v1.abi \ test-abidiff-exit/test-missing-alias-report.txt \ test-abidiff-exit/test-missing-alias.abi \ test-abidiff-exit/test-missing-alias.suppr \ +test-abidiff/test-PR27616-squished-v0.abi \ +test-abidiff/test-PR27616-squished-v1.abi \ +test-abidiff/test-PR27616-v0.xml \ +test-abidiff/test-PR27616-v1.xml \ \ test-diff-dwarf/test0-v0.cc \ test-diff-dwarf/test0-v0.o \ diff --git a/tests/data/test-abidiff/test-PR27616-squished-v0.abi b/tests/d= ata/test-abidiff/test-PR27616-squished-v0.abi new file mode 100644 index 00000000..6b3d0460 --- /dev/null +++ b/tests/data/test-abidiff/test-PR27616-squished-v0.abi @@ -0,0 +1,43 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tests/data/test-abidiff/test-PR27616-squished-v1.abi b/tests/d= ata/test-abidiff/test-PR27616-squished-v1.abi new file mode 100644 index 00000000..20495f00 --- /dev/null +++ b/tests/data/test-abidiff/test-PR27616-squished-v1.abi @@ -0,0 +1 @@ +<= dependency name=3D'libgcc_s.so.1'/><= /elf-function-symbols>= diff --git a/tests/data/test-abidiff/test-PR27616-v0.xml b/tests/data/test-= abidiff/test-PR27616-v0.xml new file mode 100644 index 00000000..5e025ff6 --- /dev/null +++ b/tests/data/test-abidiff/test-PR27616-v0.xml @@ -0,0 +1,4 @@ + + + + diff --git a/tests/data/test-abidiff/test-PR27616-v1.xml b/tests/data/test-= abidiff/test-PR27616-v1.xml new file mode 100644 index 00000000..aa8059bf --- /dev/null +++ b/tests/data/test-abidiff/test-PR27616-v1.xml @@ -0,0 +1,3 @@ + + + diff --git a/tests/test-abidiff.cc b/tests/test-abidiff.cc index 3452b7ee..6ef18654 100644 --- a/tests/test-abidiff.cc +++ b/tests/test-abidiff.cc @@ -128,6 +128,18 @@ static InOutSpec specs[] =3D "data/test-abidiff/test-crc-report.txt", "output/test-abidiff/test-crc-report-1-2.txt" }, + { + "data/test-abidiff/test-PR27616-v0.xml", + "data/test-abidiff/test-PR27616-v1.xml", + "data/test-abidiff/empty-report.txt", + "output/test-abidiff/empty-report.txt" + }, + { + "data/test-abidiff/test-PR27616-squished-v0.abi", + "data/test-abidiff/test-PR27616-squished-v1.abi", + "data/test-abidiff/empty-report.txt", + "output/test-abidiff/empty-report.txt" + }, // This should be the last entry. {0, 0, 0, 0} }; diff --git a/tools/abilint.cc b/tools/abilint.cc index c8598465..856f935d 100644 --- a/tools/abilint.cc +++ b/tools/abilint.cc @@ -440,11 +440,16 @@ main(int argc, char* argv[]) else { if (type =3D=3D abigail::tools_utils::FILE_TYPE_XML_CORPUS - ||type =3D=3D abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP + || type =3D=3D abigail::tools_utils::FILE_TYPE_XML_CORPUS_GROUP || type =3D=3D abigail::tools_utils::FILE_TYPE_ELF) { if (!opts.noout) - is_ok =3D write_corpus(*ctxt, corp, 0); + { + if (corp) + is_ok =3D write_corpus(*ctxt, corp, 0); + else if (group) + is_ok =3D write_corpus_group(*ctxt, group, 0); + } } } =20 --=20 2.30.0 --=20 Dodji