From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay9-d.mail.gandi.net (relay9-d.mail.gandi.net [217.70.183.199]) by sourceware.org (Postfix) with ESMTPS id 30D8E38930C1 for ; Tue, 6 Apr 2021 11:48:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 30D8E38930C1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=seketeli.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dodji@seketeli.org X-Originating-IP: 88.120.130.27 Received: from localhost (unknown [88.120.130.27]) (Authenticated sender: dodji@seketeli.org) by relay9-d.mail.gandi.net (Postfix) with ESMTPSA id 65A63FF806; Tue, 6 Apr 2021 11:48:17 +0000 (UTC) Received: by localhost (Postfix, from userid 1000) id B6673580100; Tue, 6 Apr 2021 13:48:16 +0200 (CEST) From: Dodji Seketeli To: Giuliano Procida Cc: libabigail@sourceware.org, kernel-team@android.com, maennich@google.com Subject: Re: [PATCH v2] XML reader: improve XML node traversal Organization: Me, myself and I References: <20210331092352.619148-1-gprocida@google.com> <20210331170454.951900-1-gprocida@google.com> X-Operating-System: Fedora 34 X-URL: http://www.seketeli.net/~dodji Date: Tue, 06 Apr 2021 13:48:16 +0200 In-Reply-To: <20210331170454.951900-1-gprocida@google.com> (Giuliano Procida's message of "Wed, 31 Mar 2021 18:04:54 +0100") Message-ID: <8735w3vdkv.fsf@seketeli.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libabigail@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Mailing list of the Libabigail project List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 11:48:21 -0000 Hello Giuliano, Giuliano Procida a =C3=A9crit: [...] Thanks for looking into this. Sadly, as several other patches got in before I got to review this, this patch doesn't apply to the current state of master anymore. So overall, I think it would need a rebasing. Sorry about that :-( [...] > The XML reader uses libxml's Reader API for parsing XML and traverses > high-level XML nodes using reader cursor advancement. Low-level nodes > are read into complete node trees and are traversed following > children, next and (in one instance) parent pointers. Some > intermediate-level nodes can be parsed both ways. > > A lot of code cares about node types, typically skipping over > non-element nodes. Unfortunately, in a few places, the tracked "corpus > node" needs to land on text (whitespace) within elements for the XML > reader to work; stripping text nodes out causes some elements to be > skipped. Can you please give an example of this? It doesn't seem obvious to me. > Almost all node tree traversals are very straightforward and > can be expressed using simpler primitves than at present. > > This commit changes all the node tree traversals to use two node > iteration primitives consistently. These are "first_child" and > "next_child", both of which automatically skip text nodes. They > replace the existing "advance_to_next_sibling_element" and *all* > direct references to "children" and "next" nodes. The unwanted > dependency on text within elements has been fixed. I think providing an example of the initial problem would help to understand how this fixes the initial issue. [...] > For-loops constructed with the new primitives present only element > nodes to the loop body. This allows a lot of redundant checking for > null / non-element nodes to be removed. > > These changes have been tested by using abidiff to comparing all > current test XML files with copies where all text has been removed. > One such test case is included in this commit. > > * include/abg-libxml-utils.h > (advance_to_next_sibling_element): Remove this function. > (first_child): Add new function. (next_child): Likewise. This Please note how the "(next_child)" should go on the next line. There are several instances of this issue in the ChangeLog part of the commit log. [...] > is simpler than but functionally identical to > advance_to_next_sibling_element. > * src/abg-libxml-utils.cc (skip_non_elements): New function to > skip non-element nodes in a sibling node list. (first_child): > New function to find the first child element of a node. > (next_child): New function to find the next sibling element of > a node. > * src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove > redundant node checks. Iterate over children with first_child > and next_child helpers. (read_translation_unit): Likewise. > (read_translation_unit_from_input): Remove a no-longer-needed > ->next and redundant node checks. Advance "corpus node" after > reading the translation unit. (read_symbol_db_from_input): > Remove a no-longer-needed ->next and advance "corpus node" at > the end of the for-loop. Remove redundant node checks. > (build_needed): Remove redundant node checks. Iterate over > children with first_child and next_child. > (read_elf_needed_from_input): Remove redundant node checks and > simplify logic. Move to next element node after reading > elf-needed node. (read_corpus_from_input): Remove no-children > early return (which was inhibited by text nodes anyway). Set > initial "corpus node" using first_child instead of children > pointer. Remove redundant "corpus node" updates at end of > function as the caller takes care of this and the node can > become null with the new primitives. > (read_corpus_group_from_input): Iterate over children using > first_child and next child. Set "corpus node" for each corpus, > not just the first. Incidentally keep going even if one corpus > parse fails. (build_namespace_decl): Remove redundant node > checks. Iterate over children using first_child and > next_child. (build_elf_symbol): Remove redundant node checks. > (build_elf_symbol_db): Iterate over children using first_child > and next_child. (build_function_decl): Remove redundant node > checks. Iterate over children using first_child and > next_child. (build_function_type): Likewise. > (build_array_type_def): Likewise. (build_enum_type_decl): > Likewise. (build_class_decl): Remove redundant node > checks. Iterate over children using first_child and > next_child. Add comment about skipping children of > declaration-only types (build_union_decl): Likewise. > (build_function_tdecl): Remove redundant node checks. > Iterate over children using first_child and next_child. > (build_type_composition): Likewise. > (build_template_tparameter): Likewise. > * tests/data/Makefile.am: Add new test files. > * tests/data/test-abidiff-exit/test-squished-report.txt: Add > empty diff report. > * tests/data/test-abidiff-exit/test-squished-v0.abi: Add test > ABI file, containing all ELF elements. > * tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise > and identical, except that all text has been stripped. > * tests/test-abidiff-exit.cc: Add new test case. > > Reviewed-by: Matthias Maennich > Signed-off-by: Giuliano Procida > --- > include/abg-libxml-utils.h | 5 +- > src/abg-libxml-utils.cc | 47 ++-- > src/abg-reader.cc | 260 ++++++------------ > tests/data/Makefile.am | 3 + > .../test-squished-report.txt | 0 > .../test-abidiff-exit/test-squished-v0.abi | 43 +++ > .../test-abidiff-exit/test-squished-v1.abi | 1 + > tests/test-abidiff-exit.cc | 11 + > 8 files changed, 167 insertions(+), 203 deletions(-) > create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt > create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi > create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi > > diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h > index 69e940f6..f87ee5a0 100644 > --- a/include/abg-libxml-utils.h > +++ b/include/abg-libxml-utils.h > @@ -82,7 +82,10 @@ int get_xml_node_depth(xmlNodePtr); > reinterpret_cast(xml_char_str.get()) >=20=20 > xmlNodePtr > -advance_to_next_sibling_element(xmlNodePtr node); > +first_child(xmlNodePtr node); > + > +xmlNodePtr > +next_child(xmlNodePtr node); >=20=20 > void > escape_xml_string(const std::string& str, > diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc > index a1acff1f..baea25ac 100644 > --- a/src/abg-libxml-utils.cc > +++ b/src/abg-libxml-utils.cc > @@ -413,40 +413,39 @@ unescape_xml_comment(const std::string& str) > return result; > } >=20=20 > -/// Maybe get the next sibling element node of an XML node, or stay to t= he sam > +/// Skip until we reach an XML element node or run out of (child) nodes. > /// > -/// If there is no next sibling xml element node, the function returns > -/// the initial node. > +/// @param a pointer to a node. > /// > -/// @param node the initial node to consider. > -/// > -/// @return the next sibling node or the initial node @p node. > -static xmlNodePtr > -go_to_next_sibling_element_or_stay(xmlNodePtr node) Removal of this is not documented in the ChangeLog part of the commit log. > +/// @return a pointer to an XML element node or nil. > +xmlNodePtr > +skip_non_elements(xmlNodePtr node) This does exactly the same as the removed go_to_next_sibling_element_or_stay or am I missing something? > { > - xmlNodePtr n; > - for (n =3D node; n; n =3D n->next) > - { > - if (n->type =3D=3D XML_ELEMENT_NODE) > - break; > - } > - return n ? n : node; > + while (node && node->type !=3D XML_ELEMENT_NODE) > + node =3D node->next; > + return node; > } >=20=20 > -/// Get the next sibling element node of an XML node. > +/// Get the first child element of an XML node. > /// > -/// If there is no next sibling xml element node, the function returns n= il. > +/// @param a pointer to the node whose children are to be perused. > /// > -/// @param node the XML node to consider. > +/// @return a pointer to the first XML element child of @p node or nil. > +xmlNodePtr > +first_child(xmlNodePtr node) Please call this first_element_child. Otherwise, it can be misleading, I think. > +{ > + return skip_non_elements(node->children); > +} > + > +/// Get the next sibling element of an XML node. > /// > -/// @return the next sibling element node or nil. > +/// @param a pointer to the XML node whose following siblings are to be = perused. > +/// > +/// @return a pointer to the next sibling element of @p node or nil. > xmlNodePtr > -advance_to_next_sibling_element(xmlNodePtr node) > +next_child(xmlNodePtr node) Looking at the code of this function I think it should rather be called next_element_sibling. Or what am I missing? > { > - xmlNodePtr n =3D go_to_next_sibling_element_or_stay(node->next); > - if (n =3D=3D 0 || n->type !=3D XML_ELEMENT_NODE) > - return 0; > - return n; > + return skip_non_elements(node->next); > } [...] Cheers, --=20 Dodji