From: Dodji Seketeli <dodji@seketeli.org>
To: Giuliano Procida <gprocida@google.com>
Cc: libabigail@sourceware.org, kernel-team@android.com,
maennich@google.com
Subject: Re: [PATCH v2] XML reader: improve XML node traversal
Date: Tue, 06 Apr 2021 13:48:16 +0200 [thread overview]
Message-ID: <8735w3vdkv.fsf@seketeli.org> (raw)
In-Reply-To: <20210331170454.951900-1-gprocida@google.com> (Giuliano Procida's message of "Wed, 31 Mar 2021 18:04:54 +0100")
Hello Giuliano,
Giuliano Procida <gprocida@google.com> a écrit:
[...]
Thanks for looking into this.
Sadly, as several other patches got in before I got to review this, this
patch doesn't apply to the current state of master anymore. So overall,
I think it would need a rebasing. Sorry about that :-(
[...]
> The XML reader uses libxml's Reader API for parsing XML and traverses
> high-level XML nodes using reader cursor advancement. Low-level nodes
> are read into complete node trees and are traversed following
> children, next and (in one instance) parent pointers. Some
> intermediate-level nodes can be parsed both ways.
>
> A lot of code cares about node types, typically skipping over
> non-element nodes. Unfortunately, in a few places, the tracked "corpus
> node" needs to land on text (whitespace) within elements for the XML
> reader to work; stripping text nodes out causes some elements to be
> skipped.
Can you please give an example of this? It doesn't seem obvious to me.
> Almost all node tree traversals are very straightforward and
> can be expressed using simpler primitves than at present.
>
> This commit changes all the node tree traversals to use two node
> iteration primitives consistently. These are "first_child" and
> "next_child", both of which automatically skip text nodes. They
> replace the existing "advance_to_next_sibling_element" and *all*
> direct references to "children" and "next" nodes. The unwanted
> dependency on text within elements has been fixed.
I think providing an example of the initial problem would help to
understand how this fixes the initial issue.
[...]
> For-loops constructed with the new primitives present only element
> nodes to the loop body. This allows a lot of redundant checking for
> null / non-element nodes to be removed.
>
> These changes have been tested by using abidiff to comparing all
> current test XML files with copies where all text has been removed.
> One such test case is included in this commit.
>
> * include/abg-libxml-utils.h
> (advance_to_next_sibling_element): Remove this function.
> (first_child): Add new function. (next_child): Likewise. This
Please note how the "(next_child)" should go on the next line.
There are several instances of this issue in the ChangeLog part of the
commit log.
[...]
> is simpler than but functionally identical to
> advance_to_next_sibling_element.
> * src/abg-libxml-utils.cc (skip_non_elements): New function to
> skip non-element nodes in a sibling node list. (first_child):
> New function to find the first child element of a node.
> (next_child): New function to find the next sibling element of
> a node.
> * src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove
> redundant node checks. Iterate over children with first_child
> and next_child helpers. (read_translation_unit): Likewise.
> (read_translation_unit_from_input): Remove a no-longer-needed
> ->next and redundant node checks. Advance "corpus node" after
> reading the translation unit. (read_symbol_db_from_input):
> Remove a no-longer-needed ->next and advance "corpus node" at
> the end of the for-loop. Remove redundant node checks.
> (build_needed): Remove redundant node checks. Iterate over
> children with first_child and next_child.
> (read_elf_needed_from_input): Remove redundant node checks and
> simplify logic. Move to next element node after reading
> elf-needed node. (read_corpus_from_input): Remove no-children
> early return (which was inhibited by text nodes anyway). Set
> initial "corpus node" using first_child instead of children
> pointer. Remove redundant "corpus node" updates at end of
> function as the caller takes care of this and the node can
> become null with the new primitives.
> (read_corpus_group_from_input): Iterate over children using
> first_child and next child. Set "corpus node" for each corpus,
> not just the first. Incidentally keep going even if one corpus
> parse fails. (build_namespace_decl): Remove redundant node
> checks. Iterate over children using first_child and
> next_child. (build_elf_symbol): Remove redundant node checks.
> (build_elf_symbol_db): Iterate over children using first_child
> and next_child. (build_function_decl): Remove redundant node
> checks. Iterate over children using first_child and
> next_child. (build_function_type): Likewise.
> (build_array_type_def): Likewise. (build_enum_type_decl):
> Likewise. (build_class_decl): Remove redundant node
> checks. Iterate over children using first_child and
> next_child. Add comment about skipping children of
> declaration-only types (build_union_decl): Likewise.
> (build_function_tdecl): Remove redundant node checks.
> Iterate over children using first_child and next_child.
> (build_type_composition): Likewise.
> (build_template_tparameter): Likewise.
> * tests/data/Makefile.am: Add new test files.
> * tests/data/test-abidiff-exit/test-squished-report.txt: Add
> empty diff report.
> * tests/data/test-abidiff-exit/test-squished-v0.abi: Add test
> ABI file, containing all ELF elements.
> * tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise
> and identical, except that all text has been stripped.
> * tests/test-abidiff-exit.cc: Add new test case.
>
> Reviewed-by: Matthias Maennich <maennich@google.com>
> Signed-off-by: Giuliano Procida <gprocida@google.com>
> ---
> include/abg-libxml-utils.h | 5 +-
> src/abg-libxml-utils.cc | 47 ++--
> src/abg-reader.cc | 260 ++++++------------
> tests/data/Makefile.am | 3 +
> .../test-squished-report.txt | 0
> .../test-abidiff-exit/test-squished-v0.abi | 43 +++
> .../test-abidiff-exit/test-squished-v1.abi | 1 +
> tests/test-abidiff-exit.cc | 11 +
> 8 files changed, 167 insertions(+), 203 deletions(-)
> create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt
> create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi
> create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi
>
> diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h
> index 69e940f6..f87ee5a0 100644
> --- a/include/abg-libxml-utils.h
> +++ b/include/abg-libxml-utils.h
> @@ -82,7 +82,10 @@ int get_xml_node_depth(xmlNodePtr);
> reinterpret_cast<char*>(xml_char_str.get())
>
> xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node);
> +first_child(xmlNodePtr node);
> +
> +xmlNodePtr
> +next_child(xmlNodePtr node);
>
> void
> escape_xml_string(const std::string& str,
> diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> index a1acff1f..baea25ac 100644
> --- a/src/abg-libxml-utils.cc
> +++ b/src/abg-libxml-utils.cc
> @@ -413,40 +413,39 @@ unescape_xml_comment(const std::string& str)
> return result;
> }
>
> -/// Maybe get the next sibling element node of an XML node, or stay to the sam
> +/// Skip until we reach an XML element node or run out of (child) nodes.
> ///
> -/// If there is no next sibling xml element node, the function returns
> -/// the initial node.
> +/// @param a pointer to a node.
> ///
> -/// @param node the initial node to consider.
> -///
> -/// @return the next sibling node or the initial node @p node.
> -static xmlNodePtr
> -go_to_next_sibling_element_or_stay(xmlNodePtr node)
Removal of this is not documented in the ChangeLog part of the commit log.
> +/// @return a pointer to an XML element node or nil.
> +xmlNodePtr
> +skip_non_elements(xmlNodePtr node)
This does exactly the same as the removed
go_to_next_sibling_element_or_stay or am I missing something?
> {
> - xmlNodePtr n;
> - for (n = node; n; n = n->next)
> - {
> - if (n->type == XML_ELEMENT_NODE)
> - break;
> - }
> - return n ? n : node;
> + while (node && node->type != XML_ELEMENT_NODE)
> + node = node->next;
> + return node;
> }
>
> -/// Get the next sibling element node of an XML node.
> +/// Get the first child element of an XML node.
> ///
> -/// If there is no next sibling xml element node, the function returns nil.
> +/// @param a pointer to the node whose children are to be perused.
> ///
> -/// @param node the XML node to consider.
> +/// @return a pointer to the first XML element child of @p node or nil.
> +xmlNodePtr
> +first_child(xmlNodePtr node)
Please call this first_element_child. Otherwise, it can be misleading,
I think.
> +{
> + return skip_non_elements(node->children);
> +}
> +
> +/// Get the next sibling element of an XML node.
> ///
> -/// @return the next sibling element node or nil.
> +/// @param a pointer to the XML node whose following siblings are to be perused.
> +///
> +/// @return a pointer to the next sibling element of @p node or nil.
> xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node)
> +next_child(xmlNodePtr node)
Looking at the code of this function I think it should rather be called
next_element_sibling. Or what am I missing?
> {
> - xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
> - if (n == 0 || n->type != XML_ELEMENT_NODE)
> - return 0;
> - return n;
> + return skip_non_elements(node->next);
> }
[...]
Cheers,
--
Dodji
next prev parent reply other threads:[~2021-04-06 11:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-31 9:23 [PATCH] " Giuliano Procida
2021-03-31 14:24 ` Matthias Maennich
2021-03-31 16:58 ` Giuliano Procida
2021-03-31 17:04 ` [PATCH v2] " Giuliano Procida
2021-04-06 11:48 ` Dodji Seketeli [this message]
2021-04-06 14:35 ` Giuliano Procida
2021-04-06 16:22 ` [PATCH v3] " Giuliano Procida
2021-04-06 17:39 ` [PATCH v4] " Giuliano Procida
2021-04-15 14:12 ` [PATCH v2, " Dodji Seketeli
2021-04-16 12:03 ` Giuliano Procida
2021-04-19 13:40 ` Dodji Seketeli
2021-04-19 14:00 ` Subject: [PATCH 1/3] reader: Handle 'abi-corpus' element being possibly empty Dodji Seketeli
2021-04-19 14:01 ` [PATCH 2/3] reader: Use xmlFirstElementChild and xmlNextElementSibling rather than xml::advance_to_next_sibling_element Dodji Seketeli
2021-04-19 14:03 ` [PATCH 3/3] reader: Use xmlFirstElementChild/xmlNextElementSibling to iterate over children elements Dodji Seketeli
2021-04-20 6:52 ` Giuliano Procida
2021-04-20 9:46 ` Dodji Seketeli
2021-05-03 15:18 ` Dodji Seketeli
2021-05-04 11:13 ` Giuliano Procida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8735w3vdkv.fsf@seketeli.org \
--to=dodji@seketeli.org \
--cc=gprocida@google.com \
--cc=kernel-team@android.com \
--cc=libabigail@sourceware.org \
--cc=maennich@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).