public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: Dodji Seketeli <dodji@seketeli.org>
To: Giuliano Procida <gprocida@google.com>
Cc: libabigail@sourceware.org,  kernel-team@android.com,
	 maennich@google.com
Subject: Re: [PATCH v2] XML reader: improve XML node traversal
Date: Tue, 06 Apr 2021 13:48:16 +0200	[thread overview]
Message-ID: <8735w3vdkv.fsf@seketeli.org> (raw)
In-Reply-To: <20210331170454.951900-1-gprocida@google.com> (Giuliano Procida's message of "Wed, 31 Mar 2021 18:04:54 +0100")

Hello Giuliano,

Giuliano Procida <gprocida@google.com> a écrit:

[...]

Thanks for looking into this.

Sadly, as several other patches got in before I got to review this, this
patch doesn't apply to the current state of master anymore.  So overall,
I think it would need a rebasing.  Sorry about that :-(

[...]

> The XML reader uses libxml's Reader API for parsing XML and traverses
> high-level XML nodes using reader cursor advancement. Low-level nodes
> are read into complete node trees and are traversed following
> children, next and (in one instance) parent pointers. Some
> intermediate-level nodes can be parsed both ways.
>
> A lot of code cares about node types, typically skipping over
> non-element nodes. Unfortunately, in a few places, the tracked "corpus
> node" needs to land on text (whitespace) within elements for the XML
> reader to work; stripping text nodes out causes some elements to be
> skipped.

Can you please give an example of this?  It doesn't seem obvious to me.

> Almost all node tree traversals are very straightforward and
> can be expressed using simpler primitves than at present.
>
> This commit changes all the node tree traversals to use two node
> iteration primitives consistently. These are "first_child" and
> "next_child", both of which automatically skip text nodes. They
> replace the existing "advance_to_next_sibling_element" and *all*
> direct references to "children" and "next" nodes. The unwanted
> dependency on text within elements has been fixed.

I think providing an example of the initial problem would help to
understand how this fixes the initial issue.

[...]

> For-loops constructed with the new primitives present only element
> nodes to the loop body. This allows a lot of redundant checking for
> null / non-element nodes to be removed.
>
> These changes have been tested by using abidiff to comparing all
> current test XML files with copies where all text has been removed.
> One such test case is included in this commit.
>
> 	* include/abg-libxml-utils.h
> 	(advance_to_next_sibling_element): Remove this function.
> 	(first_child): Add new function. (next_child): Likewise. This

Please note how the "(next_child)" should go on the next line.

There are several instances of this issue in the ChangeLog part of the
commit log.

[...]

> 	is simpler than but functionally identical to
> 	advance_to_next_sibling_element.
> 	* src/abg-libxml-utils.cc (skip_non_elements): New function to
> 	skip non-element nodes in a sibling node list. (first_child):
> 	New function to find the first child element of a node.
> 	(next_child): New function to find the next sibling element of
> 	a node.
> 	* src/abg-reader.cc (walk_xml_node_to_map_type_ids): Remove
> 	redundant node checks. Iterate over children with first_child
> 	and next_child helpers. (read_translation_unit): Likewise.
> 	(read_translation_unit_from_input): Remove a no-longer-needed
> 	->next and redundant node checks. Advance "corpus node" after
> 	reading the translation unit. (read_symbol_db_from_input):
> 	Remove a no-longer-needed ->next and advance "corpus node" at
> 	the end of the for-loop.  Remove redundant node checks.
> 	(build_needed): Remove redundant node checks. Iterate over
> 	children with first_child and next_child.
> 	(read_elf_needed_from_input): Remove redundant node checks and
> 	simplify logic. Move to next element node after reading
> 	elf-needed node. (read_corpus_from_input): Remove no-children
> 	early return (which was inhibited by text nodes anyway). Set
> 	initial "corpus node" using first_child instead of children
> 	pointer. Remove redundant "corpus node" updates at end of
> 	function as the caller takes care of this and the node can
> 	become null with the new primitives.
> 	(read_corpus_group_from_input): Iterate over children using
> 	first_child and next child. Set "corpus node" for each corpus,
> 	not just the first. Incidentally keep going even if one corpus
> 	parse fails. (build_namespace_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. (build_elf_symbol): Remove redundant node checks.
> 	(build_elf_symbol_db): Iterate over children using first_child
> 	and next_child. (build_function_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. (build_function_type): Likewise.
> 	(build_array_type_def): Likewise. (build_enum_type_decl):
> 	Likewise. (build_class_decl): Remove redundant node
> 	checks. Iterate over children using first_child and
> 	next_child. Add comment about skipping children of
> 	declaration-only types (build_union_decl): Likewise.
> 	(build_function_tdecl): Remove redundant node checks.
> 	Iterate over children using first_child and next_child.
> 	(build_type_composition): Likewise.
> 	(build_template_tparameter): Likewise.
> 	* tests/data/Makefile.am: Add new test files.
> 	* tests/data/test-abidiff-exit/test-squished-report.txt: Add
> 	empty diff report.
> 	* tests/data/test-abidiff-exit/test-squished-v0.abi: Add test
> 	ABI file, containing all ELF elements.
> 	* tests/data/test-abidiff-exit/test-squished-v1.abi: Likewise
> 	and identical, except that all text has been stripped.
> 	* tests/test-abidiff-exit.cc: Add new test case.
>
> Reviewed-by: Matthias Maennich <maennich@google.com>
> Signed-off-by: Giuliano Procida <gprocida@google.com>
> ---
>  include/abg-libxml-utils.h                    |   5 +-
>  src/abg-libxml-utils.cc                       |  47 ++--
>  src/abg-reader.cc                             | 260 ++++++------------
>  tests/data/Makefile.am                        |   3 +
>  .../test-squished-report.txt                  |   0
>  .../test-abidiff-exit/test-squished-v0.abi    |  43 +++
>  .../test-abidiff-exit/test-squished-v1.abi    |   1 +
>  tests/test-abidiff-exit.cc                    |  11 +
>  8 files changed, 167 insertions(+), 203 deletions(-)
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-report.txt
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-v0.abi
>  create mode 100644 tests/data/test-abidiff-exit/test-squished-v1.abi
>
> diff --git a/include/abg-libxml-utils.h b/include/abg-libxml-utils.h
> index 69e940f6..f87ee5a0 100644
> --- a/include/abg-libxml-utils.h
> +++ b/include/abg-libxml-utils.h
> @@ -82,7 +82,10 @@ int get_xml_node_depth(xmlNodePtr);
>    reinterpret_cast<char*>(xml_char_str.get())
>  
>  xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node);
> +first_child(xmlNodePtr node);
> +
> +xmlNodePtr
> +next_child(xmlNodePtr node);
>  
>  void
>  escape_xml_string(const std::string& str,
> diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> index a1acff1f..baea25ac 100644
> --- a/src/abg-libxml-utils.cc
> +++ b/src/abg-libxml-utils.cc
> @@ -413,40 +413,39 @@ unescape_xml_comment(const std::string& str)
>    return result;
>  }
>  
> -/// Maybe get the next sibling element node of an XML node, or stay to the sam
> +/// Skip until we reach an XML element node or run out of (child) nodes.
>  ///
> -/// If there is no next sibling xml element node, the function returns
> -/// the initial node.
> +/// @param a pointer to a node.
>  ///
> -/// @param node the initial node to consider.
> -///
> -/// @return the next sibling node or the initial node @p node.
> -static xmlNodePtr
> -go_to_next_sibling_element_or_stay(xmlNodePtr node)

Removal of this is not documented in the ChangeLog part of the commit log.

> +/// @return a pointer to an XML element node or nil.
> +xmlNodePtr
> +skip_non_elements(xmlNodePtr node)

This does exactly the same as the removed
go_to_next_sibling_element_or_stay or am I missing something?


>  {
> -  xmlNodePtr n;
> -  for (n = node; n; n = n->next)
> -    {
> -      if (n->type == XML_ELEMENT_NODE)
> -	break;
> -    }
> -  return n ? n : node;
> +  while (node && node->type != XML_ELEMENT_NODE)
> +    node = node->next;
> +  return node;
>  }
>  
> -/// Get the next sibling element node of an XML node.
> +/// Get the first child element of an XML node.
>  ///
> -/// If there is no next sibling xml element node, the function returns nil.
> +/// @param a pointer to the node whose children are to be perused.
>  ///
> -/// @param node the XML node to consider.
> +/// @return a pointer to the first XML element child of @p node or nil.
> +xmlNodePtr
> +first_child(xmlNodePtr node)

Please call this first_element_child.  Otherwise, it can be misleading,
I think.

> +{
> +  return skip_non_elements(node->children);
> +}
> +
> +/// Get the next sibling element of an XML node.
>  ///
> -/// @return the next sibling element node or nil.
> +/// @param a pointer to the XML node whose following siblings are to be perused.
> +///
> +/// @return a pointer to the next sibling element of @p node or nil.
>  xmlNodePtr
> -advance_to_next_sibling_element(xmlNodePtr node)
> +next_child(xmlNodePtr node)

Looking at the code of this function I think it should rather be called
next_element_sibling.  Or what am I missing?

>  {
> -  xmlNodePtr n = go_to_next_sibling_element_or_stay(node->next);
> -  if (n == 0 || n->type != XML_ELEMENT_NODE)
> -    return 0;
> -  return n;
> +  return skip_non_elements(node->next);
>  }

[...]

Cheers,


-- 
		Dodji

  reply	other threads:[~2021-04-06 11:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31  9:23 [PATCH] " Giuliano Procida
2021-03-31 14:24 ` Matthias Maennich
2021-03-31 16:58   ` Giuliano Procida
2021-03-31 17:04 ` [PATCH v2] " Giuliano Procida
2021-04-06 11:48   ` Dodji Seketeli [this message]
2021-04-06 14:35     ` Giuliano Procida
2021-04-06 16:22       ` [PATCH v3] " Giuliano Procida
2021-04-06 17:39         ` [PATCH v4] " Giuliano Procida
2021-04-15 14:12       ` [PATCH v2, " Dodji Seketeli
2021-04-16 12:03         ` Giuliano Procida
2021-04-19 13:40           ` Dodji Seketeli
2021-04-19 14:00             ` Subject: [PATCH 1/3] reader: Handle 'abi-corpus' element being possibly empty Dodji Seketeli
2021-04-19 14:01             ` [PATCH 2/3] reader: Use xmlFirstElementChild and xmlNextElementSibling rather than xml::advance_to_next_sibling_element Dodji Seketeli
2021-04-19 14:03             ` [PATCH 3/3] reader: Use xmlFirstElementChild/xmlNextElementSibling to iterate over children elements Dodji Seketeli
2021-04-20  6:52               ` Giuliano Procida
2021-04-20  9:46                 ` Dodji Seketeli
2021-05-03 15:18                   ` Dodji Seketeli
2021-05-04 11:13                     ` Giuliano Procida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735w3vdkv.fsf@seketeli.org \
    --to=dodji@seketeli.org \
    --cc=gprocida@google.com \
    --cc=kernel-team@android.com \
    --cc=libabigail@sourceware.org \
    --cc=maennich@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).