public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [PATCH, applied] Better support for golang programs
@ 2022-09-22 14:49 Dodji Seketeli
  0 siblings, 0 replies; only message in thread
From: Dodji Seketeli @ 2022-09-22 14:49 UTC (permalink / raw)
  To: libabigail; +Cc: woodard

Hello,

When analyzing the bettercap program written in Golang, the DWARF
reader goes into an infinite loop due to this recursive DWARF
construct:

 [ 8bbf9]    subroutine_type      abbrev: 40
             name                 (string) "gopkg.in/sourcemap%2ev1.fn"
             byte_size            (udata) 4
             lo_user+0x900        (data1) 19
             lo_user+0x904        (addr) +0000000000
 [ 8bc1b]      formal_parameter     abbrev: 41
               type                 (ref_addr) [ 8ba8b]
 [ 8bc20]      formal_parameter     abbrev: 41
               type                 (ref_addr) [ 8bc4b]
 [ 8bc25]      formal_parameter     abbrev: 41
               type                 (ref_addr) [ 6d43e]
 [ 8bc2b]    typedef              abbrev: 39
             name                 (string) "gopkg.in/sourcemap%2ev1.fn"
             type                 (ref_addr) [ 8bbf9]
 [ 8bc4b]    pointer_type         abbrev: 43
             name                 (string) "*gopkg.in/sourcemap%2ev1.fn"
             type                 (ref_addr) [ 8bc2b]
             lo_user+0x900        (data1) 0
             lo_user+0x904        (addr) +0000000000

Note how the typedef DIE at offset [ 8bc2b] references the function
type DIE at offset [ 8bbf9] which second parameter DIE at offset
[8bc20] has a pointer type described by the DIE [ 8bc4b].  This last
pointer type is a pointer to the typedef type which DIE has the offset
[ 8bc2b], which started this paragraph.  This is a recursive
construct.

First, there is die_qualified_type_name in the DWARF reader that goes
look unnecessarily into the underlying type of a typedef.  This makes
that function end-up in an infinite loop.  That is especially
unfortunate because we do not need to do that to construct the name of
the typedef.  This looks like an old relic of ancient unrelated code
that needs to go.  This patch lets it go.

Second, when building the IR for function type, build_function_type
also ends up in a infinite loop because it's written naively.  To fix
that, this patch does what we do to handle recursively defined
classes.  The function type IR for that function type DIE is
"forward-declared" as being "Work In Progress" aka WIP; then when a
construct references that same DIE, the WIP IR is returned.  When we
are done constructing the function type IR for that DIE, the IR is no
longer marked WIP.  That way, the infinite recursion is avoided.

Now that all function types can be represented in the IR,
function_decl::get_pretty_representation_of_declarator is crashing
because it wrongly forgets that a parameter can have a function type.
The patch fixes that.

Last but not least, it appears that the name of elf symbols and
functions can contain characters that need to be escaped (to respect
the lexical rules of XML) in the emitted ABIXML.  The patch fixes
that.

Together, this patch makes it so that running fedabipkgdiff to compare
packages against themselves now succeeds on the f36 distribution, for
the following Golang packages:

    $ fedabipkgdiff  --self-compare --from fc36 {containerd, bettercap,
    apptainer, rclone, singularity}

	* src/abg-dwarf-reader.cc (die_qualified_type_name): Don't look at
	the underlying type unnecessarily.
	(build_function_type): Look for the WIP type first to avoid
	infinite recursion.
	* src/abg-ir.cc
	(function_decl::get_pretty_representation_of_declarator): A
	parameter can have a function type.
	* src/abg-writer.cc (write_elf_symbol_reference)
	(write_function_decl): Escape symbol names, function names and
	symbol references.

Signed-off-by: Dodji Seketeli <dodji@redhat.com>
---
 src/abg-dwarf-reader.cc | 26 +++++++++++---------------
 src/abg-ir.cc           |  3 +--
 src/abg-writer.cc       |  8 +++++---
 3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index c6ba838c..21d2e11d 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -9780,21 +9780,6 @@ die_qualified_type_name(const read_context& ctxt,
     case DW_TAG_class_type:
     case DW_TAG_union_type:
       {
-	if (tag == DW_TAG_typedef)
-	  {
-	    // If the underlying type of the typedef is unspecified,
-	    // bail out as we don't support that yet.
-	    Dwarf_Die underlying_type_die;
-	    if (die_die_attribute(die, DW_AT_type, underlying_type_die))
-	      {
-		string n = die_qualified_type_name(ctxt, &underlying_type_die,
-						   where_offset);
-		if (die_is_unspecified(&underlying_type_die)
-		    || n.empty())
-		  break;
-	      }
-	  }
-
 	if (name.empty())
 	  // TODO: handle cases where there are more than one
 	  // anonymous type of the same kind in the same scope.  In
@@ -14566,6 +14551,17 @@ build_function_type(read_context&	ctxt,
 
   const die_source source = ctxt.get_die_source(die);
 
+  {
+    size_t off = dwarf_dieoffset(die);
+    auto i = ctxt.die_wip_function_types_map(source).find(off);
+    if (i != ctxt.die_wip_function_types_map(source).end())
+      {
+	function_type_sptr fn_type = is_function_type(i->second);
+	ABG_ASSERT(fn_type);
+	return fn_type;
+      }
+  }
+
   decl_base_sptr type_decl;
 
   translation_unit_sptr tu = ctxt.cur_transl_unit();
diff --git a/src/abg-ir.cc b/src/abg-ir.cc
index 8ad870af..1cd2a219 100644
--- a/src/abg-ir.cc
+++ b/src/abg-ir.cc
@@ -20483,8 +20483,7 @@ function_decl::get_pretty_representation_of_declarator (bool internal) const
 	  type_base_sptr type = parm->get_type();
 	  if (internal)
 	    type = peel_typedef_type(type);
-	  decl_base_sptr type_decl = get_type_declaration(type);
-	  result += type_decl->get_qualified_name(internal);
+	  result += get_type_name(type, /*qualified=*/true, internal);
 	}
     }
   result += ")";
diff --git a/src/abg-writer.cc b/src/abg-writer.cc
index a6166f5a..dff8813a 100644
--- a/src/abg-writer.cc
+++ b/src/abg-writer.cc
@@ -1758,7 +1758,9 @@ write_elf_symbol_reference(const elf_symbol& sym, ostream& o)
   // If all aliases are suppressed, just stick with the main symbol.
   if (!found)
     alias = main;
-  o << " elf-symbol-id='" << alias->get_id_string() << "'";
+  o << " elf-symbol-id='"
+    << xml::escape_xml_string(alias->get_id_string())
+    << "'";
   return true;
 }
 
@@ -3101,7 +3103,7 @@ write_elf_symbol(const elf_symbol_sptr&	sym,
 
   annotate(sym, ctxt, indent);
   do_indent(o, indent);
-  o << "<elf-symbol name='" << sym->get_name() << "'";
+  o << "<elf-symbol name='" << xml::escape_xml_string(sym->get_name()) << "'";
   if (sym->is_variable() && sym->get_size())
   o << " size='" << sym->get_size() << "'";
 
@@ -3400,7 +3402,7 @@ write_function_decl(const function_decl_sptr& decl, write_context& ctxt,
 	  ctxt.record_type_as_referenced(parm_type);
 
 	  if (ctxt.get_write_parameter_names() && !(*pi)->get_name().empty())
-	    o << " name='" << (*pi)->get_name() << "'";
+	    o << " name='" << xml::escape_xml_string((*pi)->get_name()) << "'";
 	}
       write_is_artificial(*pi, o);
       write_location((*pi)->get_location(), ctxt);
-- 
2.37.2


-- 
		Dodji


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-09-22 14:49 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 14:49 [PATCH, applied] Better support for golang programs Dodji Seketeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).