Re: [PATCH 1/4] Add named-types-ids to use name ids after the type name instead of numbers.

public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed

From: Mark Wielaard <mark@klomp.org>
To: Dodji Seketeli <dodji@seketeli.org>
Cc: libabigail@sourceware.org
Subject: Re: [PATCH 1/4] Add named-types-ids to use name ids after the type name instead of numbers.
Date: Fri, 24 Apr 2020 16:26:46 +0200	[thread overview]
Message-ID: <6e92ff1acae37114b60e9fbc0d4949a5aeab1330.camel@klomp.org> (raw)
In-Reply-To: <87v9lqayar.fsf@seketeli.org>

Hi Dodji,

On Thu, 2020-04-23 at 19:48 +0200, Dodji Seketeli wrote:
> "Mark J. Wielaard" <mark@klomp.org> a écrit:
> 
> > To make the XML output more readable and a bit more stable generate
> > type ids based on the underlying type name. When types are inserted,
> > removed or rearranged the type ids will (mostly) be the same so that
> > references can stay the same. This also makes it easier to read and
> > compare the XML corpus representation.
> 
> Note that abidw has the option --annotate that makes it easy to read the
> abixml file, for cases where people want to read it.

Yes, but that does make the file bigger and I found the xml-escaped
names in the annotation a bit harder to read (see also below).

> Nevertheless, this new --named-type-ids is definitely a welcomed
> improvement for the stability of abixml files.
> 
> I made some light changes to this patch and so I have attached the
> resulting patch to this one, so that you can test it in your environment
> and see if it still suits you.

The get_pretty_representation change makes things less ideal for my use
case, see below.

> I am thinking that we should add some basic testing for this option as
> well, I am about to modify test-read-dwarf.cc to make it support this.
> I haven't done it yet as I didn't want to delay this review any further.

Yes, sorry. I wanted to see first if the representation was acceptable.
But I guess we should at least add some "round-tripping" tests with
abidw --named-types-ids --abidiff to make sure things are at least
internally consistent.

> [...]
> 
> > diff --git a/src/abg-ir.cc b/src/abg-ir.cc
> > index 27831352..f4fee60a 100644
> > --- a/src/abg-ir.cc
> > +++ b/src/abg-ir.cc
> > @@ -2884,6 +2884,13 @@ interned_string
> >  environment::intern(const string& s) const
> >  {return const_cast<environment*>(this)->priv_-
> > >string_pool_.create_string(s);}
> >  
> > +bool
> > +environment::is_interned_string(const string& s) const
> > +{
> 
> I have added a comment for this function.

Thanks, looks good.

> > diff --git a/src/abg-libxml-utils.cc b/src/abg-libxml-utils.cc
> > index 2c46aad8..e8c1c1b8 100644
> > --- a/src/abg-libxml-utils.cc
> > +++ b/src/abg-libxml-utils.cc
> > @@ -249,6 +249,64 @@ escape_xml_string(const std::string& str)
> >    return result;
> >  }
> >  
> > +/// Replace the various special characters in a type string as used in
> > +/// a type-id attribute.
> > +///
> > +/// The characters '<', '>', ''', '"' and ' ' are all replaced by '_'.
> > +/// The characters '&' and '*' at the end of a string are simply dropped,
> > +/// otherwise they are also replaced by an '_'.
> > +///
> > +/// The result is not reversible.
> > +///
> > +//// @param str the input string to read to search for the characters
> > +//// to replace.
> > +////
> > +//// @param replaced the output string where to write the resulting
> > +//// string that contains the replaced characters,
> > +void
> > +replace_xml_type_string(const std::string& str, std::string& replaced)
> > +{
> 
> I thinking you could have just used the xml::escape_xml_string function
> that is in this abg-libxml-utils.cc file.  I understand that this new
> one that you wrote makes the result more pleasant to read, though.  So I
> am not arguing against your choice.

I indeed found the replacements using &[a-z]+;... not very readable.
It is a bit of a shame that we have to escape '<' inside attributes, so
it seemed right to also replace '>' (which doesn't need escaping inside
an attribute text). And some chars, like '&' and '*' at the end of an
identifier were redundant and so could be removed (but see below).

> I have however changed the name of this function to
> escape_xml_string_as_named_type_id to make the name a bit more
> "self-documented" and more in-line with escape_xml_string that exists
> already.

Yes, good idea.

> --- a/src/abg-writer.cc
> > +++ b/src/abg-writer.cc
> > @@ -127,6 +127,54 @@ public:
> >      ABG_ASSERT(env);
> >      return env->intern(o.str());
> >    }
> > +
> > +  /// Return a unique string representing a name, prefixed by a type
> > +  /// string. The returned string will be made unique by postfixing
> > +  /// underscores if necessary.
> > +  ///
> > +  /// @param type to create an unique id string for
> > +  interned_string
> > +  get_id_for_type(const type_base* type) const
> > +  {
> > +    ostringstream o;
> > +    /* Try to find an appropriate prefix. */
> > +    if (is_type_decl(type))
> > +      o << "type-";
> > +    else if (is_class_type(type))
> > +      o << "class-";
> > +    else if (is_union_type(type))
> > +      o << "union-";
> > +    else if (is_enum_type(type))
> > +      o << "enum-";
> > +    else if (is_typedef(type))
> > +      o << "typedef-";
> > +    else if (is_qualified_type(type))
> > +      o << "qual-";
> > +    else if (is_pointer_type(type))
> > +      o << "ptr-";
> > +    else if (is_reference_type(type))
> > +      o << "ref-";
> > +    else if (is_array_type(type))
> > +      o << "array-";
> > +    else if (is_subrange_type(type))
> > +      o << "subrng-";
> > +    else if (is_function_type(type))
> > +      o << "func-";
> > +    else
> > +      ABG_ASSERT_NOT_REACHED;
> 
> Here, have replaced the above by a call to
> abigail::ir::get_pretty_representation().  It does what you think it
> does for all the types supported in the internal representation (IR).
> Whenever a new type representation is added to the IR, this should
> hopefully just works as well.  I have also renamed this function as
> get_named_type_id() to make it a bit more self-documented.

I am not sure using get_pretty_representation() gives the same
guarantees as the above code. In particular it seems to accept both
types and decls and it will accept a "NULL" type (as "void"). does that
really make sure we are only handling real types at this point?

It might be a good idea to use get_pretty_representation() instead of
get_type_name() here, because it makes the type representation
"richer". But it seems it is not enough to make them "unique".

The idea of creating an unique type-prefix was to make clear what type
we are dealing with. e.g. both a base type, struct, union, enum,
typedef, qualified type, etc. can have the same "name". Which is
especially confusing with typedefs, pointers, references and functions.
That is why this function prefixes each type with "type-" for base
types, "class-" for structs, "typedef-" for typedefs. That way it is
easy to see if you are dealing with an enum, typedef or reference, etc.

So even if we replace get_type_name(), with
get_pretty_representation(), I would like to keep the prefixing.

> So please, find my amended patch below and I hope I haven't botched
> it too much.  Please let me know if it works for you.

I'll look at Matthias review next and will sent a V2 based on your
version.

Cheers,

Mark

next prev parent reply	other threads:[~2020-04-24 14:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 12:28 More simplifications of abi XML file Mark J. Wielaard
2020-04-21 12:28 ` [PATCH 1/4] Add named-types-ids to use name ids after the type name instead of numbers Mark J. Wielaard
2020-04-23 17:48   ` Dodji Seketeli
2020-04-24 12:57     ` Matthias Maennich
2020-04-24 15:26       ` Mark Wielaard
2020-04-24 17:11         ` Matthias Maennich
2020-04-24 14:26     ` Mark Wielaard [this message]
2020-04-21 12:28 ` [PATCH 2/4] Add no-parameter-names to drop function parameter names Mark J. Wielaard
2020-04-24 13:35   ` Dodji Seketeli
2020-04-21 12:28 ` [PATCH 3/4] Add --no-elf-needed option to drop DT_NEEDED list from corpus Mark J. Wielaard
2020-04-24 14:17   ` Dodji Seketeli
2020-04-21 12:28 ` [PATCH 4/4] Add --merge-translation-units option Mark J. Wielaard
2020-05-07 12:27   ` Matthias Maennich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e92ff1acae37114b60e9fbc0d4949a5aeab1330.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=dodji@seketeli.org \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).