public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: PING: Re: [PATCH] json: preserve key-insertion order [PR109163]
Date: Fri, 24 Mar 2023 16:27:51 +0100	[thread overview]
Message-ID: <4B6552FF-73A2-46D0-ADA3-7CB273FC7E91@gmail.com> (raw)
In-Reply-To: <464928109702cd42ea5e1de4b03bc0f4d736297b.camel@redhat.com>



> Am 24.03.2023 um 15:44 schrieb David Malcolm via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> I'd like to ping the following patch for review:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html

Ok.

Richard 

> Thanks
> Dave
> 
>> On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote:
>> PR other/109163 notes that when we write out JSON files, we traverse
>> the keys within each object via hash_map iteration, and thus the
>> ordering is non-deterministic - it can arbitrarily vary from run to
>> run and from different machines, making it harder for users to
>> compare
>> results and determine if anything has "really" changed.
>> 
>> I'm running into this issue with SARIF output, but there are several
>> places where we're currently emitting JSON:
>> 
>>   * -fsave-optimization-record emits SRCFILE.opt-record.json.gz
>>         "This option is experimental and the format of the data
>> within
>>         the compressed JSON file is subject to change."; see
>>         optinfo-emit-json.{h,cc}, dumpfile.cc, etc
>>   * -fdiagnostics-format= with the various "sarif" and "json" options
>>   * -fdump-analyzer-json is a developer option in the analyzer
>>   * gcov has:
>>      "-j, --json-format: Output JSON intermediate format into
>>      .gcov.json.gz file"
>> 
>> This patch adds an auto_vec to class json::object to preserve
>> key-insertion order, and use it when writing out objects. 
>> Potentially
>> this slightly slows down JSON output, but I believe that this isn't
>> normally a bottleneck, and that the benefits to the user of
>> deterministic output are worth it.
>> 
>> I had first attempted to use ordered_hash_map.h for this, but ran
>> into
>> impenetrable template errors, so this patch uses a simpler approach
>> of
>> just adding an auto_vec to json::object.
>> 
>> Testing showed a failure of diagnostic-format-json-5.c, which was
>> using
>> a convoluted set of regexps to consume the output; I believe that
>> this
>> was brittle, and was intermittently failing for some of the random
>> orderings of output.  I rewrote these regexps to work with the
>> expected
>> output order.  The other such tests seem to pass with the
>> now-deterministic orderings.
>> 
>> Lightly tested with valgrind.
>> I manually verified that the SARIF output is now deterministic.
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>> 
>> OK for trunk?
>> 
>> gcc/ChangeLog:
>>         PR other/109163
>>         * json.cc: Update comments to indicate that we now preserve
>>         insertion order of keys within objects.
>>         (object::print): Traverse keys in insertion order.
>>         (object::set): Preserve insertion order of keys.
>>         (selftest::test_writing_objects): Add an additional key to
>> verify
>>         that we preserve insertion order.
>>         * json.h (object::m_keys): New field.
>> 
>> gcc/testsuite/ChangeLog:
>>         PR other/109163
>>         * c-c++-common/diagnostic-format-json-1.c: Update comment.
>>         * c-c++-common/diagnostic-format-json-2.c: Likewise.
>>         * c-c++-common/diagnostic-format-json-3.c: Likewise.
>>         * c-c++-common/diagnostic-format-json-4.c: Likewise.
>>         * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
>>         * c-c++-common/diagnostic-format-json-stderr-1.c: Update
>> comment.
>> 
>> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
>> ---
>>  gcc/json.cc                                   |  40 ++++---
>>  gcc/json.h                                    |  10 +-
>>  .../c-c++-common/diagnostic-format-json-1.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-2.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-3.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-4.c   |   3 +-
>>  .../c-c++-common/diagnostic-format-json-5.c   | 100 ++++++++++------
>> --
>>  .../diagnostic-format-json-stderr-1.c         |   3 +-
>>  8 files changed, 95 insertions(+), 70 deletions(-)
>> 
>> diff --git a/gcc/json.cc b/gcc/json.cc
>> index 01879c9c466..741e97b20e5 100644
>> --- a/gcc/json.cc
>> +++ b/gcc/json.cc
>> @@ -31,8 +31,11 @@ using namespace json;
>>  /* class json::value.  */
>>  
>>  /* Dump this json::value tree to OUTF.
>> -   No formatting is done.  There are no guarantees about the order
>> -   in which the key/value pairs of json::objects are printed.  */
>> +
>> +   No formatting is done.
>> +
>> +   The key/value pairs of json::objects are printed in the order
>> +   in which the keys were originally inserted.  */
>>  
>>  void
>>  value::dump (FILE *outf) const
>> @@ -44,7 +47,7 @@ value::dump (FILE *outf) const
>>  }
>>  
>>  /* class json::object, a subclass of json::value, representing
>> -   an unordered collection of key/value pairs.  */
>> +   an ordered collection of key/value pairs.  */
>>  
>>  /* json:object's dtor.  */
>>  
>> @@ -62,14 +65,17 @@ object::~object ()
>>  void
>>  object::print (pretty_printer *pp) const
>>  {
>> -  /* Note that the order is not guaranteed.  */
>>    pp_character (pp, '{');
>> -  for (map_t::iterator it = m_map.begin (); it != m_map.end ();
>> ++it)
>> +
>> +  /* Iterate in the order that the keys were inserted.  */
>> +  unsigned i;
>> +  const char *key;
>> +  FOR_EACH_VEC_ELT (m_keys, i, key)
>>      {
>> -      if (it != m_map.begin ())
>> +      if (i > 0)
>>         pp_string (pp, ", ");
>> -      const char *key = const_cast <char *>((*it).first);
>> -      value *value = (*it).second;
>> +      map_t &mut_map = const_cast<map_t &> (m_map);
>> +      value *value = *mut_map.get (key);
>>        pp_doublequote (pp);
>>        pp_string (pp, key); // FIXME: escaping?
>>        pp_doublequote (pp);
>> @@ -97,9 +103,13 @@ object::set (const char *key, value *v)
>>        *ptr = v;
>>      }
>>    else
>> -    /* If the key wasn't already present, take a copy of the key,
>> -       and store the value.  */
>> -    m_map.put (xstrdup (key), v);
>> +    {
>> +      /* If the key wasn't already present, take a copy of the key,
>> +        and store the value.  */
>> +      char *owned_key = xstrdup (key);
>> +      m_map.put (owned_key, v);
>> +      m_keys.safe_push (owned_key);
>> +    }
>>  }
>>  
>>  /* Get the json::value * for KEY.
>> @@ -295,15 +305,17 @@ test_object_get ()
>>    ASSERT_EQ (obj.get ("not-present"), NULL);
>>  }
>>  
>> -/* Verify that JSON objects are written correctly.  We can't test
>> more than
>> -   one key/value pair, as we don't impose a guaranteed ordering.  */
>> +/* Verify that JSON objects are written correctly.  */
>>  
>>  static void
>>  test_writing_objects ()
>>  {
>>    object obj;
>>    obj.set ("foo", new json::string ("bar"));
>> -  assert_print_eq (obj, "{\"foo\": \"bar\"}");
>> +  obj.set ("baz", new json::string ("quux"));
>> +  /* This test relies on json::object writing out key/value pairs
>> +     in key-insertion order.  */
>> +  assert_print_eq (obj, "{\"foo\": \"bar\", \"baz\": \"quux\"}");
>>  }
>>  
>>  /* Verify that JSON arrays are written correctly.  */
>> diff --git a/gcc/json.h b/gcc/json.h
>> index aa52ba2951a..057119db277 100644
>> --- a/gcc/json.h
>> +++ b/gcc/json.h
>> @@ -82,8 +82,11 @@ class value
>>    void dump (FILE *) const;
>>  };
>>  
>> -/* Subclass of value for objects: an unordered collection of
>> -   key/value pairs.  */
>> +/* Subclass of value for objects: a collection of key/value pairs
>> +   preserving the ordering in which keys were inserted.
>> +
>> +   Preserving the order eliminates non-determinism in the output,
>> +   making it easier for the user to compare repeated invocations. 
>> */
>>  
>>  class object : public value
>>  {
>> @@ -100,6 +103,9 @@ class object : public value
>>    typedef hash_map <char *, value *,
>>      simple_hashmap_traits<nofree_string_hash, value *> > map_t;
>>    map_t m_map;
>> +
>> +  /* Keep track of order in which keys were inserted.  */
>> +  auto_vec <const char *> m_keys;
>>  };
>>  
>>  /* Subclass of value for arrays.  */
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
>> b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
>> index af57eb636d5..6bab30e3e6c 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
>> @@ -4,8 +4,7 @@
>>  #error message
>>  
>>  /* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +   the innermost values, and working outwards.  */
>>  
>>  /* { dg-regexp "\"kind\": \"error\"" } */
>>  /* { dg-regexp "\"column-origin\": 1" } */
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
>> b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
>> index edb802efb8d..3c12103c9f8 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
>> @@ -4,8 +4,7 @@
>>  #warning message
>>  
>>  /* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +   the innermost values, and working outwards.  */
>>  
>>  /* { dg-regexp "\"kind\": \"warning\"" } */
>>  /* { dg-regexp "\"column-origin\": 1" } */
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
>> b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
>> index bb7b8dc5d16..11d74624ff1 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
>> @@ -4,8 +4,7 @@
>>  #warning message
>>  
>>  /* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +   the innermost values, and working outwards.  */
>>  
>>  /* { dg-regexp "\"kind\": \"error\"" } */
>>  /* { dg-regexp "\"column-origin\": 1" } */
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
>> b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
>> index 8ac90723cbd..cec1cf924b4 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
>> @@ -10,8 +10,7 @@ int test (void)
>>  }
>>  
>>  /* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +   the innermost values, and working outwards.  */
>>  
>>  /* Verify nested diagnostics.  */
>>  
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
>> b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
>> index 8d2eb0c5089..86f8c5fb374 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
>> @@ -8,49 +8,61 @@ int test (struct s *ptr)
>>    return ptr->colour;
>>  }
>>  
>> -/* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +/* Verify fix-it hints.
>>  
>> -/* { dg-regexp "\"kind\": \"error\"" } */
>> -/* { dg-regexp "\"column-origin\": 1" } */
>> -/* { dg-regexp "\"escape-source\": false" } */
>> -/* { dg-regexp "\"message\": \".*\"" } */
>> +   Use dg-regexp to consume the JSON output from start to
>> +   finish, relying on the ordering of the keys.
>> +   The following uses indentation to visualize the structure
>> +   of the JSON (although the actual output is all on one line).
>>  
>> -/* Verify fix-it hints.  */
>> -
>> -/* { dg-regexp "\"string\": \"color\"" } */
>> -
>> -/* { dg-regexp "\"start\": \{" } */
>> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
>> 5.c\"" } */
>> -/* { dg-regexp "\"line\": 8" } */
>> -/* { dg-regexp "\"column\": 15" } */
>> -/* { dg-regexp "\"display-column\": 15" } */
>> -/* { dg-regexp "\"byte-column\": 15" } */
>> -
>> -/* { dg-regexp "\"next\": \{" } */
>> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
>> 5.c\"" } */
>> -/* { dg-regexp "\"line\": 8" } */
>> -/* { dg-regexp "\"column\": 21" } */
>> -/* { dg-regexp "\"display-column\": 21" } */
>> -/* { dg-regexp "\"byte-column\": 21" } */
>> -
>> -/* { dg-regexp "\"fixits\": \[\[\{\}, \]*\]" } */
>> -
>> -/* { dg-regexp "\"caret\": \{" } */
>> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
>> 5.c\"" } */
>> -/* { dg-regexp "\"line\": 8" } */
>> -/* { dg-regexp "\"column\": 15" } */
>> -/* { dg-regexp "\"display-column\": 15" } */
>> -/* { dg-regexp "\"byte-column\": 15" } */
>> -
>> -/* { dg-regexp "\"finish\": \{" } */
>> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
>> 5.c\"" } */
>> -/* { dg-regexp "\"line\": 8" } */
>> -/* { dg-regexp "\"column\": 20" } */
>> -/* { dg-regexp "\"display-column\": 20" } */
>> -/* { dg-regexp "\"byte-column\": 20" } */
>> -
>> -/* { dg-regexp "\"locations\": \[\[\{\}, \]*\]" } */
>> -/* { dg-regexp "\"children\": \[\[\]\[\]\]" } */
>> -/* { dg-regexp "\[\[\{\}, \]*\]" } */
>> +  { dg-regexp {\[} }
>> +    { dg-regexp {\{} }
>> +     { dg-regexp {"kind": "error"} }
>> +     { dg-regexp {, "message": "'struct s' has no member named
>> 'colour'; did you mean 'color'\?"} }
>> +     { dg-regexp {, "children": \[\]} }
>> +     { dg-regexp {, "column-origin": 1} }
>> +     { dg-regexp {, "locations": } }
>> +       { dg-regexp {\[} }
>> +         { dg-regexp {\{} }
>> +           { dg-regexp {"caret": } }
>> +             { dg-regexp {\{} }
>> +               { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
>> json-5.c"} }
>> +               { dg-regexp {, "line": 8} }
>> +               { dg-regexp {, "display-column": 15} }
>> +               { dg-regexp {, "byte-column": 15} }
>> +               { dg-regexp {, "column": 15} }
>> +             { dg-regexp {\}} }
>> +           { dg-regexp {, "finish": } }
>> +             { dg-regexp {\{} }
>> +               { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
>> json-5.c"} }
>> +               { dg-regexp {, "line": 8} }
>> +               { dg-regexp {, "display-column": 20} }
>> +               { dg-regexp {, "byte-column": 20} }
>> +               { dg-regexp {, "column": 20} }
>> +             { dg-regexp {\}} }
>> +           { dg-regexp {\}} }
>> +         { dg-regexp {\]} }
>> +       { dg-regexp {, "fixits": } }
>> +       { dg-regexp {\[} }
>> +         { dg-regexp {\{} }
>> +           { dg-regexp {"start": } }
>> +             { dg-regexp {\{} }
>> +               { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
>> json-5.c"} }
>> +               { dg-regexp {, "line": 8} }
>> +               { dg-regexp {, "display-column": 15} }
>> +               { dg-regexp {, "byte-column": 15} }
>> +               { dg-regexp {, "column": 15} }
>> +             { dg-regexp {\}} }
>> +           { dg-regexp {, "next": } }
>> +             { dg-regexp {\{} }
>> +               { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
>> json-5.c"} }
>> +               { dg-regexp {, "line": 8} }
>> +               { dg-regexp {, "display-column": 21} }
>> +               { dg-regexp {, "byte-column": 21} }
>> +               { dg-regexp {, "column": 21} }
>> +             { dg-regexp {\}} }
>> +           { dg-regexp {, "string": "color"} }
>> +         { dg-regexp {\}} }
>> +       { dg-regexp {\]} }
>> +     { dg-regexp {, "escape-source": false\}} }
>> +   { dg-regexp {\]} }  */
>> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-
>> stderr-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-
>> stderr-1.c
>> index 02f780bce10..bcfa92110f5 100644
>> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c
>> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c
>> @@ -6,8 +6,7 @@
>>  #error message
>>  
>>  /* Use dg-regexp to consume the JSON output starting with
>> -   the innermost values, and working outwards.
>> -   We can't rely on any ordering of the keys.  */
>> +   the innermost values, and working outwards.  */
>>  
>>  /* { dg-regexp "\"kind\": \"error\"" } */
>>  /* { dg-regexp "\"column-origin\": 1" } */
> 

      reply	other threads:[~2023-03-24 15:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-17 20:53 David Malcolm
2023-03-24 14:12 ` PING: " David Malcolm
2023-03-24 15:27   ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6552FF-73A2-46D0-ADA3-7CB273FC7E91@gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).