From: David Malcolm <dmalcolm@redhat.com>
To: gcc-patches@gcc.gnu.org
Subject: PING: Re: [PATCH] json: preserve key-insertion order [PR109163]
Date: Fri, 24 Mar 2023 10:12:37 -0400 [thread overview]
Message-ID: <464928109702cd42ea5e1de4b03bc0f4d736297b.camel@redhat.com> (raw)
In-Reply-To: <20230317205349.3635562-1-dmalcolm@redhat.com>
I'd like to ping the following patch for review:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html
Thanks
Dave
On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote:
> PR other/109163 notes that when we write out JSON files, we traverse
> the keys within each object via hash_map iteration, and thus the
> ordering is non-deterministic - it can arbitrarily vary from run to
> run and from different machines, making it harder for users to
> compare
> results and determine if anything has "really" changed.
>
> I'm running into this issue with SARIF output, but there are several
> places where we're currently emitting JSON:
>
> * -fsave-optimization-record emits SRCFILE.opt-record.json.gz
> "This option is experimental and the format of the data
> within
> the compressed JSON file is subject to change."; see
> optinfo-emit-json.{h,cc}, dumpfile.cc, etc
> * -fdiagnostics-format= with the various "sarif" and "json" options
> * -fdump-analyzer-json is a developer option in the analyzer
> * gcov has:
> "-j, --json-format: Output JSON intermediate format into
> .gcov.json.gz file"
>
> This patch adds an auto_vec to class json::object to preserve
> key-insertion order, and use it when writing out objects.
> Potentially
> this slightly slows down JSON output, but I believe that this isn't
> normally a bottleneck, and that the benefits to the user of
> deterministic output are worth it.
>
> I had first attempted to use ordered_hash_map.h for this, but ran
> into
> impenetrable template errors, so this patch uses a simpler approach
> of
> just adding an auto_vec to json::object.
>
> Testing showed a failure of diagnostic-format-json-5.c, which was
> using
> a convoluted set of regexps to consume the output; I believe that
> this
> was brittle, and was intermittently failing for some of the random
> orderings of output. I rewrote these regexps to work with the
> expected
> output order. The other such tests seem to pass with the
> now-deterministic orderings.
>
> Lightly tested with valgrind.
> I manually verified that the SARIF output is now deterministic.
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
> PR other/109163
> * json.cc: Update comments to indicate that we now preserve
> insertion order of keys within objects.
> (object::print): Traverse keys in insertion order.
> (object::set): Preserve insertion order of keys.
> (selftest::test_writing_objects): Add an additional key to
> verify
> that we preserve insertion order.
> * json.h (object::m_keys): New field.
>
> gcc/testsuite/ChangeLog:
> PR other/109163
> * c-c++-common/diagnostic-format-json-1.c: Update comment.
> * c-c++-common/diagnostic-format-json-2.c: Likewise.
> * c-c++-common/diagnostic-format-json-3.c: Likewise.
> * c-c++-common/diagnostic-format-json-4.c: Likewise.
> * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
> * c-c++-common/diagnostic-format-json-stderr-1.c: Update
> comment.
>
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
> ---
> gcc/json.cc | 40 ++++---
> gcc/json.h | 10 +-
> .../c-c++-common/diagnostic-format-json-1.c | 3 +-
> .../c-c++-common/diagnostic-format-json-2.c | 3 +-
> .../c-c++-common/diagnostic-format-json-3.c | 3 +-
> .../c-c++-common/diagnostic-format-json-4.c | 3 +-
> .../c-c++-common/diagnostic-format-json-5.c | 100 ++++++++++------
> --
> .../diagnostic-format-json-stderr-1.c | 3 +-
> 8 files changed, 95 insertions(+), 70 deletions(-)
>
> diff --git a/gcc/json.cc b/gcc/json.cc
> index 01879c9c466..741e97b20e5 100644
> --- a/gcc/json.cc
> +++ b/gcc/json.cc
> @@ -31,8 +31,11 @@ using namespace json;
> /* class json::value. */
>
> /* Dump this json::value tree to OUTF.
> - No formatting is done. There are no guarantees about the order
> - in which the key/value pairs of json::objects are printed. */
> +
> + No formatting is done.
> +
> + The key/value pairs of json::objects are printed in the order
> + in which the keys were originally inserted. */
>
> void
> value::dump (FILE *outf) const
> @@ -44,7 +47,7 @@ value::dump (FILE *outf) const
> }
>
> /* class json::object, a subclass of json::value, representing
> - an unordered collection of key/value pairs. */
> + an ordered collection of key/value pairs. */
>
> /* json:object's dtor. */
>
> @@ -62,14 +65,17 @@ object::~object ()
> void
> object::print (pretty_printer *pp) const
> {
> - /* Note that the order is not guaranteed. */
> pp_character (pp, '{');
> - for (map_t::iterator it = m_map.begin (); it != m_map.end ();
> ++it)
> +
> + /* Iterate in the order that the keys were inserted. */
> + unsigned i;
> + const char *key;
> + FOR_EACH_VEC_ELT (m_keys, i, key)
> {
> - if (it != m_map.begin ())
> + if (i > 0)
> pp_string (pp, ", ");
> - const char *key = const_cast <char *>((*it).first);
> - value *value = (*it).second;
> + map_t &mut_map = const_cast<map_t &> (m_map);
> + value *value = *mut_map.get (key);
> pp_doublequote (pp);
> pp_string (pp, key); // FIXME: escaping?
> pp_doublequote (pp);
> @@ -97,9 +103,13 @@ object::set (const char *key, value *v)
> *ptr = v;
> }
> else
> - /* If the key wasn't already present, take a copy of the key,
> - and store the value. */
> - m_map.put (xstrdup (key), v);
> + {
> + /* If the key wasn't already present, take a copy of the key,
> + and store the value. */
> + char *owned_key = xstrdup (key);
> + m_map.put (owned_key, v);
> + m_keys.safe_push (owned_key);
> + }
> }
>
> /* Get the json::value * for KEY.
> @@ -295,15 +305,17 @@ test_object_get ()
> ASSERT_EQ (obj.get ("not-present"), NULL);
> }
>
> -/* Verify that JSON objects are written correctly. We can't test
> more than
> - one key/value pair, as we don't impose a guaranteed ordering. */
> +/* Verify that JSON objects are written correctly. */
>
> static void
> test_writing_objects ()
> {
> object obj;
> obj.set ("foo", new json::string ("bar"));
> - assert_print_eq (obj, "{\"foo\": \"bar\"}");
> + obj.set ("baz", new json::string ("quux"));
> + /* This test relies on json::object writing out key/value pairs
> + in key-insertion order. */
> + assert_print_eq (obj, "{\"foo\": \"bar\", \"baz\": \"quux\"}");
> }
>
> /* Verify that JSON arrays are written correctly. */
> diff --git a/gcc/json.h b/gcc/json.h
> index aa52ba2951a..057119db277 100644
> --- a/gcc/json.h
> +++ b/gcc/json.h
> @@ -82,8 +82,11 @@ class value
> void dump (FILE *) const;
> };
>
> -/* Subclass of value for objects: an unordered collection of
> - key/value pairs. */
> +/* Subclass of value for objects: a collection of key/value pairs
> + preserving the ordering in which keys were inserted.
> +
> + Preserving the order eliminates non-determinism in the output,
> + making it easier for the user to compare repeated invocations.
> */
>
> class object : public value
> {
> @@ -100,6 +103,9 @@ class object : public value
> typedef hash_map <char *, value *,
> simple_hashmap_traits<nofree_string_hash, value *> > map_t;
> map_t m_map;
> +
> + /* Keep track of order in which keys were inserted. */
> + auto_vec <const char *> m_keys;
> };
>
> /* Subclass of value for arrays. */
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
> b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
> index af57eb636d5..6bab30e3e6c 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c
> @@ -4,8 +4,7 @@
> #error message
>
> /* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> + the innermost values, and working outwards. */
>
> /* { dg-regexp "\"kind\": \"error\"" } */
> /* { dg-regexp "\"column-origin\": 1" } */
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
> b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
> index edb802efb8d..3c12103c9f8 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c
> @@ -4,8 +4,7 @@
> #warning message
>
> /* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> + the innermost values, and working outwards. */
>
> /* { dg-regexp "\"kind\": \"warning\"" } */
> /* { dg-regexp "\"column-origin\": 1" } */
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
> b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
> index bb7b8dc5d16..11d74624ff1 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c
> @@ -4,8 +4,7 @@
> #warning message
>
> /* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> + the innermost values, and working outwards. */
>
> /* { dg-regexp "\"kind\": \"error\"" } */
> /* { dg-regexp "\"column-origin\": 1" } */
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
> b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
> index 8ac90723cbd..cec1cf924b4 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c
> @@ -10,8 +10,7 @@ int test (void)
> }
>
> /* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> + the innermost values, and working outwards. */
>
> /* Verify nested diagnostics. */
>
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
> b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
> index 8d2eb0c5089..86f8c5fb374 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c
> @@ -8,49 +8,61 @@ int test (struct s *ptr)
> return ptr->colour;
> }
>
> -/* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> +/* Verify fix-it hints.
>
> -/* { dg-regexp "\"kind\": \"error\"" } */
> -/* { dg-regexp "\"column-origin\": 1" } */
> -/* { dg-regexp "\"escape-source\": false" } */
> -/* { dg-regexp "\"message\": \".*\"" } */
> + Use dg-regexp to consume the JSON output from start to
> + finish, relying on the ordering of the keys.
> + The following uses indentation to visualize the structure
> + of the JSON (although the actual output is all on one line).
>
> -/* Verify fix-it hints. */
> -
> -/* { dg-regexp "\"string\": \"color\"" } */
> -
> -/* { dg-regexp "\"start\": \{" } */
> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
> 5.c\"" } */
> -/* { dg-regexp "\"line\": 8" } */
> -/* { dg-regexp "\"column\": 15" } */
> -/* { dg-regexp "\"display-column\": 15" } */
> -/* { dg-regexp "\"byte-column\": 15" } */
> -
> -/* { dg-regexp "\"next\": \{" } */
> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
> 5.c\"" } */
> -/* { dg-regexp "\"line\": 8" } */
> -/* { dg-regexp "\"column\": 21" } */
> -/* { dg-regexp "\"display-column\": 21" } */
> -/* { dg-regexp "\"byte-column\": 21" } */
> -
> -/* { dg-regexp "\"fixits\": \[\[\{\}, \]*\]" } */
> -
> -/* { dg-regexp "\"caret\": \{" } */
> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
> 5.c\"" } */
> -/* { dg-regexp "\"line\": 8" } */
> -/* { dg-regexp "\"column\": 15" } */
> -/* { dg-regexp "\"display-column\": 15" } */
> -/* { dg-regexp "\"byte-column\": 15" } */
> -
> -/* { dg-regexp "\"finish\": \{" } */
> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json-
> 5.c\"" } */
> -/* { dg-regexp "\"line\": 8" } */
> -/* { dg-regexp "\"column\": 20" } */
> -/* { dg-regexp "\"display-column\": 20" } */
> -/* { dg-regexp "\"byte-column\": 20" } */
> -
> -/* { dg-regexp "\"locations\": \[\[\{\}, \]*\]" } */
> -/* { dg-regexp "\"children\": \[\[\]\[\]\]" } */
> -/* { dg-regexp "\[\[\{\}, \]*\]" } */
> + { dg-regexp {\[} }
> + { dg-regexp {\{} }
> + { dg-regexp {"kind": "error"} }
> + { dg-regexp {, "message": "'struct s' has no member named
> 'colour'; did you mean 'color'\?"} }
> + { dg-regexp {, "children": \[\]} }
> + { dg-regexp {, "column-origin": 1} }
> + { dg-regexp {, "locations": } }
> + { dg-regexp {\[} }
> + { dg-regexp {\{} }
> + { dg-regexp {"caret": } }
> + { dg-regexp {\{} }
> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
> json-5.c"} }
> + { dg-regexp {, "line": 8} }
> + { dg-regexp {, "display-column": 15} }
> + { dg-regexp {, "byte-column": 15} }
> + { dg-regexp {, "column": 15} }
> + { dg-regexp {\}} }
> + { dg-regexp {, "finish": } }
> + { dg-regexp {\{} }
> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
> json-5.c"} }
> + { dg-regexp {, "line": 8} }
> + { dg-regexp {, "display-column": 20} }
> + { dg-regexp {, "byte-column": 20} }
> + { dg-regexp {, "column": 20} }
> + { dg-regexp {\}} }
> + { dg-regexp {\}} }
> + { dg-regexp {\]} }
> + { dg-regexp {, "fixits": } }
> + { dg-regexp {\[} }
> + { dg-regexp {\{} }
> + { dg-regexp {"start": } }
> + { dg-regexp {\{} }
> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
> json-5.c"} }
> + { dg-regexp {, "line": 8} }
> + { dg-regexp {, "display-column": 15} }
> + { dg-regexp {, "byte-column": 15} }
> + { dg-regexp {, "column": 15} }
> + { dg-regexp {\}} }
> + { dg-regexp {, "next": } }
> + { dg-regexp {\{} }
> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format-
> json-5.c"} }
> + { dg-regexp {, "line": 8} }
> + { dg-regexp {, "display-column": 21} }
> + { dg-regexp {, "byte-column": 21} }
> + { dg-regexp {, "column": 21} }
> + { dg-regexp {\}} }
> + { dg-regexp {, "string": "color"} }
> + { dg-regexp {\}} }
> + { dg-regexp {\]} }
> + { dg-regexp {, "escape-source": false\}} }
> + { dg-regexp {\]} } */
> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-
> stderr-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json-
> stderr-1.c
> index 02f780bce10..bcfa92110f5 100644
> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c
> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c
> @@ -6,8 +6,7 @@
> #error message
>
> /* Use dg-regexp to consume the JSON output starting with
> - the innermost values, and working outwards.
> - We can't rely on any ordering of the keys. */
> + the innermost values, and working outwards. */
>
> /* { dg-regexp "\"kind\": \"error\"" } */
> /* { dg-regexp "\"column-origin\": 1" } */
next prev parent reply other threads:[~2023-03-24 14:43 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-17 20:53 David Malcolm
2023-03-24 14:12 ` David Malcolm [this message]
2023-03-24 15:27 ` PING: " Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=464928109702cd42ea5e1de4b03bc0f4d736297b.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).