From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 1D50A3882174 for ; Fri, 24 Mar 2023 15:53:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1D50A3882174 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x236.google.com with SMTP id q14so2196909ljm.11 for ; Fri, 24 Mar 2023 08:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679673205; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=HlvYQx5nGXaQIAb4whaCXgivvlDPRR+jtK9Puyzr8IM=; b=S9C3fA0efWbs0BAbyhZqzqbynTiG6kaM4iatZkdRZBKpVsCvQta4XbDzGEOaJe5FZ8 hgk16nHtVmKMwZtMNkfM07Bz+HecP9cxdjL5i50CrKbzhOawtenkstK6hJBisvO1j2jN M3zTu0CGA/Q6yrV9ai4YcrxgcnPlo0PHbvBY3UK1zQunuJvK+hdWf1biRk280ZPWWDpx GfEJUodv7K8fYYjVHtKz8sMOO2E/O/R4YRJgH1uD5UIbcW+++UeAxL1502DUuMCdqPHP QlzE+G3ASRdI0XYcjHTKaAcyOrqHfPA70gKfXnAyXLEtOr+9zrPeCsjwhTRR73O47LKS 5v1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679673205; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HlvYQx5nGXaQIAb4whaCXgivvlDPRR+jtK9Puyzr8IM=; b=Oz2CXU2orthvhwIw6E3BpiJNhCfuE0cag/ODuz4/z5Ja1Y2N+Icg93AGdxvaqesslp h+tJOWZmhnuKfE70MDUklye/G//4VY49E6PY79Zwdz5jIaehrRi7sGd7JrnZxFep8ONE P9ULgmLZWPjwRrmISDkfwamwbkJXfXo2EYtJEzyNxzdniUw4uE0T2Sd96CB2IhMgv9Zq 1R4CAPST/oWaO6S6DlDZc1P2/y2wSbAqFlG5RghWvhiL6feCjSRwyEXs0ciQtoU3srKi A0To+3EQudFyvVZfQX/xFvnKmaIM3g3Zp7ybLwPAfH92De0oJh00LSq9XE8H7xkzridC Y/0A== X-Gm-Message-State: AAQBX9czbQLNwofBp0Z3vyYP/iYf1sWxZv6yVEEE42l7WuYiv1aId42R Y8eftvYKnPoAm19LF2UKyirqyjCChRk= X-Google-Smtp-Source: AKy350YsNhzXpaguALCfSFhW5oZnGcpjqewl6NnppTSUnFK2vMMl17vDdCgyEbZU2kDubx15gcs9SA== X-Received: by 2002:aa7:c78c:0:b0:4fc:3cd1:37c8 with SMTP id n12-20020aa7c78c000000b004fc3cd137c8mr3052846eds.22.1679671683335; Fri, 24 Mar 2023 08:28:03 -0700 (PDT) Received: from smtpclient.apple ([2a02:3037:401:660a:a0d3:92a4:4619:7d51]) by smtp.gmail.com with ESMTPSA id h5-20020a50c385000000b004f9e6495f94sm10992884edf.50.2023.03.24.08.28.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 24 Mar 2023 08:28:02 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: PING: Re: [PATCH] json: preserve key-insertion order [PR109163] Date: Fri, 24 Mar 2023 16:27:51 +0100 Message-Id: <4B6552FF-73A2-46D0-ADA3-7CB273FC7E91@gmail.com> References: <464928109702cd42ea5e1de4b03bc0f4d736297b.camel@redhat.com> Cc: gcc-patches@gcc.gnu.org In-Reply-To: <464928109702cd42ea5e1de4b03bc0f4d736297b.camel@redhat.com> To: David Malcolm X-Mailer: iPhone Mail (20D67) X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 24.03.2023 um 15:44 schrieb David Malcolm via Gcc-patches : >=20 > =EF=BB=BFI'd like to ping the following patch for review: > https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html Ok. Richard=20 > Thanks > Dave >=20 >> On Fri, 2023-03-17 at 16:53 -0400, David Malcolm wrote: >> PR other/109163 notes that when we write out JSON files, we traverse >> the keys within each object via hash_map iteration, and thus the >> ordering is non-deterministic - it can arbitrarily vary from run to >> run and from different machines, making it harder for users to >> compare >> results and determine if anything has "really" changed. >>=20 >> I'm running into this issue with SARIF output, but there are several >> places where we're currently emitting JSON: >>=20 >> * -fsave-optimization-record emits SRCFILE.opt-record.json.gz >> "This option is experimental and the format of the data >> within >> the compressed JSON file is subject to change."; see >> optinfo-emit-json.{h,cc}, dumpfile.cc, etc >> * -fdiagnostics-format=3D with the various "sarif" and "json" options >> * -fdump-analyzer-json is a developer option in the analyzer >> * gcov has: >> "-j, --json-format: Output JSON intermediate format into >> .gcov.json.gz file" >>=20 >> This patch adds an auto_vec to class json::object to preserve >> key-insertion order, and use it when writing out objects.=20 >> Potentially >> this slightly slows down JSON output, but I believe that this isn't >> normally a bottleneck, and that the benefits to the user of >> deterministic output are worth it. >>=20 >> I had first attempted to use ordered_hash_map.h for this, but ran >> into >> impenetrable template errors, so this patch uses a simpler approach >> of >> just adding an auto_vec to json::object. >>=20 >> Testing showed a failure of diagnostic-format-json-5.c, which was >> using >> a convoluted set of regexps to consume the output; I believe that >> this >> was brittle, and was intermittently failing for some of the random >> orderings of output. I rewrote these regexps to work with the >> expected >> output order. The other such tests seem to pass with the >> now-deterministic orderings. >>=20 >> Lightly tested with valgrind. >> I manually verified that the SARIF output is now deterministic. >> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. >>=20 >> OK for trunk? >>=20 >> gcc/ChangeLog: >> PR other/109163 >> * json.cc: Update comments to indicate that we now preserve >> insertion order of keys within objects. >> (object::print): Traverse keys in insertion order. >> (object::set): Preserve insertion order of keys. >> (selftest::test_writing_objects): Add an additional key to >> verify >> that we preserve insertion order. >> * json.h (object::m_keys): New field. >>=20 >> gcc/testsuite/ChangeLog: >> PR other/109163 >> * c-c++-common/diagnostic-format-json-1.c: Update comment. >> * c-c++-common/diagnostic-format-json-2.c: Likewise. >> * c-c++-common/diagnostic-format-json-3.c: Likewise. >> * c-c++-common/diagnostic-format-json-4.c: Likewise. >> * c-c++-common/diagnostic-format-json-5.c: Rewrite regexps. >> * c-c++-common/diagnostic-format-json-stderr-1.c: Update >> comment. >>=20 >> Signed-off-by: David Malcolm >> --- >> gcc/json.cc | 40 ++++--- >> gcc/json.h | 10 +- >> .../c-c++-common/diagnostic-format-json-1.c | 3 +- >> .../c-c++-common/diagnostic-format-json-2.c | 3 +- >> .../c-c++-common/diagnostic-format-json-3.c | 3 +- >> .../c-c++-common/diagnostic-format-json-4.c | 3 +- >> .../c-c++-common/diagnostic-format-json-5.c | 100 ++++++++++------ >> -- >> .../diagnostic-format-json-stderr-1.c | 3 +- >> 8 files changed, 95 insertions(+), 70 deletions(-) >>=20 >> diff --git a/gcc/json.cc b/gcc/json.cc >> index 01879c9c466..741e97b20e5 100644 >> --- a/gcc/json.cc >> +++ b/gcc/json.cc >> @@ -31,8 +31,11 @@ using namespace json; >> /* class json::value. */ >> =20 >> /* Dump this json::value tree to OUTF. >> - No formatting is done. There are no guarantees about the order >> - in which the key/value pairs of json::objects are printed. */ >> + >> + No formatting is done. >> + >> + The key/value pairs of json::objects are printed in the order >> + in which the keys were originally inserted. */ >> =20 >> void >> value::dump (FILE *outf) const >> @@ -44,7 +47,7 @@ value::dump (FILE *outf) const >> } >> =20 >> /* class json::object, a subclass of json::value, representing >> - an unordered collection of key/value pairs. */ >> + an ordered collection of key/value pairs. */ >> =20 >> /* json:object's dtor. */ >> =20 >> @@ -62,14 +65,17 @@ object::~object () >> void >> object::print (pretty_printer *pp) const >> { >> - /* Note that the order is not guaranteed. */ >> pp_character (pp, '{'); >> - for (map_t::iterator it =3D m_map.begin (); it !=3D m_map.end (); >> ++it) >> + >> + /* Iterate in the order that the keys were inserted. */ >> + unsigned i; >> + const char *key; >> + FOR_EACH_VEC_ELT (m_keys, i, key) >> { >> - if (it !=3D m_map.begin ()) >> + if (i > 0) >> pp_string (pp, ", "); >> - const char *key =3D const_cast ((*it).first); >> - value *value =3D (*it).second; >> + map_t &mut_map =3D const_cast (m_map); >> + value *value =3D *mut_map.get (key); >> pp_doublequote (pp); >> pp_string (pp, key); // FIXME: escaping? >> pp_doublequote (pp); >> @@ -97,9 +103,13 @@ object::set (const char *key, value *v) >> *ptr =3D v; >> } >> else >> - /* If the key wasn't already present, take a copy of the key, >> - and store the value. */ >> - m_map.put (xstrdup (key), v); >> + { >> + /* If the key wasn't already present, take a copy of the key, >> + and store the value. */ >> + char *owned_key =3D xstrdup (key); >> + m_map.put (owned_key, v); >> + m_keys.safe_push (owned_key); >> + } >> } >> =20 >> /* Get the json::value * for KEY. >> @@ -295,15 +305,17 @@ test_object_get () >> ASSERT_EQ (obj.get ("not-present"), NULL); >> } >> =20 >> -/* Verify that JSON objects are written correctly. We can't test >> more than >> - one key/value pair, as we don't impose a guaranteed ordering. */ >> +/* Verify that JSON objects are written correctly. */ >> =20 >> static void >> test_writing_objects () >> { >> object obj; >> obj.set ("foo", new json::string ("bar")); >> - assert_print_eq (obj, "{\"foo\": \"bar\"}"); >> + obj.set ("baz", new json::string ("quux")); >> + /* This test relies on json::object writing out key/value pairs >> + in key-insertion order. */ >> + assert_print_eq (obj, "{\"foo\": \"bar\", \"baz\": \"quux\"}"); >> } >> =20 >> /* Verify that JSON arrays are written correctly. */ >> diff --git a/gcc/json.h b/gcc/json.h >> index aa52ba2951a..057119db277 100644 >> --- a/gcc/json.h >> +++ b/gcc/json.h >> @@ -82,8 +82,11 @@ class value >> void dump (FILE *) const; >> }; >> =20 >> -/* Subclass of value for objects: an unordered collection of >> - key/value pairs. */ >> +/* Subclass of value for objects: a collection of key/value pairs >> + preserving the ordering in which keys were inserted. >> + >> + Preserving the order eliminates non-determinism in the output, >> + making it easier for the user to compare repeated invocations.=20 >> */ >> =20 >> class object : public value >> { >> @@ -100,6 +103,9 @@ class object : public value >> typedef hash_map > simple_hashmap_traits > map_t; >> map_t m_map; >> + >> + /* Keep track of order in which keys were inserted. */ >> + auto_vec m_keys; >> }; >> =20 >> /* Subclass of value for arrays. */ >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c >> b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c >> index af57eb636d5..6bab30e3e6c 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-1.c >> @@ -4,8 +4,7 @@ >> #error message >> =20 >> /* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> + the innermost values, and working outwards. */ >> =20 >> /* { dg-regexp "\"kind\": \"error\"" } */ >> /* { dg-regexp "\"column-origin\": 1" } */ >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c >> b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c >> index edb802efb8d..3c12103c9f8 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-2.c >> @@ -4,8 +4,7 @@ >> #warning message >> =20 >> /* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> + the innermost values, and working outwards. */ >> =20 >> /* { dg-regexp "\"kind\": \"warning\"" } */ >> /* { dg-regexp "\"column-origin\": 1" } */ >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c >> b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c >> index bb7b8dc5d16..11d74624ff1 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-3.c >> @@ -4,8 +4,7 @@ >> #warning message >> =20 >> /* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> + the innermost values, and working outwards. */ >> =20 >> /* { dg-regexp "\"kind\": \"error\"" } */ >> /* { dg-regexp "\"column-origin\": 1" } */ >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c >> b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c >> index 8ac90723cbd..cec1cf924b4 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-4.c >> @@ -10,8 +10,7 @@ int test (void) >> } >> =20 >> /* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> + the innermost values, and working outwards. */ >> =20 >> /* Verify nested diagnostics. */ >> =20 >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c >> b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c >> index 8d2eb0c5089..86f8c5fb374 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-5.c >> @@ -8,49 +8,61 @@ int test (struct s *ptr) >> return ptr->colour; >> } >> =20 >> -/* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> +/* Verify fix-it hints. >> =20 >> -/* { dg-regexp "\"kind\": \"error\"" } */ >> -/* { dg-regexp "\"column-origin\": 1" } */ >> -/* { dg-regexp "\"escape-source\": false" } */ >> -/* { dg-regexp "\"message\": \".*\"" } */ >> + Use dg-regexp to consume the JSON output from start to >> + finish, relying on the ordering of the keys. >> + The following uses indentation to visualize the structure >> + of the JSON (although the actual output is all on one line). >> =20 >> -/* Verify fix-it hints. */ >> - >> -/* { dg-regexp "\"string\": \"color\"" } */ >> - >> -/* { dg-regexp "\"start\": \{" } */ >> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- >> 5.c\"" } */ >> -/* { dg-regexp "\"line\": 8" } */ >> -/* { dg-regexp "\"column\": 15" } */ >> -/* { dg-regexp "\"display-column\": 15" } */ >> -/* { dg-regexp "\"byte-column\": 15" } */ >> - >> -/* { dg-regexp "\"next\": \{" } */ >> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- >> 5.c\"" } */ >> -/* { dg-regexp "\"line\": 8" } */ >> -/* { dg-regexp "\"column\": 21" } */ >> -/* { dg-regexp "\"display-column\": 21" } */ >> -/* { dg-regexp "\"byte-column\": 21" } */ >> - >> -/* { dg-regexp "\"fixits\": \[\[\{\}, \]*\]" } */ >> - >> -/* { dg-regexp "\"caret\": \{" } */ >> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- >> 5.c\"" } */ >> -/* { dg-regexp "\"line\": 8" } */ >> -/* { dg-regexp "\"column\": 15" } */ >> -/* { dg-regexp "\"display-column\": 15" } */ >> -/* { dg-regexp "\"byte-column\": 15" } */ >> - >> -/* { dg-regexp "\"finish\": \{" } */ >> -/* { dg-regexp "\"file\": \"\[^\n\r\"\]*diagnostic-format-json- >> 5.c\"" } */ >> -/* { dg-regexp "\"line\": 8" } */ >> -/* { dg-regexp "\"column\": 20" } */ >> -/* { dg-regexp "\"display-column\": 20" } */ >> -/* { dg-regexp "\"byte-column\": 20" } */ >> - >> -/* { dg-regexp "\"locations\": \[\[\{\}, \]*\]" } */ >> -/* { dg-regexp "\"children\": \[\[\]\[\]\]" } */ >> -/* { dg-regexp "\[\[\{\}, \]*\]" } */ >> + { dg-regexp {\[} } >> + { dg-regexp {\{} } >> + { dg-regexp {"kind": "error"} } >> + { dg-regexp {, "message": "'struct s' has no member named >> 'colour'; did you mean 'color'\?"} } >> + { dg-regexp {, "children": \[\]} } >> + { dg-regexp {, "column-origin": 1} } >> + { dg-regexp {, "locations": } } >> + { dg-regexp {\[} } >> + { dg-regexp {\{} } >> + { dg-regexp {"caret": } } >> + { dg-regexp {\{} } >> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- >> json-5.c"} } >> + { dg-regexp {, "line": 8} } >> + { dg-regexp {, "display-column": 15} } >> + { dg-regexp {, "byte-column": 15} } >> + { dg-regexp {, "column": 15} } >> + { dg-regexp {\}} } >> + { dg-regexp {, "finish": } } >> + { dg-regexp {\{} } >> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- >> json-5.c"} } >> + { dg-regexp {, "line": 8} } >> + { dg-regexp {, "display-column": 20} } >> + { dg-regexp {, "byte-column": 20} } >> + { dg-regexp {, "column": 20} } >> + { dg-regexp {\}} } >> + { dg-regexp {\}} } >> + { dg-regexp {\]} } >> + { dg-regexp {, "fixits": } } >> + { dg-regexp {\[} } >> + { dg-regexp {\{} } >> + { dg-regexp {"start": } } >> + { dg-regexp {\{} } >> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- >> json-5.c"} } >> + { dg-regexp {, "line": 8} } >> + { dg-regexp {, "display-column": 15} } >> + { dg-regexp {, "byte-column": 15} } >> + { dg-regexp {, "column": 15} } >> + { dg-regexp {\}} } >> + { dg-regexp {, "next": } } >> + { dg-regexp {\{} } >> + { dg-regexp {"file": "[^\n\r"]*diagnostic-format- >> json-5.c"} } >> + { dg-regexp {, "line": 8} } >> + { dg-regexp {, "display-column": 21} } >> + { dg-regexp {, "byte-column": 21} } >> + { dg-regexp {, "column": 21} } >> + { dg-regexp {\}} } >> + { dg-regexp {, "string": "color"} } >> + { dg-regexp {\}} } >> + { dg-regexp {\]} } >> + { dg-regexp {, "escape-source": false\}} } >> + { dg-regexp {\]} } */ >> diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-json- >> stderr-1.c b/gcc/testsuite/c-c++-common/diagnostic-format-json- >> stderr-1.c >> index 02f780bce10..bcfa92110f5 100644 >> --- a/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c >> +++ b/gcc/testsuite/c-c++-common/diagnostic-format-json-stderr-1.c >> @@ -6,8 +6,7 @@ >> #error message >> =20 >> /* Use dg-regexp to consume the JSON output starting with >> - the innermost values, and working outwards. >> - We can't rely on any ordering of the keys. */ >> + the innermost values, and working outwards. */ >> =20 >> /* { dg-regexp "\"kind\": \"error\"" } */ >> /* { dg-regexp "\"column-origin\": 1" } */ >=20