public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic
@ 2023-03-16 22:05 dmalcolm at gcc dot gnu.org
2023-03-16 22:10 ` [Bug other/109163] " dmalcolm at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 22:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
Bug ID: 109163
Summary: SARIF (and other JSON) output files are
non-deterministic
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: dmalcolm at gcc dot gnu.org
Target Milestone: ---
gcc/json.cc's json::object uses a hash_map for tracking the key/value pairs,
and object::print iterates through them in arbitrary order, so every time we
emit json files they can potentially vary, which makes it much harder to
compare them from run to run (see e.g. PR 105959).
It would probably be much more user-friendly to use an ordered_hash_map here to
preserve insertion order and thus have deterministic output. I don't know if
this would have a noticeable performance hit.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug other/109163] SARIF (and other JSON) output files are non-deterministic
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
@ 2023-03-16 22:10 ` dmalcolm at gcc dot gnu.org
2023-03-16 22:49 ` dmalcolm at gcc dot gnu.org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 22:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
--- Comment #1 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
This would also help with one of the requests from a SARIF expert's review of
GCC's output:
https://github.com/oasis-tcs/sarif-spec/issues/531#issuecomment-1181191100
which is that the "version" property should occur first in the file.
It also might make testcases easier to write.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug other/109163] SARIF (and other JSON) output files are non-deterministic
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
2023-03-16 22:10 ` [Bug other/109163] " dmalcolm at gcc dot gnu.org
@ 2023-03-16 22:49 ` dmalcolm at gcc dot gnu.org
2023-03-17 20:57 ` dmalcolm at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-16 22:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
David Malcolm <dmalcolm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2023-03-16
Assignee|unassigned at gcc dot gnu.org |dmalcolm at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #2 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
I'm working on a fix for this.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug other/109163] SARIF (and other JSON) output files are non-deterministic
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
2023-03-16 22:10 ` [Bug other/109163] " dmalcolm at gcc dot gnu.org
2023-03-16 22:49 ` dmalcolm at gcc dot gnu.org
@ 2023-03-17 20:57 ` dmalcolm at gcc dot gnu.org
2023-03-24 15:41 ` cvs-commit at gcc dot gnu.org
2023-03-31 13:03 ` dmalcolm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-17 20:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
David Malcolm <dmalcolm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
URL| |https://gcc.gnu.org/piperma
| |il/gcc-patches/2023-March/6
| |14165.html
Keywords| |patch
Status|ASSIGNED |WAITING
--- Comment #3 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Patch posted for review:
[PATCH] json: preserve key-insertion order [PR109163]
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614165.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug other/109163] SARIF (and other JSON) output files are non-deterministic
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
` (2 preceding siblings ...)
2023-03-17 20:57 ` dmalcolm at gcc dot gnu.org
@ 2023-03-24 15:41 ` cvs-commit at gcc dot gnu.org
2023-03-31 13:03 ` dmalcolm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-03-24 15:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by David Malcolm <dmalcolm@gcc.gnu.org>:
https://gcc.gnu.org/g:7f1e15f743357e037d7c4f6f6000863c26f3dfc3
commit r13-6851-g7f1e15f743357e037d7c4f6f6000863c26f3dfc3
Author: David Malcolm <dmalcolm@redhat.com>
Date: Fri Mar 24 11:38:14 2023 -0400
json: preserve key-insertion order [PR109163]
PR other/109163 notes that when we write out JSON files, we traverse
the keys within each object via hash_map iteration, and thus the
ordering is non-deterministic - it can arbitrarily vary from run to
run and from different machines, making it harder for users to compare
results and determine if anything has "really" changed.
I'm running into this issue with SARIF output, but there are several
places where we're currently emitting JSON:
* -fsave-optimization-record emits SRCFILE.opt-record.json.gz
"This option is experimental and the format of the data within
the compressed JSON file is subject to change."; see
optinfo-emit-json.{h,cc}, dumpfile.cc, etc
* -fdiagnostics-format= with the various "sarif" and "json" options
* -fdump-analyzer-json is a developer option in the analyzer
* gcov has:
"-j, --json-format: Output JSON intermediate format into
.gcov.json.gz file"
This patch adds an auto_vec to class json::object to preserve
key-insertion order, and use it when writing out objects. Potentially
this slightly slows down JSON output, but I believe that this isn't
normally a bottleneck, and that the benefits to the user of
deterministic output are worth it.
I had first attempted to use ordered_hash_map.h for this, but ran into
impenetrable template errors, so this patch uses a simpler approach of
just adding an auto_vec to json::object.
Testing showed a failure of diagnostic-format-json-5.c, which was using
a convoluted set of regexps to consume the output; I believe that this
was brittle, and was intermittently failing for some of the random
orderings of output. I rewrote these regexps to work with the expected
output order. The other such tests seem to pass with the
now-deterministic orderings.
gcc/ChangeLog:
PR other/109163
* json.cc: Update comments to indicate that we now preserve
insertion order of keys within objects.
(object::print): Traverse keys in insertion order.
(object::set): Preserve insertion order of keys.
(selftest::test_writing_objects): Add an additional key to verify
that we preserve insertion order.
* json.h (object::m_keys): New field.
gcc/testsuite/ChangeLog:
PR other/109163
* c-c++-common/diagnostic-format-json-1.c: Update comment.
* c-c++-common/diagnostic-format-json-2.c: Likewise.
* c-c++-common/diagnostic-format-json-3.c: Likewise.
* c-c++-common/diagnostic-format-json-4.c: Likewise.
* c-c++-common/diagnostic-format-json-5.c: Rewrite regexps.
* c-c++-common/diagnostic-format-json-stderr-1.c: Update comment.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug other/109163] SARIF (and other JSON) output files are non-deterministic
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
` (3 preceding siblings ...)
2023-03-24 15:41 ` cvs-commit at gcc dot gnu.org
@ 2023-03-31 13:03 ` dmalcolm at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: dmalcolm at gcc dot gnu.org @ 2023-03-31 13:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109163
David Malcolm <dmalcolm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|WAITING |RESOLVED
--- Comment #5 from David Malcolm <dmalcolm at gcc dot gnu.org> ---
Should be fixed by the above patch on trunk for GCC 13.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-03-31 13:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 22:05 [Bug other/109163] New: SARIF (and other JSON) output files are non-deterministic dmalcolm at gcc dot gnu.org
2023-03-16 22:10 ` [Bug other/109163] " dmalcolm at gcc dot gnu.org
2023-03-16 22:49 ` dmalcolm at gcc dot gnu.org
2023-03-17 20:57 ` dmalcolm at gcc dot gnu.org
2023-03-24 15:41 ` cvs-commit at gcc dot gnu.org
2023-03-31 13:03 ` dmalcolm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).