From: David Malcolm <dmalcolm@redhat.com>
To: gcc-patches@gcc.gnu.org
Cc: David Malcolm <dmalcolm@redhat.com>
Subject: [PATCH 0/3] Add diagram support to gcc diagnostics
Date: Wed, 31 May 2023 14:06:27 -0400 [thread overview]
Message-ID: <20230531180630.3127108-1-dmalcolm@redhat.com> (raw)
Existing diagnostic text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance. This makes it hard to
implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc, which is reaching the limits of
maintainability).
I've posted various experimental patches over the years that add other
kinds of output to GCC, such as ASCII art:
- "rich vectorization hints":
- https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01576.html
- visualizations of -Wformat-overflow:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77696 comment 9 onwards
- https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00771.html
This patch kit combines the above ideas. It:
- adds more flexible ways to create diagnostic output:
- a canvas class, which can be "painted" to via random-access (rather
than sequentially), and then printed when the painting is complete.
A formatted pretty_printer can be roundtripped to a canvas and back,
preserving formatting data (colors and URLs)
- a table class for 2D grid layout, supporting items that span multiple
rows/columns
- a widget class for organizing diagrams hierarchically and painting
them to a canvas
- expands GCC's diagnostics subsystem so that diagnostics can have
"text art" diagrams - think ASCII art, but potentially including some
Unicode characters, such as box-drawing chars (by using the canvas
class)
- uses this to implement visualizations of -Wanalyzer-out-of-bounds so
that, where possible, it will emit a text art diagram visualizing the
spatial relationship between (a) the memory region that the analyzer
predicts would be accessed, versus (b) the range of memory that is
valid to access - whether they overlap, are touching, are close or far
apart; which one is before or after in memory, the relative sizes
involved, the direction of the access (read vs write), and, in some
cases, the values of data involved.
The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.
Many examples of the visualizations can be seen in patch 3 of the kit;
here are two examples; given:
int32_t arr[10];
int32_t int_arr_read_element_before_start_far(void)
{
return arr[-100];
}
it emits:
demo-1.c: In function ‘int_arr_read_element_before_start_far’:
demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds]
7 | return arr[-100];
| ~~~^~~~~~
‘int_arr_read_element_before_start_far’: event 1
|
| 7 | return arr[-100];
| | ~~~^~~~~~
| | |
| | (1) out-of-bounds read from byte -400 till byte -397 but ‘arr’ starts at byte 0
|
demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’
┌───────────────────────────┐
│read of ‘int32_t’ (4 bytes)│
└───────────────────────────┘
^
│
│
┌───────────────────────────┐ ┌────────┬────────┬─────────┐
│ │ │ [0] │ ... │ [9] │
│ before valid range │ ├────────┴────────┴─────────┤
│ │ │‘arr’ (type: ‘int32_t[10]’)│
└───────────────────────────┘ └───────────────────────────┘
├─────────────┬─────────────┤├─────┬──────┤├─────────────┬─────────────┤
│ │ │
╭────────────┴───────────╮ ╭────┴────╮ ╭───────┴──────╮
│⚠️ under-read of 4 bytes│ │396 bytes│ │size: 40 bytes│
╰────────────────────────╯ ╰─────────╯ ╰──────────────╯
and given:
#include <string.h>
void
test_non_ascii ()
{
char buf[5];
strcpy (buf, "文字化け");
}
it emits:
demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds]
7 | strcpy (buf, "文字化け");
| ^~~~~~~~~~~~~~~~~~~~~~~~
‘test_non_ascii’: events 1-2
|
| 6 | char buf[5];
| | ^~~
| | |
| | (1) capacity: 5 bytes
| 7 | strcpy (buf, "文字化け");
| | ~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5
|
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
7 | strcpy (buf, "文字化け");
| ^~~~~~~~~~~~~~~~~~~~~~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’
┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐
│ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤
│0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤
│ U+6587 │ U+5b57 │ U+5316 │ U+3051 │U+0000│
├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤
│ 文 │ 字 │ 化 │ け │ NUL │
├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤
│ string literal (type: ‘char[13]’) │
└──────────────────────────────────────────────────────────────────────┘
│ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │
v v v v v v v v v v v v v
┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐
│ [0] │ ... │[4] ││ │
├─────┴────────────────┴────┤│ after valid range │
│ ‘buf’ (type: ‘char[5]’) ││ │
└───────────────────────────┘└─────────────────────────────────────────┘
├─────────────┬─────────────┤├────────────────────┬────────────────────┤
│ │
╭────────┴────────╮ ╭───────────┴──────────╮
│capacity: 5 bytes│ │⚠️ overflow of 8 bytes│
╰─────────────────╯ ╰──────────────────────╯
showing that the overflow occurs partway through the UTF-8 encoding of
the U+5b57 code point.
It doesn't show up in this email, but the above diagrams are colorized
to constrast the valid and invalid access ranges.
There are lots more examples in the test suites of patches 2 and 3,
including symbolic expressions.
I can self-approve most of this but:
- patch 1 touches the testsuite for handling newlines in multiline
strings in DejaGnu tests
- patches 2 and 3 add string literals with non-ASCII, encoded in UTF-8,
for use in selftests. Is this OK?
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, both with
gcc 4.8.5 and with gcc 10.3.1
Lightly tested with valgrind.
OK for trunk?
David Malcolm (3):
testsuite: move handle-multiline-outputs to before check for blank
lines
diagnostics: add support for "text art" diagrams
analyzer: add text-art visualizations of out-of-bounds accesses
[PR106626]
contrib/unicode/gen-box-drawing-chars.py | 94 +
contrib/unicode/gen-combining-chars.py | 75 +
contrib/unicode/gen-printable-chars.py | 77 +
gcc/Makefile.in | 12 +-
gcc/analyzer/access-diagram.cc | 2405 +++++++++++++++++
gcc/analyzer/access-diagram.h | 165 ++
gcc/analyzer/analyzer.h | 30 +
gcc/analyzer/analyzer.opt | 20 +
gcc/analyzer/bounds-checking.cc | 270 +-
gcc/analyzer/diagnostic-manager.cc | 2 +-
gcc/analyzer/engine.cc | 4 +-
gcc/analyzer/infinite-recursion.cc | 2 +-
gcc/analyzer/kf-analyzer.cc | 2 +-
gcc/analyzer/kf.cc | 6 +-
gcc/analyzer/pending-diagnostic.h | 2 +-
gcc/analyzer/region-model-manager.cc | 32 +-
gcc/analyzer/region-model-manager.h | 2 +-
gcc/analyzer/region-model.cc | 52 +-
gcc/analyzer/region-model.h | 4 +
gcc/analyzer/region.cc | 369 ++-
gcc/analyzer/region.h | 1 +
gcc/analyzer/sm-fd.cc | 14 +-
gcc/analyzer/sm-file.cc | 4 +-
gcc/analyzer/sm-malloc.cc | 20 +-
gcc/analyzer/sm-pattern-test.cc | 2 +-
gcc/analyzer/sm-sensitive.cc | 3 +-
gcc/analyzer/sm-signal.cc | 2 +-
gcc/analyzer/sm-taint.cc | 16 +-
gcc/analyzer/store.cc | 11 +-
gcc/analyzer/store.h | 9 +
gcc/analyzer/varargs.cc | 8 +-
gcc/color-macros.h | 16 +
gcc/common.opt | 23 +
gcc/configure | 2 +-
gcc/configure.ac | 2 +-
gcc/diagnostic-diagram.h | 51 +
gcc/diagnostic-format-json.cc | 10 +
gcc/diagnostic-format-sarif.cc | 106 +-
gcc/diagnostic-text-art.h | 49 +
gcc/diagnostic.cc | 72 +
gcc/diagnostic.h | 21 +
gcc/doc/invoke.texi | 40 +-
gcc/gcc.cc | 6 +
gcc/opts-common.cc | 1 +
gcc/opts.cc | 6 +
gcc/pretty-print.cc | 29 +
gcc/pretty-print.h | 1 +
gcc/selftest-run-tests.cc | 3 +
.../c-c++-common/Wlogical-not-parentheses-2.c | 2 +
gcc/testsuite/gcc.dg/analyzer/data-model-1.c | 4 +-
.../analyzer/malloc-macro-inline-events.c | 5 -
.../analyzer/out-of-bounds-diagram-1-ascii.c | 55 +
.../analyzer/out-of-bounds-diagram-1-debug.c | 40 +
.../analyzer/out-of-bounds-diagram-1-emoji.c | 55 +
.../analyzer/out-of-bounds-diagram-1-json.c | 13 +
.../analyzer/out-of-bounds-diagram-1-sarif.c | 24 +
.../out-of-bounds-diagram-1-unicode.c | 55 +
.../analyzer/out-of-bounds-diagram-10.c | 29 +
.../analyzer/out-of-bounds-diagram-11.c | 82 +
.../analyzer/out-of-bounds-diagram-12.c | 54 +
.../analyzer/out-of-bounds-diagram-13.c | 43 +
.../analyzer/out-of-bounds-diagram-14.c | 110 +
.../analyzer/out-of-bounds-diagram-15.c | 42 +
.../gcc.dg/analyzer/out-of-bounds-diagram-2.c | 30 +
.../gcc.dg/analyzer/out-of-bounds-diagram-3.c | 45 +
.../gcc.dg/analyzer/out-of-bounds-diagram-4.c | 45 +
.../analyzer/out-of-bounds-diagram-5-ascii.c | 40 +
.../out-of-bounds-diagram-5-unicode.c | 42 +
.../gcc.dg/analyzer/out-of-bounds-diagram-6.c | 125 +
.../gcc.dg/analyzer/out-of-bounds-diagram-7.c | 36 +
.../gcc.dg/analyzer/out-of-bounds-diagram-8.c | 34 +
.../gcc.dg/analyzer/out-of-bounds-diagram-9.c | 42 +
.../gcc.dg/analyzer/pattern-test-2.c | 4 +-
gcc/testsuite/gcc.dg/missing-header-fixit-5.c | 10 +-
.../gcc.dg/plugin/analyzer_gil_plugin.c | 6 +-
.../diagnostic-test-text-art-ascii-bw.c | 57 +
.../diagnostic-test-text-art-ascii-color.c | 58 +
.../plugin/diagnostic-test-text-art-none.c | 5 +
.../diagnostic-test-text-art-unicode-bw.c | 58 +
.../diagnostic-test-text-art-unicode-color.c | 59 +
.../plugin/diagnostic_plugin_test_text_art.c | 257 ++
gcc/testsuite/gcc.dg/plugin/plugin.exp | 6 +
gcc/testsuite/lib/gcc-dg.exp | 5 +
gcc/testsuite/lib/multiline.exp | 7 +-
gcc/testsuite/lib/prune.exp | 7 -
gcc/text-art/box-drawing-chars.inc | 18 +
gcc/text-art/box-drawing.cc | 72 +
gcc/text-art/box-drawing.h | 32 +
gcc/text-art/canvas.cc | 437 +++
gcc/text-art/canvas.h | 74 +
gcc/text-art/ruler.cc | 723 +++++
gcc/text-art/ruler.h | 125 +
gcc/text-art/selftests.cc | 77 +
gcc/text-art/selftests.h | 60 +
gcc/text-art/style.cc | 632 +++++
gcc/text-art/styled-string.cc | 1107 ++++++++
gcc/text-art/table.cc | 1272 +++++++++
gcc/text-art/table.h | 262 ++
gcc/text-art/theme.cc | 183 ++
gcc/text-art/theme.h | 123 +
gcc/text-art/types.h | 504 ++++
gcc/text-art/widget.cc | 275 ++
gcc/text-art/widget.h | 246 ++
libcpp/charset.cc | 89 +-
libcpp/combining-chars.inc | 68 +
libcpp/include/cpplib.h | 3 +
libcpp/printable-chars.inc | 231 ++
107 files changed, 12163 insertions(+), 194 deletions(-)
create mode 100755 contrib/unicode/gen-box-drawing-chars.py
create mode 100755 contrib/unicode/gen-combining-chars.py
create mode 100755 contrib/unicode/gen-printable-chars.py
create mode 100644 gcc/analyzer/access-diagram.cc
create mode 100644 gcc/analyzer/access-diagram.h
create mode 100644 gcc/diagnostic-diagram.h
create mode 100644 gcc/diagnostic-text-art.h
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-ascii.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-json.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-sarif.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-10.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-11.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-12.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-14.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-15.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-2.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-3.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-4.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-6.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-7.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-8.c
create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-9.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-none.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c
create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c
create mode 100644 gcc/text-art/box-drawing-chars.inc
create mode 100644 gcc/text-art/box-drawing.cc
create mode 100644 gcc/text-art/box-drawing.h
create mode 100644 gcc/text-art/canvas.cc
create mode 100644 gcc/text-art/canvas.h
create mode 100644 gcc/text-art/ruler.cc
create mode 100644 gcc/text-art/ruler.h
create mode 100644 gcc/text-art/selftests.cc
create mode 100644 gcc/text-art/selftests.h
create mode 100644 gcc/text-art/style.cc
create mode 100644 gcc/text-art/styled-string.cc
create mode 100644 gcc/text-art/table.cc
create mode 100644 gcc/text-art/table.h
create mode 100644 gcc/text-art/theme.cc
create mode 100644 gcc/text-art/theme.h
create mode 100644 gcc/text-art/types.h
create mode 100644 gcc/text-art/widget.cc
create mode 100644 gcc/text-art/widget.h
create mode 100644 libcpp/combining-chars.inc
create mode 100644 libcpp/printable-chars.inc
--
2.26.3
next reply other threads:[~2023-05-31 18:06 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-31 18:06 David Malcolm [this message]
2023-05-31 18:06 ` [PATCH 1/3] testsuite: move handle-multiline-outputs to before check for blank lines David Malcolm
2023-06-12 23:11 ` PING: " David Malcolm
2023-06-20 17:21 ` PING^2: " David Malcolm
2023-06-21 16:24 ` Mike Stump
2023-05-31 18:06 ` [PATCH 2/3] diagnostics: add support for "text art" diagrams David Malcolm
2023-06-23 11:52 ` Alex Coplan
2023-06-23 14:36 ` [PATCH] text-art: remove explicit #include of C++ standard library headers David Malcolm
2023-06-23 15:35 ` Alex Coplan
2023-06-24 1:26 ` [pushed: v2] " David Malcolm
2023-05-31 18:06 ` [PATCH 3/3] analyzer: add text-art visualizations of out-of-bounds accesses [PR106626] David Malcolm
2023-06-30 14:40 ` Martin Jambor
2023-07-20 14:47 ` [committed] Document new analyzer parameters Martin Jambor
2023-07-20 14:59 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230531180630.3127108-1-dmalcolm@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).