From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 645DE3858D20 for ; Wed, 31 May 2023 18:06:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 645DE3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685556394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=l26KeULA5yOahTO0FzJuHIptPYskRLJXXqPDnvU+Nsg=; b=fDV00ntyKLjez3A5zhUESo29arDX2RDL+hOxdcuYyIzf3neoUJ7Vki6nSqlV33mp9zZQ9X 1A11b0ePt1z/dMtiBYfrscbfThb+xg34Fz80AGqkYZr47bUVClD5Bs9jtbKAJnNkCP2UWw SqG3IEEOf8Fq26CigwM3QK4xQlNhCmk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-610-rnLBCgjcNF2E19NDiDrFxA-1; Wed, 31 May 2023 14:06:32 -0400 X-MC-Unique: rnLBCgjcNF2E19NDiDrFxA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 44EC029AB449 for ; Wed, 31 May 2023 18:06:32 +0000 (UTC) Received: from t14s.localdomain.com (unknown [10.22.17.56]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12FE240CFD45; Wed, 31 May 2023 18:06:32 +0000 (UTC) From: David Malcolm To: gcc-patches@gcc.gnu.org Cc: David Malcolm Subject: [PATCH 0/3] Add diagram support to gcc diagnostics Date: Wed, 31 May 2023 14:06:27 -0400 Message-Id: <20230531180630.3127108-1-dmalcolm@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Existing diagnostic text output in GCC has to be implemented by writing sequentially to a pretty_printer instance. This makes it hard to implement some kinds of diagnostic output (see e.g. diagnostic-show-locus.cc, which is reaching the limits of maintainability). I've posted various experimental patches over the years that add other kinds of output to GCC, such as ASCII art: - "rich vectorization hints": - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01576.html - visualizations of -Wformat-overflow: - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77696 comment 9 onwards - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00771.html This patch kit combines the above ideas. It: - adds more flexible ways to create diagnostic output: - a canvas class, which can be "painted" to via random-access (rather than sequentially), and then printed when the painting is complete. A formatted pretty_printer can be roundtripped to a canvas and back, preserving formatting data (colors and URLs) - a table class for 2D grid layout, supporting items that span multiple rows/columns - a widget class for organizing diagrams hierarchically and painting them to a canvas - expands GCC's diagnostics subsystem so that diagnostics can have "text art" diagrams - think ASCII art, but potentially including some Unicode characters, such as box-drawing chars (by using the canvas class) - uses this to implement visualizations of -Wanalyzer-out-of-bounds so that, where possible, it will emit a text art diagram visualizing the spatial relationship between (a) the memory region that the analyzer predicts would be accessed, versus (b) the range of memory that is valid to access - whether they overlap, are touching, are close or far apart; which one is before or after in memory, the relative sizes involved, the direction of the access (read vs write), and, in some cases, the values of data involved. The new code is in a new "gcc/text-art" subdirectory and "text_art" namespace. Many examples of the visualizations can be seen in patch 3 of the kit; here are two examples; given: int32_t arr[10]; int32_t int_arr_read_element_before_start_far(void) { return arr[-100]; } it emits: demo-1.c: In function ‘int_arr_read_element_before_start_far’: demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds] 7 | return arr[-100]; | ~~~^~~~~~ ‘int_arr_read_element_before_start_far’: event 1 | | 7 | return arr[-100]; | | ~~~^~~~~~ | | | | | (1) out-of-bounds read from byte -400 till byte -397 but ‘arr’ starts at byte 0 | demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’ ┌───────────────────────────┐ │read of ‘int32_t’ (4 bytes)│ └───────────────────────────┘ ^ │ │ ┌───────────────────────────┐ ┌────────┬────────┬─────────┐ │ │ │ [0] │ ... │ [9] │ │ before valid range │ ├────────┴────────┴─────────┤ │ │ │‘arr’ (type: ‘int32_t[10]’)│ └───────────────────────────┘ └───────────────────────────┘ ├─────────────┬─────────────┤├─────┬──────┤├─────────────┬─────────────┤ │ │ │ ╭────────────┴───────────╮ ╭────┴────╮ ╭───────┴──────╮ │⚠️ under-read of 4 bytes│ │396 bytes│ │size: 40 bytes│ ╰────────────────────────╯ ╰─────────╯ ╰──────────────╯ and given: #include void test_non_ascii () { char buf[5]; strcpy (buf, "文字化け"); } it emits: demo-2.c: In function ‘test_non_ascii’: demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] [-Wanalyzer-out-of-bounds] 7 | strcpy (buf, "文字化け"); | ^~~~~~~~~~~~~~~~~~~~~~~~ ‘test_non_ascii’: events 1-2 | | 6 | char buf[5]; | | ^~~ | | | | | (1) capacity: 5 bytes | 7 | strcpy (buf, "文字化け"); | | ~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends at byte 5 | demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’ 7 | strcpy (buf, "文字化け"); | ^~~~~~~~~~~~~~~~~~~~~~~~ demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’ ┌─────┬─────┬─────┬────┬────┐┌────┬────┬────┬────┬────┬────┬────┬──────┐ │ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │ ├─────┼─────┼─────┼────┼────┤├────┼────┼────┼────┼────┼────┼────┼──────┤ │0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │ ├─────┴─────┴─────┼────┴────┴┴────┼────┴────┴────┼────┴────┴────┼──────┤ │ U+6587 │ U+5b57 │ U+5316 │ U+3051 │U+0000│ ├─────────────────┼───────────────┼──────────────┼──────────────┼──────┤ │ 文 │ 字 │ 化 │ け │ NUL │ ├─────────────────┴───────────────┴──────────────┴──────────────┴──────┤ │ string literal (type: ‘char[13]’) │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ v v v v v v v v v v v v v ┌─────┬────────────────┬────┐┌─────────────────────────────────────────┐ │ [0] │ ... │[4] ││ │ ├─────┴────────────────┴────┤│ after valid range │ │ ‘buf’ (type: ‘char[5]’) ││ │ └───────────────────────────┘└─────────────────────────────────────────┘ ├─────────────┬─────────────┤├────────────────────┬────────────────────┤ │ │ ╭────────┴────────╮ ╭───────────┴──────────╮ │capacity: 5 bytes│ │⚠️ overflow of 8 bytes│ ╰─────────────────╯ ╰──────────────────────╯ showing that the overflow occurs partway through the UTF-8 encoding of the U+5b57 code point. It doesn't show up in this email, but the above diagrams are colorized to constrast the valid and invalid access ranges. There are lots more examples in the test suites of patches 2 and 3, including symbolic expressions. I can self-approve most of this but: - patch 1 touches the testsuite for handling newlines in multiline strings in DejaGnu tests - patches 2 and 3 add string literals with non-ASCII, encoded in UTF-8, for use in selftests. Is this OK? Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, both with gcc 4.8.5 and with gcc 10.3.1 Lightly tested with valgrind. OK for trunk? David Malcolm (3): testsuite: move handle-multiline-outputs to before check for blank lines diagnostics: add support for "text art" diagrams analyzer: add text-art visualizations of out-of-bounds accesses [PR106626] contrib/unicode/gen-box-drawing-chars.py | 94 + contrib/unicode/gen-combining-chars.py | 75 + contrib/unicode/gen-printable-chars.py | 77 + gcc/Makefile.in | 12 +- gcc/analyzer/access-diagram.cc | 2405 +++++++++++++++++ gcc/analyzer/access-diagram.h | 165 ++ gcc/analyzer/analyzer.h | 30 + gcc/analyzer/analyzer.opt | 20 + gcc/analyzer/bounds-checking.cc | 270 +- gcc/analyzer/diagnostic-manager.cc | 2 +- gcc/analyzer/engine.cc | 4 +- gcc/analyzer/infinite-recursion.cc | 2 +- gcc/analyzer/kf-analyzer.cc | 2 +- gcc/analyzer/kf.cc | 6 +- gcc/analyzer/pending-diagnostic.h | 2 +- gcc/analyzer/region-model-manager.cc | 32 +- gcc/analyzer/region-model-manager.h | 2 +- gcc/analyzer/region-model.cc | 52 +- gcc/analyzer/region-model.h | 4 + gcc/analyzer/region.cc | 369 ++- gcc/analyzer/region.h | 1 + gcc/analyzer/sm-fd.cc | 14 +- gcc/analyzer/sm-file.cc | 4 +- gcc/analyzer/sm-malloc.cc | 20 +- gcc/analyzer/sm-pattern-test.cc | 2 +- gcc/analyzer/sm-sensitive.cc | 3 +- gcc/analyzer/sm-signal.cc | 2 +- gcc/analyzer/sm-taint.cc | 16 +- gcc/analyzer/store.cc | 11 +- gcc/analyzer/store.h | 9 + gcc/analyzer/varargs.cc | 8 +- gcc/color-macros.h | 16 + gcc/common.opt | 23 + gcc/configure | 2 +- gcc/configure.ac | 2 +- gcc/diagnostic-diagram.h | 51 + gcc/diagnostic-format-json.cc | 10 + gcc/diagnostic-format-sarif.cc | 106 +- gcc/diagnostic-text-art.h | 49 + gcc/diagnostic.cc | 72 + gcc/diagnostic.h | 21 + gcc/doc/invoke.texi | 40 +- gcc/gcc.cc | 6 + gcc/opts-common.cc | 1 + gcc/opts.cc | 6 + gcc/pretty-print.cc | 29 + gcc/pretty-print.h | 1 + gcc/selftest-run-tests.cc | 3 + .../c-c++-common/Wlogical-not-parentheses-2.c | 2 + gcc/testsuite/gcc.dg/analyzer/data-model-1.c | 4 +- .../analyzer/malloc-macro-inline-events.c | 5 - .../analyzer/out-of-bounds-diagram-1-ascii.c | 55 + .../analyzer/out-of-bounds-diagram-1-debug.c | 40 + .../analyzer/out-of-bounds-diagram-1-emoji.c | 55 + .../analyzer/out-of-bounds-diagram-1-json.c | 13 + .../analyzer/out-of-bounds-diagram-1-sarif.c | 24 + .../out-of-bounds-diagram-1-unicode.c | 55 + .../analyzer/out-of-bounds-diagram-10.c | 29 + .../analyzer/out-of-bounds-diagram-11.c | 82 + .../analyzer/out-of-bounds-diagram-12.c | 54 + .../analyzer/out-of-bounds-diagram-13.c | 43 + .../analyzer/out-of-bounds-diagram-14.c | 110 + .../analyzer/out-of-bounds-diagram-15.c | 42 + .../gcc.dg/analyzer/out-of-bounds-diagram-2.c | 30 + .../gcc.dg/analyzer/out-of-bounds-diagram-3.c | 45 + .../gcc.dg/analyzer/out-of-bounds-diagram-4.c | 45 + .../analyzer/out-of-bounds-diagram-5-ascii.c | 40 + .../out-of-bounds-diagram-5-unicode.c | 42 + .../gcc.dg/analyzer/out-of-bounds-diagram-6.c | 125 + .../gcc.dg/analyzer/out-of-bounds-diagram-7.c | 36 + .../gcc.dg/analyzer/out-of-bounds-diagram-8.c | 34 + .../gcc.dg/analyzer/out-of-bounds-diagram-9.c | 42 + .../gcc.dg/analyzer/pattern-test-2.c | 4 +- gcc/testsuite/gcc.dg/missing-header-fixit-5.c | 10 +- .../gcc.dg/plugin/analyzer_gil_plugin.c | 6 +- .../diagnostic-test-text-art-ascii-bw.c | 57 + .../diagnostic-test-text-art-ascii-color.c | 58 + .../plugin/diagnostic-test-text-art-none.c | 5 + .../diagnostic-test-text-art-unicode-bw.c | 58 + .../diagnostic-test-text-art-unicode-color.c | 59 + .../plugin/diagnostic_plugin_test_text_art.c | 257 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 6 + gcc/testsuite/lib/gcc-dg.exp | 5 + gcc/testsuite/lib/multiline.exp | 7 +- gcc/testsuite/lib/prune.exp | 7 - gcc/text-art/box-drawing-chars.inc | 18 + gcc/text-art/box-drawing.cc | 72 + gcc/text-art/box-drawing.h | 32 + gcc/text-art/canvas.cc | 437 +++ gcc/text-art/canvas.h | 74 + gcc/text-art/ruler.cc | 723 +++++ gcc/text-art/ruler.h | 125 + gcc/text-art/selftests.cc | 77 + gcc/text-art/selftests.h | 60 + gcc/text-art/style.cc | 632 +++++ gcc/text-art/styled-string.cc | 1107 ++++++++ gcc/text-art/table.cc | 1272 +++++++++ gcc/text-art/table.h | 262 ++ gcc/text-art/theme.cc | 183 ++ gcc/text-art/theme.h | 123 + gcc/text-art/types.h | 504 ++++ gcc/text-art/widget.cc | 275 ++ gcc/text-art/widget.h | 246 ++ libcpp/charset.cc | 89 +- libcpp/combining-chars.inc | 68 + libcpp/include/cpplib.h | 3 + libcpp/printable-chars.inc | 231 ++ 107 files changed, 12163 insertions(+), 194 deletions(-) create mode 100755 contrib/unicode/gen-box-drawing-chars.py create mode 100755 contrib/unicode/gen-combining-chars.py create mode 100755 contrib/unicode/gen-printable-chars.py create mode 100644 gcc/analyzer/access-diagram.cc create mode 100644 gcc/analyzer/access-diagram.h create mode 100644 gcc/diagnostic-diagram.h create mode 100644 gcc/diagnostic-text-art.h create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-ascii.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-debug.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-json.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-sarif.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-10.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-11.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-12.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-13.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-14.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-15.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-2.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-3.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-4.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-6.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-7.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-8.c create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-9.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-none.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_text_art.c create mode 100644 gcc/text-art/box-drawing-chars.inc create mode 100644 gcc/text-art/box-drawing.cc create mode 100644 gcc/text-art/box-drawing.h create mode 100644 gcc/text-art/canvas.cc create mode 100644 gcc/text-art/canvas.h create mode 100644 gcc/text-art/ruler.cc create mode 100644 gcc/text-art/ruler.h create mode 100644 gcc/text-art/selftests.cc create mode 100644 gcc/text-art/selftests.h create mode 100644 gcc/text-art/style.cc create mode 100644 gcc/text-art/styled-string.cc create mode 100644 gcc/text-art/table.cc create mode 100644 gcc/text-art/table.h create mode 100644 gcc/text-art/theme.cc create mode 100644 gcc/text-art/theme.h create mode 100644 gcc/text-art/types.h create mode 100644 gcc/text-art/widget.cc create mode 100644 gcc/text-art/widget.h create mode 100644 libcpp/combining-chars.inc create mode 100644 libcpp/printable-chars.inc -- 2.26.3