From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 100134 invoked by alias); 4 Aug 2017 21:30:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 99480 invoked by uid 89); 4 Aug 2017 21:30:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=BAYES_00,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=burn, policies, Near, Potential X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 04 Aug 2017 21:30:21 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 686AE14803 for ; Fri, 4 Aug 2017 21:30:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 686AE14803 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=dmalcolm@redhat.com Received: from c64.redhat.com (ovpn-112-14.phx2.redhat.com [10.3.112.14]) by smtp.corp.redhat.com (Postfix) with ESMTP id 973EC61F20; Fri, 4 Aug 2017 21:30:03 +0000 (UTC) From: David Malcolm To: gcc-patches@gcc.gnu.org Cc: David Malcolm Subject: [PATCH 00/22] RFC: integrated 3rd-party static analysis support Date: Fri, 04 Aug 2017 21:30:00 -0000 Message-Id: <1501884293-9047-1-git-send-email-dmalcolm@redhat.com> X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00416.txt.bz2 This patch kit clearly isn't ready yet as-is (see e.g. the "known unknowns" below), but I'm posting it now in the hope of getting early feedback. Summary ======= This patch kit provides an easy way to make integrate 3rd-party static analysis tools into gcc, and have them: (a) report through gcc's diagnostic subsystem, and (b) "watermark" the generated binaries with queryable data on what checkers were run, and what the results were. Here's an example showing gcc running a bank of 3rd-party checkers on this source file: #include void test () { void *ptr_1; void *ptr_2; ptr_1 = malloc (64); if (!ptr_1) return; ptr_2 = malloc (64); if (!ptr_2) return; free (ptr_2); free (ptr_1); } via a simple command-line: $ ./xgcc -B. -c conditional-leak.c -Wrun-analyzers=policy.json conditional-leak.c:13:5: error: Potential leak of memory pointed to by 'ptr_1' [clang-analyzer:Memory leak] return; ^ conditional-leak.c:8:11: note: state 1 of 4: Memory is allocated ptr_1 = malloc (64); ^ conditional-leak.c:9:7: note: state 2 of 4: Assuming 'ptr_1' is non-null if (!ptr_1) ^ conditional-leak.c:12:7: note: state 3 of 4: Assuming 'ptr_2' is null if (!ptr_2) ^ conditional-leak.c:13:5: note: state 4 of 4: Potential leak of memory pointed to by 'ptr_1' return; ^ conditional-leak.c:13:0: error: Memory leak: ptr_1 [cppcheck:memleak] return; Of the checkers, clang's static analyzer and cppcheck both identify the memory leak; the former also identifies the control flow (the other checkers didn't report anything). The idea is to provide a mechanism to make it easy for developers and projects to impose policy on what checkers should be run, and to gate the build if certain tests fail. In this case, the results are treated as hard errors and block the build, but policy could allow them to be warnings. Extensive metadata is captured about what checkers were run, and what they emitted, using the "Firehose" interchange format: http://firehose.readthedocs.io/en/latest/index.html In the case where this doesn't block the build, this can be queried via a contrib/get-static-analysis.py script, so e.g. you can verify that a setuid binary was indeed compiled using all the checkers that you expect it to be. This can also be used to embed data about the code into the watermark. For example, checkers/ianal.py embeds information about "Copyright" lines in the source code into the generated binaries, from where it can be queried (this example is intended as a proof-of-concept rather than as a real license-tracking solution...) Statement of the problem ======================== Static analysis is IMHO done too late, if at all: static analysis tools are run as an optional extra, "on the side", rather than in developers' normal workflow, with some kind of "override the compiler and do extra work" hook, which may preclude running more than one analyzer at once. Analysis results are reviewed (if at all) in some kind of on-the-side tool, rather than when the code is being edited, or patches being prepared. It would be better to have an easy way for developers to run analyzer(s) as they're doing development, as part of their edit-compile-test cycle - analysis problems are reported immediately, and can be acted on immediately (e.g. by treating some checker tests as being hard errors). It would also be good to have a way to run analyzer(s) when packages are built, with a variety of precanned policies for analyzers. For example, setuid binaries and network-facing daemons could each be built with a higher strictness of checking. It would also be good to tag binaries with information on what analyzers were run, what options they were invoked with, etc. Potentially have "dump_file" information from optimization passes stored in the metadata also. Have a tool to query all of this. This way a distribution can perform a query like: "show me all setuid binaries that contain code that wasn't checked with $CHECKER with $TEST set to be a hard error" Can/should we break the build if there are issues? Yes: but have a way to opt-in easily: if the tool is well-integrated with the compiler: e.g. -Wrun-analyzers=/usr/share/analyzers/userspace/network-facing-service then upstream developers and packagers can turn on the setting, and see what breaks, and fix it naturally within an compile-edit-test cycle This gives a relatively painless way to opt-in to increasing levels of strictness (e.g. by an upstream project, or by an individual developer). Does this slow the build down? Yes: but you can choose which analyzers run, and can choose to turn them off. It ought to parallelize well. I believe users will prefer to turn them on, and have builders burn up the extra CPU cycles. This may make much more sense for binary distributions (e.g. Fedora, Debian) that it does for things like Gentoo. Example policy files/options might be: -Wrun-analyzers=/usr/share/analyzers/userspace/network-facing-service -Wrun-analyzers=/usr/share/analyzers/userspace/application -Wrun-analyzers=/usr/share/analyzers/userspace/setuid-binary -Wrun-analyzers=/usr/share/analyzers/userspace/default -Wrun-analyzers=/usr/share/analyzers/kernel or whatnot. Idea is to provide mechanism, and for the distribution to decide on some standard policies. This may also allow us to sandbox a gcc plugin by running the plugin inside another cc1, for plugins that add warnings - if the plugin ICEs, then the main cc1 isn't affected (useful for doing mass rebuilds of code using an experimental plugin). Known unknowns ============== How does one suppress a specific false-positive site? Do we need a pragma for it? (though pragmas ought to already affect some of the underlying checkers...) Do we really want .json for the policy format? If we're expecting users to edit this, we need great error messages, and probably support for comments. Would YAML or somesuch be better? Or have them as individual command-line flags, and the policy files are "@" files for gcc. How to mark which checkers are appropriate for which languages? (etc; see also all the FIXMEs in the code...) Dependencies ============ The "checkers" subdirectory uses Python 2 or 3, and has a few Python dependencies, including "firehose" and "gccinvocation". How it works ============ If enabled, toplev.c starts each of the various checkers from separate threads from near the start of toplev.c, so that the checkers run in parallel with each other, and with the bulk of cc1. Near the end of toplev.c it waits for each thread to finish, and reads the stdout, which is expected to be in Firehose JSON format. This is then sent through the diagnostic subsystem. Each "checker" is a harness script, which "knows" how to invoke the particular 3rd-party tool, and coerce the output from the tool into the common JSON format. Some notes on the data model can be seen here: http://firehose.readthedocs.io/en/latest/data-model.html (though that's expressed as Python objects and XML, rather than the JSON format). Successfully bootstrapped®rtested the combination of the patches on x86_64-pc-linux-gnu (though the only testcases are selftest based unit-tests, rather than DejaGnu tests). Thoughts? Dave David Malcolm (22): Expose assert_loceq outside of input.c; add ASSERT_LOCEQ libcpp: add linemap_position_for_file_line_and_column Add JSON implementation Add firehose.h/cc diagnostic.c/h: add support for external tools Makefile.in: hack in -lpthread Add minimal version of Nick Clifton's annobin code Add GNU_BUILD_ATTRIBUTE_STATIC_ANALYSIS to annobin.h Add selftest::read_file (..., FILE *, ...) Add checkers.h/cc Add checkers/test-sources Add -Wrun-analyzers= to common.opt, toplev.c, and invoke.texi Add checkers/checker.py Add checkers/always_fails.py Add checkers/clang_analyzer.py Add checkers/coverity.py Add checkers/cppcheck.py Add checkers/flawfinder.py Add checkers/ianal.py Add checkers/splint.py Add checkers/Makefile Add contrib/get-static-analysis.py checkers/ChangeLog | 9 + checkers/Makefile | 23 + checkers/always_fails.py | 57 + checkers/checker.py | 367 ++++ checkers/clang_analyzer.py | 145 ++ checkers/coverity.py | 141 ++ checkers/cppcheck.py | 138 ++ checkers/flawfinder.py | 124 ++ checkers/ianal.py | 79 + checkers/splint.py | 77 + checkers/test-sources/conditional-leak.c | 17 + checkers/test-sources/cpychecker-demo.c | 110 ++ checkers/test-sources/divide-by-zero.c | 4 + checkers/test-sources/harmless.c | 9 + checkers/test-sources/multiple-1.c | 6 + checkers/test-sources/multiple-2.c | 9 + checkers/test-sources/out-of-bounds.c | 6 + checkers/test-sources/read-through-null.c | 4 + checkers/test-sources/return-of-stack-address.c | 6 + checkers/test-sources/unconditional-file-leak.c | 10 + contrib/get-static-analysis.py | 47 + gcc/Makefile.in | 7 +- gcc/annobin.cc | 185 ++ gcc/annobin.h | 45 + gcc/checkers.cc | 736 ++++++++ gcc/checkers.h | 26 + gcc/common.opt | 4 + gcc/diagnostic-show-locus.c | 29 +- gcc/diagnostic.c | 85 +- gcc/diagnostic.h | 5 + gcc/doc/invoke.texi | 8 +- gcc/firehose.cc | 709 ++++++++ gcc/firehose.h | 199 ++ gcc/input.c | 71 +- gcc/json.cc | 1914 ++++++++++++++++++++ gcc/json.h | 214 +++ gcc/selftest-diagnostic.h | 62 + gcc/selftest-input.h | 54 + gcc/selftest-run-tests.c | 3 + gcc/selftest.c | 16 +- gcc/selftest.h | 10 + .../checker-output/test-clang-analyzer.json | 122 ++ .../selftests/checker-output/test-cppcheck.json | 50 + .../selftests/checker-output/test-failure.json | 38 + .../selftests/checker-policy/test-policy.json | 7 + gcc/toplev.c | 9 + libcpp/include/line-map.h | 9 + libcpp/line-map.c | 51 + 48 files changed, 6001 insertions(+), 55 deletions(-) create mode 100644 checkers/ChangeLog create mode 100644 checkers/Makefile create mode 100755 checkers/always_fails.py create mode 100755 checkers/checker.py create mode 100755 checkers/clang_analyzer.py create mode 100644 checkers/coverity.py create mode 100755 checkers/cppcheck.py create mode 100755 checkers/flawfinder.py create mode 100755 checkers/ianal.py create mode 100755 checkers/splint.py create mode 100644 checkers/test-sources/conditional-leak.c create mode 100644 checkers/test-sources/cpychecker-demo.c create mode 100644 checkers/test-sources/divide-by-zero.c create mode 100644 checkers/test-sources/harmless.c create mode 100644 checkers/test-sources/multiple-1.c create mode 100644 checkers/test-sources/multiple-2.c create mode 100644 checkers/test-sources/out-of-bounds.c create mode 100644 checkers/test-sources/read-through-null.c create mode 100644 checkers/test-sources/return-of-stack-address.c create mode 100644 checkers/test-sources/unconditional-file-leak.c create mode 100644 contrib/get-static-analysis.py create mode 100644 gcc/annobin.cc create mode 100644 gcc/annobin.h create mode 100644 gcc/checkers.cc create mode 100644 gcc/checkers.h create mode 100644 gcc/firehose.cc create mode 100644 gcc/firehose.h create mode 100644 gcc/json.cc create mode 100644 gcc/json.h create mode 100644 gcc/selftest-diagnostic.h create mode 100644 gcc/selftest-input.h create mode 100644 gcc/testsuite/selftests/checker-output/test-clang-analyzer.json create mode 100644 gcc/testsuite/selftests/checker-output/test-cppcheck.json create mode 100644 gcc/testsuite/selftests/checker-output/test-failure.json create mode 100644 gcc/testsuite/selftests/checker-policy/test-policy.json -- 1.8.5.3