public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/22] RFC: integrated 3rd-party static analysis support
Date: Sun, 06 Aug 2017 21:21:00 -0000	[thread overview]
Message-ID: <34cfe1ac-6136-fca7-4da8-de5f025769c3@gmail.com> (raw)
In-Reply-To: <1501884293-9047-1-git-send-email-dmalcolm@redhat.com>

On 08/04/2017 04:04 PM, David Malcolm wrote:
> This patch kit clearly isn't ready yet as-is (see e.g. the
> "known unknowns" below), but I'm posting it now in the hope of
> getting early feedback.
>
> Summary
> =======
>
> This patch kit provides an easy way to make integrate 3rd-party static
> analysis tools into gcc, and have them:
> (a) report through gcc's diagnostic subsystem, and
> (b) "watermark" the generated binaries with queryable data on what checkers
>     were run, and what the results were.

I couldn't agree more that static analysis is often done too late,
and that running it before code is checked in is ideal.  At my last
job, running static analysis was required as part of the commit
criteria for every change.  The tooling made sure that each change
was analyzed and defect-free before it was committed and would
reject changes that failed this test.  (A full run on the whole
product would happen on a much less frequent schedule.)

As for this particular project, I'm somewhat divided.  On the one
hand, implementing these kinds of enhancements is an opportunity
to clean up and generalize the existing (internal) design, and
expose latent bugs in the process.  On the other, adding all this
machinery complicates the already complex code base.  Unless it's
fully exercised as part of everyday GCC development it also runs
the risk of bit rotting and further increasing GCC maintenance.

That said, GCC having its own static analyzer (or some other such
tool) that were used as the default implementation with this
feature to fully exercise it during bootstrap would largely
obviate this concern.  It would still require maintenance but
at least it would be fully tested.

As for adding own implementation of JSON (and other components),
to minimize the maintenance costs, I would far prefer to introduce
a dependency on a well-tested and actively maintained third party
library.

Martin

>
> Here's an example showing gcc running a bank of 3rd-party checkers on this
> source file:
>
>   #include <stdlib.h>
>
>   void test ()
>   {
>     void *ptr_1;
>     void *ptr_2;
>
>     ptr_1 = malloc (64);
>     if (!ptr_1)
>       return;
>     ptr_2 = malloc (64);
>     if (!ptr_2)
>       return;
>
>     free (ptr_2);
>     free (ptr_1);
>   }
>
> via a simple command-line:
>
>   $ ./xgcc -B. -c conditional-leak.c -Wrun-analyzers=policy.json
>   conditional-leak.c:13:5: error: Potential leak of memory pointed to by 'ptr_1' [clang-analyzer:Memory leak]
>        return;
>        ^
>   conditional-leak.c:8:11: note: state 1 of 4: Memory is allocated
>      ptr_1 = malloc (64);
>              ^
>   conditional-leak.c:9:7: note: state 2 of 4: Assuming 'ptr_1' is non-null
>      if (!ptr_1)
>          ^
>   conditional-leak.c:12:7: note: state 3 of 4: Assuming 'ptr_2' is null
>      if (!ptr_2)
>          ^
>   conditional-leak.c:13:5: note: state 4 of 4: Potential leak of memory pointed to by 'ptr_1'
>        return;
>        ^
>   conditional-leak.c:13:0: error: Memory leak: ptr_1 [cppcheck:memleak]
>        return;
>
> Of the checkers, clang's static analyzer and cppcheck both identify the
> memory leak; the former also identifies the control flow (the other
> checkers didn't report anything).
>
> The idea is to provide a mechanism to make it easy for developers and
> projects to impose policy on what checkers should be run, and to gate
> the build if certain tests fail.
>
> In this case, the results are treated as hard errors and block the build,
> but policy could allow them to be warnings.
>
> Extensive metadata is captured about what checkers were run, and what
> they emitted, using the "Firehose" interchange format:
>
>   http://firehose.readthedocs.io/en/latest/index.html
>
> In the case where this doesn't block the build, this can be queried via a
>   contrib/get-static-analysis.py
> script, so e.g. you can verify that a setuid binary was indeed compiled
> using all the checkers that you expect it to be.
>
> This can also be used to embed data about the code into the watermark.
> For example, checkers/ianal.py embeds information about "Copyright"
> lines in the source code into the generated binaries, from where it
> can be queried (this example is intended as a proof-of-concept rather
> than as a real license-tracking solution...)
>
>
> Statement of the problem
> ========================
>
> Static analysis is IMHO done too late, if at all: static analysis tools are run
> as an optional extra, "on the side", rather than in developers' normal
> workflow, with some kind of "override the compiler and do extra work" hook,
> which may preclude running more than one analyzer at once.  Analysis results
> are reviewed (if at all) in some kind of on-the-side tool, rather than when the
> code is being edited, or patches being prepared.
>
> It would be better to have an easy way for developers to run analyzer(s)
> as they're doing development, as part of their edit-compile-test cycle
> - analysis problems are reported immediately, and can be acted on
> immediately (e.g. by treating some checker tests as being hard errors).
>
> It would also be good to have a way to run analyzer(s) when packages are
> built, with a variety of precanned policies for analyzers.  For example,
> setuid binaries and network-facing daemons could each be built with a
> higher strictness of checking.
>
> It would also be good to tag binaries with information on what analyzers
> were run, what options they were invoked with, etc.
> Potentially have "dump_file" information from optimization passes stored
> in the metadata also.   Have a tool to query all of this.
>
> This way a distribution can perform a query like:
>
>   "show me all setuid binaries that contain code that wasn't checked
>    with $CHECKER with $TEST set to be a hard error"
>
> Can/should we break the build if there are issues?
> Yes: but have a way to opt-in easily: if the tool is well-integrated with the
>     compiler: e.g.
>         -Wrun-analyzers=/usr/share/analyzers/userspace/network-facing-service
> then upstream developers and packagers can turn on the setting, and see what
> breaks, and fix it naturally within an compile-edit-test cycle
>
> This gives a relatively painless way to opt-in to increasing levels of
> strictness (e.g. by an upstream project, or by an individual developer).
>
> Does this slow the build down?
> Yes: but you can choose which analyzers run, and can choose to turn them off.
> It ought to parallelize well.  I believe users will prefer to turn them on,
> and have builders burn up the extra CPU cycles.
> This may make much more sense for binary distributions (e.g. Fedora, Debian)
> that it does for things like Gentoo.
>
> Example policy files/options might be:
>   -Wrun-analyzers=/usr/share/analyzers/userspace/network-facing-service
>   -Wrun-analyzers=/usr/share/analyzers/userspace/application
>   -Wrun-analyzers=/usr/share/analyzers/userspace/setuid-binary
>   -Wrun-analyzers=/usr/share/analyzers/userspace/default
>   -Wrun-analyzers=/usr/share/analyzers/kernel
>
> or whatnot.
>
> Idea is to provide mechanism, and for the distribution to decide on some
> standard policies.
>
> This may also allow us to sandbox a gcc plugin by running the plugin inside
> another cc1, for plugins that add warnings - if the plugin ICEs, then the main
> cc1 isn't affected (useful for doing mass rebuilds of code using an
> experimental plugin).
>
>
> Known unknowns
> ==============
>
> How does one suppress a specific false-positive site?
> Do we need a pragma for it?  (though pragmas ought to already affect some of
> the underlying checkers...)
>
> Do we really want .json for the policy format?
> If we're expecting users to edit this, we need great error messages,
> and probably support for comments.  Would YAML or somesuch be better?
> Or have them as individual command-line flags, and the policy files are
> "@" files for gcc.
>
> How to mark which checkers are appropriate for which languages?
>
> (etc; see also all the FIXMEs in the code...)
>
>
> Dependencies
> ============
>
> The "checkers" subdirectory uses Python 2 or 3, and has a few Python
> dependencies, including "firehose" and "gccinvocation".
>
>
> How it works
> ============
>
> If enabled, toplev.c starts each of the various checkers from separate
> threads from near the start of toplev.c, so that the checkers run in
> parallel with each other, and with the bulk of cc1.  Near the end of
> toplev.c it waits for each thread to finish, and reads the stdout,
> which is expected to be in Firehose JSON format.  This is then sent
> through the diagnostic subsystem.
>
> Each "checker" is a harness script, which "knows" how to invoke
> the particular 3rd-party tool, and coerce the output from the tool
> into the common JSON format.
>
> Some notes on the data model can be seen here:
>   http://firehose.readthedocs.io/en/latest/data-model.html
> (though that's expressed as Python objects and XML, rather than
> the JSON format).
>
>
> Successfully bootstrapped&regrtested the combination of the patches
> on x86_64-pc-linux-gnu (though the only testcases are selftest based
> unit-tests, rather than DejaGnu tests).
>
>
> Thoughts?
> Dave
>
>
> David Malcolm (22):
>   Expose assert_loceq outside of input.c; add ASSERT_LOCEQ
>   libcpp: add linemap_position_for_file_line_and_column
>   Add JSON implementation
>   Add firehose.h/cc
>   diagnostic.c/h: add support for external tools
>   Makefile.in: hack in -lpthread
>   Add minimal version of Nick Clifton's annobin code
>   Add GNU_BUILD_ATTRIBUTE_STATIC_ANALYSIS to annobin.h
>   Add selftest::read_file (..., FILE *, ...)
>   Add checkers.h/cc
>   Add checkers/test-sources
>   Add -Wrun-analyzers= to common.opt, toplev.c, and invoke.texi
>   Add checkers/checker.py
>   Add checkers/always_fails.py
>   Add checkers/clang_analyzer.py
>   Add checkers/coverity.py
>   Add checkers/cppcheck.py
>   Add checkers/flawfinder.py
>   Add checkers/ianal.py
>   Add checkers/splint.py
>   Add checkers/Makefile
>   Add contrib/get-static-analysis.py
>
>  checkers/ChangeLog                                 |    9 +
>  checkers/Makefile                                  |   23 +
>  checkers/always_fails.py                           |   57 +
>  checkers/checker.py                                |  367 ++++
>  checkers/clang_analyzer.py                         |  145 ++
>  checkers/coverity.py                               |  141 ++
>  checkers/cppcheck.py                               |  138 ++
>  checkers/flawfinder.py                             |  124 ++
>  checkers/ianal.py                                  |   79 +
>  checkers/splint.py                                 |   77 +
>  checkers/test-sources/conditional-leak.c           |   17 +
>  checkers/test-sources/cpychecker-demo.c            |  110 ++
>  checkers/test-sources/divide-by-zero.c             |    4 +
>  checkers/test-sources/harmless.c                   |    9 +
>  checkers/test-sources/multiple-1.c                 |    6 +
>  checkers/test-sources/multiple-2.c                 |    9 +
>  checkers/test-sources/out-of-bounds.c              |    6 +
>  checkers/test-sources/read-through-null.c          |    4 +
>  checkers/test-sources/return-of-stack-address.c    |    6 +
>  checkers/test-sources/unconditional-file-leak.c    |   10 +
>  contrib/get-static-analysis.py                     |   47 +
>  gcc/Makefile.in                                    |    7 +-
>  gcc/annobin.cc                                     |  185 ++
>  gcc/annobin.h                                      |   45 +
>  gcc/checkers.cc                                    |  736 ++++++++
>  gcc/checkers.h                                     |   26 +
>  gcc/common.opt                                     |    4 +
>  gcc/diagnostic-show-locus.c                        |   29 +-
>  gcc/diagnostic.c                                   |   85 +-
>  gcc/diagnostic.h                                   |    5 +
>  gcc/doc/invoke.texi                                |    8 +-
>  gcc/firehose.cc                                    |  709 ++++++++
>  gcc/firehose.h                                     |  199 ++
>  gcc/input.c                                        |   71 +-
>  gcc/json.cc                                        | 1914 ++++++++++++++++++++
>  gcc/json.h                                         |  214 +++
>  gcc/selftest-diagnostic.h                          |   62 +
>  gcc/selftest-input.h                               |   54 +
>  gcc/selftest-run-tests.c                           |    3 +
>  gcc/selftest.c                                     |   16 +-
>  gcc/selftest.h                                     |   10 +
>  .../checker-output/test-clang-analyzer.json        |  122 ++
>  .../selftests/checker-output/test-cppcheck.json    |   50 +
>  .../selftests/checker-output/test-failure.json     |   38 +
>  .../selftests/checker-policy/test-policy.json      |    7 +
>  gcc/toplev.c                                       |    9 +
>  libcpp/include/line-map.h                          |    9 +
>  libcpp/line-map.c                                  |   51 +
>  48 files changed, 6001 insertions(+), 55 deletions(-)
>  create mode 100644 checkers/ChangeLog
>  create mode 100644 checkers/Makefile
>  create mode 100755 checkers/always_fails.py
>  create mode 100755 checkers/checker.py
>  create mode 100755 checkers/clang_analyzer.py
>  create mode 100644 checkers/coverity.py
>  create mode 100755 checkers/cppcheck.py
>  create mode 100755 checkers/flawfinder.py
>  create mode 100755 checkers/ianal.py
>  create mode 100755 checkers/splint.py
>  create mode 100644 checkers/test-sources/conditional-leak.c
>  create mode 100644 checkers/test-sources/cpychecker-demo.c
>  create mode 100644 checkers/test-sources/divide-by-zero.c
>  create mode 100644 checkers/test-sources/harmless.c
>  create mode 100644 checkers/test-sources/multiple-1.c
>  create mode 100644 checkers/test-sources/multiple-2.c
>  create mode 100644 checkers/test-sources/out-of-bounds.c
>  create mode 100644 checkers/test-sources/read-through-null.c
>  create mode 100644 checkers/test-sources/return-of-stack-address.c
>  create mode 100644 checkers/test-sources/unconditional-file-leak.c
>  create mode 100644 contrib/get-static-analysis.py
>  create mode 100644 gcc/annobin.cc
>  create mode 100644 gcc/annobin.h
>  create mode 100644 gcc/checkers.cc
>  create mode 100644 gcc/checkers.h
>  create mode 100644 gcc/firehose.cc
>  create mode 100644 gcc/firehose.h
>  create mode 100644 gcc/json.cc
>  create mode 100644 gcc/json.h
>  create mode 100644 gcc/selftest-diagnostic.h
>  create mode 100644 gcc/selftest-input.h
>  create mode 100644 gcc/testsuite/selftests/checker-output/test-clang-analyzer.json
>  create mode 100644 gcc/testsuite/selftests/checker-output/test-cppcheck.json
>  create mode 100644 gcc/testsuite/selftests/checker-output/test-failure.json
>  create mode 100644 gcc/testsuite/selftests/checker-policy/test-policy.json
>

  parent reply	other threads:[~2017-08-06 21:21 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 21:30 David Malcolm
2017-08-04 21:30 ` [PATCH 01/22] Expose assert_loceq outside of input.c; add ASSERT_LOCEQ David Malcolm
2017-09-01 17:49   ` Jeff Law
2017-08-04 21:30 ` [PATCH 02/22] libcpp: add linemap_position_for_file_line_and_column David Malcolm
2017-09-01 17:50   ` Jeff Law
2017-08-04 21:30 ` [PATCH 03/22] Add JSON implementation David Malcolm
2017-09-01 17:56   ` Jeff Law
2017-08-04 21:36 ` [PATCH 10/22] Add checkers.h/cc David Malcolm
2017-08-04 21:36 ` [PATCH 08/22] Add GNU_BUILD_ATTRIBUTE_STATIC_ANALYSIS to annobin.h David Malcolm
2017-08-04 21:36 ` [PATCH 16/22] Add checkers/coverity.py David Malcolm
2017-08-04 21:36 ` [PATCH 19/22] Add checkers/ianal.py David Malcolm
2017-08-04 21:36 ` [PATCH 22/22] Add contrib/get-static-analysis.py David Malcolm
2017-08-04 21:36 ` [PATCH 09/22] Add selftest::read_file (..., FILE *, ...) David Malcolm
2017-08-04 21:36 ` [PATCH 17/22] Add checkers/cppcheck.py David Malcolm
2017-08-04 21:36 ` [PATCH 07/22] Add minimal version of Nick Clifton's annobin code David Malcolm
2017-09-01 18:17   ` Jeff Law
2017-08-04 21:37 ` [PATCH 11/22] Add checkers/test-sources David Malcolm
2017-08-04 21:37 ` [PATCH 15/22] Add checkers/clang_analyzer.py David Malcolm
2017-08-04 21:37 ` [PATCH 04/22] Add firehose.h/cc David Malcolm
2017-08-04 21:38 ` [PATCH 13/22] Add checkers/checker.py David Malcolm
2017-08-04 21:38 ` [PATCH 06/22] Makefile.in: hack in -lpthread David Malcolm
2017-09-01 18:13   ` Jeff Law
2017-08-04 21:38 ` [PATCH 12/22] Add -Wrun-analyzers= to common.opt, toplev.c, and invoke.texi David Malcolm
2017-08-04 21:38 ` [PATCH 14/22] Add checkers/always_fails.py David Malcolm
2017-08-04 21:38 ` [PATCH 21/22] Add checkers/Makefile David Malcolm
2017-08-04 21:39 ` [PATCH 20/22] Add checkers/splint.py David Malcolm
2017-08-04 21:39 ` [PATCH 05/22] diagnostic.c/h: add support for external tools David Malcolm
2017-09-01 18:18   ` Jeff Law
2017-08-04 21:39 ` [PATCH 18/22] Add checkers/flawfinder.py David Malcolm
2017-08-05  1:00 ` [PATCH 00/22] RFC: integrated 3rd-party static analysis support Eric Gallager
2017-08-08  0:23   ` David Malcolm
2017-08-06 21:21 ` Martin Sebor [this message]
2017-08-08 17:57 ` Richard Sandiford
2017-09-01 17:46 ` Jeff Law
2017-09-02  2:46   ` Trevor Saunders

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34cfe1ac-6136-fca7-4da8-de5f025769c3@gmail.com \
    --to=msebor@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).