public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Thomas Schwinge <thomas@codesourcery.com>
Cc: gcc-patches@gcc.gnu.org, Markus Schordan <schordan1@llnl.gov>
Subject: Re: [committed] diagnostics: add SARIF output format
Date: Tue, 07 Jun 2022 17:22:38 -0400	[thread overview]
Message-ID: <a0cbe9e27a267128b08947e4ae3772210adc8fea.camel@redhat.com> (raw)
In-Reply-To: <87ilpcqzlc.fsf@euler.schwinge.homeip.net>

On Tue, 2022-06-07 at 17:10 +0200, Thomas Schwinge wrote:
> Hi David, and Markus!
> 
> On 2022-06-02T15:46:20-0400, David Malcolm via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> > This patch adds support to gcc's diagnostic subsystem for emitting
> > diagnostics in SARIF, aka the Static Analysis Results Interchange
> > Format:
> >   https://sarifweb.azurewebsites.net/
> > by extending -fdiagnostics-format= to add two new options:
> >   -fdiagnostics-format=sarif-stderr
> > and:
> >   -fdiagnostics-format=sarif-file
> > 
> > The patch targets SARIF v2.1.0
> 
> Now that's "funny": on that very day that you pushed to GCC
> "diagnostics: add SARIF output format", I'd been attending at ISC 2022
> the "Compiler-assisted Correctness Checking and Performance
> Optimization
> for HPC" (C3PO) workshop, <
> https://c3po-workshop.github.io/2022/program>,
> where in his interesting keynote "On the Benefits of Software
> Verification Competitions for HPC", Markus Schordan (in CC just for
> your
> information) had a number of generally positive :-) mentions of GCC's
> Static Analyzer -- just also did comment that it doesn't support the
> standard SARIF output format.  Seems that issue is now resolved.  :-)

Thanks for the heads-up!

> 
> He generally also covered other fundamental aspects, such as the
> difference between "sound" vs. "complete" analysis.  See
> < 
> http://www.pl-enthusiast.net/2017/10/23/what-is-soundness-in-static-analysis/
> >
> "What is soundness (in static analysis)?", or
> < 
> https://bertrandmeyer.com/2019/04/21/soundness-completeness-precision/>
> "Soundness and completeness: with precision", for example.  As I
> remember, it was stated that it's unclear which one GCC's Static
> Analyzer
> strives for; may want to clarify that, in the manual:
> <https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html>, I
> suppose?

Thanks.  I'd thought I'd mentioned it in the official docs, but I think
I've only ever said it in presentations: GCC's -fanalyzer is *neither*
sound nor complete: it can fail to find real problems, and it can
report false positives.

Specifically, the analyzer's program_state classes contain various
approximations, and further, there are various places in the
representation of program state where the analyzer merges "sufficiently
similar" states at a program point, which can lead to false positives. 

Also, the analyzer tracks state in a very coarse-grained way, which can
lead to "state explosions", and so I have various heuristics where the
analysis simply gives up if it's seen too many unmergable states
(either at a given program point, or globally), leading to failing to
fully explore the software-under-test.  Also, the way I check
feasibility of constraints along execution paths is another
approximation, which can falsely reject a path as infeasible (leading
to false negatives).  Plus there are bugs...

I tend to think of -fanalyzer as a family of more expensive GCC
warnings, rather than a formal verification tool; I try to run it on
real-world C code, and try to ensure it generates helpful results with
a decent "signal:noise ratio", but it's not going to be suitable for
formally proving the absence of undefined behaviors.

> 
> 
> Plus, probably a few more things relevant for GCC's Static Analyzer,
> that
> I don't currently remember; I didn't take notes.  Maybe Markus is
> going
> to upload his presentation on <https://people.llnl.gov/schordan1>, or
> would like to make it available to you in another way?

The talk sounds very interesting, so yes, I'm keen in seeing it if if
was recorded (or are there slides?).

> 
> 
> Note that I'm really just relaying information here, but other than
> general interest, I'm myself not too familiar with the details of
> Static
> Analysis.  Just thought that you would appreciate hearing about GCC's
> Static Analyzer "spotted in the wild".


Thanks!
Dave

> 
> 
> Grüße
>  Thomas
> 
> 
> > This is a JSON-based format suited for capturing the results of
> > static
> > analysis tools (like GCC's -fanalyzer), but it can also be used for
> > plain
> > GCC warnings and errors.
> > 
> > SARIF supports per-event metadata in diagnostic paths such as
> > ["acquire", "resource"] and ["release", "lock"] (specifically, the
> > threadFlowLocation "kinds" property: SARIF v2.1.0 section 3.38.8),
> > so
> > the patch extends GCC"s diagnostic_event subclass with a "struct
> > meaning"
> > with similar purpose.  The patch implements this for -fanalyzer so
> > that
> > the various state-machine-based warnings set these in the SARIF
> > output.
> > 
> > The heart of the implementation is in the new file
> > diagnostic-format-sarif.cc.  Much of the rest of the patch is
> > interface
> > classes, isolating the diagnostic subsystem (which has no knowledge
> > of
> > e.g. tree or langhook) from the "client" code in the compiler
> > proper
> > cc1 etc).
> > 
> > The patch adds a langhook for specifying the SARIF v2.1.0
> > "artifact.sourceLanguage" property, based on the list in
> > SARIF v2.1.0 Appendix J.
> > 
> > The patch adds automated DejaGnu tests to our testsuite via new
> > scan-sarif-file and scan-sarif-file-not directives (although these
> > merely use regexps, rather than attempting to use a proper JSON
> > parser).
> > 
> > I've tested the patch by hand using the validator at:
> >   https://sarifweb.azurewebsites.net/Validation
> > and the react-based viewer at:
> >   https://microsoft.github.io/sarif-web-component/
> > which successfully shows most of the information (although not
> > paths,
> > and not CWE IDs), and I've fixed all validation errors I've seen
> > (though
> > bugs no doubt remain).
> > 
> > I've also tested the generated SARIF using the VS Code extension
> > linked
> > to from the SARIF website; I'm a novice with VS Code, but it seems
> > to be
> > able to handle my generated SARIF files (e.g. showing the data in
> > the
> > SARIF tab, and showing squiggly underlines under issues, and when I
> > click on them, it visualizes the events in the path inline within
> > the
> > source window).
> > 
> > Has anyone written an Emacs mode for SARIF files? (pretty please)
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > Pushed to trunk as r13-967-g6cf276ddf22066.
> > 
> > [...]
> -----------------
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
> 201, 80634 München; Gesellschaft mit beschränkter Haftung;
> Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der
> Gesellschaft: München; Registergericht München, HRB 106955
> 



      reply	other threads:[~2022-06-07 21:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02 19:46 David Malcolm
2022-06-07 15:10 ` Thomas Schwinge
2022-06-07 21:22   ` David Malcolm [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a0cbe9e27a267128b08947e4ae3772210adc8fea.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=schordan1@llnl.gov \
    --cc=thomas@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).