public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "dmalcolm at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug analyzer/106626] New: Improvements to wording of -Wanalyzer-out-of-bounds
Date: Mon, 15 Aug 2022 13:08:23 +0000	[thread overview]
Message-ID: <bug-106626-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106626

            Bug ID: 106626
           Summary: Improvements to wording of -Wanalyzer-out-of-bounds
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: analyzer
          Assignee: dmalcolm at gcc dot gnu.org
          Reporter: dmalcolm at gcc dot gnu.org
                CC: tlange at gcc dot gnu.org
  Target Milestone: ---

During the patch review of -Wanalyzer-out-of-bounds we decided to focus on
getting the feature implemented in trunk first, and defer coming up with the
precise wording, to avoid holding up the feature.

I'm filing this bug to track coming up with the precise wording for
-Wanalyzer-out-of-bounds.

Current status quo:

Given e.g.:

int arr[10];

int
test (void)
{
    return arr[10];
}

https://godbolt.org/z/EPrqGTj8s

we report:

<source>: In function 'test':
<source>:6:15: warning: buffer overread [CWE-126] [-Wanalyzer-out-of-bounds]
    6 |     return arr[10];
      |            ~~~^~~~
  event 1
    |
    |    1 | int arr[10];
    |      |     ^~~
    |      |     |
    |      |     (1) capacity is 40 bytes
    |
    +--> 'test': events 2-3
           |
           |    4 | test (void)
           |      | ^~~~
           |      | |
           |      | (2) entry to 'test'
           |    5 | {
           |    6 |     return arr[10];
           |      |            ~~~~~~~
           |      |               |
           |      |               (3) out-of-bounds read from byte 40 till byte
43 but 'arr' ends at byte 40
           |
<source>:6:15: note: write is 4 bytes past the end of 'arr'
    6 |     return arr[10];
      |            ~~~^~~~

The note erroneously says "write" due to a copy&paste which I plan to fix
shortly.


Goals:

I'd like the diagnostics to (somehow) convey the following information to the
user:

* what was the expression (if available) responsible for the bad access, since
we can't always underline exactly the problematic subexpression in a compound
expression (or macros could be involved, obscuring the user's view of what the
analyzer is "seeing").  e.g. in the above example 'arr[10]'
* direction of the access: read vs write?  e.g. in the above example: read
* boundary being violated: is the access before or after the buffer?  e.g. in
the above example: after the buffer
,* location: where is the invalid access?  heap vs stack vs elsewhere (since
this can affect the impact of a vulnerability): e.g. in the above example:
stack
* magnitude: how far beyond the boundary is the invalid access (consider e.g.
the cases of immediately beyond ("off-by-one"), vs near (a few bytes or
elements), vs far); e.g. in the above example: 0-3 bytes beyond the boundary, 0
elements beyond when expressed as array index
* data size: how much data beyond the boundary is accessed; e.g. in the above
example 4 bytes, or 1 element.

Doing so is likely to avoid a combinatorial explosion (due to the need for
i18n), so to tame this, some of this may need to be split between different
parts of the diagnostic (the initial warning, events in the warning's
diagnostic path, and any notes after the warning).

The above list is taken in part from the the Bug Framework's deprecated Buffer
Overflow (BOF) Class:
  https://samate.nist.gov/BF/Old/BOFClass.html
I'm not sure why the Bug Framework deprecated "BOF" (presumably in favor of the
"BF Memory Model",
  https://samate.nist.gov/BF/Classes/MEM/MEMModel.html
and its "Memory Use Bugs (MUS) Class":
  https://samate.nist.gov/BF/Classes/MEM/MUS.html
), but the BOF attributes seem to me to be pertinent information for the user,
and good things to think about in test cases.

Currently -Wanalyzer-out-of-bounds only warns when the size and offset of the
access are constant (not symbolic), and the capacity of the underlying region
is constant (not symbolic).  I'd like to eventually generalize that (see PR
106625), so ideally whatever scheme we come up with should support that.

Ideally these should be reported in terms of the user's source code.  In the
above example, the messages talk about bytes, but we should probably *also*
talk about array indices (e.g. that indices 0 through 9 are valid, and 10 is
one beyond).

The wording should also be clear about inclusive vs exclusive ranges - I feel
  'arr' ends at byte 40"
is unclear.  Perhaps something like:
  'arr' has 10 elements, so valid indices are '[0]' to '[9]'  (bytes 0-39)
or somesuch.

I'm not sure what a correct solution here is, but am filing this now to try to
capture the things we might try to design for.

Tim: I'm CCing you in case you want to work on this; otherwise I'd like to have
a go at this in a few weeks.

             reply	other threads:[~2022-08-15 13:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15 13:08 dmalcolm at gcc dot gnu.org [this message]
2022-08-15 18:48 ` [Bug analyzer/106626] " cvs-commit at gcc dot gnu.org
2022-12-01  2:31 ` cvs-commit at gcc dot gnu.org
2022-12-01  2:31 ` cvs-commit at gcc dot gnu.org
2022-12-01  2:31 ` cvs-commit at gcc dot gnu.org
2022-12-01  2:31 ` cvs-commit at gcc dot gnu.org
2022-12-01  2:31 ` cvs-commit at gcc dot gnu.org
2023-04-07 12:36 ` dmalcolm at gcc dot gnu.org
2023-06-22  2:06 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-106626-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).