From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B81863858404; Mon, 15 Aug 2022 13:08:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B81863858404 From: "dmalcolm at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug analyzer/106626] New: Improvements to wording of -Wanalyzer-out-of-bounds Date: Mon, 15 Aug 2022 13:08:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: analyzer X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dmalcolm at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: dmalcolm at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Aug 2022 13:08:24 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106626 Bug ID: 106626 Summary: Improvements to wording of -Wanalyzer-out-of-bounds Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: analyzer Assignee: dmalcolm at gcc dot gnu.org Reporter: dmalcolm at gcc dot gnu.org CC: tlange at gcc dot gnu.org Target Milestone: --- During the patch review of -Wanalyzer-out-of-bounds we decided to focus on getting the feature implemented in trunk first, and defer coming up with the precise wording, to avoid holding up the feature. I'm filing this bug to track coming up with the precise wording for -Wanalyzer-out-of-bounds. Current status quo: Given e.g.: int arr[10]; int test (void) { return arr[10]; } https://godbolt.org/z/EPrqGTj8s we report: : In function 'test': :6:15: warning: buffer overread [CWE-126] [-Wanalyzer-out-of-bounds] 6 | return arr[10]; | ~~~^~~~ event 1 | | 1 | int arr[10]; | | ^~~ | | | | | (1) capacity is 40 bytes | +--> 'test': events 2-3 | | 4 | test (void) | | ^~~~ | | | | | (2) entry to 'test' | 5 | { | 6 | return arr[10]; | | ~~~~~~~ | | | | | (3) out-of-bounds read from byte 40 till = byte 43 but 'arr' ends at byte 40 | :6:15: note: write is 4 bytes past the end of 'arr' 6 | return arr[10]; | ~~~^~~~ The note erroneously says "write" due to a copy&paste which I plan to fix shortly. Goals: I'd like the diagnostics to (somehow) convey the following information to t= he user: * what was the expression (if available) responsible for the bad access, si= nce we can't always underline exactly the problematic subexpression in a compou= nd expression (or macros could be involved, obscuring the user's view of what = the analyzer is "seeing"). e.g. in the above example 'arr[10]' * direction of the access: read vs write? e.g. in the above example: read * boundary being violated: is the access before or after the buffer? e.g. = in the above example: after the buffer ,* location: where is the invalid access? heap vs stack vs elsewhere (since this can affect the impact of a vulnerability): e.g. in the above example: stack * magnitude: how far beyond the boundary is the invalid access (consider e.= g. the cases of immediately beyond ("off-by-one"), vs near (a few bytes or elements), vs far); e.g. in the above example: 0-3 bytes beyond the boundar= y, 0 elements beyond when expressed as array index * data size: how much data beyond the boundary is accessed; e.g. in the abo= ve example 4 bytes, or 1 element. Doing so is likely to avoid a combinatorial explosion (due to the need for i18n), so to tame this, some of this may need to be split between different parts of the diagnostic (the initial warning, events in the warning's diagnostic path, and any notes after the warning). The above list is taken in part from the the Bug Framework's deprecated Buf= fer Overflow (BOF) Class: https://samate.nist.gov/BF/Old/BOFClass.html I'm not sure why the Bug Framework deprecated "BOF" (presumably in favor of= the "BF Memory Model", https://samate.nist.gov/BF/Classes/MEM/MEMModel.html and its "Memory Use Bugs (MUS) Class": https://samate.nist.gov/BF/Classes/MEM/MUS.html ), but the BOF attributes seem to me to be pertinent information for the us= er, and good things to think about in test cases. Currently -Wanalyzer-out-of-bounds only warns when the size and offset of t= he access are constant (not symbolic), and the capacity of the underlying regi= on is constant (not symbolic). I'd like to eventually generalize that (see PR 106625), so ideally whatever scheme we come up with should support that. Ideally these should be reported in terms of the user's source code. In the above example, the messages talk about bytes, but we should probably *also* talk about array indices (e.g. that indices 0 through 9 are valid, and 10 is one beyond). The wording should also be clear about inclusive vs exclusive ranges - I fe= el 'arr' ends at byte 40" is unclear. Perhaps something like: 'arr' has 10 elements, so valid indices are '[0]' to '[9]' (bytes 0-39) or somesuch. I'm not sure what a correct solution here is, but am filing this now to try= to capture the things we might try to design for. Tim: I'm CCing you in case you want to work on this; otherwise I'd like to = have a go at this in a few weeks.=