public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Martin Sebor via Gcc-help <gcc-help@gcc.gnu.org>
Cc: Dumitru Ceara <dceara@redhat.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	Martin Sebor <msebor@gmail.com>
Subject: Re: Potentially false-positive -Wstringop-overflow= warning with gcc >= 11.1
Date: Tue, 01 Feb 2022 05:18:57 +0000	[thread overview]
Message-ID: <mpto83rgo5a.fsf@arm.com> (raw)
In-Reply-To: <ea3c3f08-f05f-aa7b-5510-f77a87c3914f@gmail.com> (Martin Sebor via Gcc-help's message of "Mon, 31 Jan 2022 14:49:43 -0700")

Martin Sebor via Gcc-help <gcc-help@gcc.gnu.org> writes:
> On 1/28/22 09:16, Dumitru Ceara via Gcc-help wrote:
>> On 1/28/22 16:27, Segher Boessenkool wrote:
>>> On Fri, Jan 28, 2022 at 04:01:36PM +0100, Dumitru Ceara via Gcc-help wrote:
>>>>      void *l4data = p.l4_ofs != UINT16_MAX ? (char *) p.base_ + p.l4_ofs : NULL;
>>>>      struct hdr2 *h2 = l4data;
>>>>
>>>>      memcpy(h2 + 1, &somedata[0], 6);
>>>
>>> l4data can be 0, and everything false apart from there on.
>>>
>> 
>> In general, yes, l4data can be 0, if p.l4_ofs == UINT16_MAX, which can
>> only happen if pkt_bar() changed p.base_.
>> 
>> But the compiler can't know that for sure and the warning makes it sound
>> like it's a sure thing:
>> 
>> "warning: ‘memcpy’ writing 6 bytes into a region of size 0 overflows the
>> destination"
>
> The reason why it makes it sound like it's a sure thing is because
> a) the warning sees an invalid statement in the IL, and b) because
> GCC warnings work on one IL statement at a time and without regard
> to the conditions under which they might be executed.
>
> An IL statement is not always the same as a source code statement.
> Some statements are also isolated from the source code even when they
> don't appear there, or appear with different arguments.  That's also
> what happens in this case: GCC replaces the memcpy call and
> the subsequent store with two: one pair is valid  and another pair
> of invalid statements.  Here's the problem in the IL the warning
> sees and points out:
>
>    <bb 4> [count: 0]:
>    memcpy (4B, &somedata[0], 6);        <<< -Warray-bounds
>    MEM[(struct hdr2 *)0B].h21 ={v} 0;   <<< causes path isolation
>    __builtin_trap ();                       and trap to be added
>
> Before the warning is issued GCC notices the branch of the code where
> l4data is null is invalid.  It splits it into two branches: valid and
> invalid and adds a trap after the invalid sequence for good measure.
> The warning (which runs much later) then discovers the isolated code
> is invalid and points it out.  The path isolation can be disabled by
> compiling with -fno-isolate-erroneous-paths-dereference.  That also
> avoids the warning.
>
> This is an example where the invalid code isolation seems to work at
> cross purposes with the warning (or at least without coordination
> with it).  It's also one that illustrates an inconsistency in
> the handling of invalid code (undefined behavior) in GCC.  Some
> instances are eliminated (optimized away), others are replaced with
> a trap, others left in place but followed by a trap (the store at
> address zero), and others still are just left in place (the invalid
> memcpy at address 4).
>
> More to your point, it's also an example where warnings issued for
> code that's reachable only under some conditions make it seem like
> they point out unconditional bugs.
>
> All these are known problems and all of them should be tackled
> somehow.  The last one (warning for conditional code) especially
> has been a sore point because it's in everyone's face.  A number
> of ideas have been thrown out over the years, starting with simply
> rephrasing the warning to be less affirmative.  E.g., instead of
> "memcpy writing..." print:
>
>    ‘memcpy’ may write 6 bytes into a region of size 0
>
> The problem with this suggestion is that because most instances of
> these warnings occur in some conditionally executed code, and because
> all GCC warnings are meant to be taken as possible (as opposed to
> definitive) indication of bugs, changing their phrasing wouldn't
> actually solve anything.  All it would do is replace one with
> the other, without helping you understand why or when the bug might
> happen.

I agree that changing the wording doesn't solve the whole problem,
but I think it does solve something.  At the moment, we're effectively
asking each individual user to be aware of the context above, to know what
meaning is being attached to the present tense.  Making the message more
equivocal would at least suggest that the compiler doesn't “know”.

It's not the pass's “fault” that it can't tell how closely the IL
it sees matches the original code.  But it is GCC's “fault” as a whole,
not the user's, and I think that's what matters here.

> What I think is needed here is to mention the conditions
> under which the invalid statement is executed.  With that, we should
> be able to print the same context as what the static analyzer prints:
>
> warning: dereference of NULL ‘0’ [CWE-476] [-Wanalyzer-null-dereference]
>     43 |     h2->h21 = 0;
>        |     ~~~~~~~~^~~
>    ‘foo’: events 1-3
>      |
>      |   39 |  *l4data = p.l4_ofs != UINT16_MAX ? (char *) p.base_ + 
> p.l4_ofs : NULL;
>      |      | 
>        ^
>      |      | 
>        |
>      |      | 
>        (1) following ‘false’ branch...
>
>      |......
>      |   42 | py(h2 + 1, &somedata[0], 6);
>      |      |    ~~~~~~ 
>
>      |      |       |
>      |      |       (2) ...to here
>
>      |   43 | h21 = 0;
>      |      | ~~~~~~~ 
>
>      |      |     |
>      |      |     (3) dereference of NULL ‘<unknown>’

I agree this would be better, but I don't think it should hold back
tweaks to the error messages in the meantime.  (And if the tracing
was an optional feature, we'd still want wording that makes sense
when the feature is turned off.)

I realise this is rehashing an old discussion, sorry.  But it seems
like it's a discussion that gets rehashed precisely because it's
about an issue that users keep hitting.

Thanks,
Richard

  parent reply	other threads:[~2022-02-01  5:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-28 15:01 Dumitru Ceara
2022-01-28 15:27 ` Segher Boessenkool
2022-01-28 16:16   ` Dumitru Ceara
2022-01-28 17:37     ` Segher Boessenkool
2022-01-31 21:49     ` Martin Sebor
2022-01-31 22:01       ` Segher Boessenkool
2022-02-01  8:21         ` Jonathan Wakely
2022-02-01  5:18       ` Richard Sandiford [this message]
2022-02-01 14:42         ` Segher Boessenkool
2022-02-01 23:54         ` Martin Sebor
2022-02-02 10:12           ` Jonathan Wakely
2022-02-02 18:33             ` Segher Boessenkool
2022-02-02 20:44               ` Jonathan Wakely
2022-02-02 18:23           ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpto83rgo5a.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=dceara@redhat.com \
    --cc=gcc-help@gcc.gnu.org \
    --cc=msebor@gmail.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).