public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Siddhesh Poyarekar <siddhesh@gotplt.org>,
	Martin Sebor <msebor@redhat.com>,
	 gcc-patches@gcc.gnu.org
Cc: jakub@redhat.com
Subject: Re: [PATCH v2] middle-end/104854: Limit strncmp overread warnings
Date: Thu, 17 Mar 2022 09:02:51 -0400	[thread overview]
Message-ID: <efb54703967c9a7c56ba19613acd827b582f743d.camel@redhat.com> (raw)
In-Reply-To: <01c49ecb-cd91-3bcc-c5a0-97e81281e1dd@gotplt.org>

On Tue, 2022-03-15 at 22:10 +0530, Siddhesh Poyarekar wrote:
> On 15/03/2022 21:09, Martin Sebor wrote:
> > The strncmp function takes arrays as arguments (not necessarily
> > strings).  The main purpose of the -Wstringop-overread warning
> > for calls to it is to detect calls where one of the arrays is
> > not a nul-terminated string and the bound is larger than the size
> > of the array.  For example:
> > 
> >    char a[4], b[4];
> > 
> >    int f (void)
> >    {
> >      return strncmp (a, b, 8);   // -Wstringop-overread
> >    }
> > 
> > Such a call is suspect: if one of the arrays isn't nul-terminated
> > the call is undefined.  Otherwise, if both are nul-terminated there
> 
> Isn't "suspect" too harsh a description though?  The bound does not 
> specify the size of a or b, it specifies the maximum extent to which to
> compare a and b, the extent being any application-specific limit.  In
> fact the limit could be the size of some arbitrary third buffer that
> the 
> contents of a or b must be copied to, truncating to the bound.
> 
> I agree the call is undefined if one of the arrays is not nul-
> terminated 
> and that's the thing; nothing about the bound is undefined in this 
> context, it's the NUL termination that is key.
> 
> > is no point in calling strncmp with a bound greater than their sizes.
> 
> There is, when the bound describes something else, e.g. the size of a
> third destination buffer into which one of the input buffers may get 
> copied into.  Or when the bound describes the maximum length of a set
> of 
> strings where only a subset of the strings are reachable in the current
> function and ranger sees it, allowing us to reduce our input string
> size 
> estimate.  The bounds being the maximum of the lengths of two input 
> strings is just one of many possibilities.
> 
> > With no evidence that this warning is ever harmful I'd consider
> 
> There is, the false positives were seen in Fedora/RHEL builds.
> 
> > suppressing it a regression.  Since the warning is a deliberate
> > feature in a released compiler and GCC is now in a regression
> > fixing stage, this patch is out of scope even if a case where
> > the warning wasn't helpful did turn up (none has been reported
> > so far).
> 
> Wait, I just reported an issue and it's across multiple packages in 
> Fedora/RHEL :)

I think having more examples of where this is happening on real-world
code might help this discussion.

Do you have some URLs? (e.g. bug reports, build logs, etc)

Reading through this thread, all of the examples I've seen look like
they've been constructed; I'm interested in seeing what this looks like
"in the wild" as it were.

Quoting myself in the UX Guidelines:
  https://gcc.gnu.org/onlinedocs/gccint/Guidelines-for-Diagnostics.html

"28.1.3 Try the diagnostic on real-world code

It’s worth testing a new warning on many instances of real-world code,
written by different people, and seeing what it complains about, and
what it doesn’t complain about.

This may suggest heuristics that silence common false positives.

It may also suggest ways to improve the precision of the message."

...and various other parts of those guidelines may apply here.

Hope this is constructive
Dave





> 
> I think this is a regression since gcc 11 due to misunderstanding the
> specification and assuming too strong a relationship between the size
> argument of strncmp (and indeed strnlen and strndup) and the size of 
> objects being passed to it.  Compliant code relies on the compiler to
> do 
> the right thing here, i.e. optimize the strncmp call to strcmp and not 
> panic about the size argument being larger than the input buffer size. 
> If at all such a diagnostic needs to stay, it ought to go into the 
> analyzer, where such looser heuristic suggestions are more acceptable
> and sometimes even appreciated.
> 
> FWIW, I'm open to splitting the warning levels as you suggested if 
> that's the consensus since it at least provides a way to make these 
> warnings saner. However I still haven't found the rationale presented
> so 
> far compelling enough to justify these false positives; I just don't
> see 
> a proportional enough reward.  Hopefully more people can chime in with 
> their perspective on this.
> 
> Thanks,
> Siddhesh
> 



      parent reply	other threads:[~2022-03-17 13:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11  7:12 [PATCH] " Siddhesh Poyarekar
2022-03-15  5:31 ` [PATCH v2] " Siddhesh Poyarekar
2022-03-15 15:39   ` Martin Sebor
2022-03-15 16:40     ` Siddhesh Poyarekar
2022-03-15 20:36       ` Martin Sebor
2022-03-16  1:24         ` Siddhesh Poyarekar
2022-03-16 23:41           ` Martin Sebor
2022-03-17  1:43             ` Siddhesh Poyarekar
2022-03-17 14:32               ` Aldy Hernandez
2022-03-17 15:31           ` Marek Polacek
2022-03-17 16:46             ` Jeff Law
2022-03-17 17:01               ` Andreas Schwab
2022-03-17 17:22               ` Siddhesh Poyarekar
2022-03-17 17:51                 ` Martin Sebor
2022-03-17 18:02                   ` Siddhesh Poyarekar
2022-03-17 18:13                     ` Martin Sebor
2022-03-17 10:35         ` Jonathan Wakely
2022-03-25 13:26           ` Jason Merrill
2022-03-25 14:14             ` Siddhesh Poyarekar
2022-03-17 13:02       ` David Malcolm [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efb54703967c9a7c56ba19613acd827b582f743d.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=msebor@redhat.com \
    --cc=siddhesh@gotplt.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).