public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Martin Sebor <msebor@gmail.com>
Cc: Jeff Law <law@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PING #4][PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Fri, 30 Nov 2018 07:57:00 -0000	[thread overview]
Message-ID: <CAFiYyc3WEae8XzedLNd6kVWjSnRkWnuDVoHzrXy5b=1+cVZnDg@mail.gmail.com> (raw)
In-Reply-To: <d51af7f4-f1bb-fbea-bd7e-b67d4632f0a5@gmail.com>

On Thu, Nov 29, 2018 at 9:34 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 11/16/2018 02:07 AM, Richard Biener wrote:
> > On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
> >>
> >> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >>
> >> Please let me know if there is something I need to change here
> >> to make the fix acceptable or if I should stop trying.
> >
> > I have one more comment about
> >
> > +  /* Defer warning (and folding) until the next statement in the basic
> > +     block is reachable.  */
> > +  if (!gimple_bb (stmt))
> > +    return false;
> > +
> >
> > it's not about the next statement in the basic-block being "reachable"
> > (even w/o a CFG you can use gsi_next()) but rather that the next
> > stmt isn't yet gimplified and thus not inserted into the gimple sequence,
> > right?
>
> No, it's about the current statement not being associated with
> a basic block yet when the warning code runs for the first time
> (during gimplify_expr), and so gsi_next() returning null.
>
> > You apply this to gimple_fold_builtin_strncpy but I'd rather
> > see us not sprinkling this over gimple-fold.c but instead do this
> > in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
> >
> > See the attached (untested).
>
> I would also prefer this solution.  I had tested it (in response
> to you first mentioning it back in September) and it causes quite
> a bit of fallout in tests that look for the folding to take place
> very early.  See the end of my reply here:
>
>    https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
>
> But I'm willing to do the test suite cleanup if you think it's
> suitable for GCC 9.  (If you're thinking GCC 10 please let me
> know now.)

I very much prefer that to the hacks in gimple-fold.c if it doesn't
help now then I'll rather live with some bogus warnings for GCC 9
and fix it up properly for GCC 10.

I expect the fallout to be quite minimal (also considering my
suggestion to do the folding in gimple-low.c).

Richard.

> Thanks
> Martin
>
> >
> > Richard.
> >
> >
> >
> >> On 10/31/2018 10:33 AM, Martin Sebor wrote:
> >>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
> >>>
> >>> On 10/20/2018 06:01 PM, Martin Sebor wrote:
> >>>> On 10/16/2018 03:21 PM, Jeff Law wrote:
> >>>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
> >>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
> >>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
> >>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
> >>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
> >>>>>>>>>>>> the next statement after a truncating strncpy to see if it
> >>>>>>>>>>>> adds a terminating nul.  This only works when the next
> >>>>>>>>>>>> statement can be reached using the Gimple statement iterator
> >>>>>>>>>>>> which isn't until after gimplification.  As a result, strncpy
> >>>>>>>>>>>> calls that truncate their constant argument that are being
> >>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
> >>>>>>>>>>>> followed by the nul assignment:
> >>>>>>>>>>>>
> >>>>>>>>>>>>   const char s[] = "12345";
> >>>>>>>>>>>>   char d[3];
> >>>>>>>>>>>>
> >>>>>>>>>>>>   void f (void)
> >>>>>>>>>>>>   {
> >>>>>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >>>>>>>>>>>>     d[sizeof d - 1] = 0;
> >>>>>>>>>>>>   }
> >>>>>>>>>>>>
> >>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
> >>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
> >>>>>>>>>>>> is in can be used to try to reach the next statement (this
> >>>>>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
> >>>>>>>>>>>> fold things early but in the case of strncpy (a relatively
> >>>>>>>>>>>> rarely used function that is often misused), getting
> >>>>>>>>>>>> the warning right while folding a bit later but still fairly
> >>>>>>>>>>>> early on seems like a reasonable compromise.  I fear that
> >>>>>>>>>>>> otherwise, the false positives will drive users to adopt
> >>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
> >>>>>>>>>>>> bugs cannot be as readily detected.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tested on x86_64-linux.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Martin
> >>>>>>>>>>>>
> >>>>>>>>>>>> PS There still are outstanding cases where the warning can
> >>>>>>>>>>>> be avoided.  I xfailed them in the test for now but will
> >>>>>>>>>>>> still try to get them to work for GCC 9.
> >>>>>>>>>>>>
> >>>>>>>>>>>> gcc-87028.diff
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
> >>>>>>>>>>>> strncpy with global variable source string
> >>>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>>
> >>>>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
> >>>>>>>>>>>> folding when
> >>>>>>>>>>>>       statement doesn't belong to a basic block.
> >>>>>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
> >>>>>>>>>>>> MEM_REF on
> >>>>>>>>>>>>       the left hand side of assignment.
> >>>>>>>>>>>>
> >>>>>>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>>>>>
> >>>>>>>>>>>>       PR tree-optimization/87028
> >>>>>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >>>>>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >>>>>>>>>>>> index 07341eb..284c2fb 100644
> >>>>>>>>>>>> --- a/gcc/gimple-fold.c
> >>>>>>>>>>>> +++ b/gcc/gimple-fold.c
> >>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >>>>>>>>>>>> (gimple_stmt_iterator *gsi,
> >>>>>>>>>>>>    if (tree_int_cst_lt (ssize, len))
> >>>>>>>>>>>>      return false;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +  /* Defer warning (and folding) until the next statement in the
> >>>>>>>>>>>> basic
> >>>>>>>>>>>> +     block is reachable.  */
> >>>>>>>>>>>> +  if (!gimple_bb (stmt))
> >>>>>>>>>>>> +    return false;
> >>>>>>>>>>> I think you want cfun->cfg as the test here.  They should be
> >>>>>>>>>>> equivalent
> >>>>>>>>>>> in practice.
> >>>>>>>>>>
> >>>>>>>>>> Please do not add 'cfun' references.  Note that the next stmt is
> >>>>>>>>>> also accessible
> >>>>>>>>>> when there is no CFG.  I guess the issue is that we fold this
> >>>>>>>>>> during
> >>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
> >>>>>>>>>> GENERIC)?
> >>>>>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
> >>>>>>>>> avoiding in that case.
> >>>>>>>>
> >>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
> >>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
> >>>>>>> the
> >>>>>>> possibility of further fallout.
> >>>>>>>
> >>>>>>> I guess they could be written to target this case specifically to
> >>>>>>> minimize fallout, but that feels like we're doing the same thing
> >>>>>>> (band-aid) just in a different place.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
> >>>>>>>>>> can avoid that
> >>>>>>>>>> which is why we fold most stmts during gimplification.  We also do
> >>>>>>>>>> that because
> >>>>>>>>>> we now do less folding on GENERIC.
> >>>>>>>>> But an unfolded call in the IL should always be safe and we've got
> >>>>>>>>> plenty of opportunities to fold it later.
> >>>>>>>>
> >>>>>>>> Well - we do.  The very first one is forwprop though which means
> >>>>>>>> we'll miss to
> >>>>>>>> re-write some memcpy parts into SSA:
> >>>>>>>>
> >>>>>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
> >>>>>>>>           /* After CCP we rewrite no longer addressed locals into SSA
> >>>>>>>>              form if possible.  */
> >>>>>>>>           NEXT_PASS (pass_forwprop);
> >>>>>>>>
> >>>>>>>> likewise early object-size will be confused by memcpy calls that just
> >>>>>>>> exist
> >>>>>>>> to avoid TBAA issues (another of our recommendations besides using
> >>>>>>>> unions).
> >>>>>>>>
> >>>>>>>> We do fold mem* early for a reason ;)
> >>>>>>>>
> >>>>>>>> "We can always do warnings earlier" would be a similar true sentence.
> >>>>>>> I'm not disagreeing at all.  There's a natural tension between the
> >>>>>>> benefits of folding early to enable more optimizations downstream and
> >>>>>>> leaving the IL in a state where we can give actionable warnings.
> >>>>>>
> >>>>>> Similar trade-offs between folding early and losing information
> >>>>>> as a result also impact high-level optimizations.
> >>>>>>
> >>>>>> For instance, folding the strlen argument below
> >>>>>>
> >>>>>>   void f3 (struct A* p)
> >>>>>>   {
> >>>>>>     __builtin_strcpy (p->a, "123");
> >>>>>>
> >>>>>>     if (__builtin_strlen (p->a + 1) != 2)   // not folded
> >>>>>>       __builtin_abort ();
> >>>>>>   }
> >>>>>>
> >>>>>> into
> >>>>>>
> >>>>>>   _2 = &MEM[(void *)p_4(D) + 2B];
> >>>>>>
> >>>>>> early on defeats the strlen optimization because there is no
> >>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
> >>>>>> to (this is bug 86955).
> >>>>>>
> >>>>>> Another example is folding of strlen calls with no-nconstant
> >>>>>> offsets into constant strings like here:
> >>>>>>
> >>>>>>   const char a[] = "123";
> >>>>>>
> >>>>>>   void f (int i)
> >>>>>>   {
> >>>>>>     if (__builtin_strlen (&a[i]) > 3)
> >>>>>>       __builtin_abort ();
> >>>>>>   }
> >>>>>>
> >>>>>> into sizeof a - 1 - i, which then prevents the result from
> >>>>>> being folded to false  (bug 86434), not to mention the code
> >>>>>> it emits for out-of-bounds indices.
> >>>>>>
> >>>>>> There are a number of other similar examples in Bugzilla
> >>>>>> that I've filed as I discovered then during testing my
> >>>>>> warnings (e.g., 86572).
> >>>>>>
> >>>>>> In my mind, transforming library calls into "lossy" low-level
> >>>>>> primitives like MEM_REF would be better done only after higher
> >>>>>> level optimizations have had a chance to analyze them.  Ditto
> >>>>>> for other similar transformations (like to other library calls).
> >>>>>> Having more accurate information helps both optimization and
> >>>>>> warnings.  It also makes the warnings more meaningful.
> >>>>>> Printing "memcpy overflows a buffer" when the source code
> >>>>>> has a call to strncpy is less than ideal.
> >>>>>>
> >>>>>>> Similarly there's a natural tension between warning early vs warning
> >>>>>>> late.  Code that triggers the warning may ultimately be proved
> >>>>>>> unreachable, or we may discover simplifications that either
> >>>>>>> suppress or
> >>>>>>> expose a warning.
> >>>>>>>
> >>>>>>> There is no easy answer here.  But I think we can legitimately ask
> >>>>>>> questions.  ie, does folding strnlen here really improve things
> >>>>>>> downstream in ways that are measurable?  Does the false positive
> >>>>>>> really
> >>>>>>> impact the utility of the warning?  etc.
> >>>>>>>
> >>>>>>> I'd hazard a guess that Martin is particularly sensitive to false
> >>>>>>> positives based on feedback he's received from our developer community
> >>>>>>> as well as downstream consumers of his work.
> >>>>>>
> >>>>>> Yes.  The kernel folks in particular have done a lot of work
> >>>>>> cleaning up their code in an effort to adopt the warning and
> >>>>>> attribute nonstring.  They have been keeping me in the loop
> >>>>>> on their progress (and feeding me back test cases with false
> >>>>>> positives and negatives they run into).
> >>>>> I can't recall seeing further guidance from Richi WRT putting the checks
> >>>>> earlier (maybe_fold_stmt).
> >>>>>
> >>>>> If the point here is to avoid false positives by not folding strncpy,
> >>>>> particularly in cases where we don't see the NUL in the copy, but it
> >>>>> appears in a subsequent store, then let's be fairly selective (so as not
> >>>>> to muck up things on the optimization side more than is necessary).
> >>>>>
> >>>>> ISTM we can do this by refactoring the warning bits so they're reusable
> >>>>> at different points in the pipeline.  Those bits would always return a
> >>>>> boolean indicating if the given statement might generate a warning or
> >>>>> not.
> >>>>>
> >>>>> When called early, they would not actually issue any warning.  They
> >>>>> would merely do the best analysis they can and return a status
> >>>>> indicating whether or not the statement would generate a warning given
> >>>>> current context.  The goal here is to leave statements that might
> >>>>> generate a warning as-is in the IL.
> >>>>>
> >>>>> When called late (assuming there is a point where we can walk the IL and
> >>>>> issue the appropriate warnings), the routine would actually issue the
> >>>>> warning.
> >>>>>
> >>>>> The kind of structure could potentially work for other builtins where we
> >>>>> may need to look at subsequent statements to avoid false positives, but
> >>>>> early folding hides cases by transforming the call into an undesirable
> >>>>> form.
> >>>>>
> >>>>> Note that for cases where a call looks problematical early because we
> >>>>> can't see statement which stores the terminator, but where the
> >>>>> terminator statement ultimately becomes visible, we still get folding,
> >>>>> it just happens later in the pipeline.
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> The warning only triggers when the bound is less than or equal
> >>>> to the length of the constant source string (i.e, when strncpy
> >>>> truncates).  So IIUC, your suggestion would defer folding only
> >>>> such strncpy calls and let gimple_fold_builtin_strncpy fold
> >>>> those with a constant bound that's greater than the length of
> >>>> the constant source string.  That would be fine with me, but
> >>>> since strncpy calls with a bound that's greater than the length
> >>>> of the source are pointless I don't think they are important
> >>>> enough to worry about folding super early.  The constant ones
> >>>> that serve any purpose (and that are presumably important to
> >>>> optimize) are those that truncate.
> >>>>
> >>>> That said, when optimization isn't enabled, I don't think users
> >>>> expect calls to library functions to be transformed to calls to
> >>>> other  functions, or inlined.  Yet that's just what GCC does.
> >>>> For example, besides triggering the warning, the following:
> >>>>
> >>>>   char a[4];
> >>>>
> >>>>   void f (char *s)
> >>>>   {
> >>>>     __builtin_strncpy (a, "1234", sizeof a);
> >>>>     a[3] = 0;
> >>>>   }
> >>>>
> >>>> is transformed, even at -O0, into:
> >>>>
> >>>>   f (char * s)
> >>>>   {
> >>>>     <bb 2> :
> >>>>     MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
> >>>>     a[3] = 0;
> >>>>     return;
> >>>>   }
> >>>>
> >>>> That doesn't seem right.  GCC should avoid these transformations
> >>>> at -O0, and one way to do that is to defer folding until the CFG
> >>>> is constructed.  The patch does it for strncpy but a more general
> >>>> solution would do that for all calls, e.g., in maybe_fold_stmt
> >>>> as Richard suggested (and I subsequently tested).
> >>>>
> >>>> Martin
> >>>
> >>
>

  parent reply	other threads:[~2018-11-30  7:57 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24 15:58 [PATCH] " Martin Sebor
2018-08-26  5:25 ` Jeff Law
2018-08-27  8:30   ` Richard Biener
2018-08-27 15:32     ` Jeff Law
2018-08-27 15:43       ` Richard Biener
2018-10-04 15:51         ` Jeff Law
2018-10-04 15:55           ` Martin Sebor
2018-10-08 10:14             ` Richard Biener
2018-10-08 21:40               ` Martin Sebor
2018-10-16 22:42             ` Jeff Law
2018-10-21  8:17               ` Martin Sebor
2018-10-31 17:07                 ` [PING #3][PATCH] " Martin Sebor
2018-11-16  3:12                   ` [PING #4][PATCH] " Martin Sebor
2018-11-16  9:07                     ` Richard Biener
2018-11-29 20:34                       ` Martin Sebor
2018-11-29 23:07                         ` Jeff Law
2018-11-29 23:43                           ` Martin Sebor
2018-11-30  2:02                             ` Jeff Law
2018-11-30  8:05                               ` Richard Biener
2018-11-30  8:30                                 ` Jakub Jelinek
2018-12-05 23:11                             ` Jeff Law
2018-12-06 13:00                               ` Christophe Lyon
2018-12-06 13:52                                 ` Jeff Law
2018-11-30  7:57                         ` Richard Biener [this message]
2018-11-30 15:51                           ` Martin Sebor
2018-11-07 21:28                 ` [PATCH] " Jeff Law
2018-11-09  1:25                   ` Martin Sebor
2018-10-04 19:55           ` Joseph Myers
2018-08-27 16:27     ` Martin Sebor
2018-08-28  4:27       ` Jeff Law
2018-08-28  9:56         ` Richard Biener
2018-08-28  9:57           ` Richard Biener
2018-08-29  0:12           ` Martin Sebor
2018-08-29  7:29             ` Richard Biener
2018-08-29 15:43               ` Martin Sebor
2018-08-30  0:27             ` Jeff Law
2018-08-30  8:48               ` Richard Biener
2018-09-12 15:50             ` Martin Sebor
2018-09-18  1:56             ` Jeff Law
2018-09-21 17:40               ` Martin Sebor
2018-10-01 21:31                 ` [PING] " Martin Sebor
2018-10-08 22:15                   ` Martin Sebor
2018-10-04 15:52             ` Jeff Law
2018-08-28 20:44         ` Martin Sebor
2018-08-28 22:17           ` Jeff Law
2018-08-27 20:31   ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc3WEae8XzedLNd6kVWjSnRkWnuDVoHzrXy5b=1+cVZnDg@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).