public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <law@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Mon, 08 Oct 2018 21:40:00 -0000	[thread overview]
Message-ID: <a188650e-eda3-28f4-c00f-30f7db26a487@gmail.com> (raw)
In-Reply-To: <CAFiYyc0NZ4tLbEmLgczef9zzMiwmMbENXNdiHPOHt6AB5OL83w@mail.gmail.com>

On 10/08/2018 04:05 AM, Richard Biener wrote:
> On Thu, Oct 4, 2018 at 5:51 PM Martin Sebor <msebor@gmail.com> wrote:
>>
>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>
>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>> adds a terminating nul.  This only works when the next
>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>> followed by the nul assignment:
>>>>>>>>
>>>>>>>>   const char s[] = "12345";
>>>>>>>>   char d[3];
>>>>>>>>
>>>>>>>>   void f (void)
>>>>>>>>   {
>>>>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>>>>     d[sizeof d - 1] = 0;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>> rarely used function that is often misused), getting
>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>> early on seems like a reasonable compromise.  I fear that
>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>
>>>>>>>> Tested on x86_64-linux.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>> be avoided.  I xfailed them in the test for now but will
>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>
>>>>>>>> gcc-87028.diff
>>>>>>>>
>>>>>>>>
>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>>>>       statement doesn't belong to a basic block.
>>>>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>>>>       the left hand side of assignment.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>       PR tree-optimization/87028
>>>>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>
>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
>>>>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>>>>      return false;
>>>>>>>>
>>>>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>>>>> +     block is reachable.  */
>>>>>>>> +  if (!gimple_bb (stmt))
>>>>>>>> +    return false;
>>>>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>>>>> in practice.
>>>>>>
>>>>>> Please do not add 'cfun' references.  Note that the next stmt is also accessible
>>>>>> when there is no CFG.  I guess the issue is that we fold this during
>>>>>> gimplification where the next stmt is not yet "there" (but still in GENERIC)?
>>>>> That was my assumption.  I almost suggested peeking at gsi_next and
>>>>> avoiding in that case.
>>>>
>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>> So I think the concern with adding the guards to maybe_fold_stmt is the
>>> possibility of further fallout.
>>>
>>> I guess they could be written to target this case specifically to
>>> minimize fallout, but that feels like we're doing the same thing
>>> (band-aid) just in a different place.
>>>
>>>
>>>
>>>>
>>>>>>
>>>>>> We generally do not want to have unfolded stmts in the IL when we can avoid that
>>>>>> which is why we fold most stmts during gimplification.  We also do that because
>>>>>> we now do less folding on GENERIC.
>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>> plenty of opportunities to fold it later.
>>>>
>>>> Well - we do.  The very first one is forwprop though which means we'll miss to
>>>> re-write some memcpy parts into SSA:
>>>>
>>>>           NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>           /* After CCP we rewrite no longer addressed locals into SSA
>>>>              form if possible.  */
>>>>           NEXT_PASS (pass_forwprop);
>>>>
>>>> likewise early object-size will be confused by memcpy calls that just exist
>>>> to avoid TBAA issues (another of our recommendations besides using unions).
>>>>
>>>> We do fold mem* early for a reason ;)
>>>>
>>>> "We can always do warnings earlier" would be a similar true sentence.
>>> I'm not disagreeing at all.  There's a natural tension between the
>>> benefits of folding early to enable more optimizations downstream and
>>> leaving the IL in a state where we can give actionable warnings.
>>
>> Similar trade-offs between folding early and losing information
>> as a result also impact high-level optimizations.
>>
>> For instance, folding the strlen argument below
>>
>>    void f3 (struct A* p)
>>    {
>>      __builtin_strcpy (p->a, "123");
>>
>>      if (__builtin_strlen (p->a + 1) != 2)   // not folded
>>        __builtin_abort ();
>>    }
>>
>> into
>>
>>    _2 = &MEM[(void *)p_4(D) + 2B];
>>
>> early on defeats the strlen optimization because there is no
>> mechanism to determine what member (void *)p_4(D) + 2B refers
>> to (this is bug 86955).
>>
>> Another example is folding of strlen calls with no-nconstant
>> offsets into constant strings like here:
>>
>>    const char a[] = "123";
>>
>>    void f (int i)
>>    {
>>      if (__builtin_strlen (&a[i]) > 3)
>>        __builtin_abort ();
>>    }
>>
>> into sizeof a - 1 - i, which then prevents the result from
>> being folded to false  (bug 86434), not to mention the code
>> it emits for out-of-bounds indices.
>>
>> There are a number of other similar examples in Bugzilla
>> that I've filed as I discovered then during testing my
>> warnings (e.g., 86572).
>>
>> In my mind, transforming library calls into "lossy" low-level
>> primitives like MEM_REF would be better done only after higher
>> level optimizations have had a chance to analyze them.
>
> The issue is mostly inlining heuristics.  Not doing the transformation
> might end up not inlining the function which in turn might defeat
> having more context for your warning analysis...
>
> So it's a chicken-and-egg issue for diagnostics (you run them
> later because you do want inlining and optimization).
>
> And it's an important missed optimization for removing abstraction.
> IPA inlining runs very early.
>
> So IMHO the only sensible option is to do your warning analysis
> in an early IPA phase where you can also freely clone contexts
> (do "virtual" inlining) based on heuristics driven by diagnostic
> needs rather than relying on optimization heuristics to match
> yours.  The IPA phase would necessarily be the
> "all_small_ipa_passes" one, and placement needs to be before
> pass_local_optimization_passes.

That might work for some of the strncpy truncation warnings,
but it won't help with this problem (87028) because the folding
happens during gimplification.  Holding off on the folding until
the CFG has been constructed would help, and it shouldn't have
a noticeable impact -- the calls will still be folded.

Detecting some of the most serious bugs like buffer overflow
in functions like strcpy or sprintf in all but the most trivial
cases depends on the strlen, sprintf, and object size passes.
All those run much later than pass_local_optimization_passes.

Martin

  reply	other threads:[~2018-10-08 21:30 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24 15:58 Martin Sebor
2018-08-26  5:25 ` Jeff Law
2018-08-27  8:30   ` Richard Biener
2018-08-27 15:32     ` Jeff Law
2018-08-27 15:43       ` Richard Biener
2018-10-04 15:51         ` Jeff Law
2018-10-04 15:55           ` Martin Sebor
2018-10-08 10:14             ` Richard Biener
2018-10-08 21:40               ` Martin Sebor [this message]
2018-10-16 22:42             ` Jeff Law
2018-10-21  8:17               ` Martin Sebor
2018-10-31 17:07                 ` [PING #3][PATCH] " Martin Sebor
2018-11-16  3:12                   ` [PING #4][PATCH] " Martin Sebor
2018-11-16  9:07                     ` Richard Biener
2018-11-29 20:34                       ` Martin Sebor
2018-11-29 23:07                         ` Jeff Law
2018-11-29 23:43                           ` Martin Sebor
2018-11-30  2:02                             ` Jeff Law
2018-11-30  8:05                               ` Richard Biener
2018-11-30  8:30                                 ` Jakub Jelinek
2018-12-05 23:11                             ` Jeff Law
2018-12-06 13:00                               ` Christophe Lyon
2018-12-06 13:52                                 ` Jeff Law
2018-11-30  7:57                         ` Richard Biener
2018-11-30 15:51                           ` Martin Sebor
2018-11-07 21:28                 ` [PATCH] " Jeff Law
2018-11-09  1:25                   ` Martin Sebor
2018-10-04 19:55           ` Joseph Myers
2018-08-27 16:27     ` Martin Sebor
2018-08-28  4:27       ` Jeff Law
2018-08-28  9:56         ` Richard Biener
2018-08-28  9:57           ` Richard Biener
2018-08-29  0:12           ` Martin Sebor
2018-08-29  7:29             ` Richard Biener
2018-08-29 15:43               ` Martin Sebor
2018-08-30  0:27             ` Jeff Law
2018-08-30  8:48               ` Richard Biener
2018-09-12 15:50             ` Martin Sebor
2018-09-18  1:56             ` Jeff Law
2018-09-21 17:40               ` Martin Sebor
2018-10-01 21:31                 ` [PING] " Martin Sebor
2018-10-08 22:15                   ` Martin Sebor
2018-10-04 15:52             ` Jeff Law
2018-08-28 20:44         ` Martin Sebor
2018-08-28 22:17           ` Jeff Law
2018-08-27 20:31   ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a188650e-eda3-28f4-c00f-30f7db26a487@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).