public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Tue, 28 Aug 2018 20:44:00 -0000	[thread overview]
Message-ID: <2fb5625b-22d2-e80a-9320-d931a3cb2343@gmail.com> (raw)
In-Reply-To: <fd26f079-1411-4f35-d0d2-cf4aabac2d7c@redhat.com>

On 08/27/2018 10:27 PM, Jeff Law wrote:
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>> the next statement after a truncating strncpy to see if it
>>>>> adds a terminating nul.  This only works when the next
>>>>> statement can be reached using the Gimple statement iterator
>>>>> which isn't until after gimplification.  As a result, strncpy
>>>>> calls that truncate their constant argument that are being
>>>>> folded to memcpy this early get diagnosed even if they are
>>>>> followed by the nul assignment:
>>>>>
>>>>>   const char s[] = "12345";
>>>>>   char d[3];
>>>>>
>>>>>   void f (void)
>>>>>   {
>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>>     d[sizeof d - 1] = 0;
>>>>>   }
>>>>>
>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>> is in can be used to try to reach the next statement (this
>>>>> happens as early as ccp1).  I'm aware of the preference to
>>>>> fold things early but in the case of strncpy (a relatively
>>>>> rarely used function that is often misused), getting
>>>>> the warning right while folding a bit later but still fairly
>>>>> early on seems like a reasonable compromise.  I fear that
>>>>> otherwise, the false positives will drive users to adopt
>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>> bugs cannot be as readily detected.
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> PS There still are outstanding cases where the warning can
>>>>> be avoided.  I xfailed them in the test for now but will
>>>>> still try to get them to work for GCC 9.
>>>>>
>>>>> gcc-87028.diff
>>>>>
>>>>>
>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>> strncpy with global variable source string
>>>>> gcc/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>>       statement doesn't belong to a basic block.
>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>>       the left hand side of assignment.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>       PR tree-optimization/87028
>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>
>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>> index 07341eb..284c2fb 100644
>>>>> --- a/gcc/gimple-fold.c
>>>>> +++ b/gcc/gimple-fold.c
>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>> (gimple_stmt_iterator *gsi,
>>>>>    if (tree_int_cst_lt (ssize, len))
>>>>>      return false;
>>>>>
>>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>>> +     block is reachable.  */
>>>>> +  if (!gimple_bb (stmt))
>>>>> +    return false;
>>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>>> in practice.
>>>
>>> Please do not add 'cfun' references.  Note that the next stmt is also
>>> accessible
>>> when there is no CFG.  I guess the issue is that we fold this during
>>> gimplification
>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>
>>> We generally do not want to have unfolded stmts in the IL when we can
>>> avoid that
>>> which is why we fold most stmts during gimplification.  We also do
>>> that because
>>> we now do less folding on GENERIC.
>>>
>>> There may be the possibility to refactor gimplification time folding
>>> to what we
>>> do during inlining - queue stmts we want to fold and perform all
>>> folding delayed.
>>> This of course means bigger compile-time due to cache effects.
>>>
>>>>
>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>> index d0792aa..f1988f6 100644
>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>         && known_eq (dstoff, lhsoff)
>>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>>       return false;
>>>>> +
>>>>> +      if (code == MEM_REF
>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>>> +       && known_eq (dstoff, lhsoff))
>>>>> +     {
>>>>> +       /* Extract the referenced variable from something like
>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>> +       if (gimple_nop_p (def))
>>>>> +         {
>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>>> +           if (lhsbase
>>>>> +               && dstbase
>>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>>> +             return false;
>>>>> +         }
>>>>> +     }
>>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>>> the gimple (with virtual operands).  BUt at some level what you really
>>>> want to do is make sure the base of the MEM_REF is the same as what got
>>>> passed as the destination of the strncpy.  You'd want to be testing
>>>> SSA_NAMEs in that case.
>>>
>>> Yes.  Why not simply compare the SSA names?  Why would it be
>>> not OK to do that when !lhsbase?
>>
>> The added code handles this case:
>>
>>   void f (char *d)
>>   {
>>     __builtin_strncpy (d, "12345", 4);
>>     d[3] = 0;
>>   }
>>
>> where during forwprop we see:
>>
>>   __builtin_strncpy (d_3(D), "12345", 4);
>>   MEM[(char *)d_3(D) + 3B] = 0;
>>
>> The next statement after the strncpy is the assignment whose
>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
>> is no other information in the GIMPLE_NOP that I can see to
>> tell that the operand is d_3(D) or that it's the same as
>> the strncpy argument (i.e., the PARAM_DECl d).  Having to
>> do open-code this all the time seems so cumbersome -- is
>> there some API that would do this for me?  (I thought
>> get_addr_base_and_unit_offset was that API but clearly in
>> this case it doesn't do what I expect -- it just returns
>> the argument.)
>
> I think you need to look harder at that MEM_REF.  It references d_3.
> That's what you need to be checking.  The base (d_3) is the first
> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>
> (gdb) p debug_gimple_stmt ($2)
> # .MEM_5 = VDEF <.MEM_4>
> MEM[(char *)d_3(D) + 3B] = 0;
>
>
> (gdb) p gimple_assign_lhs ($2)
> $5 = (tree_node *) 0x7ffff01a6208
>
> (gdb) p debug_tree ($5)
>  <mem_ref 0x7ffff01a6208
>     type <integer_type 0x7ffff00723f0 char public string-flag QI
>         size <integer_cst 0x7ffff0059d80 constant 8>
>         unit-size <integer_cst 0x7ffff0059d98 constant 1>
>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
> <integer_cst 0x7ffff0059df8 127>
>         pointer_to_this <pointer_type 0x7ffff007de70>>
>
>     arg:0 <ssa_name 0x7ffff0063dc8
>         type <pointer_type 0x7ffff007de70 type <integer_type
> 0x7ffff00723f0 char>
>             public unsigned DI
>             size <integer_cst 0x7ffff0059c90 constant 64>
>             unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>             align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>         visited var <parm_decl 0x7ffff01a5000 d>
>         def_stmt GIMPLE_NOP
>         version:3>
>     arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
> constant 3>
>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>
>
> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

The d in the MEM_REF you see in the dump above is the SSA_NAME's
SSA_NAME_VAR:

           visited var <parm_decl 0x7ffff01a5000 d>

Here's the print_node() code that prints it:

	  print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);

There is nothing else in the MEM_REF operand that tells me that.
Why is it wrong to look at the SSA_NAME_VAR?

> (gdb) p debug_tree (lhsbase)
> <ssa_name 0x7ffff0063dc8
>     type <pointer_type 0x7ffff007de70
>         type <integer_type 0x7ffff00723f0 char public string-flag QI
>             size <integer_cst 0x7ffff0059d80 constant 8>
>             unit-size <integer_cst 0x7ffff0059d98 constant 1>
>             align:8 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>             pointer_to_this <pointer_type 0x7ffff007de70>>
>         public unsigned DI
>         size <integer_cst 0x7ffff0059c90 constant 64>
>         unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1
> canonical-type 0x7ffff007de70 reference_to_this <reference_type
> 0x7ffff017d738>>
>     visited var <parm_decl 0x7ffff01a5000 d>
>     def_stmt GIMPLE_NOP
>     version:3>
> Sadly, dstbase is the PARM_DECL for d.  That's where things are going
> "wrong".

As Richard observed, that's because get_attr_nonstring_decl()
returns the DECL that the expression refers to.  It does that
because that's where it looks for attribute nonstring, and so
that the warning can mention the DECL with the attribute.

I suppose since I'm not supposed to be using SSA_NAME_VAR
(I still don't understand why it's taboo) I'll have to avoid
using the get_attr_nonstring_decl() return value and instead
look into comparing the SSA_NAMEs.

Martin

> Not sure why you're getting the PARM_DECL in that case.  I'd
> debug get_addr_base_and_unit_offset to understand what's going on.
> Essentially you're getting different results of
> get_addr_base_and_unit_offset in a case where they arguably should be
> the same.
>
> Jeff
>
> Jeff
>

  parent reply	other threads:[~2018-08-28 20:44 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24 15:58 Martin Sebor
2018-08-26  5:25 ` Jeff Law
2018-08-27  8:30   ` Richard Biener
2018-08-27 15:32     ` Jeff Law
2018-08-27 15:43       ` Richard Biener
2018-10-04 15:51         ` Jeff Law
2018-10-04 15:55           ` Martin Sebor
2018-10-08 10:14             ` Richard Biener
2018-10-08 21:40               ` Martin Sebor
2018-10-16 22:42             ` Jeff Law
2018-10-21  8:17               ` Martin Sebor
2018-10-31 17:07                 ` [PING #3][PATCH] " Martin Sebor
2018-11-16  3:12                   ` [PING #4][PATCH] " Martin Sebor
2018-11-16  9:07                     ` Richard Biener
2018-11-29 20:34                       ` Martin Sebor
2018-11-29 23:07                         ` Jeff Law
2018-11-29 23:43                           ` Martin Sebor
2018-11-30  2:02                             ` Jeff Law
2018-11-30  8:05                               ` Richard Biener
2018-11-30  8:30                                 ` Jakub Jelinek
2018-12-05 23:11                             ` Jeff Law
2018-12-06 13:00                               ` Christophe Lyon
2018-12-06 13:52                                 ` Jeff Law
2018-11-30  7:57                         ` Richard Biener
2018-11-30 15:51                           ` Martin Sebor
2018-11-07 21:28                 ` [PATCH] " Jeff Law
2018-11-09  1:25                   ` Martin Sebor
2018-10-04 19:55           ` Joseph Myers
2018-08-27 16:27     ` Martin Sebor
2018-08-28  4:27       ` Jeff Law
2018-08-28  9:56         ` Richard Biener
2018-08-28  9:57           ` Richard Biener
2018-08-29  0:12           ` Martin Sebor
2018-08-29  7:29             ` Richard Biener
2018-08-29 15:43               ` Martin Sebor
2018-08-30  0:27             ` Jeff Law
2018-08-30  8:48               ` Richard Biener
2018-09-12 15:50             ` Martin Sebor
2018-09-18  1:56             ` Jeff Law
2018-09-21 17:40               ` Martin Sebor
2018-10-01 21:31                 ` [PING] " Martin Sebor
2018-10-08 22:15                   ` Martin Sebor
2018-10-04 15:52             ` Jeff Law
2018-08-28 20:44         ` Martin Sebor [this message]
2018-08-28 22:17           ` Jeff Law
2018-08-27 20:31   ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fb5625b-22d2-e80a-9320-d931a3cb2343@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).