public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>,
	Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Tue, 28 Aug 2018 04:27:00 -0000	[thread overview]
Message-ID: <fd26f079-1411-4f35-d0d2-cf4aabac2d7c@redhat.com> (raw)
In-Reply-To: <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com>

On 08/27/2018 10:27 AM, Martin Sebor wrote:
> On 08/27/2018 02:29 AM, Richard Biener wrote:
>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>> The warning suppression for -Wstringop-truncation looks for
>>>> the next statement after a truncating strncpy to see if it
>>>> adds a terminating nul.  This only works when the next
>>>> statement can be reached using the Gimple statement iterator
>>>> which isn't until after gimplification.  As a result, strncpy
>>>> calls that truncate their constant argument that are being
>>>> folded to memcpy this early get diagnosed even if they are
>>>> followed by the nul assignment:
>>>>
>>>>   const char s[] = "12345";
>>>>   char d[3];
>>>>
>>>>   void f (void)
>>>>   {
>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>>>     d[sizeof d - 1] = 0;
>>>>   }
>>>>
>>>> To avoid the warning I propose to defer folding strncpy to
>>>> memcpy until the pointer to the basic block the strnpy call
>>>> is in can be used to try to reach the next statement (this
>>>> happens as early as ccp1).  I'm aware of the preference to
>>>> fold things early but in the case of strncpy (a relatively
>>>> rarely used function that is often misused), getting
>>>> the warning right while folding a bit later but still fairly
>>>> early on seems like a reasonable compromise.  I fear that
>>>> otherwise, the false positives will drive users to adopt
>>>> other unsafe solutions (like memcpy) where these kinds of
>>>> bugs cannot be as readily detected.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> PS There still are outstanding cases where the warning can
>>>> be avoided.  I xfailed them in the test for now but will
>>>> still try to get them to work for GCC 9.
>>>>
>>>> gcc-87028.diff
>>>>
>>>>
>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>> strncpy with global variable source string
>>>> gcc/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>>       statement doesn't belong to a basic block.
>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>>       the left hand side of assignment.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>       PR tree-optimization/87028
>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>       * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>
>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>> index 07341eb..284c2fb 100644
>>>> --- a/gcc/gimple-fold.c
>>>> +++ b/gcc/gimple-fold.c
>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>> (gimple_stmt_iterator *gsi,
>>>>    if (tree_int_cst_lt (ssize, len))
>>>>      return false;
>>>>
>>>> +  /* Defer warning (and folding) until the next statement in the basic
>>>> +     block is reachable.  */
>>>> +  if (!gimple_bb (stmt))
>>>> +    return false;
>>> I think you want cfun->cfg as the test here.  They should be equivalent
>>> in practice.
>>
>> Please do not add 'cfun' references.  Note that the next stmt is also
>> accessible
>> when there is no CFG.  I guess the issue is that we fold this during
>> gimplification
>> where the next stmt is not yet "there" (but still in GENERIC)?
>>
>> We generally do not want to have unfolded stmts in the IL when we can
>> avoid that
>> which is why we fold most stmts during gimplification.  We also do
>> that because
>> we now do less folding on GENERIC.
>>
>> There may be the possibility to refactor gimplification time folding
>> to what we
>> do during inlining - queue stmts we want to fold and perform all
>> folding delayed.
>> This of course means bigger compile-time due to cache effects.
>>
>>>
>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>> index d0792aa..f1988f6 100644
>>>> --- a/gcc/tree-ssa-strlen.c
>>>> +++ b/gcc/tree-ssa-strlen.c
>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>         && known_eq (dstoff, lhsoff)
>>>>         && operand_equal_p (dstbase, lhsbase, 0))
>>>>       return false;
>>>> +
>>>> +      if (code == MEM_REF
>>>> +       && TREE_CODE (lhsbase) == SSA_NAME
>>>> +       && known_eq (dstoff, lhsoff))
>>>> +     {
>>>> +       /* Extract the referenced variable from something like
>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */
>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>> +       if (gimple_nop_p (def))
>>>> +         {
>>>> +           lhsbase = SSA_NAME_VAR (lhsbase);
>>>> +           if (lhsbase
>>>> +               && dstbase
>>>> +               && operand_equal_p (dstbase, lhsbase, 0))
>>>> +             return false;
>>>> +         }
>>>> +     }
>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>> the wrong tree.  It'd be easier to suggest something here if I could see
>>> the gimple (with virtual operands).  BUt at some level what you really
>>> want to do is make sure the base of the MEM_REF is the same as what got
>>> passed as the destination of the strncpy.  You'd want to be testing
>>> SSA_NAMEs in that case.
>>
>> Yes.  Why not simply compare the SSA names?  Why would it be
>> not OK to do that when !lhsbase?
> 
> The added code handles this case:
> 
>   void f (char *d)
>   {
>     __builtin_strncpy (d, "12345", 4);
>     d[3] = 0;
>   }
> 
> where during forwprop we see:
> 
>   __builtin_strncpy (d_3(D), "12345", 4);
>   MEM[(char *)d_3(D) + 3B] = 0;
> 
> The next statement after the strncpy is the assignment whose
> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There
> is no other information in the GIMPLE_NOP that I can see to
> tell that the operand is d_3(D) or that it's the same as
> the strncpy argument (i.e., the PARAM_DECl d).  Having to
> do open-code this all the time seems so cumbersome -- is
> there some API that would do this for me?  (I thought
> get_addr_base_and_unit_offset was that API but clearly in
> this case it doesn't do what I expect -- it just returns
> the argument.)

I think you need to look harder at that MEM_REF.  It references d_3.
That's what you need to be checking.  The base (d_3) is the first
operand of the MEM_REF, the offset is the second operand of the MEM_REF.

(gdb) p debug_gimple_stmt ($2)
# .MEM_5 = VDEF <.MEM_4>
MEM[(char *)d_3(D) + 3B] = 0;


(gdb) p gimple_assign_lhs ($2)
$5 = (tree_node *) 0x7ffff01a6208

(gdb) p debug_tree ($5)
 <mem_ref 0x7ffff01a6208
    type <integer_type 0x7ffff00723f0 char public string-flag QI
        size <integer_cst 0x7ffff0059d80 constant 8>
        unit-size <integer_cst 0x7ffff0059d98 constant 1>
        align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
<integer_cst 0x7ffff0059df8 127>
        pointer_to_this <pointer_type 0x7ffff007de70>>

    arg:0 <ssa_name 0x7ffff0063dc8
        type <pointer_type 0x7ffff007de70 type <integer_type
0x7ffff00723f0 char>
            public unsigned DI
            size <integer_cst 0x7ffff0059c90 constant 64>
            unit-size <integer_cst 0x7ffff0059ca8 constant 8>
            align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
        visited var <parm_decl 0x7ffff01a5000 d>
        def_stmt GIMPLE_NOP
        version:3>
    arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
constant 3>
    j.c:4:6 start: j.c:4:5 finish: j.c:4:8>


Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase:

(gdb) p debug_tree (lhsbase)
<ssa_name 0x7ffff0063dc8
    type <pointer_type 0x7ffff007de70
        type <integer_type 0x7ffff00723f0 char public string-flag QI
            size <integer_cst 0x7ffff0059d80 constant 8>
            unit-size <integer_cst 0x7ffff0059d98 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
            pointer_to_this <pointer_type 0x7ffff007de70>>
        public unsigned DI
        size <integer_cst 0x7ffff0059c90 constant 64>
        unit-size <integer_cst 0x7ffff0059ca8 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
    visited var <parm_decl 0x7ffff01a5000 d>
    def_stmt GIMPLE_NOP
    version:3>


Sadly, dstbase is the PARM_DECL for d.  That's where things are going
"wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.

Jeff

Jeff

  reply	other threads:[~2018-08-28  4:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-24 15:58 Martin Sebor
2018-08-26  5:25 ` Jeff Law
2018-08-27  8:30   ` Richard Biener
2018-08-27 15:32     ` Jeff Law
2018-08-27 15:43       ` Richard Biener
2018-10-04 15:51         ` Jeff Law
2018-10-04 15:55           ` Martin Sebor
2018-10-08 10:14             ` Richard Biener
2018-10-08 21:40               ` Martin Sebor
2018-10-16 22:42             ` Jeff Law
2018-10-21  8:17               ` Martin Sebor
2018-10-31 17:07                 ` [PING #3][PATCH] " Martin Sebor
2018-11-16  3:12                   ` [PING #4][PATCH] " Martin Sebor
2018-11-16  9:07                     ` Richard Biener
2018-11-29 20:34                       ` Martin Sebor
2018-11-29 23:07                         ` Jeff Law
2018-11-29 23:43                           ` Martin Sebor
2018-11-30  2:02                             ` Jeff Law
2018-11-30  8:05                               ` Richard Biener
2018-11-30  8:30                                 ` Jakub Jelinek
2018-12-05 23:11                             ` Jeff Law
2018-12-06 13:00                               ` Christophe Lyon
2018-12-06 13:52                                 ` Jeff Law
2018-11-30  7:57                         ` Richard Biener
2018-11-30 15:51                           ` Martin Sebor
2018-11-07 21:28                 ` [PATCH] " Jeff Law
2018-11-09  1:25                   ` Martin Sebor
2018-10-04 19:55           ` Joseph Myers
2018-08-27 16:27     ` Martin Sebor
2018-08-28  4:27       ` Jeff Law [this message]
2018-08-28  9:56         ` Richard Biener
2018-08-28  9:57           ` Richard Biener
2018-08-29  0:12           ` Martin Sebor
2018-08-29  7:29             ` Richard Biener
2018-08-29 15:43               ` Martin Sebor
2018-08-30  0:27             ` Jeff Law
2018-08-30  8:48               ` Richard Biener
2018-09-12 15:50             ` Martin Sebor
2018-09-18  1:56             ` Jeff Law
2018-09-21 17:40               ` Martin Sebor
2018-10-01 21:31                 ` [PING] " Martin Sebor
2018-10-08 22:15                   ` Martin Sebor
2018-10-04 15:52             ` Jeff Law
2018-08-28 20:44         ` Martin Sebor
2018-08-28 22:17           ` Jeff Law
2018-08-27 20:31   ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd26f079-1411-4f35-d0d2-cf4aabac2d7c@redhat.com \
    --to=law@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=msebor@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).