From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>,
Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Tue, 28 Aug 2018 22:17:00 -0000 [thread overview]
Message-ID: <8f35e59d-c94f-b0c1-f9cd-519dcfe723d1@redhat.com> (raw)
In-Reply-To: <2fb5625b-22d2-e80a-9320-d931a3cb2343@gmail.com>
On 08/28/2018 02:43 PM, Martin Sebor wrote:
> On 08/27/2018 10:27 PM, Jeff Law wrote:
>> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>> the next statement after a truncating strncpy to see if it
>>>>>> adds a terminating nul. This only works when the next
>>>>>> statement can be reached using the Gimple statement iterator
>>>>>> which isn't until after gimplification. As a result, strncpy
>>>>>> calls that truncate their constant argument that are being
>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>> followed by the nul assignment:
>>>>>>
>>>>>> Â const char s[] = "12345";
>>>>>> Â char d[3];
>>>>>>
>>>>>> Â void f (void)
>>>>>> Â {
>>>>>> Â Â Â strncpy (d, s, sizeof d - 1);Â Â // -Wstringop-truncation
>>>>>> Â Â Â d[sizeof d - 1] = 0;
>>>>>> Â }
>>>>>>
>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>> is in can be used to try to reach the next statement (this
>>>>>> happens as early as ccp1). I'm aware of the preference to
>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>> rarely used function that is often misused), getting
>>>>>> the warning right while folding a bit later but still fairly
>>>>>> early on seems like a reasonable compromise. I fear that
>>>>>> otherwise, the false positives will drive users to adopt
>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>> bugs cannot be as readily detected.
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS There still are outstanding cases where the warning can
>>>>>> be avoided. I xfailed them in the test for now but will
>>>>>> still try to get them to work for GCC 9.
>>>>>>
>>>>>> gcc-87028.diff
>>>>>>
>>>>>>
>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>> strncpy with global variable source string
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> Â Â Â Â Â PR tree-optimization/87028
>>>>>> Â Â Â Â Â * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>>>>>> when
>>>>>> Â Â Â Â Â statement doesn't belong to a basic block.
>>>>>> Â Â Â Â Â * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>> MEM_REF on
>>>>>> Â Â Â Â Â the left hand side of assignment.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> Â Â Â Â Â PR tree-optimization/87028
>>>>>> Â Â Â Â Â * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>> Â Â Â Â Â * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>
>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>> index 07341eb..284c2fb 100644
>>>>>> --- a/gcc/gimple-fold.c
>>>>>> +++ b/gcc/gimple-fold.c
>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>> (gimple_stmt_iterator *gsi,
>>>>>> Â Â if (tree_int_cst_lt (ssize, len))
>>>>>> Â Â Â Â return false;
>>>>>>
>>>>>> +Â /* Defer warning (and folding) until the next statement in the
>>>>>> basic
>>>>>> +    block is reachable. */
>>>>>> +Â if (!gimple_bb (stmt))
>>>>>> +Â Â Â return false;
>>>>> I think you want cfun->cfg as the test here. They should be
>>>>> equivalent
>>>>> in practice.
>>>>
>>>> Please do not add 'cfun' references. Note that the next stmt is also
>>>> accessible
>>>> when there is no CFG. I guess the issue is that we fold this during
>>>> gimplification
>>>> where the next stmt is not yet "there" (but still in GENERIC)?
>>>>
>>>> We generally do not want to have unfolded stmts in the IL when we can
>>>> avoid that
>>>> which is why we fold most stmts during gimplification. We also do
>>>> that because
>>>> we now do less folding on GENERIC.
>>>>
>>>> There may be the possibility to refactor gimplification time folding
>>>> to what we
>>>> do during inlining - queue stmts we want to fold and perform all
>>>> folding delayed.
>>>> This of course means bigger compile-time due to cache effects.
>>>>
>>>>>
>>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>>>> index d0792aa..f1988f6 100644
>>>>>> --- a/gcc/tree-ssa-strlen.c
>>>>>> +++ b/gcc/tree-ssa-strlen.c
>>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>>>> Â Â Â Â Â Â Â && known_eq (dstoff, lhsoff)
>>>>>> Â Â Â Â Â Â Â && operand_equal_p (dstbase, lhsbase, 0))
>>>>>> Â Â Â Â Â return false;
>>>>>> +
>>>>>> +Â Â Â Â Â if (code == MEM_REF
>>>>>> +Â Â Â Â Â Â && TREE_CODE (lhsbase) == SSA_NAME
>>>>>> +Â Â Â Â Â Â && known_eq (dstoff, lhsoff))
>>>>>> +Â Â Â Â {
>>>>>> +Â Â Â Â Â Â /* Extract the referenced variable from something like
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â MEM[(char *)d_3(D) + 3B] = 0;Â */
>>>>>> +Â Â Â Â Â Â gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>>>> +Â Â Â Â Â Â if (gimple_nop_p (def))
>>>>>> +Â Â Â Â Â Â Â Â {
>>>>>> +Â Â Â Â Â Â Â Â Â Â lhsbase = SSA_NAME_VAR (lhsbase);
>>>>>> +Â Â Â Â Â Â Â Â Â Â if (lhsbase
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â && dstbase
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â && operand_equal_p (dstbase, lhsbase, 0))
>>>>>> +Â Â Â Â Â Â Â Â Â Â Â Â return false;
>>>>>> +Â Â Â Â Â Â Â Â }
>>>>>> +Â Â Â Â }
>>>>> If you find yourself looking at SSA_NAME_VAR, you're usually
>>>>> barking up
>>>>> the wrong tree. It'd be easier to suggest something here if I
>>>>> could see
>>>>> the gimple (with virtual operands). BUt at some level what you really
>>>>> want to do is make sure the base of the MEM_REF is the same as what
>>>>> got
>>>>> passed as the destination of the strncpy. You'd want to be testing
>>>>> SSA_NAMEs in that case.
>>>>
>>>> Yes. Why not simply compare the SSA names? Why would it be
>>>> not OK to do that when !lhsbase?
>>>
>>> The added code handles this case:
>>>
>>> Â void f (char *d)
>>> Â {
>>> Â Â Â __builtin_strncpy (d, "12345", 4);
>>> Â Â Â d[3] = 0;
>>> Â }
>>>
>>> where during forwprop we see:
>>>
>>> Â __builtin_strncpy (d_3(D), "12345", 4);
>>> Â MEM[(char *)d_3(D) + 3B] = 0;
>>>
>>> The next statement after the strncpy is the assignment whose
>>> lhs is the MEM_REF with a GIMPLE_NOP as an operand. There
>>> is no other information in the GIMPLE_NOP that I can see to
>>> tell that the operand is d_3(D) or that it's the same as
>>> the strncpy argument (i.e., the PARAM_DECl d). Having to
>>> do open-code this all the time seems so cumbersome -- is
>>> there some API that would do this for me? (I thought
>>> get_addr_base_and_unit_offset was that API but clearly in
>>> this case it doesn't do what I expect -- it just returns
>>> the argument.)
>>
>> I think you need to look harder at that MEM_REF. It references d_3.
>> That's what you need to be checking. The base (d_3) is the first
>> operand of the MEM_REF, the offset is the second operand of the MEM_REF.
>>
>> (gdb) p debug_gimple_stmt ($2)
>> # .MEM_5 = VDEF <.MEM_4>
>> MEM[(char *)d_3(D) + 3B] = 0;
>>
>>
>> (gdb) p gimple_assign_lhs ($2)
>> $5 = (tree_node *) 0x7ffff01a6208
>>
>> (gdb) p debug_tree ($5)
>> Â <mem_ref 0x7ffff01a6208
>> Â Â Â type <integer_type 0x7ffff00723f0 char public string-flag QI
>> Â Â Â Â Â Â Â size <integer_cst 0x7ffff0059d80 constant 8>
>> Â Â Â Â Â Â Â unit-size <integer_cst 0x7ffff0059d98 constant 1>
>> Â Â Â Â Â Â Â align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
>> 0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
>> <integer_cst 0x7ffff0059df8 127>
>> Â Â Â Â Â Â Â pointer_to_this <pointer_type 0x7ffff007de70>>
>>
>> Â Â Â arg:0 <ssa_name 0x7ffff0063dc8
>> Â Â Â Â Â Â Â type <pointer_type 0x7ffff007de70 type <integer_type
>> 0x7ffff00723f0 char>
>> Â Â Â Â Â Â Â Â Â Â Â public unsigned DI
>> Â Â Â Â Â Â Â Â Â Â Â size <integer_cst 0x7ffff0059c90 constant 64>
>> Â Â Â Â Â Â Â Â Â Â Â unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>> Â Â Â Â Â Â Â Â Â Â Â align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>> Â Â Â Â Â Â Â visited var <parm_decl 0x7ffff01a5000 d>
>> Â Â Â Â Â Â Â def_stmt GIMPLE_NOP
>> Â Â Â Â Â Â Â version:3>
>> Â Â Â arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
>> constant 3>
>> Â Â Â j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
>>
>>
>> Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase:
>
> The d in the MEM_REF you see in the dump above is the SSA_NAME's
> SSA_NAME_VAR:
>
> Â Â Â Â Â Â Â Â Â visited var <parm_decl 0x7ffff01a5000 d>
>
> Here's the print_node() code that prints it:
>
> Â Â Â Â Â print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4);
>
> There is nothing else in the MEM_REF operand that tells me that.
> Why is it wrong to look at the SSA_NAME_VAR?
>
>> (gdb) p debug_tree (lhsbase)
>> <ssa_name 0x7ffff0063dc8
>> Â Â Â type <pointer_type 0x7ffff007de70
>> Â Â Â Â Â Â Â type <integer_type 0x7ffff00723f0 char public string-flag QI
>> Â Â Â Â Â Â Â Â Â Â Â size <integer_cst 0x7ffff0059d80 constant 8>
>> Â Â Â Â Â Â Â Â Â Â Â unit-size <integer_cst 0x7ffff0059d98 constant 1>
>> Â Â Â Â Â Â Â Â Â Â Â align:8 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
>> 0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
>> Â Â Â Â Â Â Â Â Â Â Â pointer_to_this <pointer_type 0x7ffff007de70>>
>> Â Â Â Â Â Â Â public unsigned DI
>> Â Â Â Â Â Â Â size <integer_cst 0x7ffff0059c90 constant 64>
>> Â Â Â Â Â Â Â unit-size <integer_cst 0x7ffff0059ca8 constant 8>
>> Â Â Â Â Â Â Â align:64 warn_if_not_align:0 symtab:0 alias-set -1
>> canonical-type 0x7ffff007de70 reference_to_this <reference_type
>> 0x7ffff017d738>>
>> Â Â Â visited var <parm_decl 0x7ffff01a5000 d>
>> Â Â Â def_stmt GIMPLE_NOP
>> Â Â Â version:3>
>> Sadly, dstbase is the PARM_DECL for d. That's where things are going
>> "wrong".
>
> As Richard observed, that's because get_attr_nonstring_decl()
> returns the DECL that the expression refers to. It does that
> because that's where it looks for attribute nonstring, and so
> that the warning can mention the DECL with the attribute.
>
> I suppose since I'm not supposed to be using SSA_NAME_VAR
> (I still don't understand why it's taboo) I'll have to avoid
> using the get_attr_nonstring_decl() return value and instead
> look into comparing the SSA_NAMEs.
Because it's not generally useful because it has no dataflow information
associated with it. SSA_NAMEs are what carry dataflow information and
what you need to check if you want to know if two objects are the same.
SSA_NAME_VAR's primary use is for diagnostic messages and debugging. We
do hang attributes off the _DECL node it refers to, so you can take an
SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME
has a particular attribute property. But if you're trying to see if two
objects in the IL are the same, you need to be looking at the SSA_NAME.
jeff
next prev parent reply other threads:[~2018-08-28 22:17 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-24 15:58 Martin Sebor
2018-08-26 5:25 ` Jeff Law
2018-08-27 8:30 ` Richard Biener
2018-08-27 15:32 ` Jeff Law
2018-08-27 15:43 ` Richard Biener
2018-10-04 15:51 ` Jeff Law
2018-10-04 15:55 ` Martin Sebor
2018-10-08 10:14 ` Richard Biener
2018-10-08 21:40 ` Martin Sebor
2018-10-16 22:42 ` Jeff Law
2018-10-21 8:17 ` Martin Sebor
2018-10-31 17:07 ` [PING #3][PATCH] " Martin Sebor
2018-11-16 3:12 ` [PING #4][PATCH] " Martin Sebor
2018-11-16 9:07 ` Richard Biener
2018-11-29 20:34 ` Martin Sebor
2018-11-29 23:07 ` Jeff Law
2018-11-29 23:43 ` Martin Sebor
2018-11-30 2:02 ` Jeff Law
2018-11-30 8:05 ` Richard Biener
2018-11-30 8:30 ` Jakub Jelinek
2018-12-05 23:11 ` Jeff Law
2018-12-06 13:00 ` Christophe Lyon
2018-12-06 13:52 ` Jeff Law
2018-11-30 7:57 ` Richard Biener
2018-11-30 15:51 ` Martin Sebor
2018-11-07 21:28 ` [PATCH] " Jeff Law
2018-11-09 1:25 ` Martin Sebor
2018-10-04 19:55 ` Joseph Myers
2018-08-27 16:27 ` Martin Sebor
2018-08-28 4:27 ` Jeff Law
2018-08-28 9:56 ` Richard Biener
2018-08-28 9:57 ` Richard Biener
2018-08-29 0:12 ` Martin Sebor
2018-08-29 7:29 ` Richard Biener
2018-08-29 15:43 ` Martin Sebor
2018-08-30 0:27 ` Jeff Law
2018-08-30 8:48 ` Richard Biener
2018-09-12 15:50 ` Martin Sebor
2018-09-18 1:56 ` Jeff Law
2018-09-21 17:40 ` Martin Sebor
2018-10-01 21:31 ` [PING] " Martin Sebor
2018-10-08 22:15 ` Martin Sebor
2018-10-04 15:52 ` Jeff Law
2018-08-28 20:44 ` Martin Sebor
2018-08-28 22:17 ` Jeff Law [this message]
2018-08-27 20:31 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f35e59d-c94f-b0c1-f9cd-519dcfe723d1@redhat.com \
--to=law@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=msebor@gmail.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).