From: Jeff Law <law@redhat.com>
To: Martin Sebor <msebor@gmail.com>,
Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Tue, 28 Aug 2018 04:27:00 -0000 [thread overview]
Message-ID: <fd26f079-1411-4f35-d0d2-cf4aabac2d7c@redhat.com> (raw)
In-Reply-To: <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com>
On 08/27/2018 10:27 AM, Martin Sebor wrote:
> On 08/27/2018 02:29 AM, Richard Biener wrote:
>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>> The warning suppression for -Wstringop-truncation looks for
>>>> the next statement after a truncating strncpy to see if it
>>>> adds a terminating nul. This only works when the next
>>>> statement can be reached using the Gimple statement iterator
>>>> which isn't until after gimplification. As a result, strncpy
>>>> calls that truncate their constant argument that are being
>>>> folded to memcpy this early get diagnosed even if they are
>>>> followed by the nul assignment:
>>>>
>>>> Â const char s[] = "12345";
>>>> Â char d[3];
>>>>
>>>> Â void f (void)
>>>> Â {
>>>> Â Â Â strncpy (d, s, sizeof d - 1);Â Â // -Wstringop-truncation
>>>> Â Â Â d[sizeof d - 1] = 0;
>>>> Â }
>>>>
>>>> To avoid the warning I propose to defer folding strncpy to
>>>> memcpy until the pointer to the basic block the strnpy call
>>>> is in can be used to try to reach the next statement (this
>>>> happens as early as ccp1). I'm aware of the preference to
>>>> fold things early but in the case of strncpy (a relatively
>>>> rarely used function that is often misused), getting
>>>> the warning right while folding a bit later but still fairly
>>>> early on seems like a reasonable compromise. I fear that
>>>> otherwise, the false positives will drive users to adopt
>>>> other unsafe solutions (like memcpy) where these kinds of
>>>> bugs cannot be as readily detected.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> PS There still are outstanding cases where the warning can
>>>> be avoided. I xfailed them in the test for now but will
>>>> still try to get them to work for GCC 9.
>>>>
>>>> gcc-87028.diff
>>>>
>>>>
>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>> strncpy with global variable source string
>>>> gcc/ChangeLog:
>>>>
>>>> Â Â Â Â Â PR tree-optimization/87028
>>>> Â Â Â Â Â * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>>>> Â Â Â Â Â statement doesn't belong to a basic block.
>>>> Â Â Â Â Â * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>>>> Â Â Â Â Â the left hand side of assignment.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> Â Â Â Â Â PR tree-optimization/87028
>>>> Â Â Â Â Â * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>> Â Â Â Â Â * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>
>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>> index 07341eb..284c2fb 100644
>>>> --- a/gcc/gimple-fold.c
>>>> +++ b/gcc/gimple-fold.c
>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>> (gimple_stmt_iterator *gsi,
>>>> Â Â if (tree_int_cst_lt (ssize, len))
>>>> Â Â Â Â return false;
>>>>
>>>> +Â /* Defer warning (and folding) until the next statement in the basic
>>>> +    block is reachable. */
>>>> +Â if (!gimple_bb (stmt))
>>>> +Â Â Â return false;
>>> I think you want cfun->cfg as the test here. They should be equivalent
>>> in practice.
>>
>> Please do not add 'cfun' references. Note that the next stmt is also
>> accessible
>> when there is no CFG. I guess the issue is that we fold this during
>> gimplification
>> where the next stmt is not yet "there" (but still in GENERIC)?
>>
>> We generally do not want to have unfolded stmts in the IL when we can
>> avoid that
>> which is why we fold most stmts during gimplification. We also do
>> that because
>> we now do less folding on GENERIC.
>>
>> There may be the possibility to refactor gimplification time folding
>> to what we
>> do during inlining - queue stmts we want to fold and perform all
>> folding delayed.
>> This of course means bigger compile-time due to cache effects.
>>
>>>
>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>>> index d0792aa..f1988f6 100644
>>>> --- a/gcc/tree-ssa-strlen.c
>>>> +++ b/gcc/tree-ssa-strlen.c
>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>>>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>>> Â Â Â Â Â Â Â && known_eq (dstoff, lhsoff)
>>>> Â Â Â Â Â Â Â && operand_equal_p (dstbase, lhsbase, 0))
>>>> Â Â Â Â Â return false;
>>>> +
>>>> +Â Â Â Â Â if (code == MEM_REF
>>>> +Â Â Â Â Â Â && TREE_CODE (lhsbase) == SSA_NAME
>>>> +Â Â Â Â Â Â && known_eq (dstoff, lhsoff))
>>>> +Â Â Â Â {
>>>> +Â Â Â Â Â Â /* Extract the referenced variable from something like
>>>> +Â Â Â Â Â Â Â Â Â Â Â MEM[(char *)d_3(D) + 3B] = 0;Â */
>>>> +Â Â Â Â Â Â gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>>>> +Â Â Â Â Â Â if (gimple_nop_p (def))
>>>> +Â Â Â Â Â Â Â Â {
>>>> +Â Â Â Â Â Â Â Â Â Â lhsbase = SSA_NAME_VAR (lhsbase);
>>>> +Â Â Â Â Â Â Â Â Â Â if (lhsbase
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â && dstbase
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â && operand_equal_p (dstbase, lhsbase, 0))
>>>> +Â Â Â Â Â Â Â Â Â Â Â Â return false;
>>>> +Â Â Â Â Â Â Â Â }
>>>> +Â Â Â Â }
>>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up
>>> the wrong tree. It'd be easier to suggest something here if I could see
>>> the gimple (with virtual operands). BUt at some level what you really
>>> want to do is make sure the base of the MEM_REF is the same as what got
>>> passed as the destination of the strncpy. You'd want to be testing
>>> SSA_NAMEs in that case.
>>
>> Yes. Why not simply compare the SSA names? Why would it be
>> not OK to do that when !lhsbase?
>
> The added code handles this case:
>
> Â void f (char *d)
> Â {
> Â Â Â __builtin_strncpy (d, "12345", 4);
> Â Â Â d[3] = 0;
> Â }
>
> where during forwprop we see:
>
> Â __builtin_strncpy (d_3(D), "12345", 4);
> Â MEM[(char *)d_3(D) + 3B] = 0;
>
> The next statement after the strncpy is the assignment whose
> lhs is the MEM_REF with a GIMPLE_NOP as an operand. There
> is no other information in the GIMPLE_NOP that I can see to
> tell that the operand is d_3(D) or that it's the same as
> the strncpy argument (i.e., the PARAM_DECl d). Having to
> do open-code this all the time seems so cumbersome -- is
> there some API that would do this for me? (I thought
> get_addr_base_and_unit_offset was that API but clearly in
> this case it doesn't do what I expect -- it just returns
> the argument.)
I think you need to look harder at that MEM_REF. It references d_3.
That's what you need to be checking. The base (d_3) is the first
operand of the MEM_REF, the offset is the second operand of the MEM_REF.
(gdb) p debug_gimple_stmt ($2)
# .MEM_5 = VDEF <.MEM_4>
MEM[(char *)d_3(D) + 3B] = 0;
(gdb) p gimple_assign_lhs ($2)
$5 = (tree_node *) 0x7ffff01a6208
(gdb) p debug_tree ($5)
<mem_ref 0x7ffff01a6208
type <integer_type 0x7ffff00723f0 char public string-flag QI
size <integer_cst 0x7ffff0059d80 constant 8>
unit-size <integer_cst 0x7ffff0059d98 constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff00723f0 precision:8 min <integer_cst 0x7ffff0059dc8 -128> max
<integer_cst 0x7ffff0059df8 127>
pointer_to_this <pointer_type 0x7ffff007de70>>
arg:0 <ssa_name 0x7ffff0063dc8
type <pointer_type 0x7ffff007de70 type <integer_type
0x7ffff00723f0 char>
public unsigned DI
size <integer_cst 0x7ffff0059c90 constant 64>
unit-size <integer_cst 0x7ffff0059ca8 constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
visited var <parm_decl 0x7ffff01a5000 d>
def_stmt GIMPLE_NOP
version:3>
arg:1 <integer_cst 0x7ffff018ae40 type <pointer_type 0x7ffff007de70>
constant 3>
j.c:4:6 start: j.c:4:5 finish: j.c:4:8>
Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase:
(gdb) p debug_tree (lhsbase)
<ssa_name 0x7ffff0063dc8
type <pointer_type 0x7ffff007de70
type <integer_type 0x7ffff00723f0 char public string-flag QI
size <integer_cst 0x7ffff0059d80 constant 8>
unit-size <integer_cst 0x7ffff0059d98 constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff00723f0 precision:8 min <integer_cst
0x7ffff0059dc8 -128> max <integer_cst 0x7ffff0059df8 127>
pointer_to_this <pointer_type 0x7ffff007de70>>
public unsigned DI
size <integer_cst 0x7ffff0059c90 constant 64>
unit-size <integer_cst 0x7ffff0059ca8 constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff007de70 reference_to_this <reference_type
0x7ffff017d738>>
visited var <parm_decl 0x7ffff01a5000 d>
def_stmt GIMPLE_NOP
version:3>
Sadly, dstbase is the PARM_DECL for d. That's where things are going
"wrong". Not sure why you're getting the PARM_DECL in that case. I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.
Jeff
Jeff
next prev parent reply other threads:[~2018-08-28 4:27 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-24 15:58 Martin Sebor
2018-08-26 5:25 ` Jeff Law
2018-08-27 8:30 ` Richard Biener
2018-08-27 15:32 ` Jeff Law
2018-08-27 15:43 ` Richard Biener
2018-10-04 15:51 ` Jeff Law
2018-10-04 15:55 ` Martin Sebor
2018-10-08 10:14 ` Richard Biener
2018-10-08 21:40 ` Martin Sebor
2018-10-16 22:42 ` Jeff Law
2018-10-21 8:17 ` Martin Sebor
2018-10-31 17:07 ` [PING #3][PATCH] " Martin Sebor
2018-11-16 3:12 ` [PING #4][PATCH] " Martin Sebor
2018-11-16 9:07 ` Richard Biener
2018-11-29 20:34 ` Martin Sebor
2018-11-29 23:07 ` Jeff Law
2018-11-29 23:43 ` Martin Sebor
2018-11-30 2:02 ` Jeff Law
2018-11-30 8:05 ` Richard Biener
2018-11-30 8:30 ` Jakub Jelinek
2018-12-05 23:11 ` Jeff Law
2018-12-06 13:00 ` Christophe Lyon
2018-12-06 13:52 ` Jeff Law
2018-11-30 7:57 ` Richard Biener
2018-11-30 15:51 ` Martin Sebor
2018-11-07 21:28 ` [PATCH] " Jeff Law
2018-11-09 1:25 ` Martin Sebor
2018-10-04 19:55 ` Joseph Myers
2018-08-27 16:27 ` Martin Sebor
2018-08-28 4:27 ` Jeff Law [this message]
2018-08-28 9:56 ` Richard Biener
2018-08-28 9:57 ` Richard Biener
2018-08-29 0:12 ` Martin Sebor
2018-08-29 7:29 ` Richard Biener
2018-08-29 15:43 ` Martin Sebor
2018-08-30 0:27 ` Jeff Law
2018-08-30 8:48 ` Richard Biener
2018-09-12 15:50 ` Martin Sebor
2018-09-18 1:56 ` Jeff Law
2018-09-21 17:40 ` Martin Sebor
2018-10-01 21:31 ` [PING] " Martin Sebor
2018-10-08 22:15 ` Martin Sebor
2018-10-04 15:52 ` Jeff Law
2018-08-28 20:44 ` Martin Sebor
2018-08-28 22:17 ` Jeff Law
2018-08-27 20:31 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd26f079-1411-4f35-d0d2-cf4aabac2d7c@redhat.com \
--to=law@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=msebor@gmail.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).