From: Martin Sebor <msebor@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <law@redhat.com>, GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PING #4][PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)
Date: Thu, 29 Nov 2018 20:34:00 -0000 [thread overview]
Message-ID: <d51af7f4-f1bb-fbea-bd7e-b67d4632f0a5@gmail.com> (raw)
In-Reply-To: <CAFiYyc2bsg_HbdnmTALmz+YNWnDWR9c5Sq6q9Yz5=e9PX9NKcg@mail.gmail.com>
On 11/16/2018 02:07 AM, Richard Biener wrote:
> On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor <msebor@gmail.com> wrote:
>>
>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>
>> Please let me know if there is something I need to change here
>> to make the fix acceptable or if I should stop trying.
>
> I have one more comment about
>
> + /* Defer warning (and folding) until the next statement in the basic
> + block is reachable. */
> + if (!gimple_bb (stmt))
> + return false;
> +
>
> it's not about the next statement in the basic-block being "reachable"
> (even w/o a CFG you can use gsi_next()) but rather that the next
> stmt isn't yet gimplified and thus not inserted into the gimple sequence,
> right?
No, it's about the current statement not being associated with
a basic block yet when the warning code runs for the first time
(during gimplify_expr), and so gsi_next() returning null.
> You apply this to gimple_fold_builtin_strncpy but I'd rather
> see us not sprinkling this over gimple-fold.c but instead do this
> in gimplify.c:maybe_fold_stmt, delaying folding until say lowering.
>
> See the attached (untested).
I would also prefer this solution. I had tested it (in response
to you first mentioning it back in September) and it causes quite
a bit of fallout in tests that look for the folding to take place
very early. See the end of my reply here:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html
But I'm willing to do the test suite cleanup if you think it's
suitable for GCC 9. (If you're thinking GCC 10 please let me
know now.)
Thanks
Martin
>
> Richard.
>
>
>
>> On 10/31/2018 10:33 AM, Martin Sebor wrote:
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html
>>>
>>> On 10/20/2018 06:01 PM, Martin Sebor wrote:
>>>> On 10/16/2018 03:21 PM, Jeff Law wrote:
>>>>> On 10/4/18 9:51 AM, Martin Sebor wrote:
>>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote:
>>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote:
>>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
>>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law <law@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for
>>>>>>>>>>>> the next statement after a truncating strncpy to see if it
>>>>>>>>>>>> adds a terminating nul. This only works when the next
>>>>>>>>>>>> statement can be reached using the Gimple statement iterator
>>>>>>>>>>>> which isn't until after gimplification. As a result, strncpy
>>>>>>>>>>>> calls that truncate their constant argument that are being
>>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are
>>>>>>>>>>>> followed by the nul assignment:
>>>>>>>>>>>>
>>>>>>>>>>>> const char s[] = "12345";
>>>>>>>>>>>> char d[3];
>>>>>>>>>>>>
>>>>>>>>>>>> void f (void)
>>>>>>>>>>>> {
>>>>>>>>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation
>>>>>>>>>>>> d[sizeof d - 1] = 0;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to
>>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call
>>>>>>>>>>>> is in can be used to try to reach the next statement (this
>>>>>>>>>>>> happens as early as ccp1). I'm aware of the preference to
>>>>>>>>>>>> fold things early but in the case of strncpy (a relatively
>>>>>>>>>>>> rarely used function that is often misused), getting
>>>>>>>>>>>> the warning right while folding a bit later but still fairly
>>>>>>>>>>>> early on seems like a reasonable compromise. I fear that
>>>>>>>>>>>> otherwise, the false positives will drive users to adopt
>>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of
>>>>>>>>>>>> bugs cannot be as readily detected.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> PS There still are outstanding cases where the warning can
>>>>>>>>>>>> be avoided. I xfailed them in the test for now but will
>>>>>>>>>>>> still try to get them to work for GCC 9.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-87028.diff
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>>>>>>>>>>>> strncpy with global variable source string
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> PR tree-optimization/87028
>>>>>>>>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid
>>>>>>>>>>>> folding when
>>>>>>>>>>>> statement doesn't belong to a basic block.
>>>>>>>>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>>>>>>>>>>>> MEM_REF on
>>>>>>>>>>>> the left hand side of assignment.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> PR tree-optimization/87028
>>>>>>>>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>>>>>>>>>>> * gcc.dg/Wstringop-truncation-5.c: New test.
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>>>>>>>>>>>> index 07341eb..284c2fb 100644
>>>>>>>>>>>> --- a/gcc/gimple-fold.c
>>>>>>>>>>>> +++ b/gcc/gimple-fold.c
>>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>>>>>>>>>>>> (gimple_stmt_iterator *gsi,
>>>>>>>>>>>> if (tree_int_cst_lt (ssize, len))
>>>>>>>>>>>> return false;
>>>>>>>>>>>>
>>>>>>>>>>>> + /* Defer warning (and folding) until the next statement in the
>>>>>>>>>>>> basic
>>>>>>>>>>>> + block is reachable. */
>>>>>>>>>>>> + if (!gimple_bb (stmt))
>>>>>>>>>>>> + return false;
>>>>>>>>>>> I think you want cfun->cfg as the test here. They should be
>>>>>>>>>>> equivalent
>>>>>>>>>>> in practice.
>>>>>>>>>>
>>>>>>>>>> Please do not add 'cfun' references. Note that the next stmt is
>>>>>>>>>> also accessible
>>>>>>>>>> when there is no CFG. I guess the issue is that we fold this
>>>>>>>>>> during
>>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in
>>>>>>>>>> GENERIC)?
>>>>>>>>> That was my assumption. I almost suggested peeking at gsi_next and
>>>>>>>>> avoiding in that case.
>>>>>>>>
>>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then.
>>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is
>>>>>>> the
>>>>>>> possibility of further fallout.
>>>>>>>
>>>>>>> I guess they could be written to target this case specifically to
>>>>>>> minimize fallout, but that feels like we're doing the same thing
>>>>>>> (band-aid) just in a different place.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we
>>>>>>>>>> can avoid that
>>>>>>>>>> which is why we fold most stmts during gimplification. We also do
>>>>>>>>>> that because
>>>>>>>>>> we now do less folding on GENERIC.
>>>>>>>>> But an unfolded call in the IL should always be safe and we've got
>>>>>>>>> plenty of opportunities to fold it later.
>>>>>>>>
>>>>>>>> Well - we do. The very first one is forwprop though which means
>>>>>>>> we'll miss to
>>>>>>>> re-write some memcpy parts into SSA:
>>>>>>>>
>>>>>>>> NEXT_PASS (pass_ccp, false /* nonzero_p */);
>>>>>>>> /* After CCP we rewrite no longer addressed locals into SSA
>>>>>>>> form if possible. */
>>>>>>>> NEXT_PASS (pass_forwprop);
>>>>>>>>
>>>>>>>> likewise early object-size will be confused by memcpy calls that just
>>>>>>>> exist
>>>>>>>> to avoid TBAA issues (another of our recommendations besides using
>>>>>>>> unions).
>>>>>>>>
>>>>>>>> We do fold mem* early for a reason ;)
>>>>>>>>
>>>>>>>> "We can always do warnings earlier" would be a similar true sentence.
>>>>>>> I'm not disagreeing at all. There's a natural tension between the
>>>>>>> benefits of folding early to enable more optimizations downstream and
>>>>>>> leaving the IL in a state where we can give actionable warnings.
>>>>>>
>>>>>> Similar trade-offs between folding early and losing information
>>>>>> as a result also impact high-level optimizations.
>>>>>>
>>>>>> For instance, folding the strlen argument below
>>>>>>
>>>>>> void f3 (struct A* p)
>>>>>> {
>>>>>> __builtin_strcpy (p->a, "123");
>>>>>>
>>>>>> if (__builtin_strlen (p->a + 1) != 2) // not folded
>>>>>> __builtin_abort ();
>>>>>> }
>>>>>>
>>>>>> into
>>>>>>
>>>>>> _2 = &MEM[(void *)p_4(D) + 2B];
>>>>>>
>>>>>> early on defeats the strlen optimization because there is no
>>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers
>>>>>> to (this is bug 86955).
>>>>>>
>>>>>> Another example is folding of strlen calls with no-nconstant
>>>>>> offsets into constant strings like here:
>>>>>>
>>>>>> const char a[] = "123";
>>>>>>
>>>>>> void f (int i)
>>>>>> {
>>>>>> if (__builtin_strlen (&a[i]) > 3)
>>>>>> __builtin_abort ();
>>>>>> }
>>>>>>
>>>>>> into sizeof a - 1 - i, which then prevents the result from
>>>>>> being folded to false (bug 86434), not to mention the code
>>>>>> it emits for out-of-bounds indices.
>>>>>>
>>>>>> There are a number of other similar examples in Bugzilla
>>>>>> that I've filed as I discovered then during testing my
>>>>>> warnings (e.g., 86572).
>>>>>>
>>>>>> In my mind, transforming library calls into "lossy" low-level
>>>>>> primitives like MEM_REF would be better done only after higher
>>>>>> level optimizations have had a chance to analyze them. Ditto
>>>>>> for other similar transformations (like to other library calls).
>>>>>> Having more accurate information helps both optimization and
>>>>>> warnings. It also makes the warnings more meaningful.
>>>>>> Printing "memcpy overflows a buffer" when the source code
>>>>>> has a call to strncpy is less than ideal.
>>>>>>
>>>>>>> Similarly there's a natural tension between warning early vs warning
>>>>>>> late. Code that triggers the warning may ultimately be proved
>>>>>>> unreachable, or we may discover simplifications that either
>>>>>>> suppress or
>>>>>>> expose a warning.
>>>>>>>
>>>>>>> There is no easy answer here. But I think we can legitimately ask
>>>>>>> questions. ie, does folding strnlen here really improve things
>>>>>>> downstream in ways that are measurable? Does the false positive
>>>>>>> really
>>>>>>> impact the utility of the warning? etc.
>>>>>>>
>>>>>>> I'd hazard a guess that Martin is particularly sensitive to false
>>>>>>> positives based on feedback he's received from our developer community
>>>>>>> as well as downstream consumers of his work.
>>>>>>
>>>>>> Yes. The kernel folks in particular have done a lot of work
>>>>>> cleaning up their code in an effort to adopt the warning and
>>>>>> attribute nonstring. They have been keeping me in the loop
>>>>>> on their progress (and feeding me back test cases with false
>>>>>> positives and negatives they run into).
>>>>> I can't recall seeing further guidance from Richi WRT putting the checks
>>>>> earlier (maybe_fold_stmt).
>>>>>
>>>>> If the point here is to avoid false positives by not folding strncpy,
>>>>> particularly in cases where we don't see the NUL in the copy, but it
>>>>> appears in a subsequent store, then let's be fairly selective (so as not
>>>>> to muck up things on the optimization side more than is necessary).
>>>>>
>>>>> ISTM we can do this by refactoring the warning bits so they're reusable
>>>>> at different points in the pipeline. Those bits would always return a
>>>>> boolean indicating if the given statement might generate a warning or
>>>>> not.
>>>>>
>>>>> When called early, they would not actually issue any warning. They
>>>>> would merely do the best analysis they can and return a status
>>>>> indicating whether or not the statement would generate a warning given
>>>>> current context. The goal here is to leave statements that might
>>>>> generate a warning as-is in the IL.
>>>>>
>>>>> When called late (assuming there is a point where we can walk the IL and
>>>>> issue the appropriate warnings), the routine would actually issue the
>>>>> warning.
>>>>>
>>>>> The kind of structure could potentially work for other builtins where we
>>>>> may need to look at subsequent statements to avoid false positives, but
>>>>> early folding hides cases by transforming the call into an undesirable
>>>>> form.
>>>>>
>>>>> Note that for cases where a call looks problematical early because we
>>>>> can't see statement which stores the terminator, but where the
>>>>> terminator statement ultimately becomes visible, we still get folding,
>>>>> it just happens later in the pipeline.
>>>>>
>>>>> Thoughts?
>>>>
>>>> The warning only triggers when the bound is less than or equal
>>>> to the length of the constant source string (i.e, when strncpy
>>>> truncates). So IIUC, your suggestion would defer folding only
>>>> such strncpy calls and let gimple_fold_builtin_strncpy fold
>>>> those with a constant bound that's greater than the length of
>>>> the constant source string. That would be fine with me, but
>>>> since strncpy calls with a bound that's greater than the length
>>>> of the source are pointless I don't think they are important
>>>> enough to worry about folding super early. The constant ones
>>>> that serve any purpose (and that are presumably important to
>>>> optimize) are those that truncate.
>>>>
>>>> That said, when optimization isn't enabled, I don't think users
>>>> expect calls to library functions to be transformed to calls to
>>>> other functions, or inlined. Yet that's just what GCC does.
>>>> For example, besides triggering the warning, the following:
>>>>
>>>> char a[4];
>>>>
>>>> void f (char *s)
>>>> {
>>>> __builtin_strncpy (a, "1234", sizeof a);
>>>> a[3] = 0;
>>>> }
>>>>
>>>> is transformed, even at -O0, into:
>>>>
>>>> f (char * s)
>>>> {
>>>> <bb 2> :
>>>> MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"];
>>>> a[3] = 0;
>>>> return;
>>>> }
>>>>
>>>> That doesn't seem right. GCC should avoid these transformations
>>>> at -O0, and one way to do that is to defer folding until the CFG
>>>> is constructed. The patch does it for strncpy but a more general
>>>> solution would do that for all calls, e.g., in maybe_fold_stmt
>>>> as Richard suggested (and I subsequently tested).
>>>>
>>>> Martin
>>>
>>
next prev parent reply other threads:[~2018-11-29 20:34 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-24 15:58 [PATCH] " Martin Sebor
2018-08-26 5:25 ` Jeff Law
2018-08-27 8:30 ` Richard Biener
2018-08-27 15:32 ` Jeff Law
2018-08-27 15:43 ` Richard Biener
2018-10-04 15:51 ` Jeff Law
2018-10-04 15:55 ` Martin Sebor
2018-10-08 10:14 ` Richard Biener
2018-10-08 21:40 ` Martin Sebor
2018-10-16 22:42 ` Jeff Law
2018-10-21 8:17 ` Martin Sebor
2018-10-31 17:07 ` [PING #3][PATCH] " Martin Sebor
2018-11-16 3:12 ` [PING #4][PATCH] " Martin Sebor
2018-11-16 9:07 ` Richard Biener
2018-11-29 20:34 ` Martin Sebor [this message]
2018-11-29 23:07 ` Jeff Law
2018-11-29 23:43 ` Martin Sebor
2018-11-30 2:02 ` Jeff Law
2018-11-30 8:05 ` Richard Biener
2018-11-30 8:30 ` Jakub Jelinek
2018-12-05 23:11 ` Jeff Law
2018-12-06 13:00 ` Christophe Lyon
2018-12-06 13:52 ` Jeff Law
2018-11-30 7:57 ` Richard Biener
2018-11-30 15:51 ` Martin Sebor
2018-11-07 21:28 ` [PATCH] " Jeff Law
2018-11-09 1:25 ` Martin Sebor
2018-10-04 19:55 ` Joseph Myers
2018-08-27 16:27 ` Martin Sebor
2018-08-28 4:27 ` Jeff Law
2018-08-28 9:56 ` Richard Biener
2018-08-28 9:57 ` Richard Biener
2018-08-29 0:12 ` Martin Sebor
2018-08-29 7:29 ` Richard Biener
2018-08-29 15:43 ` Martin Sebor
2018-08-30 0:27 ` Jeff Law
2018-08-30 8:48 ` Richard Biener
2018-09-12 15:50 ` Martin Sebor
2018-09-18 1:56 ` Jeff Law
2018-09-21 17:40 ` Martin Sebor
2018-10-01 21:31 ` [PING] " Martin Sebor
2018-10-08 22:15 ` Martin Sebor
2018-10-04 15:52 ` Jeff Law
2018-08-28 20:44 ` Martin Sebor
2018-08-28 22:17 ` Jeff Law
2018-08-27 20:31 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d51af7f4-f1bb-fbea-bd7e-b67d4632f0a5@gmail.com \
--to=msebor@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).