From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 750 invoked by alias); 29 Nov 2018 20:34:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 741 invoked by uid 89); 29 Nov 2018 20:34:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=losing, defer, proved, disagreeing X-HELO: mail-qt1-f196.google.com Received: from mail-qt1-f196.google.com (HELO mail-qt1-f196.google.com) (209.85.160.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 29 Nov 2018 20:34:38 +0000 Received: by mail-qt1-f196.google.com with SMTP id n32so3527086qte.11 for ; Thu, 29 Nov 2018 12:34:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=F+8lOtLMN4lW9/j/HOhbtEuGeIY0hE5liGePxwP0R0Q=; b=SVrS4I7nclr8TNBjCZg0QQkfHuzvoFJu+fezGui97DdofzoEggFUE3jlIyCjrYp/nq xs9+ps2W/yep8pCoghlfBNhdP3JpvpWoUpyWEYryHgYliDlFM6oUkKeXeYGiPNKDEqfb rmdrAFMMqS9OSF1m85bfm7uDDrGtwGw/g9HfzdAzf6wtfLkOC0hD69XTyjnPAcXv8JI+ CtjwiwvxvPYaxsGoxdwDY5pm7t7Y1qA5duaDnwUfTWHmz5UAM7ZfZB0MrsVUhSujpQMP djigm4tZApopfSi4vA6up4thX25lgIODJjSn1X40WfTX4fc8yk0OO0simEtCBTOATVkU yIfQ== Return-Path: Received: from localhost.localdomain (97-118-99-160.hlrn.qwest.net. [97.118.99.160]) by smtp.gmail.com with ESMTPSA id o34sm1464032qte.4.2018.11.29.12.34.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 29 Nov 2018 12:34:34 -0800 (PST) From: Martin Sebor Subject: Re: [PING #4][PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Richard Biener Cc: Jeff Law , GCC Patches References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <27235a6d-6f2c-1f2d-d456-d4cd9b941894@redhat.com> <23ea3d13-d9c5-1b02-f01c-d2a0e11f3a10@redhat.com> <9e3f6d62-47b9-b80f-b8ac-5711628579c5@redhat.com> <09ce3b57-33a3-86ae-1308-39fd02f25228@gmail.com> <1437ae83-c0c2-e18f-0dc8-92717c2fdcfe@gmail.com> Message-ID: Date: Thu, 29 Nov 2018 20:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg02492.txt.bz2 On 11/16/2018 02:07 AM, Richard Biener wrote: > On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor wrote: >> >> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html >> >> Please let me know if there is something I need to change here >> to make the fix acceptable or if I should stop trying. > > I have one more comment about > > + /* Defer warning (and folding) until the next statement in the basic > + block is reachable. */ > + if (!gimple_bb (stmt)) > + return false; > + > > it's not about the next statement in the basic-block being "reachable" > (even w/o a CFG you can use gsi_next()) but rather that the next > stmt isn't yet gimplified and thus not inserted into the gimple sequence, > right? No, it's about the current statement not being associated with a basic block yet when the warning code runs for the first time (during gimplify_expr), and so gsi_next() returning null. > You apply this to gimple_fold_builtin_strncpy but I'd rather > see us not sprinkling this over gimple-fold.c but instead do this > in gimplify.c:maybe_fold_stmt, delaying folding until say lowering. > > See the attached (untested). I would also prefer this solution. I had tested it (in response to you first mentioning it back in September) and it causes quite a bit of fallout in tests that look for the folding to take place very early. See the end of my reply here: https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01248.html But I'm willing to do the test suite cleanup if you think it's suitable for GCC 9. (If you're thinking GCC 10 please let me know now.) Thanks Martin > > Richard. > > > >> On 10/31/2018 10:33 AM, Martin Sebor wrote: >>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html >>> >>> On 10/20/2018 06:01 PM, Martin Sebor wrote: >>>> On 10/16/2018 03:21 PM, Jeff Law wrote: >>>>> On 10/4/18 9:51 AM, Martin Sebor wrote: >>>>>> On 10/04/2018 08:58 AM, Jeff Law wrote: >>>>>>> On 8/27/18 9:42 AM, Richard Biener wrote: >>>>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law wrote: >>>>>>>>> >>>>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote: >>>>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: >>>>>>>>>>> >>>>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: >>>>>>>>>>>> The warning suppression for -Wstringop-truncation looks for >>>>>>>>>>>> the next statement after a truncating strncpy to see if it >>>>>>>>>>>> adds a terminating nul. This only works when the next >>>>>>>>>>>> statement can be reached using the Gimple statement iterator >>>>>>>>>>>> which isn't until after gimplification. As a result, strncpy >>>>>>>>>>>> calls that truncate their constant argument that are being >>>>>>>>>>>> folded to memcpy this early get diagnosed even if they are >>>>>>>>>>>> followed by the nul assignment: >>>>>>>>>>>> >>>>>>>>>>>> const char s[] = "12345"; >>>>>>>>>>>> char d[3]; >>>>>>>>>>>> >>>>>>>>>>>> void f (void) >>>>>>>>>>>> { >>>>>>>>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation >>>>>>>>>>>> d[sizeof d - 1] = 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> To avoid the warning I propose to defer folding strncpy to >>>>>>>>>>>> memcpy until the pointer to the basic block the strnpy call >>>>>>>>>>>> is in can be used to try to reach the next statement (this >>>>>>>>>>>> happens as early as ccp1). I'm aware of the preference to >>>>>>>>>>>> fold things early but in the case of strncpy (a relatively >>>>>>>>>>>> rarely used function that is often misused), getting >>>>>>>>>>>> the warning right while folding a bit later but still fairly >>>>>>>>>>>> early on seems like a reasonable compromise. I fear that >>>>>>>>>>>> otherwise, the false positives will drive users to adopt >>>>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of >>>>>>>>>>>> bugs cannot be as readily detected. >>>>>>>>>>>> >>>>>>>>>>>> Tested on x86_64-linux. >>>>>>>>>>>> >>>>>>>>>>>> Martin >>>>>>>>>>>> >>>>>>>>>>>> PS There still are outstanding cases where the warning can >>>>>>>>>>>> be avoided. I xfailed them in the test for now but will >>>>>>>>>>>> still try to get them to work for GCC 9. >>>>>>>>>>>> >>>>>>>>>>>> gcc-87028.diff >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation >>>>>>>>>>>> strncpy with global variable source string >>>>>>>>>>>> gcc/ChangeLog: >>>>>>>>>>>> >>>>>>>>>>>> PR tree-optimization/87028 >>>>>>>>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid >>>>>>>>>>>> folding when >>>>>>>>>>>> statement doesn't belong to a basic block. >>>>>>>>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle >>>>>>>>>>>> MEM_REF on >>>>>>>>>>>> the left hand side of assignment. >>>>>>>>>>>> >>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>> >>>>>>>>>>>> PR tree-optimization/87028 >>>>>>>>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. >>>>>>>>>>>> * gcc.dg/Wstringop-truncation-5.c: New test. >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c >>>>>>>>>>>> index 07341eb..284c2fb 100644 >>>>>>>>>>>> --- a/gcc/gimple-fold.c >>>>>>>>>>>> +++ b/gcc/gimple-fold.c >>>>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy >>>>>>>>>>>> (gimple_stmt_iterator *gsi, >>>>>>>>>>>> if (tree_int_cst_lt (ssize, len)) >>>>>>>>>>>> return false; >>>>>>>>>>>> >>>>>>>>>>>> + /* Defer warning (and folding) until the next statement in the >>>>>>>>>>>> basic >>>>>>>>>>>> + block is reachable. */ >>>>>>>>>>>> + if (!gimple_bb (stmt)) >>>>>>>>>>>> + return false; >>>>>>>>>>> I think you want cfun->cfg as the test here. They should be >>>>>>>>>>> equivalent >>>>>>>>>>> in practice. >>>>>>>>>> >>>>>>>>>> Please do not add 'cfun' references. Note that the next stmt is >>>>>>>>>> also accessible >>>>>>>>>> when there is no CFG. I guess the issue is that we fold this >>>>>>>>>> during >>>>>>>>>> gimplification where the next stmt is not yet "there" (but still in >>>>>>>>>> GENERIC)? >>>>>>>>> That was my assumption. I almost suggested peeking at gsi_next and >>>>>>>>> avoiding in that case. >>>>>>>> >>>>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then. >>>>>>> So I think the concern with adding the guards to maybe_fold_stmt is >>>>>>> the >>>>>>> possibility of further fallout. >>>>>>> >>>>>>> I guess they could be written to target this case specifically to >>>>>>> minimize fallout, but that feels like we're doing the same thing >>>>>>> (band-aid) just in a different place. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>>> >>>>>>>>>> We generally do not want to have unfolded stmts in the IL when we >>>>>>>>>> can avoid that >>>>>>>>>> which is why we fold most stmts during gimplification. We also do >>>>>>>>>> that because >>>>>>>>>> we now do less folding on GENERIC. >>>>>>>>> But an unfolded call in the IL should always be safe and we've got >>>>>>>>> plenty of opportunities to fold it later. >>>>>>>> >>>>>>>> Well - we do. The very first one is forwprop though which means >>>>>>>> we'll miss to >>>>>>>> re-write some memcpy parts into SSA: >>>>>>>> >>>>>>>> NEXT_PASS (pass_ccp, false /* nonzero_p */); >>>>>>>> /* After CCP we rewrite no longer addressed locals into SSA >>>>>>>> form if possible. */ >>>>>>>> NEXT_PASS (pass_forwprop); >>>>>>>> >>>>>>>> likewise early object-size will be confused by memcpy calls that just >>>>>>>> exist >>>>>>>> to avoid TBAA issues (another of our recommendations besides using >>>>>>>> unions). >>>>>>>> >>>>>>>> We do fold mem* early for a reason ;) >>>>>>>> >>>>>>>> "We can always do warnings earlier" would be a similar true sentence. >>>>>>> I'm not disagreeing at all. There's a natural tension between the >>>>>>> benefits of folding early to enable more optimizations downstream and >>>>>>> leaving the IL in a state where we can give actionable warnings. >>>>>> >>>>>> Similar trade-offs between folding early and losing information >>>>>> as a result also impact high-level optimizations. >>>>>> >>>>>> For instance, folding the strlen argument below >>>>>> >>>>>> void f3 (struct A* p) >>>>>> { >>>>>> __builtin_strcpy (p->a, "123"); >>>>>> >>>>>> if (__builtin_strlen (p->a + 1) != 2) // not folded >>>>>> __builtin_abort (); >>>>>> } >>>>>> >>>>>> into >>>>>> >>>>>> _2 = &MEM[(void *)p_4(D) + 2B]; >>>>>> >>>>>> early on defeats the strlen optimization because there is no >>>>>> mechanism to determine what member (void *)p_4(D) + 2B refers >>>>>> to (this is bug 86955). >>>>>> >>>>>> Another example is folding of strlen calls with no-nconstant >>>>>> offsets into constant strings like here: >>>>>> >>>>>> const char a[] = "123"; >>>>>> >>>>>> void f (int i) >>>>>> { >>>>>> if (__builtin_strlen (&a[i]) > 3) >>>>>> __builtin_abort (); >>>>>> } >>>>>> >>>>>> into sizeof a - 1 - i, which then prevents the result from >>>>>> being folded to false (bug 86434), not to mention the code >>>>>> it emits for out-of-bounds indices. >>>>>> >>>>>> There are a number of other similar examples in Bugzilla >>>>>> that I've filed as I discovered then during testing my >>>>>> warnings (e.g., 86572). >>>>>> >>>>>> In my mind, transforming library calls into "lossy" low-level >>>>>> primitives like MEM_REF would be better done only after higher >>>>>> level optimizations have had a chance to analyze them. Ditto >>>>>> for other similar transformations (like to other library calls). >>>>>> Having more accurate information helps both optimization and >>>>>> warnings. It also makes the warnings more meaningful. >>>>>> Printing "memcpy overflows a buffer" when the source code >>>>>> has a call to strncpy is less than ideal. >>>>>> >>>>>>> Similarly there's a natural tension between warning early vs warning >>>>>>> late. Code that triggers the warning may ultimately be proved >>>>>>> unreachable, or we may discover simplifications that either >>>>>>> suppress or >>>>>>> expose a warning. >>>>>>> >>>>>>> There is no easy answer here. But I think we can legitimately ask >>>>>>> questions. ie, does folding strnlen here really improve things >>>>>>> downstream in ways that are measurable? Does the false positive >>>>>>> really >>>>>>> impact the utility of the warning? etc. >>>>>>> >>>>>>> I'd hazard a guess that Martin is particularly sensitive to false >>>>>>> positives based on feedback he's received from our developer community >>>>>>> as well as downstream consumers of his work. >>>>>> >>>>>> Yes. The kernel folks in particular have done a lot of work >>>>>> cleaning up their code in an effort to adopt the warning and >>>>>> attribute nonstring. They have been keeping me in the loop >>>>>> on their progress (and feeding me back test cases with false >>>>>> positives and negatives they run into). >>>>> I can't recall seeing further guidance from Richi WRT putting the checks >>>>> earlier (maybe_fold_stmt). >>>>> >>>>> If the point here is to avoid false positives by not folding strncpy, >>>>> particularly in cases where we don't see the NUL in the copy, but it >>>>> appears in a subsequent store, then let's be fairly selective (so as not >>>>> to muck up things on the optimization side more than is necessary). >>>>> >>>>> ISTM we can do this by refactoring the warning bits so they're reusable >>>>> at different points in the pipeline. Those bits would always return a >>>>> boolean indicating if the given statement might generate a warning or >>>>> not. >>>>> >>>>> When called early, they would not actually issue any warning. They >>>>> would merely do the best analysis they can and return a status >>>>> indicating whether or not the statement would generate a warning given >>>>> current context. The goal here is to leave statements that might >>>>> generate a warning as-is in the IL. >>>>> >>>>> When called late (assuming there is a point where we can walk the IL and >>>>> issue the appropriate warnings), the routine would actually issue the >>>>> warning. >>>>> >>>>> The kind of structure could potentially work for other builtins where we >>>>> may need to look at subsequent statements to avoid false positives, but >>>>> early folding hides cases by transforming the call into an undesirable >>>>> form. >>>>> >>>>> Note that for cases where a call looks problematical early because we >>>>> can't see statement which stores the terminator, but where the >>>>> terminator statement ultimately becomes visible, we still get folding, >>>>> it just happens later in the pipeline. >>>>> >>>>> Thoughts? >>>> >>>> The warning only triggers when the bound is less than or equal >>>> to the length of the constant source string (i.e, when strncpy >>>> truncates). So IIUC, your suggestion would defer folding only >>>> such strncpy calls and let gimple_fold_builtin_strncpy fold >>>> those with a constant bound that's greater than the length of >>>> the constant source string. That would be fine with me, but >>>> since strncpy calls with a bound that's greater than the length >>>> of the source are pointless I don't think they are important >>>> enough to worry about folding super early. The constant ones >>>> that serve any purpose (and that are presumably important to >>>> optimize) are those that truncate. >>>> >>>> That said, when optimization isn't enabled, I don't think users >>>> expect calls to library functions to be transformed to calls to >>>> other functions, or inlined. Yet that's just what GCC does. >>>> For example, besides triggering the warning, the following: >>>> >>>> char a[4]; >>>> >>>> void f (char *s) >>>> { >>>> __builtin_strncpy (a, "1234", sizeof a); >>>> a[3] = 0; >>>> } >>>> >>>> is transformed, even at -O0, into: >>>> >>>> f (char * s) >>>> { >>>> : >>>> MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"]; >>>> a[3] = 0; >>>> return; >>>> } >>>> >>>> That doesn't seem right. GCC should avoid these transformations >>>> at -O0, and one way to do that is to defer folding until the CFG >>>> is constructed. The patch does it for strncpy but a more general >>>> solution would do that for all calls, e.g., in maybe_fold_stmt >>>> as Richard suggested (and I subsequently tested). >>>> >>>> Martin >>> >>