From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 95335 invoked by alias); 16 Nov 2018 09:07:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 94849 invoked by uid 89); 16 Nov 2018 09:07:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Google-Smtp-Source:AJdET5fz X-HELO: mail-lj1-f196.google.com Received: from mail-lj1-f196.google.com (HELO mail-lj1-f196.google.com) (209.85.208.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Nov 2018 09:07:19 +0000 Received: by mail-lj1-f196.google.com with SMTP id t22-v6so19649287lji.7 for ; Fri, 16 Nov 2018 01:07:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sRkxArI04nfWu3MtQyFdwZ7mU6fxZRcvhEV/LiqPqPM=; b=teifz6xGj1jdnIFXlrd/908gDsXrRzE3IAGtYg1JPX91fVSBm1GtKrs/x3BPjBqPpo PzxRgTNV1+lwEDwQtFoHSl5DoqnzqyXLY17iVx2YAYv1pzovesT3NCql2yViyF6mn0nd XUgoojKo4/rgy8yCzeGsbWa9iPohSay2W0mqL4Kgk1LboSyblzYak1+hB6DEinbZVuwo lH0EjNI/oOEiagneTLZkwBaPLvorClq3xnxpT7pVSjl6X40HRCRQqLvarSN2pINZY39d 4oeAP87TtLdUIQm5OaFL1oKVx1kZShV3zvejz/fkstQCl+o15jZ2+aw5wpP9hWvgbWri kIJw== MIME-Version: 1.0 References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <27235a6d-6f2c-1f2d-d456-d4cd9b941894@redhat.com> <23ea3d13-d9c5-1b02-f01c-d2a0e11f3a10@redhat.com> <9e3f6d62-47b9-b80f-b8ac-5711628579c5@redhat.com> <09ce3b57-33a3-86ae-1308-39fd02f25228@gmail.com> <1437ae83-c0c2-e18f-0dc8-92717c2fdcfe@gmail.com> In-Reply-To: <1437ae83-c0c2-e18f-0dc8-92717c2fdcfe@gmail.com> From: Richard Biener Date: Fri, 16 Nov 2018 09:07:00 -0000 Message-ID: Subject: Re: [PING #4][PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Martin Sebor Cc: Jeff Law , GCC Patches Content-Type: multipart/mixed; boundary="0000000000006e234a057ac47e72" X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg01430.txt.bz2 --0000000000006e234a057ac47e72 Content-Type: text/plain; charset="UTF-8" Content-length: 13566 On Fri, Nov 16, 2018 at 4:12 AM Martin Sebor wrote: > > Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html > > Please let me know if there is something I need to change here > to make the fix acceptable or if I should stop trying. I have one more comment about + /* Defer warning (and folding) until the next statement in the basic + block is reachable. */ + if (!gimple_bb (stmt)) + return false; + it's not about the next statement in the basic-block being "reachable" (even w/o a CFG you can use gsi_next()) but rather that the next stmt isn't yet gimplified and thus not inserted into the gimple sequence, right? You apply this to gimple_fold_builtin_strncpy but I'd rather see us not sprinkling this over gimple-fold.c but instead do this in gimplify.c:maybe_fold_stmt, delaying folding until say lowering. See the attached (untested). Richard. > On 10/31/2018 10:33 AM, Martin Sebor wrote: > > Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html > > > > On 10/20/2018 06:01 PM, Martin Sebor wrote: > >> On 10/16/2018 03:21 PM, Jeff Law wrote: > >>> On 10/4/18 9:51 AM, Martin Sebor wrote: > >>>> On 10/04/2018 08:58 AM, Jeff Law wrote: > >>>>> On 8/27/18 9:42 AM, Richard Biener wrote: > >>>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law wrote: > >>>>>>> > >>>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote: > >>>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: > >>>>>>>>> > >>>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: > >>>>>>>>>> The warning suppression for -Wstringop-truncation looks for > >>>>>>>>>> the next statement after a truncating strncpy to see if it > >>>>>>>>>> adds a terminating nul. This only works when the next > >>>>>>>>>> statement can be reached using the Gimple statement iterator > >>>>>>>>>> which isn't until after gimplification. As a result, strncpy > >>>>>>>>>> calls that truncate their constant argument that are being > >>>>>>>>>> folded to memcpy this early get diagnosed even if they are > >>>>>>>>>> followed by the nul assignment: > >>>>>>>>>> > >>>>>>>>>> const char s[] = "12345"; > >>>>>>>>>> char d[3]; > >>>>>>>>>> > >>>>>>>>>> void f (void) > >>>>>>>>>> { > >>>>>>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation > >>>>>>>>>> d[sizeof d - 1] = 0; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> To avoid the warning I propose to defer folding strncpy to > >>>>>>>>>> memcpy until the pointer to the basic block the strnpy call > >>>>>>>>>> is in can be used to try to reach the next statement (this > >>>>>>>>>> happens as early as ccp1). I'm aware of the preference to > >>>>>>>>>> fold things early but in the case of strncpy (a relatively > >>>>>>>>>> rarely used function that is often misused), getting > >>>>>>>>>> the warning right while folding a bit later but still fairly > >>>>>>>>>> early on seems like a reasonable compromise. I fear that > >>>>>>>>>> otherwise, the false positives will drive users to adopt > >>>>>>>>>> other unsafe solutions (like memcpy) where these kinds of > >>>>>>>>>> bugs cannot be as readily detected. > >>>>>>>>>> > >>>>>>>>>> Tested on x86_64-linux. > >>>>>>>>>> > >>>>>>>>>> Martin > >>>>>>>>>> > >>>>>>>>>> PS There still are outstanding cases where the warning can > >>>>>>>>>> be avoided. I xfailed them in the test for now but will > >>>>>>>>>> still try to get them to work for GCC 9. > >>>>>>>>>> > >>>>>>>>>> gcc-87028.diff > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation > >>>>>>>>>> strncpy with global variable source string > >>>>>>>>>> gcc/ChangeLog: > >>>>>>>>>> > >>>>>>>>>> PR tree-optimization/87028 > >>>>>>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid > >>>>>>>>>> folding when > >>>>>>>>>> statement doesn't belong to a basic block. > >>>>>>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle > >>>>>>>>>> MEM_REF on > >>>>>>>>>> the left hand side of assignment. > >>>>>>>>>> > >>>>>>>>>> gcc/testsuite/ChangeLog: > >>>>>>>>>> > >>>>>>>>>> PR tree-optimization/87028 > >>>>>>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. > >>>>>>>>>> * gcc.dg/Wstringop-truncation-5.c: New test. > >>>>>>>>>> > >>>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > >>>>>>>>>> index 07341eb..284c2fb 100644 > >>>>>>>>>> --- a/gcc/gimple-fold.c > >>>>>>>>>> +++ b/gcc/gimple-fold.c > >>>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy > >>>>>>>>>> (gimple_stmt_iterator *gsi, > >>>>>>>>>> if (tree_int_cst_lt (ssize, len)) > >>>>>>>>>> return false; > >>>>>>>>>> > >>>>>>>>>> + /* Defer warning (and folding) until the next statement in the > >>>>>>>>>> basic > >>>>>>>>>> + block is reachable. */ > >>>>>>>>>> + if (!gimple_bb (stmt)) > >>>>>>>>>> + return false; > >>>>>>>>> I think you want cfun->cfg as the test here. They should be > >>>>>>>>> equivalent > >>>>>>>>> in practice. > >>>>>>>> > >>>>>>>> Please do not add 'cfun' references. Note that the next stmt is > >>>>>>>> also accessible > >>>>>>>> when there is no CFG. I guess the issue is that we fold this > >>>>>>>> during > >>>>>>>> gimplification where the next stmt is not yet "there" (but still in > >>>>>>>> GENERIC)? > >>>>>>> That was my assumption. I almost suggested peeking at gsi_next and > >>>>>>> avoiding in that case. > >>>>>> > >>>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then. > >>>>> So I think the concern with adding the guards to maybe_fold_stmt is > >>>>> the > >>>>> possibility of further fallout. > >>>>> > >>>>> I guess they could be written to target this case specifically to > >>>>> minimize fallout, but that feels like we're doing the same thing > >>>>> (band-aid) just in a different place. > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>>>> > >>>>>>>> We generally do not want to have unfolded stmts in the IL when we > >>>>>>>> can avoid that > >>>>>>>> which is why we fold most stmts during gimplification. We also do > >>>>>>>> that because > >>>>>>>> we now do less folding on GENERIC. > >>>>>>> But an unfolded call in the IL should always be safe and we've got > >>>>>>> plenty of opportunities to fold it later. > >>>>>> > >>>>>> Well - we do. The very first one is forwprop though which means > >>>>>> we'll miss to > >>>>>> re-write some memcpy parts into SSA: > >>>>>> > >>>>>> NEXT_PASS (pass_ccp, false /* nonzero_p */); > >>>>>> /* After CCP we rewrite no longer addressed locals into SSA > >>>>>> form if possible. */ > >>>>>> NEXT_PASS (pass_forwprop); > >>>>>> > >>>>>> likewise early object-size will be confused by memcpy calls that just > >>>>>> exist > >>>>>> to avoid TBAA issues (another of our recommendations besides using > >>>>>> unions). > >>>>>> > >>>>>> We do fold mem* early for a reason ;) > >>>>>> > >>>>>> "We can always do warnings earlier" would be a similar true sentence. > >>>>> I'm not disagreeing at all. There's a natural tension between the > >>>>> benefits of folding early to enable more optimizations downstream and > >>>>> leaving the IL in a state where we can give actionable warnings. > >>>> > >>>> Similar trade-offs between folding early and losing information > >>>> as a result also impact high-level optimizations. > >>>> > >>>> For instance, folding the strlen argument below > >>>> > >>>> void f3 (struct A* p) > >>>> { > >>>> __builtin_strcpy (p->a, "123"); > >>>> > >>>> if (__builtin_strlen (p->a + 1) != 2) // not folded > >>>> __builtin_abort (); > >>>> } > >>>> > >>>> into > >>>> > >>>> _2 = &MEM[(void *)p_4(D) + 2B]; > >>>> > >>>> early on defeats the strlen optimization because there is no > >>>> mechanism to determine what member (void *)p_4(D) + 2B refers > >>>> to (this is bug 86955). > >>>> > >>>> Another example is folding of strlen calls with no-nconstant > >>>> offsets into constant strings like here: > >>>> > >>>> const char a[] = "123"; > >>>> > >>>> void f (int i) > >>>> { > >>>> if (__builtin_strlen (&a[i]) > 3) > >>>> __builtin_abort (); > >>>> } > >>>> > >>>> into sizeof a - 1 - i, which then prevents the result from > >>>> being folded to false (bug 86434), not to mention the code > >>>> it emits for out-of-bounds indices. > >>>> > >>>> There are a number of other similar examples in Bugzilla > >>>> that I've filed as I discovered then during testing my > >>>> warnings (e.g., 86572). > >>>> > >>>> In my mind, transforming library calls into "lossy" low-level > >>>> primitives like MEM_REF would be better done only after higher > >>>> level optimizations have had a chance to analyze them. Ditto > >>>> for other similar transformations (like to other library calls). > >>>> Having more accurate information helps both optimization and > >>>> warnings. It also makes the warnings more meaningful. > >>>> Printing "memcpy overflows a buffer" when the source code > >>>> has a call to strncpy is less than ideal. > >>>> > >>>>> Similarly there's a natural tension between warning early vs warning > >>>>> late. Code that triggers the warning may ultimately be proved > >>>>> unreachable, or we may discover simplifications that either > >>>>> suppress or > >>>>> expose a warning. > >>>>> > >>>>> There is no easy answer here. But I think we can legitimately ask > >>>>> questions. ie, does folding strnlen here really improve things > >>>>> downstream in ways that are measurable? Does the false positive > >>>>> really > >>>>> impact the utility of the warning? etc. > >>>>> > >>>>> I'd hazard a guess that Martin is particularly sensitive to false > >>>>> positives based on feedback he's received from our developer community > >>>>> as well as downstream consumers of his work. > >>>> > >>>> Yes. The kernel folks in particular have done a lot of work > >>>> cleaning up their code in an effort to adopt the warning and > >>>> attribute nonstring. They have been keeping me in the loop > >>>> on their progress (and feeding me back test cases with false > >>>> positives and negatives they run into). > >>> I can't recall seeing further guidance from Richi WRT putting the checks > >>> earlier (maybe_fold_stmt). > >>> > >>> If the point here is to avoid false positives by not folding strncpy, > >>> particularly in cases where we don't see the NUL in the copy, but it > >>> appears in a subsequent store, then let's be fairly selective (so as not > >>> to muck up things on the optimization side more than is necessary). > >>> > >>> ISTM we can do this by refactoring the warning bits so they're reusable > >>> at different points in the pipeline. Those bits would always return a > >>> boolean indicating if the given statement might generate a warning or > >>> not. > >>> > >>> When called early, they would not actually issue any warning. They > >>> would merely do the best analysis they can and return a status > >>> indicating whether or not the statement would generate a warning given > >>> current context. The goal here is to leave statements that might > >>> generate a warning as-is in the IL. > >>> > >>> When called late (assuming there is a point where we can walk the IL and > >>> issue the appropriate warnings), the routine would actually issue the > >>> warning. > >>> > >>> The kind of structure could potentially work for other builtins where we > >>> may need to look at subsequent statements to avoid false positives, but > >>> early folding hides cases by transforming the call into an undesirable > >>> form. > >>> > >>> Note that for cases where a call looks problematical early because we > >>> can't see statement which stores the terminator, but where the > >>> terminator statement ultimately becomes visible, we still get folding, > >>> it just happens later in the pipeline. > >>> > >>> Thoughts? > >> > >> The warning only triggers when the bound is less than or equal > >> to the length of the constant source string (i.e, when strncpy > >> truncates). So IIUC, your suggestion would defer folding only > >> such strncpy calls and let gimple_fold_builtin_strncpy fold > >> those with a constant bound that's greater than the length of > >> the constant source string. That would be fine with me, but > >> since strncpy calls with a bound that's greater than the length > >> of the source are pointless I don't think they are important > >> enough to worry about folding super early. The constant ones > >> that serve any purpose (and that are presumably important to > >> optimize) are those that truncate. > >> > >> That said, when optimization isn't enabled, I don't think users > >> expect calls to library functions to be transformed to calls to > >> other functions, or inlined. Yet that's just what GCC does. > >> For example, besides triggering the warning, the following: > >> > >> char a[4]; > >> > >> void f (char *s) > >> { > >> __builtin_strncpy (a, "1234", sizeof a); > >> a[3] = 0; > >> } > >> > >> is transformed, even at -O0, into: > >> > >> f (char * s) > >> { > >> : > >> MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"]; > >> a[3] = 0; > >> return; > >> } > >> > >> That doesn't seem right. GCC should avoid these transformations > >> at -O0, and one way to do that is to defer folding until the CFG > >> is constructed. The patch does it for strncpy but a more general > >> solution would do that for all calls, e.g., in maybe_fold_stmt > >> as Richard suggested (and I subsequently tested). > >> > >> Martin > > > --0000000000006e234a057ac47e72 Content-Type: application/octet-stream; name=p Content-Disposition: attachment; filename=p Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jojsurey0 Content-length: 1383 ZGlmZiAtLWdpdCBhL2djYy9naW1wbGUtbG93LmMgYi9nY2MvZ2ltcGxlLWxv dy5jCmluZGV4IGMzNzc3YTFlNzYxLi4xYmI5Njg1MDg4MCAxMDA2NDQKLS0t IGEvZ2NjL2dpbXBsZS1sb3cuYworKysgYi9nY2MvZ2ltcGxlLWxvdy5jCkBA IC0zNzgsNiArMzc4LDEyIEBAIGxvd2VyX3N0bXQgKGdpbXBsZV9zdG10X2l0 ZXJhdG9yICpnc2ksIHN0cnVjdCBsb3dlcl9kYXRhICpkYXRhKQogCSAgICBn c2lfbmV4dCAoZ3NpKTsKIAkgICAgcmV0dXJuOwogCSAgfQorCisJLyogV2Ug ZGVsYXkgZm9sZGluZyBvZiBidWlsdCBjYWxscyBmcm9tIGdpbXBsaWZpY2F0 aW9uIHRvCisJICAgaGVyZSBzbyB0aGUgSUwgaXMgaW4gY29uc2lzdGVudCBz dGF0ZSBmb3IgdGhlIGRpYWdub3N0aWMKKwkgICBtYWNoaW5lcmllcyBqb2Iu ICAqLworCWlmIChnaW1wbGVfY2FsbF9idWlsdGluX3AgKHN0bXQpKQorCSAg Zm9sZF9zdG10IChnc2kpOwogICAgICAgfQogICAgICAgYnJlYWs7CiAKZGlm ZiAtLWdpdCBhL2djYy9naW1wbGlmeS5jIGIvZ2NjL2dpbXBsaWZ5LmMKaW5k ZXggODcwODJhZDEwZDIuLjBmYWM4YTE3MzZiIDEwMDY0NAotLS0gYS9nY2Mv Z2ltcGxpZnkuYworKysgYi9nY2MvZ2ltcGxpZnkuYwpAQCAtMzE4NCw2ICsz MTg0LDEwIEBAIG1heWJlX2ZvbGRfc3RtdCAoZ2ltcGxlX3N0bXRfaXRlcmF0 b3IgKmdzaSkKICAgICAgIHJldHVybiBmYWxzZTsKICAgICBlbHNlIGlmICgo Y3R4LT5yZWdpb25fdHlwZSAmIE9SVF9IT1NUX1RFQU1TKSA9PSBPUlRfSE9T VF9URUFNUykKICAgICAgIHJldHVybiBmYWxzZTsKKyAgLyogRGVsYXkgZm9s ZGluZyBvZiBidWlsdGlucyB1bnRpbCB0aGUgSUwgaXMgaW4gY29uc2lzdGVu dCBzdGF0ZQorICAgICBzbyB0aGUgZGlhZ25vc3RpYyBtYWNoaW5lcnkgY2Fu IGRvIGEgYmV0dGVyIGpvYi4gICovCisgIGlmIChnaW1wbGVfY2FsbF9idWls dGluX3AgKGdzaV9zdG10IChnc2kpKSkKKyAgICByZXR1cm4gZmFsc2U7CiAg IHJldHVybiBmb2xkX3N0bXQgKGdzaSk7CiB9CiAK --0000000000006e234a057ac47e72--