From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 49243 invoked by alias); 31 Oct 2018 16:33:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 48923 invoked by uid 89); 31 Oct 2018 16:33:51 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=suppression, hes, Richi, he's X-HELO: mail-qt1-f196.google.com Received: from mail-qt1-f196.google.com (HELO mail-qt1-f196.google.com) (209.85.160.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 31 Oct 2018 16:33:48 +0000 Received: by mail-qt1-f196.google.com with SMTP id v1-v6so13572041qtq.5 for ; Wed, 31 Oct 2018 09:33:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=yWFvol6ulT/T7himIMNsnmeDle0S6qPhVOACSY8DAjQ=; b=Ue+xaE3elFrWf7TVaSZ1FQ0QNjloh+Kyo79CitVYgIW5uQZ3/xpSKykxN9IuuVte+S VS7rmA2f7a5TgpfildcCmSvrSgivuBul7Qowh6POP8hSApIUN5P6IcIkawNLAIfhNNfK dtYLvkNfX9sQnmVdVNP+D4IFAx9vFLK9RTGAtgl7n2lOHE6jVjx0IoeIw1Yam9YH3mUH xutaK8g5uq0BSGq5cFOkTtUEfBcjghe8IcWYqUpoN5h3oZoGFVOhgIxAAkTdpcw95ngQ C3s+vXRqftuBaiQ1f+9d4U+JxHNFDjOfXX6Z27DtnN61RpIaZcU0seevcpMsP1FFnzVL fPxQ== Return-Path: Received: from localhost.localdomain (184-96-239-209.hlrn.qwest.net. [184.96.239.209]) by smtp.gmail.com with ESMTPSA id o26-v6sm13987244qtr.10.2018.10.31.09.33.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 31 Oct 2018 09:33:44 -0700 (PDT) Subject: [PING #3][PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Jeff Law , Richard Biener References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <27235a6d-6f2c-1f2d-d456-d4cd9b941894@redhat.com> <23ea3d13-d9c5-1b02-f01c-d2a0e11f3a10@redhat.com> <9e3f6d62-47b9-b80f-b8ac-5711628579c5@redhat.com> Cc: GCC Patches From: Martin Sebor Message-ID: <09ce3b57-33a3-86ae-1308-39fd02f25228@gmail.com> Date: Wed, 31 Oct 2018 17:07:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2018-10/txt/msg02061.txt.bz2 Ping: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01818.html On 10/20/2018 06:01 PM, Martin Sebor wrote: > On 10/16/2018 03:21 PM, Jeff Law wrote: >> On 10/4/18 9:51 AM, Martin Sebor wrote: >>> On 10/04/2018 08:58 AM, Jeff Law wrote: >>>> On 8/27/18 9:42 AM, Richard Biener wrote: >>>>> On Mon, Aug 27, 2018 at 5:32 PM Jeff Law wrote: >>>>>> >>>>>> On 08/27/2018 02:29 AM, Richard Biener wrote: >>>>>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: >>>>>>>> >>>>>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: >>>>>>>>> The warning suppression for -Wstringop-truncation looks for >>>>>>>>> the next statement after a truncating strncpy to see if it >>>>>>>>> adds a terminating nul. This only works when the next >>>>>>>>> statement can be reached using the Gimple statement iterator >>>>>>>>> which isn't until after gimplification. As a result, strncpy >>>>>>>>> calls that truncate their constant argument that are being >>>>>>>>> folded to memcpy this early get diagnosed even if they are >>>>>>>>> followed by the nul assignment: >>>>>>>>> >>>>>>>>> const char s[] = "12345"; >>>>>>>>> char d[3]; >>>>>>>>> >>>>>>>>> void f (void) >>>>>>>>> { >>>>>>>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation >>>>>>>>> d[sizeof d - 1] = 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> To avoid the warning I propose to defer folding strncpy to >>>>>>>>> memcpy until the pointer to the basic block the strnpy call >>>>>>>>> is in can be used to try to reach the next statement (this >>>>>>>>> happens as early as ccp1). I'm aware of the preference to >>>>>>>>> fold things early but in the case of strncpy (a relatively >>>>>>>>> rarely used function that is often misused), getting >>>>>>>>> the warning right while folding a bit later but still fairly >>>>>>>>> early on seems like a reasonable compromise. I fear that >>>>>>>>> otherwise, the false positives will drive users to adopt >>>>>>>>> other unsafe solutions (like memcpy) where these kinds of >>>>>>>>> bugs cannot be as readily detected. >>>>>>>>> >>>>>>>>> Tested on x86_64-linux. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> PS There still are outstanding cases where the warning can >>>>>>>>> be avoided. I xfailed them in the test for now but will >>>>>>>>> still try to get them to work for GCC 9. >>>>>>>>> >>>>>>>>> gcc-87028.diff >>>>>>>>> >>>>>>>>> >>>>>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation >>>>>>>>> strncpy with global variable source string >>>>>>>>> gcc/ChangeLog: >>>>>>>>> >>>>>>>>> PR tree-optimization/87028 >>>>>>>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid >>>>>>>>> folding when >>>>>>>>> statement doesn't belong to a basic block. >>>>>>>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle >>>>>>>>> MEM_REF on >>>>>>>>> the left hand side of assignment. >>>>>>>>> >>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>> >>>>>>>>> PR tree-optimization/87028 >>>>>>>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. >>>>>>>>> * gcc.dg/Wstringop-truncation-5.c: New test. >>>>>>>>> >>>>>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c >>>>>>>>> index 07341eb..284c2fb 100644 >>>>>>>>> --- a/gcc/gimple-fold.c >>>>>>>>> +++ b/gcc/gimple-fold.c >>>>>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy >>>>>>>>> (gimple_stmt_iterator *gsi, >>>>>>>>> if (tree_int_cst_lt (ssize, len)) >>>>>>>>> return false; >>>>>>>>> >>>>>>>>> + /* Defer warning (and folding) until the next statement in the >>>>>>>>> basic >>>>>>>>> + block is reachable. */ >>>>>>>>> + if (!gimple_bb (stmt)) >>>>>>>>> + return false; >>>>>>>> I think you want cfun->cfg as the test here. They should be >>>>>>>> equivalent >>>>>>>> in practice. >>>>>>> >>>>>>> Please do not add 'cfun' references. Note that the next stmt is >>>>>>> also accessible >>>>>>> when there is no CFG. I guess the issue is that we fold this during >>>>>>> gimplification where the next stmt is not yet "there" (but still in >>>>>>> GENERIC)? >>>>>> That was my assumption. I almost suggested peeking at gsi_next and >>>>>> avoiding in that case. >>>>> >>>>> So I'd rather add guards to maybe_fold_stmt in the gimplifier then. >>>> So I think the concern with adding the guards to maybe_fold_stmt is the >>>> possibility of further fallout. >>>> >>>> I guess they could be written to target this case specifically to >>>> minimize fallout, but that feels like we're doing the same thing >>>> (band-aid) just in a different place. >>>> >>>> >>>> >>>>> >>>>>>> >>>>>>> We generally do not want to have unfolded stmts in the IL when we >>>>>>> can avoid that >>>>>>> which is why we fold most stmts during gimplification. We also do >>>>>>> that because >>>>>>> we now do less folding on GENERIC. >>>>>> But an unfolded call in the IL should always be safe and we've got >>>>>> plenty of opportunities to fold it later. >>>>> >>>>> Well - we do. The very first one is forwprop though which means >>>>> we'll miss to >>>>> re-write some memcpy parts into SSA: >>>>> >>>>> NEXT_PASS (pass_ccp, false /* nonzero_p */); >>>>> /* After CCP we rewrite no longer addressed locals into SSA >>>>> form if possible. */ >>>>> NEXT_PASS (pass_forwprop); >>>>> >>>>> likewise early object-size will be confused by memcpy calls that just >>>>> exist >>>>> to avoid TBAA issues (another of our recommendations besides using >>>>> unions). >>>>> >>>>> We do fold mem* early for a reason ;) >>>>> >>>>> "We can always do warnings earlier" would be a similar true sentence. >>>> I'm not disagreeing at all. There's a natural tension between the >>>> benefits of folding early to enable more optimizations downstream and >>>> leaving the IL in a state where we can give actionable warnings. >>> >>> Similar trade-offs between folding early and losing information >>> as a result also impact high-level optimizations. >>> >>> For instance, folding the strlen argument below >>> >>> void f3 (struct A* p) >>> { >>> __builtin_strcpy (p->a, "123"); >>> >>> if (__builtin_strlen (p->a + 1) != 2) // not folded >>> __builtin_abort (); >>> } >>> >>> into >>> >>> _2 = &MEM[(void *)p_4(D) + 2B]; >>> >>> early on defeats the strlen optimization because there is no >>> mechanism to determine what member (void *)p_4(D) + 2B refers >>> to (this is bug 86955). >>> >>> Another example is folding of strlen calls with no-nconstant >>> offsets into constant strings like here: >>> >>> const char a[] = "123"; >>> >>> void f (int i) >>> { >>> if (__builtin_strlen (&a[i]) > 3) >>> __builtin_abort (); >>> } >>> >>> into sizeof a - 1 - i, which then prevents the result from >>> being folded to false (bug 86434), not to mention the code >>> it emits for out-of-bounds indices. >>> >>> There are a number of other similar examples in Bugzilla >>> that I've filed as I discovered then during testing my >>> warnings (e.g., 86572). >>> >>> In my mind, transforming library calls into "lossy" low-level >>> primitives like MEM_REF would be better done only after higher >>> level optimizations have had a chance to analyze them. Ditto >>> for other similar transformations (like to other library calls). >>> Having more accurate information helps both optimization and >>> warnings. It also makes the warnings more meaningful. >>> Printing "memcpy overflows a buffer" when the source code >>> has a call to strncpy is less than ideal. >>> >>>> Similarly there's a natural tension between warning early vs warning >>>> late. Code that triggers the warning may ultimately be proved >>>> unreachable, or we may discover simplifications that either suppress or >>>> expose a warning. >>>> >>>> There is no easy answer here. But I think we can legitimately ask >>>> questions. ie, does folding strnlen here really improve things >>>> downstream in ways that are measurable? Does the false positive really >>>> impact the utility of the warning? etc. >>>> >>>> I'd hazard a guess that Martin is particularly sensitive to false >>>> positives based on feedback he's received from our developer community >>>> as well as downstream consumers of his work. >>> >>> Yes. The kernel folks in particular have done a lot of work >>> cleaning up their code in an effort to adopt the warning and >>> attribute nonstring. They have been keeping me in the loop >>> on their progress (and feeding me back test cases with false >>> positives and negatives they run into). >> I can't recall seeing further guidance from Richi WRT putting the checks >> earlier (maybe_fold_stmt). >> >> If the point here is to avoid false positives by not folding strncpy, >> particularly in cases where we don't see the NUL in the copy, but it >> appears in a subsequent store, then let's be fairly selective (so as not >> to muck up things on the optimization side more than is necessary). >> >> ISTM we can do this by refactoring the warning bits so they're reusable >> at different points in the pipeline. Those bits would always return a >> boolean indicating if the given statement might generate a warning or >> not. >> >> When called early, they would not actually issue any warning. They >> would merely do the best analysis they can and return a status >> indicating whether or not the statement would generate a warning given >> current context. The goal here is to leave statements that might >> generate a warning as-is in the IL. >> >> When called late (assuming there is a point where we can walk the IL and >> issue the appropriate warnings), the routine would actually issue the >> warning. >> >> The kind of structure could potentially work for other builtins where we >> may need to look at subsequent statements to avoid false positives, but >> early folding hides cases by transforming the call into an undesirable >> form. >> >> Note that for cases where a call looks problematical early because we >> can't see statement which stores the terminator, but where the >> terminator statement ultimately becomes visible, we still get folding, >> it just happens later in the pipeline. >> >> Thoughts? > > The warning only triggers when the bound is less than or equal > to the length of the constant source string (i.e, when strncpy > truncates). So IIUC, your suggestion would defer folding only > such strncpy calls and let gimple_fold_builtin_strncpy fold > those with a constant bound that's greater than the length of > the constant source string. That would be fine with me, but > since strncpy calls with a bound that's greater than the length > of the source are pointless I don't think they are important > enough to worry about folding super early. The constant ones > that serve any purpose (and that are presumably important to > optimize) are those that truncate. > > That said, when optimization isn't enabled, I don't think users > expect calls to library functions to be transformed to calls to > other functions, or inlined. Yet that's just what GCC does. > For example, besides triggering the warning, the following: > > char a[4]; > > void f (char *s) > { > __builtin_strncpy (a, "1234", sizeof a); > a[3] = 0; > } > > is transformed, even at -O0, into: > > f (char * s) > { > : > MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})"1234"]; > a[3] = 0; > return; > } > > That doesn't seem right. GCC should avoid these transformations > at -O0, and one way to do that is to defer folding until the CFG > is constructed. The patch does it for strncpy but a more general > solution would do that for all calls, e.g., in maybe_fold_stmt > as Richard suggested (and I subsequently tested). > > Martin