From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 120000 invoked by alias); 28 Aug 2018 09:56:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 119986 invoked by uid 89); 28 Aug 2018 09:56:12 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-lf1-f67.google.com Received: from mail-lf1-f67.google.com (HELO mail-lf1-f67.google.com) (209.85.167.67) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Aug 2018 09:56:09 +0000 Received: by mail-lf1-f67.google.com with SMTP id l26-v6so854339lfc.8 for ; Tue, 28 Aug 2018 02:56:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ea9AIwLbLNJfGIPzXznMLFZgbOrKqwdnfTNfzW9Sxdg=; b=EL5Aq+M/Z7zbwvWHnvO6K9lts0GFMtiOsTQf06J5O1NeHlwiiFuoyN0eCT6JLvZKZM TtI+k58T7rgK10B+9pQQzIHsbor1FU1HioXzOo0a9o2ySOxLujEYHdIr7wbv6CZXHIr7 bVbhKaksH5FNHc+JaLfd5UfolL/1czAZ9+ymPUc3qyVNXzqG4ji7zafoYHgbbxh/rZem GAr2hwuv1uRBWSmd0YHn92pQNmAJWm7AfoG8m0i/tW4ZKOHZvyfbhbRkTzJlhMd/mKGH hoCs/U6LFhiSIm2TaCZfZTyjFmojFZ5pGzTcWwCgwEv5Ju+te7G5SYNQabB3mDeOBUNd gh7A== MIME-Version: 1.0 References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com> In-Reply-To: From: Richard Biener Date: Tue, 28 Aug 2018 09:56:00 -0000 Message-ID: Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Jeff Law Cc: Martin Sebor , GCC Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg01741.txt.bz2 On Tue, Aug 28, 2018 at 6:27 AM Jeff Law wrote: > > On 08/27/2018 10:27 AM, Martin Sebor wrote: > > On 08/27/2018 02:29 AM, Richard Biener wrote: > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: > >>> > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote: > >>>> The warning suppression for -Wstringop-truncation looks for > >>>> the next statement after a truncating strncpy to see if it > >>>> adds a terminating nul. This only works when the next > >>>> statement can be reached using the Gimple statement iterator > >>>> which isn't until after gimplification. As a result, strncpy > >>>> calls that truncate their constant argument that are being > >>>> folded to memcpy this early get diagnosed even if they are > >>>> followed by the nul assignment: > >>>> > >>>> const char s[] = "12345"; > >>>> char d[3]; > >>>> > >>>> void f (void) > >>>> { > >>>> strncpy (d, s, sizeof d - 1); // -Wstringop-truncation > >>>> d[sizeof d - 1] = 0; > >>>> } > >>>> > >>>> To avoid the warning I propose to defer folding strncpy to > >>>> memcpy until the pointer to the basic block the strnpy call > >>>> is in can be used to try to reach the next statement (this > >>>> happens as early as ccp1). I'm aware of the preference to > >>>> fold things early but in the case of strncpy (a relatively > >>>> rarely used function that is often misused), getting > >>>> the warning right while folding a bit later but still fairly > >>>> early on seems like a reasonable compromise. I fear that > >>>> otherwise, the false positives will drive users to adopt > >>>> other unsafe solutions (like memcpy) where these kinds of > >>>> bugs cannot be as readily detected. > >>>> > >>>> Tested on x86_64-linux. > >>>> > >>>> Martin > >>>> > >>>> PS There still are outstanding cases where the warning can > >>>> be avoided. I xfailed them in the test for now but will > >>>> still try to get them to work for GCC 9. > >>>> > >>>> gcc-87028.diff > >>>> > >>>> > >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation > >>>> strncpy with global variable source string > >>>> gcc/ChangeLog: > >>>> > >>>> PR tree-optimization/87028 > >>>> * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when > >>>> statement doesn't belong to a basic block. > >>>> * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on > >>>> the left hand side of assignment. > >>>> > >>>> gcc/testsuite/ChangeLog: > >>>> > >>>> PR tree-optimization/87028 > >>>> * c-c++-common/Wstringop-truncation.c: Remove xfails. > >>>> * gcc.dg/Wstringop-truncation-5.c: New test. > >>>> > >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > >>>> index 07341eb..284c2fb 100644 > >>>> --- a/gcc/gimple-fold.c > >>>> +++ b/gcc/gimple-fold.c > >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy > >>>> (gimple_stmt_iterator *gsi, > >>>> if (tree_int_cst_lt (ssize, len)) > >>>> return false; > >>>> > >>>> + /* Defer warning (and folding) until the next statement in the basic > >>>> + block is reachable. */ > >>>> + if (!gimple_bb (stmt)) > >>>> + return false; > >>> I think you want cfun->cfg as the test here. They should be equivalent > >>> in practice. > >> > >> Please do not add 'cfun' references. Note that the next stmt is also > >> accessible > >> when there is no CFG. I guess the issue is that we fold this during > >> gimplification > >> where the next stmt is not yet "there" (but still in GENERIC)? > >> > >> We generally do not want to have unfolded stmts in the IL when we can > >> avoid that > >> which is why we fold most stmts during gimplification. We also do > >> that because > >> we now do less folding on GENERIC. > >> > >> There may be the possibility to refactor gimplification time folding > >> to what we > >> do during inlining - queue stmts we want to fold and perform all > >> folding delayed. > >> This of course means bigger compile-time due to cache effects. > >> > >>> > >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c > >>>> index d0792aa..f1988f6 100644 > >>>> --- a/gcc/tree-ssa-strlen.c > >>>> +++ b/gcc/tree-ssa-strlen.c > >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc > >>>> (gimple_stmt_iterator gsi, tree src, tree cnt) > >>>> && known_eq (dstoff, lhsoff) > >>>> && operand_equal_p (dstbase, lhsbase, 0)) > >>>> return false; > >>>> + > >>>> + if (code == MEM_REF > >>>> + && TREE_CODE (lhsbase) == SSA_NAME > >>>> + && known_eq (dstoff, lhsoff)) > >>>> + { > >>>> + /* Extract the referenced variable from something like > >>>> + MEM[(char *)d_3(D) + 3B] = 0; */ > >>>> + gimple *def = SSA_NAME_DEF_STMT (lhsbase); > >>>> + if (gimple_nop_p (def)) > >>>> + { > >>>> + lhsbase = SSA_NAME_VAR (lhsbase); > >>>> + if (lhsbase > >>>> + && dstbase > >>>> + && operand_equal_p (dstbase, lhsbase, 0)) > >>>> + return false; > >>>> + } > >>>> + } > >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up > >>> the wrong tree. It'd be easier to suggest something here if I could see > >>> the gimple (with virtual operands). BUt at some level what you really > >>> want to do is make sure the base of the MEM_REF is the same as what got > >>> passed as the destination of the strncpy. You'd want to be testing > >>> SSA_NAMEs in that case. > >> > >> Yes. Why not simply compare the SSA names? Why would it be > >> not OK to do that when !lhsbase? > > > > The added code handles this case: > > > > void f (char *d) > > { > > __builtin_strncpy (d, "12345", 4); > > d[3] = 0; > > } > > > > where during forwprop we see: > > > > __builtin_strncpy (d_3(D), "12345", 4); > > MEM[(char *)d_3(D) + 3B] = 0; > > > > The next statement after the strncpy is the assignment whose > > lhs is the MEM_REF with a GIMPLE_NOP as an operand. There > > is no other information in the GIMPLE_NOP that I can see to > > tell that the operand is d_3(D) or that it's the same as > > the strncpy argument (i.e., the PARAM_DECl d). Having to > > do open-code this all the time seems so cumbersome -- is > > there some API that would do this for me? (I thought > > get_addr_base_and_unit_offset was that API but clearly in > > this case it doesn't do what I expect -- it just returns > > the argument.) > > I think you need to look harder at that MEM_REF. It references d_3. > That's what you need to be checking. The base (d_3) is the first > operand of the MEM_REF, the offset is the second operand of the MEM_REF. > > (gdb) p debug_gimple_stmt ($2) > # .MEM_5 = VDEF <.MEM_4> > MEM[(char *)d_3(D) + 3B] = 0; > > > (gdb) p gimple_assign_lhs ($2) > $5 = (tree_node *) 0x7ffff01a6208 > > (gdb) p debug_tree ($5) > type size > unit-size > align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type > 0x7ffff00723f0 precision:8 min max > > pointer_to_this > > > arg:0 type 0x7ffff00723f0 char> > public unsigned DI > size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > canonical-type 0x7ffff007de70 reference_to_this 0x7ffff017d738>> > visited var > def_stmt GIMPLE_NOP > version:3> > arg:1 > constant 3> > j.c:4:6 start: j.c:4:5 finish: j.c:4:8> > > > Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase: > > (gdb) p debug_tree (lhsbase) > type type size > unit-size > align:8 warn_if_not_align:0 symtab:0 alias-set -1 > canonical-type 0x7ffff00723f0 precision:8 min 0x7ffff0059dc8 -128> max > pointer_to_this > > public unsigned DI > size > unit-size > align:64 warn_if_not_align:0 symtab:0 alias-set -1 > canonical-type 0x7ffff007de70 reference_to_this 0x7ffff017d738>> > visited var > def_stmt GIMPLE_NOP > version:3> > > > Sadly, dstbase is the PARM_DECL for d. That's where things are going > "wrong". Not sure why you're getting the PARM_DECL in that case. I'd > debug get_addr_base_and_unit_offset to understand what's going on. > Essentially you're getting different results of > get_addr_base_and_unit_offset in a case where they arguably should be > the same. Probably get_attr_nonstring_decl has the same "mistake" and returns the PARM_DECL instead of the SSA name pointer. So we're comparing apples and oranges here. Yeah: /* If EXPR refers to a character array or pointer declared attribute nonstring return a decl for that array or pointer and set *REF to the referenced enclosing object or pointer. Otherwise returns null. */ tree get_attr_nonstring_decl (tree expr, tree *ref) { tree decl = expr; if (TREE_CODE (decl) == SSA_NAME) { gimple *def = SSA_NAME_DEF_STMT (decl); if (is_gimple_assign (def)) { tree_code code = gimple_assign_rhs_code (def); if (code == ADDR_EXPR || code == COMPONENT_REF || code == VAR_DECL) decl = gimple_assign_rhs1 (def); } else if (tree var = SSA_NAME_VAR (decl)) decl = var; } if (TREE_CODE (decl) == ADDR_EXPR) decl = TREE_OPERAND (decl, 0); if (ref) *ref = decl; I see a lot of "magic" here again in the attempt to "propagate" a nonstring attribute. Note foo (char *p __attribute__(("nonstring"))) { p = "bar"; strlen (p); // or whatever is necessary to call get_attr_nonstring_decl } is perfectly valid and p as passed to strlen is _not_ nonstring(?). I think in your code comparing bases you want to look at the _original_ argument to the string function rather than what get_attr_nonstring_decl returned as ref. Richard. > Jeff > > Jeff