From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80866 invoked by alias); 28 Aug 2018 04:27:51 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 80854 invoked by uid 89); 28 Aug 2018 04:27:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Aug 2018 04:27:48 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4DFDC805A6D8; Tue, 28 Aug 2018 04:27:46 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-8.rdu2.redhat.com [10.10.112.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B1522026E18; Tue, 28 Aug 2018 04:27:44 +0000 (UTC) Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Martin Sebor , Richard Biener Cc: GCC Patches References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com> From: Jeff Law Openpgp: preference=signencrypt Message-ID: Date: Tue, 28 Aug 2018 04:27:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg01717.txt.bz2 On 08/27/2018 10:27 AM, Martin Sebor wrote: > On 08/27/2018 02:29 AM, Richard Biener wrote: >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: >>> >>> On 08/24/2018 09:58 AM, Martin Sebor wrote: >>>> The warning suppression for -Wstringop-truncation looks for >>>> the next statement after a truncating strncpy to see if it >>>> adds a terminating nul.  This only works when the next >>>> statement can be reached using the Gimple statement iterator >>>> which isn't until after gimplification.  As a result, strncpy >>>> calls that truncate their constant argument that are being >>>> folded to memcpy this early get diagnosed even if they are >>>> followed by the nul assignment: >>>> >>>>   const char s[] = "12345"; >>>>   char d[3]; >>>> >>>>   void f (void) >>>>   { >>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation >>>>     d[sizeof d - 1] = 0; >>>>   } >>>> >>>> To avoid the warning I propose to defer folding strncpy to >>>> memcpy until the pointer to the basic block the strnpy call >>>> is in can be used to try to reach the next statement (this >>>> happens as early as ccp1).  I'm aware of the preference to >>>> fold things early but in the case of strncpy (a relatively >>>> rarely used function that is often misused), getting >>>> the warning right while folding a bit later but still fairly >>>> early on seems like a reasonable compromise.  I fear that >>>> otherwise, the false positives will drive users to adopt >>>> other unsafe solutions (like memcpy) where these kinds of >>>> bugs cannot be as readily detected. >>>> >>>> Tested on x86_64-linux. >>>> >>>> Martin >>>> >>>> PS There still are outstanding cases where the warning can >>>> be avoided.  I xfailed them in the test for now but will >>>> still try to get them to work for GCC 9. >>>> >>>> gcc-87028.diff >>>> >>>> >>>> PR tree-optimization/87028 - false positive -Wstringop-truncation >>>> strncpy with global variable source string >>>> gcc/ChangeLog: >>>> >>>>       PR tree-optimization/87028 >>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when >>>>       statement doesn't belong to a basic block. >>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on >>>>       the left hand side of assignment. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>>       PR tree-optimization/87028 >>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails. >>>>       * gcc.dg/Wstringop-truncation-5.c: New test. >>>> >>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c >>>> index 07341eb..284c2fb 100644 >>>> --- a/gcc/gimple-fold.c >>>> +++ b/gcc/gimple-fold.c >>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy >>>> (gimple_stmt_iterator *gsi, >>>>    if (tree_int_cst_lt (ssize, len)) >>>>      return false; >>>> >>>> +  /* Defer warning (and folding) until the next statement in the basic >>>> +     block is reachable.  */ >>>> +  if (!gimple_bb (stmt)) >>>> +    return false; >>> I think you want cfun->cfg as the test here.  They should be equivalent >>> in practice. >> >> Please do not add 'cfun' references.  Note that the next stmt is also >> accessible >> when there is no CFG.  I guess the issue is that we fold this during >> gimplification >> where the next stmt is not yet "there" (but still in GENERIC)? >> >> We generally do not want to have unfolded stmts in the IL when we can >> avoid that >> which is why we fold most stmts during gimplification.  We also do >> that because >> we now do less folding on GENERIC. >> >> There may be the possibility to refactor gimplification time folding >> to what we >> do during inlining - queue stmts we want to fold and perform all >> folding delayed. >> This of course means bigger compile-time due to cache effects. >> >>> >>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >>>> index d0792aa..f1988f6 100644 >>>> --- a/gcc/tree-ssa-strlen.c >>>> +++ b/gcc/tree-ssa-strlen.c >>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc >>>> (gimple_stmt_iterator gsi, tree src, tree cnt) >>>>         && known_eq (dstoff, lhsoff) >>>>         && operand_equal_p (dstbase, lhsbase, 0)) >>>>       return false; >>>> + >>>> +      if (code == MEM_REF >>>> +       && TREE_CODE (lhsbase) == SSA_NAME >>>> +       && known_eq (dstoff, lhsoff)) >>>> +     { >>>> +       /* Extract the referenced variable from something like >>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */ >>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase); >>>> +       if (gimple_nop_p (def)) >>>> +         { >>>> +           lhsbase = SSA_NAME_VAR (lhsbase); >>>> +           if (lhsbase >>>> +               && dstbase >>>> +               && operand_equal_p (dstbase, lhsbase, 0)) >>>> +             return false; >>>> +         } >>>> +     } >>> If you find yourself looking at SSA_NAME_VAR, you're usually barking up >>> the wrong tree.  It'd be easier to suggest something here if I could see >>> the gimple (with virtual operands).  BUt at some level what you really >>> want to do is make sure the base of the MEM_REF is the same as what got >>> passed as the destination of the strncpy.  You'd want to be testing >>> SSA_NAMEs in that case. >> >> Yes.  Why not simply compare the SSA names?  Why would it be >> not OK to do that when !lhsbase? > > The added code handles this case: > >   void f (char *d) >   { >     __builtin_strncpy (d, "12345", 4); >     d[3] = 0; >   } > > where during forwprop we see: > >   __builtin_strncpy (d_3(D), "12345", 4); >   MEM[(char *)d_3(D) + 3B] = 0; > > The next statement after the strncpy is the assignment whose > lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There > is no other information in the GIMPLE_NOP that I can see to > tell that the operand is d_3(D) or that it's the same as > the strncpy argument (i.e., the PARAM_DECl d).  Having to > do open-code this all the time seems so cumbersome -- is > there some API that would do this for me?  (I thought > get_addr_base_and_unit_offset was that API but clearly in > this case it doesn't do what I expect -- it just returns > the argument.) I think you need to look harder at that MEM_REF. It references d_3. That's what you need to be checking. The base (d_3) is the first operand of the MEM_REF, the offset is the second operand of the MEM_REF. (gdb) p debug_gimple_stmt ($2) # .MEM_5 = VDEF <.MEM_4> MEM[(char *)d_3(D) + 3B] = 0; (gdb) p gimple_assign_lhs ($2) $5 = (tree_node *) 0x7ffff01a6208 (gdb) p debug_tree ($5) unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff00723f0 precision:8 min max pointer_to_this > arg:0 public unsigned DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff007de70 reference_to_this > visited var def_stmt GIMPLE_NOP version:3> arg:1 constant 3> j.c:4:6 start: j.c:4:5 finish: j.c:4:8> Note arg:0 is the SSA_NAME d_3. And not surprising that's lhsbase: (gdb) p debug_tree (lhsbase) unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff00723f0 precision:8 min max pointer_to_this > public unsigned DI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff007de70 reference_to_this > visited var def_stmt GIMPLE_NOP version:3> Sadly, dstbase is the PARM_DECL for d. That's where things are going "wrong". Not sure why you're getting the PARM_DECL in that case. I'd debug get_addr_base_and_unit_offset to understand what's going on. Essentially you're getting different results of get_addr_base_and_unit_offset in a case where they arguably should be the same. Jeff Jeff