From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117202 invoked by alias); 28 Aug 2018 22:17:19 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117191 invoked by uid 89); 28 Aug 2018 22:17:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Aug 2018 22:17:16 +0000 Received: from smtp.corp.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5FF6C317C40E; Tue, 28 Aug 2018 22:17:15 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-3.rdu2.redhat.com [10.10.112.3]) by smtp.corp.redhat.com (Postfix) with ESMTP id 228A130912F4; Tue, 28 Aug 2018 22:17:13 +0000 (UTC) Subject: Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028) To: Martin Sebor , Richard Biener Cc: GCC Patches References: <88de1ee3-6ee4-d8d9-3e57-3a42474a4169@redhat.com> <4e613d61-40ad-f05f-9107-5fd7a5c2fdb6@gmail.com> <2fb5625b-22d2-e80a-9320-d931a3cb2343@gmail.com> From: Jeff Law Openpgp: preference=signencrypt Message-ID: <8f35e59d-c94f-b0c1-f9cd-519dcfe723d1@redhat.com> Date: Tue, 28 Aug 2018 22:17:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <2fb5625b-22d2-e80a-9320-d931a3cb2343@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg01808.txt.bz2 On 08/28/2018 02:43 PM, Martin Sebor wrote: > On 08/27/2018 10:27 PM, Jeff Law wrote: >> On 08/27/2018 10:27 AM, Martin Sebor wrote: >>> On 08/27/2018 02:29 AM, Richard Biener wrote: >>>> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law wrote: >>>>> >>>>> On 08/24/2018 09:58 AM, Martin Sebor wrote: >>>>>> The warning suppression for -Wstringop-truncation looks for >>>>>> the next statement after a truncating strncpy to see if it >>>>>> adds a terminating nul.  This only works when the next >>>>>> statement can be reached using the Gimple statement iterator >>>>>> which isn't until after gimplification.  As a result, strncpy >>>>>> calls that truncate their constant argument that are being >>>>>> folded to memcpy this early get diagnosed even if they are >>>>>> followed by the nul assignment: >>>>>> >>>>>>   const char s[] = "12345"; >>>>>>   char d[3]; >>>>>> >>>>>>   void f (void) >>>>>>   { >>>>>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation >>>>>>     d[sizeof d - 1] = 0; >>>>>>   } >>>>>> >>>>>> To avoid the warning I propose to defer folding strncpy to >>>>>> memcpy until the pointer to the basic block the strnpy call >>>>>> is in can be used to try to reach the next statement (this >>>>>> happens as early as ccp1).  I'm aware of the preference to >>>>>> fold things early but in the case of strncpy (a relatively >>>>>> rarely used function that is often misused), getting >>>>>> the warning right while folding a bit later but still fairly >>>>>> early on seems like a reasonable compromise.  I fear that >>>>>> otherwise, the false positives will drive users to adopt >>>>>> other unsafe solutions (like memcpy) where these kinds of >>>>>> bugs cannot be as readily detected. >>>>>> >>>>>> Tested on x86_64-linux. >>>>>> >>>>>> Martin >>>>>> >>>>>> PS There still are outstanding cases where the warning can >>>>>> be avoided.  I xfailed them in the test for now but will >>>>>> still try to get them to work for GCC 9. >>>>>> >>>>>> gcc-87028.diff >>>>>> >>>>>> >>>>>> PR tree-optimization/87028 - false positive -Wstringop-truncation >>>>>> strncpy with global variable source string >>>>>> gcc/ChangeLog: >>>>>> >>>>>>       PR tree-optimization/87028 >>>>>>       * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding >>>>>> when >>>>>>       statement doesn't belong to a basic block. >>>>>>       * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle >>>>>> MEM_REF on >>>>>>       the left hand side of assignment. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>>       PR tree-optimization/87028 >>>>>>       * c-c++-common/Wstringop-truncation.c: Remove xfails. >>>>>>       * gcc.dg/Wstringop-truncation-5.c: New test. >>>>>> >>>>>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c >>>>>> index 07341eb..284c2fb 100644 >>>>>> --- a/gcc/gimple-fold.c >>>>>> +++ b/gcc/gimple-fold.c >>>>>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy >>>>>> (gimple_stmt_iterator *gsi, >>>>>>    if (tree_int_cst_lt (ssize, len)) >>>>>>      return false; >>>>>> >>>>>> +  /* Defer warning (and folding) until the next statement in the >>>>>> basic >>>>>> +     block is reachable.  */ >>>>>> +  if (!gimple_bb (stmt)) >>>>>> +    return false; >>>>> I think you want cfun->cfg as the test here.  They should be >>>>> equivalent >>>>> in practice. >>>> >>>> Please do not add 'cfun' references.  Note that the next stmt is also >>>> accessible >>>> when there is no CFG.  I guess the issue is that we fold this during >>>> gimplification >>>> where the next stmt is not yet "there" (but still in GENERIC)? >>>> >>>> We generally do not want to have unfolded stmts in the IL when we can >>>> avoid that >>>> which is why we fold most stmts during gimplification.  We also do >>>> that because >>>> we now do less folding on GENERIC. >>>> >>>> There may be the possibility to refactor gimplification time folding >>>> to what we >>>> do during inlining - queue stmts we want to fold and perform all >>>> folding delayed. >>>> This of course means bigger compile-time due to cache effects. >>>> >>>>> >>>>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >>>>>> index d0792aa..f1988f6 100644 >>>>>> --- a/gcc/tree-ssa-strlen.c >>>>>> +++ b/gcc/tree-ssa-strlen.c >>>>>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc >>>>>> (gimple_stmt_iterator gsi, tree src, tree cnt) >>>>>>         && known_eq (dstoff, lhsoff) >>>>>>         && operand_equal_p (dstbase, lhsbase, 0)) >>>>>>       return false; >>>>>> + >>>>>> +      if (code == MEM_REF >>>>>> +       && TREE_CODE (lhsbase) == SSA_NAME >>>>>> +       && known_eq (dstoff, lhsoff)) >>>>>> +     { >>>>>> +       /* Extract the referenced variable from something like >>>>>> +            MEM[(char *)d_3(D) + 3B] = 0;  */ >>>>>> +       gimple *def = SSA_NAME_DEF_STMT (lhsbase); >>>>>> +       if (gimple_nop_p (def)) >>>>>> +         { >>>>>> +           lhsbase = SSA_NAME_VAR (lhsbase); >>>>>> +           if (lhsbase >>>>>> +               && dstbase >>>>>> +               && operand_equal_p (dstbase, lhsbase, 0)) >>>>>> +             return false; >>>>>> +         } >>>>>> +     } >>>>> If you find yourself looking at SSA_NAME_VAR, you're usually >>>>> barking up >>>>> the wrong tree.  It'd be easier to suggest something here if I >>>>> could see >>>>> the gimple (with virtual operands).  BUt at some level what you really >>>>> want to do is make sure the base of the MEM_REF is the same as what >>>>> got >>>>> passed as the destination of the strncpy.  You'd want to be testing >>>>> SSA_NAMEs in that case. >>>> >>>> Yes.  Why not simply compare the SSA names?  Why would it be >>>> not OK to do that when !lhsbase? >>> >>> The added code handles this case: >>> >>>   void f (char *d) >>>   { >>>     __builtin_strncpy (d, "12345", 4); >>>     d[3] = 0; >>>   } >>> >>> where during forwprop we see: >>> >>>   __builtin_strncpy (d_3(D), "12345", 4); >>>   MEM[(char *)d_3(D) + 3B] = 0; >>> >>> The next statement after the strncpy is the assignment whose >>> lhs is the MEM_REF with a GIMPLE_NOP as an operand.  There >>> is no other information in the GIMPLE_NOP that I can see to >>> tell that the operand is d_3(D) or that it's the same as >>> the strncpy argument (i.e., the PARAM_DECl d).  Having to >>> do open-code this all the time seems so cumbersome -- is >>> there some API that would do this for me?  (I thought >>> get_addr_base_and_unit_offset was that API but clearly in >>> this case it doesn't do what I expect -- it just returns >>> the argument.) >> >> I think you need to look harder at that MEM_REF.  It references d_3. >> That's what you need to be checking.  The base (d_3) is the first >> operand of the MEM_REF, the offset is the second operand of the MEM_REF. >> >> (gdb) p debug_gimple_stmt ($2) >> # .MEM_5 = VDEF <.MEM_4> >> MEM[(char *)d_3(D) + 3B] = 0; >> >> >> (gdb) p gimple_assign_lhs ($2) >> $5 = (tree_node *) 0x7ffff01a6208 >> >> (gdb) p debug_tree ($5) >>  >     type >         size >>         unit-size >>         align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type >> 0x7ffff00723f0 precision:8 min max >> >>         pointer_to_this > >> >>     arg:0 >         type > 0x7ffff00723f0 char> >>             public unsigned DI >>             size >>             unit-size >>             align:64 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff007de70 reference_to_this > 0x7ffff017d738>> >>         visited var >>         def_stmt GIMPLE_NOP >>         version:3> >>     arg:1 >> constant 3> >>     j.c:4:6 start: j.c:4:5 finish: j.c:4:8> >> >> >> Note arg:0 is the SSA_NAME d_3.  And not surprising that's lhsbase: > > The d in the MEM_REF you see in the dump above is the SSA_NAME's > SSA_NAME_VAR: > >           visited var > > Here's the print_node() code that prints it: > >       print_node_brief (file, "var", SSA_NAME_VAR (node), indent + 4); > > There is nothing else in the MEM_REF operand that tells me that. > Why is it wrong to look at the SSA_NAME_VAR? > >> (gdb) p debug_tree (lhsbase) >> >     type >         type >             size >>             unit-size >>             align:8 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff00723f0 precision:8 min > 0x7ffff0059dc8 -128> max >>             pointer_to_this > >>         public unsigned DI >>         size >>         unit-size >>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 >> canonical-type 0x7ffff007de70 reference_to_this > 0x7ffff017d738>> >>     visited var >>     def_stmt GIMPLE_NOP >>     version:3> >> Sadly, dstbase is the PARM_DECL for d.  That's where things are going >> "wrong". > > As Richard observed, that's because get_attr_nonstring_decl() > returns the DECL that the expression refers to.  It does that > because that's where it looks for attribute nonstring, and so > that the warning can mention the DECL with the attribute. > > I suppose since I'm not supposed to be using SSA_NAME_VAR > (I still don't understand why it's taboo) I'll have to avoid > using the get_attr_nonstring_decl() return value and instead > look into comparing the SSA_NAMEs. Because it's not generally useful because it has no dataflow information associated with it. SSA_NAMEs are what carry dataflow information and what you need to check if you want to know if two objects are the same. SSA_NAME_VAR's primary use is for diagnostic messages and debugging. We do hang attributes off the _DECL node it refers to, so you can take an SSA_NAME, query its SSA_NAME_VAR if you need to check if the SSA_NAME has a particular attribute property. But if you're trying to see if two objects in the IL are the same, you need to be looking at the SSA_NAME. jeff