From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 56104 invoked by alias); 30 May 2019 15:34:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 56094 invoked by uid 89); 30 May 2019 15:34:21 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-20.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-oi1-f195.google.com Received: from mail-oi1-f195.google.com (HELO mail-oi1-f195.google.com) (209.85.167.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 30 May 2019 15:34:19 +0000 Received: by mail-oi1-f195.google.com with SMTP id b21so1318121oic.8 for ; Thu, 30 May 2019 08:34:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=oWIyVpBTUH8y6NGblKuVbSTMzVsW9hM1/22CccmeTYs=; b=hjRwSyWEF6IUtK6tWG+/b3i6RK6W8TOA5FH15GnENHvEeJYBjm6kYlS2sDm0uipMZv 8ZLChoob1EjhA6VWvINjnr1m0yj8SX4l1wLSkgJN73RFAUBgbL1Tuctm9J47gVmx/8SS FHwrRxYJbhLihj5y/3U+A2kq8Y0xAyGrL2P0Li1i9DyHYJZgOBipllroqaVRzwJd0A3b 9cZlbJLt4qZt00p4D6Ygki4X+sEziHETUBRSqWnm+9xq2VJYMHSLL6l8nT0DY1VOFTTP 8ivDAIr0Sr7WZc7At6OU7wmDTqJUHa2Ui1v5+MlC9y0RkOfKDjHJbR0D1W2gbGcDaaGC kD1w== Return-Path: Received: from [192.168.0.41] (97-118-125-210.hlrn.qwest.net. [97.118.125.210]) by smtp.gmail.com with ESMTPSA id m66sm541645oia.58.2019.05.30.08.34.15 (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Thu, 30 May 2019 08:34:15 -0700 (PDT) Subject: Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549) To: Jeff Law , gcc-patches References: <7dc3d2b2-7bd4-b82f-bc48-9e5878e27475@redhat.com> <0a1e0b3c-df54-6562-adf9-6facbc6dbd9f@gmail.com> <244d2a97-a753-4296-df9c-f53feaadd70d@redhat.com> From: Martin Sebor Message-ID: Date: Thu, 30 May 2019 15:48:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <244d2a97-a753-4296-df9c-f53feaadd70d@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg02022.txt.bz2 On 5/30/19 9:13 AM, Jeff Law wrote: > On 5/30/19 8:52 AM, Martin Sebor wrote: >> On 5/29/19 11:45 AM, Jeff Law wrote: >>> On 5/22/19 3:34 PM, Martin Sebor wrote: >>>> -Wreturn-local-addr detects a subset of instances of returning >>>> the address of a local object from a function but the warning >>>> doesn't try to handle alloca or VLAs, or some non-trivial cases >>>> of ordinary automatic variables[1]. >>>> >>>> The attached patch extends the implementation of the warning to >>>> detect those.  It still doesn't detect instances where the address >>>> is the result of a built-in such strcpy[2]. >>>> >>>> Tested on x86_64-linux. >>>> >>>> Martin >>>> >>>> [1] For example, this is only diagnosed with the patch: >>>> >>>>    void* f (int i) >>>>    { >>>>      struct S { int a[2]; } s[2]; >>>>      return &s->a[i]; >>>>    } >>>> >>>> [2] The following is not diagnosed even with the patch: >>>> >>>>    void sink (void*); >>>> >>>>    void* f (int i) >>>>    { >>>>      char a[6]; >>>>      char *p = __builtin_strcpy (a, "123"); >>>>      sink (p); >>>>      return p; >>>>    } >>>> >>>> I would expect detecting to be possible and useful.  Maybe as >>>> a follow-up. >>>> >>>> gcc-71924.diff >>>> >>>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca >>>> result >>>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an >>>> address of a local array plus offset >>>> >>>> gcc/ChangeLog: >>>> >>>>     PR c/71924 >>>>     * gimple-ssa-isolate-paths.c (is_addr_local): New function. >>>>     (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. >>>>     (find_implicit_erroneous_behavior): Call >>>> warn_return_addr_local_phi_arg. >>>>     (find_explicit_erroneous_behavior): Call warn_return_addr_local. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>>     PR c/71924 >>>>     * gcc.dg/Wreturn-local-addr-2.c: New test. >>>>     * gcc.dg/Walloca-4.c: Prune expected warnings. >>>>     * gcc.dg/pr41551.c: Same. >>>>     * gcc.dg/pr59523.c: Same. >>>>     * gcc.dg/tree-ssa/pr88775-2.c: Same. >>>>     * gcc.dg/winline-7.c: Same. >>>> >>>> diff --git a/gcc/gimple-ssa-isolate-paths.c >>>> b/gcc/gimple-ssa-isolate-paths.c >>>> index 33fe352bb23..2933ecf502e 100644 >>>> --- a/gcc/gimple-ssa-isolate-paths.c >>>> +++ b/gcc/gimple-ssa-isolate-paths.c >>>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple >>>> *stmt) >>>>     return false; >>>>   } >>>>   +/* Return true if EXPR is a expression of pointer type that refers >>>> +   to the address of a variable with automatic storage duration. >>>> +   If so, set *PLOC to the location of the object or the call that >>>> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null, >>>> +   also consider PHI statements and set *PMAYBE when some but not >>>> +   all arguments of such statements refer to local variables, and >>>> +   to clear it otherwise.  */ >>>> + >>>> +static bool >>>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, >>>> +           hash_set *visited = NULL) >>>> +{ >>>> +  if (TREE_CODE (exp) == ADDR_EXPR) >>>> +    { >>>> +      tree baseaddr = get_base_address (TREE_OPERAND (exp, 0)); >>>> +      if (TREE_CODE (baseaddr) == MEM_REF) >>>> +    return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe, >>>> visited); >>>> + >>>> +      if ((!VAR_P (baseaddr) >>>> +       || is_global_var (baseaddr)) >>>> +      && TREE_CODE (baseaddr) != PARM_DECL) >>>> +    return false; >>>> + >>>> +      *ploc = DECL_SOURCE_LOCATION (baseaddr); >>>> +      return true; >>>> +    } >>>> + >>>> +  if (TREE_CODE (exp) == SSA_NAME) >>>> +    { >>>> +      gimple *def_stmt = SSA_NAME_DEF_STMT (exp); >>>> +      enum gimple_code code = gimple_code (def_stmt); >>>> + >>>> +      if (is_gimple_assign (def_stmt)) >>>> +    { >>>> +      tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); >>>> +      if (POINTER_TYPE_P (type)) >>>> +        { >>>> +          tree ptr = gimple_assign_rhs1 (def_stmt); >>>> +          return is_addr_local (ptr, ploc, pmaybe, visited); >>>> +        } >>>> +      return false; >>>> +    } >>>> + >>>> +      if (code == GIMPLE_CALL >>>> +      && gimple_call_builtin_p (def_stmt)) >>>> +    { >>>> +      tree fn = gimple_call_fndecl (def_stmt); >>>> +      int code = DECL_FUNCTION_CODE (fn); >>>> +      if (code != BUILT_IN_ALLOCA >>>> +          && code != BUILT_IN_ALLOCA_WITH_ALIGN) >>>> +        return false; >>>> + >>>> +      *ploc = gimple_location (def_stmt); >>>> +      return true; >>>> +    } >>>> + >>>> +      if (code == GIMPLE_PHI && pmaybe) >>>> +    { >>>> +      unsigned count = 0; >>>> +      gphi *phi_stmt = as_a (def_stmt); >>>> + >>>> +      unsigned nargs = gimple_phi_num_args (phi_stmt); >>>> +      for (unsigned i = 0; i < nargs; ++i) >>>> +        { >>>> +          if (!visited->add (phi_stmt)) >>>> +        { >>>> +          tree arg = gimple_phi_arg_def (phi_stmt, i); >>>> +          if (is_addr_local (arg, ploc, pmaybe, visited)) >>>> +            ++count; >>>> +        } >>>> +        } >>>> + >>>> +      *pmaybe = count && count < nargs; >>>> +      return count != 0; >>>> +    } >>>> +    } >>>> + >>>> +  return false; >>>> +} >>> Is there some reason you didn't query the alias oracle here?  It would >>> seem a fairly natural fit.  Ultimately given a pointer (which will be an >>> SSA_NAME) you want to ask whether or not it conclusively points into the >>> stack. >>> >>> That would seem to dramatically simplify is_addr_local. >> >> I did think about it but decided against changing the existing >> design (iterating over PHI arguments), for a couple of reasons: >> >> 1) It feels like a bigger change than my simple "bug fix" calls >>    for. > I suspect the net result will be simpler though ;-) I think you just > get a pt solution and iterate over the things the solution says the > pointer can point to. > > >> 2) I'm not familiar enough with the alias oracle to say for sure >>    if it can be used to give the same results as the existing >>    implementation.  I.e., make it possible to identify and >>    isolate each path that returns a local address (rather than >>    just answering: yes, this pointer may point to some local >>    in this function). > Precision of the oracle is certainly the big question. > > >> >> If the alias oracle can be used to give the same results without >> excessive false positives then I think it would be fine to make >> use of it.  Is that something you consider a prerequisite for >> this change or should I look into it as a followup? > I think we should explore it a bit before making a final decision. It > may guide us for other work in this space (like detecting escaping > locals). I think a dirty prototype to see if it's even in the right > ballpark would make sense. Okay, let me look into it. >> FWIW, I'm working on enhancing this to detect returning freed >> pointers (under a different option).  That seems like a good >> opportunity to also look into making use of the alias oracle. > Yes. I think there's two interesting cases here to ponder. If we free > a pointer that must point to a local, then we can warn & trap. If we > free a pointer that may point to a local, then we can only warn (unless > we can isolate the path). I wasn't actually thinking of freeing locals but it sounds like a useful enhancement as well. Thanks for the suggestion! :) To be clear, what I'm working on is detecting: void* f (void *p) { free (p); // note: pointer was freed here // ... return p; // warning: returning a freed pointer } >> Besides these enhancements, there's also a request to diagnose >> dereferencing pointers to compound literals whose lifetime has >> ended (PR 89990), or more generally, those to any such local >> object.  It's about preventing essentially the same problem >> (accessing bad data or corrupting others) but it seems that >> both the analysis and the handling will be sufficiently >> different to consider implementing it somewhere else.  What >> are your thoughts on that? > I think the tough problem here is we lose binding scopes as we lower > into gimple, so solving it in the general case may be tough. We've > started adding some clobbers into the IL to denote object death points, > but I'm not sure if they're sufficient to implement this kind of warning. I was afraid that might be a problem. Thanks Martin