From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44116 invoked by alias); 4 Jun 2019 19:40:25 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 44106 invoked by uid 89); 4 Jun 2019 19:40:24 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-17.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mail-qt1-f194.google.com Received: from mail-qt1-f194.google.com (HELO mail-qt1-f194.google.com) (209.85.160.194) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 04 Jun 2019 19:40:22 +0000 Received: by mail-qt1-f194.google.com with SMTP id a15so739178qtn.7 for ; Tue, 04 Jun 2019 12:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:references:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=tIQSNjxmsbSwCgCWoXSxc+nfKRdHSw3SwTEaZQLTuyg=; b=o9ojSyYLJUVDsxxr23tbcQnC7vrGpSO1WEzUSAJYzQCCyL0Zx04ty87Xrl1cisc+6b w/rvBiCh3Ufg9dvVSuQNMxXT5gI33I4TLFqZHIzC0TWzhclBNfa8pUXvUr1gkujtG4w1 A2zdBqW10UWMwrmgt8XYB5CQghIaCUOEOVUsS/lVToCClCXL6kOWwAREBdkr62XOo9o/ h9Zqe3et95km2+5LWF43hqxFMFFhNYUiCYiNZFLDcXkPTibrNH+nG+M/oVw1kqNtj1SA 9FLk2XydgEWQSH/5MjsTGeerZN8VBTR7S40yEPN7LWfpGqyyH4NrGBBUMI8K9vdwJJ0s YYnA== Return-Path: Received: from [192.168.0.41] (75-166-109-122.hlrn.qwest.net. [75.166.109.122]) by smtp.gmail.com with ESMTPSA id u7sm8864880qkm.62.2019.06.04.12.40.18 (version=TLS1_3 cipher=AEAD-AES128-GCM-SHA256 bits=128/128); Tue, 04 Jun 2019 12:40:19 -0700 (PDT) Subject: Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549) From: Martin Sebor To: Jeff Law , gcc-patches References: Message-ID: <0b2e38e1-1fc7-f21e-0f26-596b50304e6e@gmail.com> Date: Tue, 04 Jun 2019 19:40:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-06/txt/msg00198.txt.bz2 On 6/3/19 5:24 PM, Martin Sebor wrote: > On 5/31/19 2:46 PM, Jeff Law wrote: >> On 5/22/19 3:34 PM, Martin Sebor wrote: >>> -Wreturn-local-addr detects a subset of instances of returning >>> the address of a local object from a function but the warning >>> doesn't try to handle alloca or VLAs, or some non-trivial cases >>> of ordinary automatic variables[1]. >>> >>> The attached patch extends the implementation of the warning to >>> detect those.  It still doesn't detect instances where the address >>> is the result of a built-in such strcpy[2]. >>> >>> Tested on x86_64-linux. >>> >>> Martin >>> >>> [1] For example, this is only diagnosed with the patch: >>> >>>    void* f (int i) >>>    { >>>      struct S { int a[2]; } s[2]; >>>      return &s->a[i]; >>>    } >>> >>> [2] The following is not diagnosed even with the patch: >>> >>>    void sink (void*); >>> >>>    void* f (int i) >>>    { >>>      char a[6]; >>>      char *p = __builtin_strcpy (a, "123"); >>>      sink (p); >>>      return p; >>>    } >>> >>> I would expect detecting to be possible and useful.  Maybe as >>> a follow-up. >>> >>> gcc-71924.diff >>> >>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca >>> result >>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an >>> address of a local array plus offset >>> >>> gcc/ChangeLog: >>> >>>     PR c/71924 >>>     * gimple-ssa-isolate-paths.c (is_addr_local): New function. >>>     (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. >>>     (find_implicit_erroneous_behavior): Call >>> warn_return_addr_local_phi_arg. >>>     (find_explicit_erroneous_behavior): Call warn_return_addr_local. >>> >>> gcc/testsuite/ChangeLog: >>> >>>     PR c/71924 >>>     * gcc.dg/Wreturn-local-addr-2.c: New test. >>>     * gcc.dg/Walloca-4.c: Prune expected warnings. >>>     * gcc.dg/pr41551.c: Same. >>>     * gcc.dg/pr59523.c: Same. >>>     * gcc.dg/tree-ssa/pr88775-2.c: Same. >>>     * gcc.dg/winline-7.c: Same. >>> >>> diff --git a/gcc/gimple-ssa-isolate-paths.c >>> b/gcc/gimple-ssa-isolate-paths.c >>> index 33fe352bb23..2933ecf502e 100644 >>> --- a/gcc/gimple-ssa-isolate-paths.c >>> +++ b/gcc/gimple-ssa-isolate-paths.c >>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple >>> *stmt) >>>     return false; >>>   } >>> +/* Return true if EXPR is a expression of pointer type that refers >>> +   to the address of a variable with automatic storage duration. >>> +   If so, set *PLOC to the location of the object or the call that >>> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null, >>> +   also consider PHI statements and set *PMAYBE when some but not >>> +   all arguments of such statements refer to local variables, and >>> +   to clear it otherwise.  */ >>> + >>> +static bool >>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, >>> +           hash_set *visited = NULL) >>> +{ >>> +  if (TREE_CODE (exp) == SSA_NAME) >>> +    { >>> +      gimple *def_stmt = SSA_NAME_DEF_STMT (exp); >>> +      enum gimple_code code = gimple_code (def_stmt); >>> + >>> +      if (is_gimple_assign (def_stmt)) >>> +    { >>> +      tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); >>> +      if (POINTER_TYPE_P (type)) >>> +        { >>> +          tree ptr = gimple_assign_rhs1 (def_stmt); >>> +          return is_addr_local (ptr, ploc, pmaybe, visited); >>> +        } >>> +      return false; >>> +    } >> So this is going to recurse on the rhs1 of something like >> POINTER_PLUS_EXPR, that's a good thing :-)   But isn't it non-selective >> about the codes where we recurse? >> >> Consider >> >>    ptr = (cond) ? res1 : res2 >> >> I think we'll end up recursing on the condition rather than looking at >> res1 and res2. >> >> >> I suspect there are a very limited number of expression codes that >> appear on the RHS where we'd want to recurse on one or both operands. >> >> POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse >> on both and logically and the result), BIT_AND (maybe we masked off some >> bits in an address).  That's probably about it :-) >> >> Are there any other codes you've seen or think would be useful in >> practice to recurse through?  I'd rather list them explicitly rather >> than just recurse down through every rhs1 we encounter. > > I don't have a list of codes to test for.  I initially contemplated > enumerating them but in the end decided the pointer type check would > be sufficient.  I wouldn't expect a COND_EXPR here.  Don't they get > transformed into PHIs?  In all my tests they do and and running > the whole test suite with an assert that it doesn't come up doesn't > expose any either.  (I left the assert for COND_EXPR there.)  If > a COND_EXPR really can come up in a GIMPLE assignment here can you > please show me how so I can add a test for it? > > I've added tests to exercise all C expressions that evaluate to > pointers.  I don't know of any others where what you bring up > should be a concern and I don't want to try to hardwire tests for > any that I can't to exercise in the testsuite or don't know how. > If you know of some I'm happy to add them and adjust the code. > >>> + >>> +      if (code == GIMPLE_PHI && pmaybe) >>> +    { >>> +      unsigned count = 0; >>> +      gphi *phi_stmt = as_a (def_stmt); >>> + >>> +      unsigned nargs = gimple_phi_num_args (phi_stmt); >>> +      for (unsigned i = 0; i < nargs; ++i) >>> +        { >>> +          if (!visited->add (phi_stmt)) >>> +        { >>> +          tree arg = gimple_phi_arg_def (phi_stmt, i); >>> +          if (is_addr_local (arg, ploc, pmaybe, visited)) >>> +            ++count; >>> +        } >>> +        } >> So imagine >> >> p = phi (x1, x1, x2) >> >> Where both x1 and x2 would pass is_addr_local.  I think this code would >> incorrectly set pmaybe. > > I suppose it would but this happens readily with or without my > patch, for example here: > >   int* f (int i) >   { >     int a[2], b[2]; >     int *p = i ? a : b; >     return p; >   } > > We end up with two instances of "function may return address > of local variable," one for each local, because > find_implicit_erroneous_behavior only issues the "may return" > kind of warning. > > I don't particularly like this -- that's why I added PMAYBE to > the new code.  But to avoid mission creep I'd decided not draw > the line there and avoid trying to fix the problem completely. > (I've enhanced this in the attached update.) > >> >> We process the first instance of x1, find it is local and bump count. >> When we encounter the second instance, it's in our map and we do >> nothing.  THen we process x2 and bump count again.  So count would be 2. >>   But count is going to be less than nargs so we'll set *pmaybe to true. >> >> ISTM you'd have to record the result for each phi argument and query >> that to determine if you should bump the counter if you've already >> visited the argument. > > I suspect that no solution will be perfect, at a minimum because > multiple return statements sometimes get merged into one, so what > in the source code is a single return that definitely returns > the address a local may end up merged with one that returns > a null (and triggering a maybe kind of warning).  I have also > seen warnings in some non-trivial test cases that suggest that > some return statements get split up into two where a "may return" > could turn into a "definitely does return." > >> It also seems to me that you can break the loop as soon as you've got a >> nonzero count and get a false return from is_addr_local.   Not sure if >> that'll matter in practice or not. > > This is only possible if we're willing to lose some detail (i.e., > if we are willing to only point to one variable when returning > the address of two or more).  I don't find that very user-friendly. > > To wrap up, while I would have preferred taking a simpler approach > at first and making bigger design changes later, I have redesigned > the warning for better accuracy as you suggested above. > > The attached revised patch first collects the return statements > in a hash table along with the locations of the local variables > whose addresses each statement returns, and then issues just one > warning for each statement, listing all the locals it refers to > in subsequent notes. > > In addition, since it was nearly trivial, I also added checking > for returning addresses of locals via calls to built-ins like > memcpy. I didn't fully retest the patch after a minor tweak and sure enough, the tweak was wrong. Please apply the following on top of the patch. (We only want to consider the argument of the call. The LHS is the call itself.) diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c index eb8e754870f..bb9616f26ec 100644 --- a/gcc/gimple-ssa-isolate-paths.c +++ b/gcc/gimple-ssa-isolate-paths.c @@ -475,14 +475,9 @@ is_addr_local (gimple *return_stmt, tree exp, location_t *ploc, case BUILT_IN_STRNCAT_CHK: case BUILT_IN_STRNCPY: case BUILT_IN_STRNCPY_CHK: - /* Check both the argument and the return value. */ - return (is_addr_local (return_stmt, - gimple_call_arg (def_stmt, 0), - ploc, plocmap, visited) - || is_addr_local (return_stmt, - gimple_call_lhs (def_stmt), - ploc, plocmap, visited)); - + return is_addr_local (return_stmt, + gimple_call_arg (def_stmt, 0), + ploc, plocmap, visited); default: return false; } > > I have beefed up the test suite to exercise non-trivial functions > involving different kinds of expressions and statements, including > loops.  Except for one xfail due to missing location information > (bug 90735 - missing location in -Wreturn-local-addr on a function > with two return statements​) I haven't found any issues. > The improved warning does find many more problems than the current > solution. > >> The other approach here (and I'm not suggesting you implement it) would >> be to use the propagation engine.  That's probably overkill here since >> we'd probably end up computing a whole lot of things we don't need.  My >> sense is an on-demand approach like you've done is probably less >> computationally expensive. > > I'm certainly not opposed to making further improvements (as I > mentioned, I'd like to add checking for returning freed pointers > and freeing locals as you suggested).  But I would prefer to make > these in incremental steps rather than growing what was at first > meant to be just small bug fix/enhancement for alloca and VLAs. > > The attached update has been tested on x86_64-linux. > > Martin