From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 61164 invoked by alias); 31 May 2019 20:47:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 61156 invoked by uid 89); 31 May 2019 20:47:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-14.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 31 May 2019 20:47:04 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 569B781F22; Fri, 31 May 2019 20:46:58 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-56.rdu2.redhat.com [10.10.112.56]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6875E6014E; Fri, 31 May 2019 20:46:57 +0000 (UTC) Subject: Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549) To: Martin Sebor , gcc-patches References: From: Jeff Law Openpgp: preference=signencrypt Message-ID: Date: Fri, 31 May 2019 21:19:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-05/txt/msg02172.txt.bz2 On 5/22/19 3:34 PM, Martin Sebor wrote: > -Wreturn-local-addr detects a subset of instances of returning > the address of a local object from a function but the warning > doesn't try to handle alloca or VLAs, or some non-trivial cases > of ordinary automatic variables[1]. > > The attached patch extends the implementation of the warning to > detect those.  It still doesn't detect instances where the address > is the result of a built-in such strcpy[2]. > > Tested on x86_64-linux. > > Martin > > [1] For example, this is only diagnosed with the patch: > >   void* f (int i) >   { >     struct S { int a[2]; } s[2]; >     return &s->a[i]; >   } > > [2] The following is not diagnosed even with the patch: > >   void sink (void*); > >   void* f (int i) >   { >     char a[6]; >     char *p = __builtin_strcpy (a, "123"); >     sink (p); >     return p; >   } > > I would expect detecting to be possible and useful.  Maybe as > a follow-up. > > gcc-71924.diff > > PR middle-end/71924 - missing -Wreturn-local-addr returning alloca result > PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an address of a local array plus offset > > gcc/ChangeLog: > > PR c/71924 > * gimple-ssa-isolate-paths.c (is_addr_local): New function. > (warn_return_addr_local_phi_arg, warn_return_addr_local): Same. > (find_implicit_erroneous_behavior): Call warn_return_addr_local_phi_arg. > (find_explicit_erroneous_behavior): Call warn_return_addr_local. > > gcc/testsuite/ChangeLog: > > PR c/71924 > * gcc.dg/Wreturn-local-addr-2.c: New test. > * gcc.dg/Walloca-4.c: Prune expected warnings. > * gcc.dg/pr41551.c: Same. > * gcc.dg/pr59523.c: Same. > * gcc.dg/tree-ssa/pr88775-2.c: Same. > * gcc.dg/winline-7.c: Same. > > diff --git a/gcc/gimple-ssa-isolate-paths.c b/gcc/gimple-ssa-isolate-paths.c > index 33fe352bb23..2933ecf502e 100644 > --- a/gcc/gimple-ssa-isolate-paths.c > +++ b/gcc/gimple-ssa-isolate-paths.c > @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple *stmt) > return false; > } > > +/* Return true if EXPR is a expression of pointer type that refers > + to the address of a variable with automatic storage duration. > + If so, set *PLOC to the location of the object or the call that > + allocated it (for alloca and VLAs). When PMAYBE is non-null, > + also consider PHI statements and set *PMAYBE when some but not > + all arguments of such statements refer to local variables, and > + to clear it otherwise. */ > + > +static bool > +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL, > + hash_set *visited = NULL) > +{ > + if (TREE_CODE (exp) == SSA_NAME) > + { > + gimple *def_stmt = SSA_NAME_DEF_STMT (exp); > + enum gimple_code code = gimple_code (def_stmt); > + > + if (is_gimple_assign (def_stmt)) > + { > + tree type = TREE_TYPE (gimple_assign_lhs (def_stmt)); > + if (POINTER_TYPE_P (type)) > + { > + tree ptr = gimple_assign_rhs1 (def_stmt); > + return is_addr_local (ptr, ploc, pmaybe, visited); > + } > + return false; > + } So this is going to recurse on the rhs1 of something like POINTER_PLUS_EXPR, that's a good thing :-) But isn't it non-selective about the codes where we recurse? Consider ptr = (cond) ? res1 : res2 I think we'll end up recursing on the condition rather than looking at res1 and res2. I suspect there are a very limited number of expression codes that appear on the RHS where we'd want to recurse on one or both operands. POINTER_PLUS_EXPR, NOP_EXPR, maybe COND_EXPR (where you have to recurse on both and logically and the result), BIT_AND (maybe we masked off some bits in an address). That's probably about it :-) Are there any other codes you've seen or think would be useful in practice to recurse through? I'd rather list them explicitly rather than just recurse down through every rhs1 we encounter. > + > + if (code == GIMPLE_PHI && pmaybe) > + { > + unsigned count = 0; > + gphi *phi_stmt = as_a (def_stmt); > + > + unsigned nargs = gimple_phi_num_args (phi_stmt); > + for (unsigned i = 0; i < nargs; ++i) > + { > + if (!visited->add (phi_stmt)) > + { > + tree arg = gimple_phi_arg_def (phi_stmt, i); > + if (is_addr_local (arg, ploc, pmaybe, visited)) > + ++count; > + } > + } So imagine p = phi (x1, x1, x2) Where both x1 and x2 would pass is_addr_local. I think this code would incorrectly set pmaybe. We process the first instance of x1, find it is local and bump count. When we encounter the second instance, it's in our map and we do nothing. THen we process x2 and bump count again. So count would be 2. But count is going to be less than nargs so we'll set *pmaybe to true. ISTM you'd have to record the result for each phi argument and query that to determine if you should bump the counter if you've already visited the argument. It also seems to me that you can break the loop as soon as you've got a nonzero count and get a false return from is_addr_local. Not sure if that'll matter in practice or not. The other approach here (and I'm not suggesting you implement it) would be to use the propagation engine. That's probably overkill here since we'd probably end up computing a whole lot of things we don't need. My sense is an on-demand approach like you've done is probably less computationally expensive. jeff