From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
Date: Thu, 30 May 2019 15:48:00 -0000 [thread overview]
Message-ID: <ca9b5ba9-f641-eb76-a495-84f88a947e21@gmail.com> (raw)
In-Reply-To: <244d2a97-a753-4296-df9c-f53feaadd70d@redhat.com>
On 5/30/19 9:13 AM, Jeff Law wrote:
> On 5/30/19 8:52 AM, Martin Sebor wrote:
>> On 5/29/19 11:45 AM, Jeff Law wrote:
>>> On 5/22/19 3:34 PM, Martin Sebor wrote:
>>>> -Wreturn-local-addr detects a subset of instances of returning
>>>> the address of a local object from a function but the warning
>>>> doesn't try to handle alloca or VLAs, or some non-trivial cases
>>>> of ordinary automatic variables[1].
>>>>
>>>> The attached patch extends the implementation of the warning to
>>>> detect those.  It still doesn't detect instances where the address
>>>> is the result of a built-in such strcpy[2].
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> [1] For example, this is only diagnosed with the patch:
>>>>
>>>>    void* f (int i)
>>>> Â Â Â {
>>>>      struct S { int a[2]; } s[2];
>>>>      return &s->a[i];
>>>> Â Â Â }
>>>>
>>>> [2] The following is not diagnosed even with the patch:
>>>>
>>>>    void sink (void*);
>>>>
>>>>    void* f (int i)
>>>> Â Â Â {
>>>>      char a[6];
>>>>      char *p = __builtin_strcpy (a, "123");
>>>>      sink (p);
>>>>      return p;
>>>> Â Â Â }
>>>>
>>>> I would expect detecting to be possible and useful.  Maybe as
>>>> a follow-up.
>>>>
>>>> gcc-71924.diff
>>>>
>>>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
>>>> result
>>>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
>>>> address of a local array plus offset
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> Â Â Â Â PR c/71924
>>>> Â Â Â Â * gimple-ssa-isolate-paths.c (is_addr_local): New function.
>>>> Â Â Â Â (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
>>>> Â Â Â Â (find_implicit_erroneous_behavior): Call
>>>> warn_return_addr_local_phi_arg.
>>>> Â Â Â Â (find_explicit_erroneous_behavior): Call warn_return_addr_local.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> Â Â Â Â PR c/71924
>>>> Â Â Â Â * gcc.dg/Wreturn-local-addr-2.c: New test.
>>>> Â Â Â Â * gcc.dg/Walloca-4.c: Prune expected warnings.
>>>> Â Â Â Â * gcc.dg/pr41551.c: Same.
>>>> Â Â Â Â * gcc.dg/pr59523.c: Same.
>>>> Â Â Â Â * gcc.dg/tree-ssa/pr88775-2.c: Same.
>>>> Â Â Â Â * gcc.dg/winline-7.c: Same.
>>>>
>>>> diff --git a/gcc/gimple-ssa-isolate-paths.c
>>>> b/gcc/gimple-ssa-isolate-paths.c
>>>> index 33fe352bb23..2933ecf502e 100644
>>>> --- a/gcc/gimple-ssa-isolate-paths.c
>>>> +++ b/gcc/gimple-ssa-isolate-paths.c
>>>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
>>>> *stmt)
>>>> Â Â Â return false;
>>>> Â }
>>>> Â +/* Return true if EXPR is a expression of pointer type that refers
>>>> +Â Â to the address of a variable with automatic storage duration.
>>>> +Â Â If so, set *PLOC to the location of the object or the call that
>>>> +  allocated it (for alloca and VLAs). When PMAYBE is non-null,
>>>> +Â Â also consider PHI statements and set *PMAYBE when some but not
>>>> +Â Â all arguments of such statements refer to local variables, and
>>>> +  to clear it otherwise. */
>>>> +
>>>> +static bool
>>>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
>>>> +Â Â Â Â Â Â Â Â Â Â hash_set<gphi *> *visited = NULL)
>>>> +{
>>>> +Â if (TREE_CODE (exp) == ADDR_EXPR)
>>>> +Â Â Â {
>>>> +Â Â Â Â Â tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
>>>> +Â Â Â Â Â if (TREE_CODE (baseaddr) == MEM_REF)
>>>> +Â Â Â return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe,
>>>> visited);
>>>> +
>>>> +Â Â Â Â Â if ((!VAR_P (baseaddr)
>>>> +Â Â Â Â Â Â || is_global_var (baseaddr))
>>>> +Â Â Â Â Â && TREE_CODE (baseaddr) != PARM_DECL)
>>>> +Â Â Â return false;
>>>> +
>>>> +Â Â Â Â Â *ploc = DECL_SOURCE_LOCATION (baseaddr);
>>>> +Â Â Â Â Â return true;
>>>> +Â Â Â }
>>>> +
>>>> +Â if (TREE_CODE (exp) == SSA_NAME)
>>>> +Â Â Â {
>>>> +Â Â Â Â Â gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
>>>> +Â Â Â Â Â enum gimple_code code = gimple_code (def_stmt);
>>>> +
>>>> +Â Â Â Â Â if (is_gimple_assign (def_stmt))
>>>> +Â Â Â {
>>>> +Â Â Â Â Â tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
>>>> +Â Â Â Â Â if (POINTER_TYPE_P (type))
>>>> +Â Â Â Â Â Â Â {
>>>> +Â Â Â Â Â Â Â Â Â tree ptr = gimple_assign_rhs1 (def_stmt);
>>>> +Â Â Â Â Â Â Â Â Â return is_addr_local (ptr, ploc, pmaybe, visited);
>>>> +Â Â Â Â Â Â Â }
>>>> +Â Â Â Â Â return false;
>>>> +Â Â Â }
>>>> +
>>>> +Â Â Â Â Â if (code == GIMPLE_CALL
>>>> +Â Â Â Â Â && gimple_call_builtin_p (def_stmt))
>>>> +Â Â Â {
>>>> +Â Â Â Â Â tree fn = gimple_call_fndecl (def_stmt);
>>>> +Â Â Â Â Â int code = DECL_FUNCTION_CODE (fn);
>>>> +Â Â Â Â Â if (code != BUILT_IN_ALLOCA
>>>> +Â Â Â Â Â Â Â Â Â && code != BUILT_IN_ALLOCA_WITH_ALIGN)
>>>> +Â Â Â Â Â Â Â return false;
>>>> +
>>>> +Â Â Â Â Â *ploc = gimple_location (def_stmt);
>>>> +Â Â Â Â Â return true;
>>>> +Â Â Â }
>>>> +
>>>> +Â Â Â Â Â if (code == GIMPLE_PHI && pmaybe)
>>>> +Â Â Â {
>>>> +Â Â Â Â Â unsigned count = 0;
>>>> +Â Â Â Â Â gphi *phi_stmt = as_a <gphi *> (def_stmt);
>>>> +
>>>> +Â Â Â Â Â unsigned nargs = gimple_phi_num_args (phi_stmt);
>>>> +Â Â Â Â Â for (unsigned i = 0; i < nargs; ++i)
>>>> +Â Â Â Â Â Â Â {
>>>> +Â Â Â Â Â Â Â Â Â if (!visited->add (phi_stmt))
>>>> +Â Â Â Â Â Â Â {
>>>> +Â Â Â Â Â Â Â Â Â tree arg = gimple_phi_arg_def (phi_stmt, i);
>>>> +Â Â Â Â Â Â Â Â Â if (is_addr_local (arg, ploc, pmaybe, visited))
>>>> +Â Â Â Â Â Â Â Â Â Â Â ++count;
>>>> +Â Â Â Â Â Â Â }
>>>> +Â Â Â Â Â Â Â }
>>>> +
>>>> +Â Â Â Â Â *pmaybe = count && count < nargs;
>>>> +Â Â Â Â Â return count != 0;
>>>> +Â Â Â }
>>>> +Â Â Â }
>>>> +
>>>> +Â return false;
>>>> +}
>>> Is there some reason you didn't query the alias oracle here? It would
>>> seem a fairly natural fit. Ultimately given a pointer (which will be an
>>> SSA_NAME) you want to ask whether or not it conclusively points into the
>>> stack.
>>>
>>> That would seem to dramatically simplify is_addr_local.
>>
>> I did think about it but decided against changing the existing
>> design (iterating over PHI arguments), for a couple of reasons:
>>
>> 1) It feels like a bigger change than my simple "bug fix" calls
>> Â Â for.
> I suspect the net result will be simpler though ;-) I think you just
> get a pt solution and iterate over the things the solution says the
> pointer can point to.
>
>
>> 2) I'm not familiar enough with the alias oracle to say for sure
>> Â Â if it can be used to give the same results as the existing
>>   implementation. I.e., make it possible to identify and
>> Â Â isolate each path that returns a local address (rather than
>> Â Â just answering: yes, this pointer may point to some local
>> Â Â in this function).
> Precision of the oracle is certainly the big question.
>
>
>>
>> If the alias oracle can be used to give the same results without
>> excessive false positives then I think it would be fine to make
>> use of it. Is that something you consider a prerequisite for
>> this change or should I look into it as a followup?
> I think we should explore it a bit before making a final decision. It
> may guide us for other work in this space (like detecting escaping
> locals). I think a dirty prototype to see if it's even in the right
> ballpark would make sense.
Okay, let me look into it.
>> FWIW, I'm working on enhancing this to detect returning freed
>> pointers (under a different option). That seems like a good
>> opportunity to also look into making use of the alias oracle.
> Yes. I think there's two interesting cases here to ponder. If we free
> a pointer that must point to a local, then we can warn & trap. If we
> free a pointer that may point to a local, then we can only warn (unless
> we can isolate the path).
I wasn't actually thinking of freeing locals but it sounds like
a useful enhancement as well. Thanks for the suggestion! :)
To be clear, what I'm working on is detecting:
void* f (void *p)
{
free (p); // note: pointer was freed here
// ...
return p; // warning: returning a freed pointer
}
>> Besides these enhancements, there's also a request to diagnose
>> dereferencing pointers to compound literals whose lifetime has
>> ended (PR 89990), or more generally, those to any such local
>> object. It's about preventing essentially the same problem
>> (accessing bad data or corrupting others) but it seems that
>> both the analysis and the handling will be sufficiently
>> different to consider implementing it somewhere else. What
>> are your thoughts on that?
> I think the tough problem here is we lose binding scopes as we lower
> into gimple, so solving it in the general case may be tough. We've
> started adding some clobbers into the IL to denote object death points,
> but I'm not sure if they're sufficient to implement this kind of warning.
I was afraid that might be a problem.
Thanks
Martin
next prev parent reply other threads:[~2019-05-30 15:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-22 21:34 Martin Sebor
2019-05-29 18:08 ` Jeff Law
2019-05-30 14:58 ` Martin Sebor
2019-05-30 15:21 ` Jeff Law
2019-05-30 15:48 ` Martin Sebor [this message]
2019-05-30 16:20 ` Jeff Law
2019-05-30 17:27 ` Jason Merrill
2019-05-31 0:26 ` Jeff Law
2019-05-30 23:35 ` Martin Sebor
2019-05-31 15:50 ` Jeff Law
2019-06-03 9:37 ` Richard Biener
2019-06-03 11:28 ` Richard Biener
2019-06-04 11:45 ` Richard Biener
2019-06-03 17:24 ` Jeff Law
2019-05-31 21:19 ` Jeff Law
2019-06-03 23:24 ` Martin Sebor
2019-06-04 19:40 ` Martin Sebor
[not found] ` <224f8161-e370-bcbc-3ee6-5cff5835e016@redhat.com>
2019-06-19 3:19 ` Martin Sebor
2019-06-26 14:59 ` [PING] " Martin Sebor
2019-06-27 0:12 ` Jeff Law
2019-06-30 21:50 ` Martin Sebor
2019-07-02 20:59 ` Jeff Law
2019-07-11 6:45 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca9b5ba9-f641-eb76-a495-84f88a947e21@gmail.com \
--to=msebor@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).