public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] warn on returning alloca and VLA (PR 71924, 90549)
Date: Thu, 30 May 2019 15:48:00 -0000	[thread overview]
Message-ID: <ca9b5ba9-f641-eb76-a495-84f88a947e21@gmail.com> (raw)
In-Reply-To: <244d2a97-a753-4296-df9c-f53feaadd70d@redhat.com>

On 5/30/19 9:13 AM, Jeff Law wrote:
> On 5/30/19 8:52 AM, Martin Sebor wrote:
>> On 5/29/19 11:45 AM, Jeff Law wrote:
>>> On 5/22/19 3:34 PM, Martin Sebor wrote:
>>>> -Wreturn-local-addr detects a subset of instances of returning
>>>> the address of a local object from a function but the warning
>>>> doesn't try to handle alloca or VLAs, or some non-trivial cases
>>>> of ordinary automatic variables[1].
>>>>
>>>> The attached patch extends the implementation of the warning to
>>>> detect those.  It still doesn't detect instances where the address
>>>> is the result of a built-in such strcpy[2].
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> [1] For example, this is only diagnosed with the patch:
>>>>
>>>>     void* f (int i)
>>>>     {
>>>>       struct S { int a[2]; } s[2];
>>>>       return &s->a[i];
>>>>     }
>>>>
>>>> [2] The following is not diagnosed even with the patch:
>>>>
>>>>     void sink (void*);
>>>>
>>>>     void* f (int i)
>>>>     {
>>>>       char a[6];
>>>>       char *p = __builtin_strcpy (a, "123");
>>>>       sink (p);
>>>>       return p;
>>>>     }
>>>>
>>>> I would expect detecting to be possible and useful.  Maybe as
>>>> a follow-up.
>>>>
>>>> gcc-71924.diff
>>>>
>>>> PR middle-end/71924 - missing -Wreturn-local-addr returning alloca
>>>> result
>>>> PR middle-end/90549 - missing -Wreturn-local-addr maybe returning an
>>>> address of a local array plus offset
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>      PR c/71924
>>>>      * gimple-ssa-isolate-paths.c (is_addr_local): New function.
>>>>      (warn_return_addr_local_phi_arg, warn_return_addr_local): Same.
>>>>      (find_implicit_erroneous_behavior): Call
>>>> warn_return_addr_local_phi_arg.
>>>>      (find_explicit_erroneous_behavior): Call warn_return_addr_local.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>      PR c/71924
>>>>      * gcc.dg/Wreturn-local-addr-2.c: New test.
>>>>      * gcc.dg/Walloca-4.c: Prune expected warnings.
>>>>      * gcc.dg/pr41551.c: Same.
>>>>      * gcc.dg/pr59523.c: Same.
>>>>      * gcc.dg/tree-ssa/pr88775-2.c: Same.
>>>>      * gcc.dg/winline-7.c: Same.
>>>>
>>>> diff --git a/gcc/gimple-ssa-isolate-paths.c
>>>> b/gcc/gimple-ssa-isolate-paths.c
>>>> index 33fe352bb23..2933ecf502e 100644
>>>> --- a/gcc/gimple-ssa-isolate-paths.c
>>>> +++ b/gcc/gimple-ssa-isolate-paths.c
>>>> @@ -341,6 +341,135 @@ stmt_uses_0_or_null_in_undefined_way (gimple
>>>> *stmt)
>>>>      return false;
>>>>    }
>>>>    +/* Return true if EXPR is a expression of pointer type that refers
>>>> +   to the address of a variable with automatic storage duration.
>>>> +   If so, set *PLOC to the location of the object or the call that
>>>> +   allocated it (for alloca and VLAs).  When PMAYBE is non-null,
>>>> +   also consider PHI statements and set *PMAYBE when some but not
>>>> +   all arguments of such statements refer to local variables, and
>>>> +   to clear it otherwise.  */
>>>> +
>>>> +static bool
>>>> +is_addr_local (tree exp, location_t *ploc, bool *pmaybe = NULL,
>>>> +           hash_set<gphi *> *visited = NULL)
>>>> +{
>>>> +  if (TREE_CODE (exp) == ADDR_EXPR)
>>>> +    {
>>>> +      tree baseaddr = get_base_address (TREE_OPERAND (exp, 0));
>>>> +      if (TREE_CODE (baseaddr) == MEM_REF)
>>>> +    return is_addr_local (TREE_OPERAND (baseaddr, 0), ploc, pmaybe,
>>>> visited);
>>>> +
>>>> +      if ((!VAR_P (baseaddr)
>>>> +       || is_global_var (baseaddr))
>>>> +      && TREE_CODE (baseaddr) != PARM_DECL)
>>>> +    return false;
>>>> +
>>>> +      *ploc = DECL_SOURCE_LOCATION (baseaddr);
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +  if (TREE_CODE (exp) == SSA_NAME)
>>>> +    {
>>>> +      gimple *def_stmt = SSA_NAME_DEF_STMT (exp);
>>>> +      enum gimple_code code = gimple_code (def_stmt);
>>>> +
>>>> +      if (is_gimple_assign (def_stmt))
>>>> +    {
>>>> +      tree type = TREE_TYPE (gimple_assign_lhs (def_stmt));
>>>> +      if (POINTER_TYPE_P (type))
>>>> +        {
>>>> +          tree ptr = gimple_assign_rhs1 (def_stmt);
>>>> +          return is_addr_local (ptr, ploc, pmaybe, visited);
>>>> +        }
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +      if (code == GIMPLE_CALL
>>>> +      && gimple_call_builtin_p (def_stmt))
>>>> +    {
>>>> +      tree fn = gimple_call_fndecl (def_stmt);
>>>> +      int code = DECL_FUNCTION_CODE (fn);
>>>> +      if (code != BUILT_IN_ALLOCA
>>>> +          && code != BUILT_IN_ALLOCA_WITH_ALIGN)
>>>> +        return false;
>>>> +
>>>> +      *ploc = gimple_location (def_stmt);
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +      if (code == GIMPLE_PHI && pmaybe)
>>>> +    {
>>>> +      unsigned count = 0;
>>>> +      gphi *phi_stmt = as_a <gphi *> (def_stmt);
>>>> +
>>>> +      unsigned nargs = gimple_phi_num_args (phi_stmt);
>>>> +      for (unsigned i = 0; i < nargs; ++i)
>>>> +        {
>>>> +          if (!visited->add (phi_stmt))
>>>> +        {
>>>> +          tree arg = gimple_phi_arg_def (phi_stmt, i);
>>>> +          if (is_addr_local (arg, ploc, pmaybe, visited))
>>>> +            ++count;
>>>> +        }
>>>> +        }
>>>> +
>>>> +      *pmaybe = count && count < nargs;
>>>> +      return count != 0;
>>>> +    }
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>> Is there some reason you didn't query the alias oracle here?  It would
>>> seem a fairly natural fit.  Ultimately given a pointer (which will be an
>>> SSA_NAME) you want to ask whether or not it conclusively points into the
>>> stack.
>>>
>>> That would seem to dramatically simplify is_addr_local.
>>
>> I did think about it but decided against changing the existing
>> design (iterating over PHI arguments), for a couple of reasons:
>>
>> 1) It feels like a bigger change than my simple "bug fix" calls
>>     for.
> I suspect the net result will be simpler though ;-)  I think you just
> get a pt solution and iterate over the things the solution says the
> pointer can point to.
> 
> 
>> 2) I'm not familiar enough with the alias oracle to say for sure
>>     if it can be used to give the same results as the existing
>>     implementation.  I.e., make it possible to identify and
>>     isolate each path that returns a local address (rather than
>>     just answering: yes, this pointer may point to some local
>>     in this function).
> Precision of the oracle is certainly the big question.
> 
> 
>>
>> If the alias oracle can be used to give the same results without
>> excessive false positives then I think it would be fine to make
>> use of it.  Is that something you consider a prerequisite for
>> this change or should I look into it as a followup?
> I think we should explore it a bit before making a final decision.  It
> may guide us for other work in this space (like detecting escaping
> locals).   I think a dirty prototype to see if it's even in the right
> ballpark would make sense.

Okay, let me look into it.

>> FWIW, I'm working on enhancing this to detect returning freed
>> pointers (under a different option).  That seems like a good
>> opportunity to also look into making use of the alias oracle.
> Yes.  I think there's two interesting cases here to ponder.  If we free
> a pointer that must point to a local, then we can warn & trap.  If we
> free a pointer that may point to a local, then we can only warn (unless
> we can isolate the path).

I wasn't actually thinking of freeing locals but it sounds like
a useful enhancement as well.  Thanks for the suggestion! :)

To be clear, what I'm working on is detecting:

   void* f (void *p)
   {
     free (p);   // note: pointer was freed here
     // ...
     return p;   // warning: returning a freed pointer
   }

>> Besides these enhancements, there's also a request to diagnose
>> dereferencing pointers to compound literals whose lifetime has
>> ended (PR 89990), or more generally, those to any such local
>> object.  It's about preventing essentially the same problem
>> (accessing bad data or corrupting others) but it seems that
>> both the analysis and the handling will be sufficiently
>> different to consider implementing it somewhere else.  What
>> are your thoughts on that?
> I think the tough problem here is we lose binding scopes as we lower
> into gimple, so solving it in the general case may be tough.  We've
> started adding some clobbers into the IL to denote object death points,
> but I'm not sure if they're sufficient to implement this kind of warning.

I was afraid that might be a problem.

Thanks
Martin

  reply	other threads:[~2019-05-30 15:34 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-22 21:34 Martin Sebor
2019-05-29 18:08 ` Jeff Law
2019-05-30 14:58   ` Martin Sebor
2019-05-30 15:21     ` Jeff Law
2019-05-30 15:48       ` Martin Sebor [this message]
2019-05-30 16:20         ` Jeff Law
2019-05-30 17:27           ` Jason Merrill
2019-05-31  0:26             ` Jeff Law
2019-05-30 23:35           ` Martin Sebor
2019-05-31 15:50             ` Jeff Law
2019-06-03  9:37               ` Richard Biener
2019-06-03 11:28                 ` Richard Biener
2019-06-04 11:45                   ` Richard Biener
2019-06-03 17:24                 ` Jeff Law
2019-05-31 21:19 ` Jeff Law
2019-06-03 23:24   ` Martin Sebor
2019-06-04 19:40     ` Martin Sebor
     [not found]       ` <224f8161-e370-bcbc-3ee6-5cff5835e016@redhat.com>
2019-06-19  3:19         ` Martin Sebor
2019-06-26 14:59           ` [PING] " Martin Sebor
2019-06-27  0:12           ` Jeff Law
2019-06-30 21:50             ` Martin Sebor
2019-07-02 20:59               ` Jeff Law
2019-07-11  6:45                 ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca9b5ba9-f641-eb76-a495-84f88a947e21@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).