public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Ankur Saini <arsenic.secondary@gmail.com>
Cc: gcc@gcc.gnu.org
Subject: Re: daily report on extending static analyzer project [GSoC]
Date: Thu, 05 Aug 2021 19:09:50 -0400	[thread overview]
Message-ID: <8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com> (raw)
In-Reply-To: <2634D9B0-51AC-4166-BBF0-6F39D9D15235@gmail.com>

On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
> 
> 
> > On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> > 
> > [...snip...]
> > > 
> > > - From observation, a typical vfunc call that isn't devirtualised
> > > by
> > > the compiler's front end looks something like this 
> > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> > > where "a_ptr_5(D)" is pointer that is being used to call the
> > > virtual
> > > function.
> > > 
> > > - We can access it's region to see what is the type of the object
> > > the
> > > pointer is actually pointing to.
> > > 
> > > - This is then used to find a call with DECL_CONTEXT of the object
> > > from the all the possible targets of that polymorphic call.
> > 
> > [...]
> > 
> > > 
> > > Patch file ( prototype ) : 
> > > 
> > 
> > > +  /* Call is possibly a polymorphic call.
> > > +  
> > > +     In such case, use devirtisation tools to find 
> > > +     possible callees of this function call.  */
> > > +  
> > > +  function *fun = get_current_function ();
> > > +  gcall *stmt  = const_cast<gcall *> (call);
> > > +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> > > +  if (e->indirect_info->polymorphic)
> > > +  {
> > > +    void *cache_token;
> > > +    bool final;
> > > +    vec <cgraph_node *> targets
> > > +      = possible_polymorphic_call_targets (e, &final,
> > > &cache_token, true);
> > > +    if (!targets.is_empty ())
> > > +      {
> > > +        tree most_propbable_taget = NULL_TREE;
> > > +        if(targets.length () == 1)
> > > +           return targets[0]->decl;
> > > +    
> > > +        /* From the current state, check which subclass the
> > > pointer that 
> > > +           is being used to this polymorphic call points to, and
> > > use to
> > > +           filter out correct function call.  */
> > > +        tree t_val = gimple_call_arg (call, 0);
> > 
> > Maybe rename to "this_expr"?
> > 
> > 
> > > +        const svalue *sval = get_rvalue (t_val, ctxt);
> > 
> > and "this_sval"?
> 
> ok
> 
> > 
> > ...assuming that that's what the value is.
> > 
> > Probably should reject the case where there are zero arguments.
> 
> Ideally it should always have one argument representing the pointer
> used to call the function. 
> 
> for example, if the function is called like this : -
> 
> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
> pointer to an object of a subclass.
> 
> I saw that it’s GIMPLE representation is as follows : -
> 
> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
> 
> > 
> > 
> > > +
> > > +        const region *reg
> > > +          = [&]()->const region *
> > > +              {
> > > +                switch (sval->get_kind ())
> > > +                  {
> > > +                    case SK_INITIAL:
> > > +                      {
> > > +                        const initial_svalue *initial_sval
> > > +                          = sval->dyn_cast_initial_svalue ();
> > > +                        return initial_sval->get_region ();
> > > +                      }
> > > +                      break;
> > > +                    case SK_REGION:
> > > +                      {
> > > +                        const region_svalue *region_sval 
> > > +                          = sval->dyn_cast_region_svalue ();
> > > +                        return region_sval->get_pointee ();
> > > +                      }
> > > +                      break;
> > > +
> > > +                    default:
> > > +                      return NULL;
> > > +                  }
> > > +              } ();
> > 
> > I think the above should probably be a subroutine.
> > 
> > That said, it's not clear to me what it's doing, or that this is
> > correct.
> 
> 
> Sorry, I think I should have explained it earlier.
> 
> Let's take an example code snippet :- 
> 
> Derived d;
> Base *base_ptr;
> base_ptr = &d;
> base_ptr->foo();        // where foo() is a virtual function
> 
> This genertes the following GIMPLE dump :- 
> 
> Derived::Derived (&d);
> base_ptr_6 = &d.D.3779;
> _1 = base_ptr_6->_vptr.Base;
> _2 = _1 + 8;
> _3 = *_2;
> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

I did a bit of playing with this example, and tried adding:

1876	    case OBJ_TYPE_REF:
1877	      gcc_unreachable ();
1878	      break;

to region_model::get_rvalue_1, and running cc1plus under the debugger.

The debugger hits the "gcc_unreachable ();", at this stmt:

     OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);

Looking at the region_model with region_model::debug() shows:

(gdb) call debug()
stack depth: 1
  frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
  cluster for: Derived d
    key:   {bytes 0-7}
    value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
  cluster for: base_ptr_5: &Derived d.<anonymous>
  cluster for: _2: &‘foo’
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
    ec0: {&Derived d.<anonymous>}
    ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
    ec2: {(void *)0B == [m_constant]‘0B’}
    ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
  constraints:
    0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
    1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void *)0B == [m_constant]‘0B’}
    2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} != ec2: {(void *)0B == [m_constant]‘0B’}

i.e. it already "knows" that _2 is &'foo' for Derived::foo.

So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
function pointer directly from the vtable for such cases, so something
like:

    case OBJ_TYPE_REF:
	{
	   tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
	   return get_rvalue (expr, ctxt); 
	}
	break;

might get the function pointer.

(caveat: untested code)

> 
> Here instead of trying to extract virtual pointer from the call and see
> which subclass it belongs, I found it simpler to extract the actual
> pointer which is used to call the function itself (which from
> observation, is always the first parameter of the call) and used the
> region model at that point to figure out what is the type of the object
> it actually points to ultimately get the actual subclass who's function
> is being called here. :)
> 
> Now let me try to explain how I actually executed it ( A lot of
> assumptions here are based on observation, so please correct me
> wherever you think I made a false interpretation or forgot about a
> certain special case ) :
> 
> - once it is confirmed that the call that we are dealing with is a
> polymorphic call ( via the cgraph edge representing the call ), I used
> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
> in ipa-devirt.c ), to get the possible callee of that call. 
> 
>   function *fun = get_current_function ();
>   gcall *stmt  = const_cast<gcall *> (call);
>   cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>   if (e->indirect_info->polymorphic)
>   {
>     void *cache_token;
>     bool final;
>     vec <cgraph_node *> targets
>       = possible_polymorphic_call_targets (e, &final, &cache_token,
> true);
> 
> - Now if the list contains more than one targets, I will make use of
> the current enode's region model to get more info about the pointer
> which was used to call the function .
> 
>         /* here I extract the pointer (which was used to call the
> function), which from observation, is always the zeroth argument of the
> call.  */
>         tree t_val = gimple_call_arg (call, 0);
>         const svalue *sval = get_rvalue (t_val, ctxt);
> 
> - In all the examples I used, the pointer is represented as
> region_svalue or as initial_svalue (I think, initial_svalue is the case
> where the pointer is taken as a parameter of the current function and
> analyzer is analysing top-level call to this function )
> 
> Here are some examples of the following, Where I used
> __analyzer_describe () to show the same 
>  . (https://godbolt.org/z/Mqs8oM6ff)
>  . (https://godbolt.org/z/z4sfTM3f5))
> 
>         /* here I extract the region that the pointer is pointing to,
> and as both of them returns a (const region *), I used a lambda to get
> it ( If you want, I can turn this into a separate function to make it
> more readable )  */
> 
>         const region *reg
>           = [&]()->const region *
>               {
>                 switch (sval->get_kind ())
>                   {
>                     case SK_INITIAL:
>                       {
>                         const initial_svalue *initial_sval
>                           = sval->dyn_cast_initial_svalue ();
>                         return initial_sval->get_region ();
>                       }
>                       break;
>                     case SK_REGION:
>                       {
>                         const region_svalue *region_sval 
>                           = sval->dyn_cast_region_svalue ();
>                         return region_sval->get_pointee ();
>                       }
>                       break;
> 
>                     default:
>                       return NULL;
>                   }
>               } ();
> 
>         gcc_assert (reg);
> 
>         /* Now that I have the region, I tried to get the type of the
> object it is holding and put it in ‘known_possible_subclass_type’.  */
> 
>         tree known_possible_subclass_type;
>         known_possible_subclass_type = reg->get_type ();
>         if (reg->get_kind () == RK_FIELD)
>           {
>              const field_region* field_reg = reg->dyn_cast_field_region
> ();
>              known_possible_subclass_type 
>                = DECL_CONTEXT (field_reg->get_field ());
>           }
> 
> /* After that I iterated over the entire array of possible calls to
> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
> as that of the type of the object that the pointer is actually pointing
> to.  */
> 
>         for (cgraph_node *x : targets)
>           {
>             if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>               most_propbable_taget = x->decl;
>           }
>         return most_propbable_taget;
>       }
>    }
> 
> I tested it on all of the test programs I created and till now in all
> of the cases, the analyzer is correctly determining the call. I am
> currently in the process of creating more tests ( including multiple
> types of inheritances ) to see how successful is this implementation .

I'm still skeptical of the above code; my feeling is that with more
tests you'll find cases where it doesn't work.  Maybe dynamically
allocated instances?

Hope this is constructive

Dave


  reply	other threads:[~2021-08-05 23:09 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 14:29 Ankur Saini
2021-06-24 20:53 ` David Malcolm
2021-06-25 15:03   ` Ankur Saini
2021-06-25 15:34     ` David Malcolm
2021-06-26 15:20       ` Ankur Saini
2021-06-27 18:48         ` David Malcolm
2021-06-28 14:53           ` Ankur Saini
2021-06-28 23:39             ` David Malcolm
2021-06-29 16:34               ` Ankur Saini
2021-06-29 19:53                 ` David Malcolm
     [not found]                   ` <AD7A4C2F-1451-4317-BE53-99DE9E9853AE@gmail.com>
2021-06-30 17:17                     ` David Malcolm
2021-07-02 14:18                       ` Ankur Saini
2021-07-03 14:37                         ` Ankur Saini
2021-07-05 16:15                           ` Ankur Saini
2021-07-06 23:11                             ` David Malcolm
2021-07-06 22:46                           ` David Malcolm
2021-07-06 22:50                             ` David Malcolm
2021-07-07 13:52                             ` Ankur Saini
2021-07-07 14:37                               ` David Malcolm
2021-07-10 15:57                                 ` Ankur Saini
2021-07-11 17:01                                   ` Ankur Saini
2021-07-11 18:01                                     ` David Malcolm
2021-07-11 17:49                                   ` David Malcolm
2021-07-12 16:37                                     ` Ankur Saini
2021-07-14 17:11                                       ` Ankur Saini
2021-07-14 23:23                                         ` David Malcolm
2021-07-16 15:34                                           ` Ankur Saini
2021-07-16 21:27                                             ` David Malcolm
2021-07-21 16:14                                               ` Ankur Saini
2021-07-22 17:10                                                 ` Ankur Saini
2021-07-22 23:21                                                   ` David Malcolm
2021-07-24 16:35                                                   ` Ankur Saini
2021-07-27 15:05                                                     ` Ankur Saini
2021-07-28 15:49                                                       ` Ankur Saini
2021-07-29 12:50                                                         ` Ankur Saini
2021-07-30  0:05                                                           ` David Malcolm
     [not found]                                                             ` <ACE21DBF-8163-4F28-B755-6B05FDA27A0E@gmail.com>
2021-07-30 14:48                                                               ` David Malcolm
2021-08-03 16:12                                                                 ` Ankur Saini
2021-08-04 16:02                                                                   ` Ankur Saini
2021-08-04 23:26                                                                     ` David Malcolm
2021-08-05 14:57                                                                       ` Ankur Saini
2021-08-05 23:09                                                                         ` David Malcolm [this message]
2021-08-06 15:41                                                                           ` Ankur Saini
2021-07-22 23:07                                                 ` David Malcolm
2021-07-14 23:07                                       ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=arsenic.secondary@gmail.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).