From: David Malcolm <dmalcolm@redhat.com>
To: Ankur Saini <arsenic.secondary@gmail.com>
Cc: gcc@gcc.gnu.org
Subject: Re: daily report on extending static analyzer project [GSoC]
Date: Thu, 05 Aug 2021 19:09:50 -0400 [thread overview]
Message-ID: <8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com> (raw)
In-Reply-To: <2634D9B0-51AC-4166-BBF0-6F39D9D15235@gmail.com>
On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
>
>
> > On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> >
> > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> >
> > [...snip...]
> > >
> > > - From observation, a typical vfunc call that isn't devirtualised
> > > by
> > > the compiler's front end looks something like this
> > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> > > where "a_ptr_5(D)" is pointer that is being used to call the
> > > virtual
> > > function.
> > >
> > > - We can access it's region to see what is the type of the object
> > > the
> > > pointer is actually pointing to.
> > >
> > > - This is then used to find a call with DECL_CONTEXT of the object
> > > from the all the possible targets of that polymorphic call.
> >
> > [...]
> >
> > >
> > > Patch file ( prototype ) :
> > >
> >
> > > + /* Call is possibly a polymorphic call.
> > > +
> > > + In such case, use devirtisation tools to find
> > > + possible callees of this function call. */
> > > +
> > > + function *fun = get_current_function ();
> > > + gcall *stmt = const_cast<gcall *> (call);
> > > + cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> > > + if (e->indirect_info->polymorphic)
> > > + {
> > > + void *cache_token;
> > > + bool final;
> > > + vec <cgraph_node *> targets
> > > + = possible_polymorphic_call_targets (e, &final,
> > > &cache_token, true);
> > > + if (!targets.is_empty ())
> > > + {
> > > + tree most_propbable_taget = NULL_TREE;
> > > + if(targets.length () == 1)
> > > + return targets[0]->decl;
> > > +
> > > + /* From the current state, check which subclass the
> > > pointer that
> > > + is being used to this polymorphic call points to, and
> > > use to
> > > + filter out correct function call. */
> > > + tree t_val = gimple_call_arg (call, 0);
> >
> > Maybe rename to "this_expr"?
> >
> >
> > > + const svalue *sval = get_rvalue (t_val, ctxt);
> >
> > and "this_sval"?
>
> ok
>
> >
> > ...assuming that that's what the value is.
> >
> > Probably should reject the case where there are zero arguments.
>
> Ideally it should always have one argument representing the pointer
> used to call the function.
>
> for example, if the function is called like this : -
>
> a_ptr->foo(arg); // where foo() is a virtual function and a_ptr is a
> pointer to an object of a subclass.
>
> I saw that it’s GIMPLE representation is as follows : -
>
> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
>
> >
> >
> > > +
> > > + const region *reg
> > > + = [&]()->const region *
> > > + {
> > > + switch (sval->get_kind ())
> > > + {
> > > + case SK_INITIAL:
> > > + {
> > > + const initial_svalue *initial_sval
> > > + = sval->dyn_cast_initial_svalue ();
> > > + return initial_sval->get_region ();
> > > + }
> > > + break;
> > > + case SK_REGION:
> > > + {
> > > + const region_svalue *region_sval
> > > + = sval->dyn_cast_region_svalue ();
> > > + return region_sval->get_pointee ();
> > > + }
> > > + break;
> > > +
> > > + default:
> > > + return NULL;
> > > + }
> > > + } ();
> >
> > I think the above should probably be a subroutine.
> >
> > That said, it's not clear to me what it's doing, or that this is
> > correct.
>
>
> Sorry, I think I should have explained it earlier.
>
> Let's take an example code snippet :-
>
> Derived d;
> Base *base_ptr;
> base_ptr = &d;
> base_ptr->foo(); // where foo() is a virtual function
>
> This genertes the following GIMPLE dump :-
>
> Derived::Derived (&d);
> base_ptr_6 = &d.D.3779;
> _1 = base_ptr_6->_vptr.Base;
> _2 = _1 + 8;
> _3 = *_2;
> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);
I did a bit of playing with this example, and tried adding:
1876 case OBJ_TYPE_REF:
1877 gcc_unreachable ();
1878 break;
to region_model::get_rvalue_1, and running cc1plus under the debugger.
The debugger hits the "gcc_unreachable ();", at this stmt:
OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);
Looking at the region_model with region_model::debug() shows:
(gdb) call debug()
stack depth: 1
frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
cluster for: Derived d
key: {bytes 0-7}
value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
cluster for: base_ptr_5: &Derived d.<anonymous>
cluster for: _2: &‘foo’
m_called_unknown_fn: FALSE
constraint_manager:
equiv classes:
ec0: {&Derived d.<anonymous>}
ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
ec2: {(void *)0B == [m_constant]‘0B’}
ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
constraints:
0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void *)0B == [m_constant]‘0B’}
2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} != ec2: {(void *)0B == [m_constant]‘0B’}
i.e. it already "knows" that _2 is &'foo' for Derived::foo.
So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
function pointer directly from the vtable for such cases, so something
like:
case OBJ_TYPE_REF:
{
tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
return get_rvalue (expr, ctxt);
}
break;
might get the function pointer.
(caveat: untested code)
>
> Here instead of trying to extract virtual pointer from the call and see
> which subclass it belongs, I found it simpler to extract the actual
> pointer which is used to call the function itself (which from
> observation, is always the first parameter of the call) and used the
> region model at that point to figure out what is the type of the object
> it actually points to ultimately get the actual subclass who's function
> is being called here. :)
>
> Now let me try to explain how I actually executed it ( A lot of
> assumptions here are based on observation, so please correct me
> wherever you think I made a false interpretation or forgot about a
> certain special case ) :
>
> - once it is confirmed that the call that we are dealing with is a
> polymorphic call ( via the cgraph edge representing the call ), I used
> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
> in ipa-devirt.c ), to get the possible callee of that call.
>
> function *fun = get_current_function ();
> gcall *stmt = const_cast<gcall *> (call);
> cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> if (e->indirect_info->polymorphic)
> {
> void *cache_token;
> bool final;
> vec <cgraph_node *> targets
> = possible_polymorphic_call_targets (e, &final, &cache_token,
> true);
>
> - Now if the list contains more than one targets, I will make use of
> the current enode's region model to get more info about the pointer
> which was used to call the function .
>
> /* here I extract the pointer (which was used to call the
> function), which from observation, is always the zeroth argument of the
> call. */
> tree t_val = gimple_call_arg (call, 0);
> const svalue *sval = get_rvalue (t_val, ctxt);
>
> - In all the examples I used, the pointer is represented as
> region_svalue or as initial_svalue (I think, initial_svalue is the case
> where the pointer is taken as a parameter of the current function and
> analyzer is analysing top-level call to this function )
>
> Here are some examples of the following, Where I used
> __analyzer_describe () to show the same
> . (https://godbolt.org/z/Mqs8oM6ff)
> . (https://godbolt.org/z/z4sfTM3f5))
>
> /* here I extract the region that the pointer is pointing to,
> and as both of them returns a (const region *), I used a lambda to get
> it ( If you want, I can turn this into a separate function to make it
> more readable ) */
>
> const region *reg
> = [&]()->const region *
> {
> switch (sval->get_kind ())
> {
> case SK_INITIAL:
> {
> const initial_svalue *initial_sval
> = sval->dyn_cast_initial_svalue ();
> return initial_sval->get_region ();
> }
> break;
> case SK_REGION:
> {
> const region_svalue *region_sval
> = sval->dyn_cast_region_svalue ();
> return region_sval->get_pointee ();
> }
> break;
>
> default:
> return NULL;
> }
> } ();
>
> gcc_assert (reg);
>
> /* Now that I have the region, I tried to get the type of the
> object it is holding and put it in ‘known_possible_subclass_type’. */
>
> tree known_possible_subclass_type;
> known_possible_subclass_type = reg->get_type ();
> if (reg->get_kind () == RK_FIELD)
> {
> const field_region* field_reg = reg->dyn_cast_field_region
> ();
> known_possible_subclass_type
> = DECL_CONTEXT (field_reg->get_field ());
> }
>
> /* After that I iterated over the entire array of possible calls to
> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
> as that of the type of the object that the pointer is actually pointing
> to. */
>
> for (cgraph_node *x : targets)
> {
> if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
> most_propbable_taget = x->decl;
> }
> return most_propbable_taget;
> }
> }
>
> I tested it on all of the test programs I created and till now in all
> of the cases, the analyzer is correctly determining the call. I am
> currently in the process of creating more tests ( including multiple
> types of inheritances ) to see how successful is this implementation .
I'm still skeptical of the above code; my feeling is that with more
tests you'll find cases where it doesn't work. Maybe dynamically
allocated instances?
Hope this is constructive
Dave
next prev parent reply other threads:[~2021-08-05 23:09 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 14:29 Ankur Saini
2021-06-24 20:53 ` David Malcolm
2021-06-25 15:03 ` Ankur Saini
2021-06-25 15:34 ` David Malcolm
2021-06-26 15:20 ` Ankur Saini
2021-06-27 18:48 ` David Malcolm
2021-06-28 14:53 ` Ankur Saini
2021-06-28 23:39 ` David Malcolm
2021-06-29 16:34 ` Ankur Saini
2021-06-29 19:53 ` David Malcolm
[not found] ` <AD7A4C2F-1451-4317-BE53-99DE9E9853AE@gmail.com>
2021-06-30 17:17 ` David Malcolm
2021-07-02 14:18 ` Ankur Saini
2021-07-03 14:37 ` Ankur Saini
2021-07-05 16:15 ` Ankur Saini
2021-07-06 23:11 ` David Malcolm
2021-07-06 22:46 ` David Malcolm
2021-07-06 22:50 ` David Malcolm
2021-07-07 13:52 ` Ankur Saini
2021-07-07 14:37 ` David Malcolm
2021-07-10 15:57 ` Ankur Saini
2021-07-11 17:01 ` Ankur Saini
2021-07-11 18:01 ` David Malcolm
2021-07-11 17:49 ` David Malcolm
2021-07-12 16:37 ` Ankur Saini
2021-07-14 17:11 ` Ankur Saini
2021-07-14 23:23 ` David Malcolm
2021-07-16 15:34 ` Ankur Saini
2021-07-16 21:27 ` David Malcolm
2021-07-21 16:14 ` Ankur Saini
2021-07-22 17:10 ` Ankur Saini
2021-07-22 23:21 ` David Malcolm
2021-07-24 16:35 ` Ankur Saini
2021-07-27 15:05 ` Ankur Saini
2021-07-28 15:49 ` Ankur Saini
2021-07-29 12:50 ` Ankur Saini
2021-07-30 0:05 ` David Malcolm
[not found] ` <ACE21DBF-8163-4F28-B755-6B05FDA27A0E@gmail.com>
2021-07-30 14:48 ` David Malcolm
2021-08-03 16:12 ` Ankur Saini
2021-08-04 16:02 ` Ankur Saini
2021-08-04 23:26 ` David Malcolm
2021-08-05 14:57 ` Ankur Saini
2021-08-05 23:09 ` David Malcolm [this message]
2021-08-06 15:41 ` Ankur Saini
2021-07-22 23:07 ` David Malcolm
2021-07-14 23:07 ` David Malcolm
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com \
--to=dmalcolm@redhat.com \
--cc=arsenic.secondary@gmail.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).