From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id 48ED0385702D for ; Fri, 6 Aug 2021 15:42:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 48ED0385702D Received: by mail-pl1-x631.google.com with SMTP id u2so7563539plg.10 for ; Fri, 06 Aug 2021 08:42:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=BFaS+t9wpmt4Qv/moPDW9aK0KKB99xBOBoNmEV4Ck4g=; b=LPNMtsageTvXfdh9fKxvuZZBEQebbu0mvp4KeCHgPNyRWmvmHTSc8GTc4QCt+QcNDQ Oodc8Yad19fdeug4i3OpHS/PKs5YhOGRTJhJ8S+ajeBM7Vrtv8I0zXqEOfQTHidvePv4 kmrM3KNkq3Tb69AhS9QFeXQCm7CZcrpJR6tZQgst3wYA5wJ/YLHeb9gqbxWT7DODdNC/ m0J9ouPmiajIz2Du13guIJK5HAEA0j2yS6qPv5CuNw4Qp7oUJaWEIJHn6xsLKH2tySLp AfVGjiCagdBZ6Tr6T1wGIOcZWcu25iirFgKEM9LBlYHJkqRdo9mPPgsz2n5tZBTORPuP Dt4Q== X-Gm-Message-State: AOAM5335ofS4jGuOPwZGEnW1B9u/U/FmpnBwb6Zjds8P1qYky17+sQ4f QkZPCH62e9QPiuD5b+0HmyI= X-Google-Smtp-Source: ABdhPJxaar/4QinhGlw8YD20VnUTHixM461mRptH0v0jOGOzitAmrlaEkz6nFWHj+SIL2HUccroWQg== X-Received: by 2002:a17:90b:1809:: with SMTP id lw9mr11034286pjb.126.1628264520225; Fri, 06 Aug 2021 08:42:00 -0700 (PDT) Received: from [192.168.100.2] ([146.196.34.153]) by smtp.gmail.com with ESMTPSA id s195sm11247526pfs.119.2021.08.06.08.41.57 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 06 Aug 2021 08:41:59 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.20.0.2.21\)) Subject: Re: daily report on extending static analyzer project [GSoC] From: Ankur Saini In-Reply-To: <8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com> Date: Fri, 6 Aug 2021 21:11:55 +0530 Cc: gcc@gcc.gnu.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <35A0246A-D4F8-4B41-A009-4A98F78E0395@gmail.com> <06DBCE04-B3AC-4091-979D-430507352213@gmail.com> <425d5e711663bbf0c1ebcfe05097780ebb2126a0.camel@redhat.com> <9e97b67cc5eb55a3a526b1c263a9980915556ce1.camel@redhat.com> <6ea20fa9093b566db6884cc2af51ae1bb7deee95.camel@redhat.com> <0ba8ad1a23fd87ef123ec51f76ccaf29ea114c79.camel@redhat.com> <0558C377-5A39-4D71-A2DC-DE23E737C65D@gmail.com> <0324cb3d52c80b3a6ff70488aa28e2322de7b832.camel@redhat.com> <9CDE4E6A-D7DB-43DE-AB00-95D1B4667061@gmail.com> <00236f15381ae32ac62704d40d064a726a849f50.camel@redhat.com> <0F8999E7-C7E8-4499-9293-55BF4185EF49@gmail.com> <3ff03555914a6ab2e42a288c420050679f615a7b.camel@redhat.com> <2095bdb6f2776f54f482b4d963e16de4d62877be.camel@redhat.com> <72893A18-FE13-4A22-85AC-BBC6ED270176@gmail.com> <720C2212-E325-487C-A8F8-EA33CB5C8FC1@gmail.com> <519AE121-7299-44D8-9A72-690407143316@gmail.com> <87a2b673a3d605fea65b3418446f3a900459f965.camel@redhat.com> <8BB81A76-38BD-4AE5-B4E3-E24CD6600A37@gmail.com> <4c9bdf22b6b9fe9eb9bd8de328a145afdb3d9712.camel@redhat.com> <2634D9B0-51AC-4166-BBF0-6F39D9D15235@gmail.com> <8b204698364f73b31cb9f3c4488a26cd59934cfb.camel@redhat.com> To: David Malcolm X-Mailer: Apple Mail (2.3654.20.0.2.21) X-Spam-Status: No, score=2.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_ABUSEAT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Aug 2021 15:42:03 -0000 > On 06-Aug-2021, at 4:39 AM, David Malcolm wrote: >=20 > On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote: >>=20 >>=20 >>> On 05-Aug-2021, at 4:56 AM, David Malcolm >>> wrote: >>>=20 >>> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote: >>>=20 >>> [...snip...] >>>>=20 >>>> - =46rom observation, a typical vfunc call that isn't devirtualised >>>> by >>>> the compiler's front end looks something like this=20 >>>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))" >>>> where "a_ptr_5(D)" is pointer that is being used to call the >>>> virtual >>>> function. >>>>=20 >>>> - We can access it's region to see what is the type of the object >>>> the >>>> pointer is actually pointing to. >>>>=20 >>>> - This is then used to find a call with DECL_CONTEXT of the object >>>> from the all the possible targets of that polymorphic call. >>>=20 >>> [...] >>>=20 >>>>=20 >>>> Patch file ( prototype ) :=20 >>>>=20 >>>=20 >>>> + /* Call is possibly a polymorphic call. >>>> + =20 >>>> + In such case, use devirtisation tools to find=20 >>>> + possible callees of this function call. */ >>>> + =20 >>>> + function *fun =3D get_current_function (); >>>> + gcall *stmt =3D const_cast (call); >>>> + cgraph_edge *e =3D cgraph_node::get (fun->decl)->get_edge = (stmt); >>>> + if (e->indirect_info->polymorphic) >>>> + { >>>> + void *cache_token; >>>> + bool final; >>>> + vec targets >>>> + =3D possible_polymorphic_call_targets (e, &final, >>>> &cache_token, true); >>>> + if (!targets.is_empty ()) >>>> + { >>>> + tree most_propbable_taget =3D NULL_TREE; >>>> + if(targets.length () =3D=3D 1) >>>> + return targets[0]->decl; >>>> + =20 >>>> + /* =46rom the current state, check which subclass the >>>> pointer that=20 >>>> + is being used to this polymorphic call points to, and >>>> use to >>>> + filter out correct function call. */ >>>> + tree t_val =3D gimple_call_arg (call, 0); >>>=20 >>> Maybe rename to "this_expr"? >>>=20 >>>=20 >>>> + const svalue *sval =3D get_rvalue (t_val, ctxt); >>>=20 >>> and "this_sval"? >>=20 >> ok >>=20 >>>=20 >>> ...assuming that that's what the value is. >>>=20 >>> Probably should reject the case where there are zero arguments. >>=20 >> Ideally it should always have one argument representing the pointer >> used to call the function.=20 >>=20 >> for example, if the function is called like this : - >>=20 >> a_ptr->foo(arg); // where foo() is a virtual function and a_ptr is a >> pointer to an object of a subclass. >>=20 >> I saw that it=E2=80=99s GIMPLE representation is as follows : - >>=20 >> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg); >>=20 >>>=20 >>>=20 >>>> + >>>> + const region *reg >>>> + =3D [&]()->const region * >>>> + { >>>> + switch (sval->get_kind ()) >>>> + { >>>> + case SK_INITIAL: >>>> + { >>>> + const initial_svalue *initial_sval >>>> + =3D sval->dyn_cast_initial_svalue (); >>>> + return initial_sval->get_region (); >>>> + } >>>> + break; >>>> + case SK_REGION: >>>> + { >>>> + const region_svalue *region_sval=20 >>>> + =3D sval->dyn_cast_region_svalue (); >>>> + return region_sval->get_pointee (); >>>> + } >>>> + break; >>>> + >>>> + default: >>>> + return NULL; >>>> + } >>>> + } (); >>>=20 >>> I think the above should probably be a subroutine. >>>=20 >>> That said, it's not clear to me what it's doing, or that this is >>> correct. >>=20 >>=20 >> Sorry, I think I should have explained it earlier. >>=20 >> Let's take an example code snippet :-=20 >>=20 >> Derived d; >> Base *base_ptr; >> base_ptr =3D &d; >> base_ptr->foo(); // where foo() is a virtual function >>=20 >> This genertes the following GIMPLE dump :-=20 >>=20 >> Derived::Derived (&d); >> base_ptr_6 =3D &d.D.3779; >> _1 =3D base_ptr_6->_vptr.Base; >> _2 =3D _1 + 8; >> _3 =3D *_2; >> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6); >=20 > I did a bit of playing with this example, and tried adding: >=20 > 1876 case OBJ_TYPE_REF: > 1877 gcc_unreachable (); > 1878 break; >=20 > to region_model::get_rvalue_1, and running cc1plus under the debugger. >=20 > The debugger hits the "gcc_unreachable ();", at this stmt: >=20 > OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5); >=20 > Looking at the region_model with region_model::debug() shows: >=20 > (gdb) call debug() > stack depth: 1 > frame (index 0): frame: =E2=80=98test=E2=80=99@1 > clusters within frame: =E2=80=98test=E2=80=99@1 > cluster for: Derived d > key: {bytes 0-7} > value: =E2=80=98int (*) () *=E2=80=99 {(&constexpr int (* = Derived::_ZTV7Derived [3])(...)+(sizetype)16)} > cluster for: base_ptr_5: &Derived d. > cluster for: _2: &=E2=80=98foo=E2=80=99 > m_called_unknown_fn: FALSE > constraint_manager: > equiv classes: > ec0: {&Derived d.} > ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} > ec2: {(void *)0B =3D=3D [m_constant]=E2=80=980B=E2=80=99} > ec3: {(&constexpr int (* Derived::_ZTV7Derived = [3])(...)+(sizetype)16)} > constraints: > 0: ec0: {&Derived d.} !=3D ec2: {(void *)0B =3D=3D = [m_constant]=E2=80=980B=E2=80=99} > 1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} !=3D = ec2: {(void *)0B =3D=3D [m_constant]=E2=80=980B=E2=80=99} > 2: ec3: {(&constexpr int (* Derived::_ZTV7Derived = [3])(...)+(sizetype)16)} !=3D ec2: {(void *)0B =3D=3D = [m_constant]=E2=80=980B=E2=80=99} >=20 > i.e. it already "knows" that _2 is &'foo' for Derived::foo. >=20 > So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the > function pointer directly from the vtable for such cases, so something > like: >=20 > case OBJ_TYPE_REF: > { > tree expr =3D OBJ_TYPE_REF_EXPR (pv.m_tree); > return get_rvalue (expr, ctxt);=20 > } > break; >=20 > might get the function pointer. I tried it, and yes, it works like a charm. Thanks : ) >=20 > (caveat: untested code) >=20 >>=20 >> Here instead of trying to extract virtual pointer from the call and = see >> which subclass it belongs, I found it simpler to extract the actual >> pointer which is used to call the function itself (which from >> observation, is always the first parameter of the call) and used the >> region model at that point to figure out what is the type of the = object >> it actually points to ultimately get the actual subclass who's = function >> is being called here. :) >>=20 >> Now let me try to explain how I actually executed it ( A lot of >> assumptions here are based on observation, so please correct me >> wherever you think I made a false interpretation or forgot about a >> certain special case ) : >>=20 >> - once it is confirmed that the call that we are dealing with is a >> polymorphic call ( via the cgraph edge representing the call ), I = used >> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined >> in ipa-devirt.c ), to get the possible callee of that call.=20 >>=20 >> function *fun =3D get_current_function (); >> gcall *stmt =3D const_cast (call); >> cgraph_edge *e =3D cgraph_node::get (fun->decl)->get_edge (stmt); >> if (e->indirect_info->polymorphic) >> { >> void *cache_token; >> bool final; >> vec targets >> =3D possible_polymorphic_call_targets (e, &final, &cache_token, >> true); >>=20 >> - Now if the list contains more than one targets, I will make use of >> the current enode's region model to get more info about the pointer >> which was used to call the function . >>=20 >> /* here I extract the pointer (which was used to call the >> function), which from observation, is always the zeroth argument of = the >> call. */ >> tree t_val =3D gimple_call_arg (call, 0); >> const svalue *sval =3D get_rvalue (t_val, ctxt); >>=20 >> - In all the examples I used, the pointer is represented as >> region_svalue or as initial_svalue (I think, initial_svalue is the = case >> where the pointer is taken as a parameter of the current function and >> analyzer is analysing top-level call to this function ) >>=20 >> Here are some examples of the following, Where I used >> __analyzer_describe () to show the same=20 >> . (https://godbolt.org/z/Mqs8oM6ff) >> . (https://godbolt.org/z/z4sfTM3f5)) >>=20 >> /* here I extract the region that the pointer is pointing to, >> and as both of them returns a (const region *), I used a lambda to = get >> it ( If you want, I can turn this into a separate function to make it >> more readable ) */ >>=20 >> const region *reg >> =3D [&]()->const region * >> { >> switch (sval->get_kind ()) >> { >> case SK_INITIAL: >> { >> const initial_svalue *initial_sval >> =3D sval->dyn_cast_initial_svalue (); >> return initial_sval->get_region (); >> } >> break; >> case SK_REGION: >> { >> const region_svalue *region_sval=20 >> =3D sval->dyn_cast_region_svalue (); >> return region_sval->get_pointee (); >> } >> break; >>=20 >> default: >> return NULL; >> } >> } (); >>=20 >> gcc_assert (reg); >>=20 >> /* Now that I have the region, I tried to get the type of the >> object it is holding and put it in = =E2=80=98known_possible_subclass_type=E2=80=99. */ >>=20 >> tree known_possible_subclass_type; >> known_possible_subclass_type =3D reg->get_type (); >> if (reg->get_kind () =3D=3D RK_FIELD) >> { >> const field_region* field_reg =3D = reg->dyn_cast_field_region >> (); >> known_possible_subclass_type=20 >> =3D DECL_CONTEXT (field_reg->get_field ()); >> } >>=20 >> /* After that I iterated over the entire array of possible calls to >> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is = same >> as that of the type of the object that the pointer is actually = pointing >> to. */ >>=20 >> for (cgraph_node *x : targets) >> { >> if (DECL_CONTEXT (x->decl) =3D=3D = known_possible_subclass_type) >> most_propbable_taget =3D x->decl; >> } >> return most_propbable_taget; >> } >> } >>=20 >> I tested it on all of the test programs I created and till now in all >> of the cases, the analyzer is correctly determining the call. I am >> currently in the process of creating more tests ( including multiple >> types of inheritances ) to see how successful is this implementation = . >=20 > I'm still skeptical of the above code; my feeling is that with more > tests you'll find cases where it doesn't work. Maybe dynamically > allocated instances? That=E2=80=99s what I was thinking, and that=E2=80=99s why I wanted it = to test on more programs, but looks like I don=E2=80=99t have need this = anymore. >=20 > Hope this is constructive >=20 > Dave >=20 Thanks=20 - Ankur=