public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* daily report on extending static analyzer project [GSoC]
@ 2021-06-24 14:29 Ankur Saini
  2021-06-24 20:53 ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-06-24 14:29 UTC (permalink / raw)
  To: gcc

CURRENT STATUS :

analyzer is now splitting nodes even at call sites which doesn’t have a cgraph_edge. But as now the call and return nodes are not connected, the part of the function after such calls becomes unreachable making them impossible to properly analyse.

AIM for today : 

- try to create an intra-procedural link between the calls the calling and returning snodes 
- find the place where the exploded nodes and edges are being formed 
- figure out the program point where exploded graph would know about the function calls

—

PROGRESS :

- I initially tried to connect the calling and returning snodes with an intraprocedural sedge but looks like for that only nodes which have a cgraph_edge or a CFG edge are connected in the supergraph. I tried a few ways to connect them but at the end thought I would be better off leaving them like this and connecting them during the creation of exploded graph itself.

- As the exploded graph is created during building and processing of the worklist, "build_initial_worklist ()” and “process_worklist()” should be the interesting areas to analyse, especially the processing part.

- “build_initial_worklist()” is just creating enodes for functions that can be called explicitly ( possible entry points ) so I guess the better place to investigate is “process_worklist ()” function.

—

STATUS AT THE END OF THE DAY :- 

- try to create an intra-procedural link between the calls the calling and returning snodes ( Abandoned )
- find the place where the exploded nodes and edges are being formed ( Done )
- figure out the program point where exploded graph knows about the function call ( Pending )


Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-24 14:29 daily report on extending static analyzer project [GSoC] Ankur Saini
@ 2021-06-24 20:53 ` David Malcolm
  2021-06-25 15:03   ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-24 20:53 UTC (permalink / raw)
  To: Ankur Saini, gcc

On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote:
> CURRENT STATUS :
> 
> analyzer is now splitting nodes even at call sites which doesn’t have
> a cgraph_edge. But as now the call and return nodes are not
> connected, the part of the function after such calls becomes
> unreachable making them impossible to properly analyse.
> 
> AIM for today : 
> 
> - try to create an intra-procedural link between the calls the
> calling and returning snodes 
> - find the place where the exploded nodes and edges are being formed 
> - figure out the program point where exploded graph would know about
> the function calls
> 
> —
> 
> PROGRESS :
> 
> - I initially tried to connect the calling and returning snodes with
> an intraprocedural sedge but looks like for that only nodes which
> have a cgraph_edge or a CFG edge are connected in the supergraph. I
> tried a few ways to connect them but at the end thought I would be
> better off leaving them like this and connecting them during the
> creation of exploded graph itself.
> 
> - As the exploded graph is created during building and processing of
> the worklist, "build_initial_worklist ()” and “process_worklist()”
> should be the interesting areas to analyse, especially the processing
> part.
> 
> - “build_initial_worklist()” is just creating enodes for functions
> that can be called explicitly ( possible entry points ) so I guess
> the better place to investigate is “process_worklist ()” function.

Yes.

Have a look at exploded_graph::process_node (which is called by
process_worklist).
The eedges for calls with supergraph edges happens there in
the "case PK_AFTER_SUPERNODE:", which looks at the outgoing superedges
from that supernode and calls node->on_edge on them, creating a
exploded nodes/exploded edge for each outgoing-superedge.

So you'll need to make some changes there, I think.

> 
> —
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - try to create an intra-procedural link between the calls the
> calling and returning snodes ( Abandoned )

You may find the above useful if you're going to do it based on the
code I mentioned above.

> - find the place where the exploded nodes and edges are being formed
> ( Done )
> - figure out the program point where exploded graph knows about the
> function call ( Pending )
> 

Thanks for the update.
Hope the above is helpful.

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-24 20:53 ` David Malcolm
@ 2021-06-25 15:03   ` Ankur Saini
  2021-06-25 15:34     ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-06-25 15:03 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- try to create an intra-procedural link between the calls the calling and returning snodes
- figure out the program point where exploded graph would know about the function calls
- figure out how the exploded node will know which function to call
- create enodes and eedges for the calls

—

PROGRESS :

- I created an intraprocedural link between where the the splitting is happening to connect the call and returning snodes. like this :-

(in supergraph.cc at "supergraph::supergraph (logger *logger)" )
```
185		if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
186		{
187		   m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
188		   node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
189		   m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
190		}
191	        else
192	        {
193	          gcall *call = dyn_cast<gcall *> (stmt);
194	          if (call)
195	          {
196	            supernode *old_node_for_stmts = node_for_stmts;
197	            node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
198
199	            superedge *sedge = new callgraph_superedge (old_node_for_stmts,
200	                node_for_stmts,
201	                SUPEREDGE_INTRAPROCEDURAL_CALL,
202	                NULL);
203	            add_edge (sedge);
204	          }                
205	        }
```

- now that we have a intraprocedural link between such calls, and the analyzer will consider them as “impossible edge” ( whenever a "node->on_edge()” returns false ) while processing worklist, and I think this should be the correct place to speculate about the function call by creating exploded nodes and edges representing calls ( maybe by adding a custom edge info ).

- after several of failed attempts to do as mentioned above, looks like I was looking the wrong way all along. I think I just found out what my mentor meant when telling me to look into "calls node->on_edge”. During the edge inspection ( in program_point::on_edge() ) , if it’s an Intraprocedural s sedge, maybe I can add an extra intraprocedural sedge to the correct edge right here with the info state of that program point. 

Q. But even if we find out which function to call, how will the analyzer know which snode does that function belong ?

Q. on line 461 of program-point.cc 

```
457		else
458		  {
459		    /* Otherwise, we ignore these edges  */
460		    if (logger)
461		      logger->log ("rejecting interprocedural edge");
462		    return false;
463		  }
```
why are we rejecting “interprocedural" edge when we are examining an “intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge, which is an interprocedural edge ?

STATUS AT THE END OF THE DAY :- 

- try to create an intra-procedural link between the calls the calling and returning snodes ( Done )
- figure out the program point where exploded graph would know about the function calls ( Done )
- figure out how the exploded node will know which function to call ( Pending )
- create enodes and eedges for the calls ( Pending )


Thank you
- Ankur

> On 25-Jun-2021, at 2:23 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote:
>> CURRENT STATUS :
>> 
>> analyzer is now splitting nodes even at call sites which doesn’t have
>> a cgraph_edge. But as now the call and return nodes are not
>> connected, the part of the function after such calls becomes
>> unreachable making them impossible to properly analyse.
>> 
>> AIM for today : 
>> 
>> - try to create an intra-procedural link between the calls the
>> calling and returning snodes 
>> - find the place where the exploded nodes and edges are being formed 
>> - figure out the program point where exploded graph would know about
>> the function calls
>> 
>> —
>> 
>> PROGRESS :
>> 
>> - I initially tried to connect the calling and returning snodes with
>> an intraprocedural sedge but looks like for that only nodes which
>> have a cgraph_edge or a CFG edge are connected in the supergraph. I
>> tried a few ways to connect them but at the end thought I would be
>> better off leaving them like this and connecting them during the
>> creation of exploded graph itself.
>> 
>> - As the exploded graph is created during building and processing of
>> the worklist, "build_initial_worklist ()” and “process_worklist()”
>> should be the interesting areas to analyse, especially the processing
>> part.
>> 
>> - “build_initial_worklist()” is just creating enodes for functions
>> that can be called explicitly ( possible entry points ) so I guess
>> the better place to investigate is “process_worklist ()” function.
> 
> Yes.
> 
> Have a look at exploded_graph::process_node (which is called by
> process_worklist).
> The eedges for calls with supergraph edges happens there in
> the "case PK_AFTER_SUPERNODE:", which looks at the outgoing superedges
> from that supernode and calls node->on_edge on them, creating a
> exploded nodes/exploded edge for each outgoing-superedge.
> 
> So you'll need to make some changes there, I think.
> 
>> 
>> —
>> 
>> STATUS AT THE END OF THE DAY :- 
>> 
>> - try to create an intra-procedural link between the calls the
>> calling and returning snodes ( Abandoned )
> 
> You may find the above useful if you're going to do it based on the
> code I mentioned above.
> 
>> - find the place where the exploded nodes and edges are being formed
>> ( Done )
>> - figure out the program point where exploded graph knows about the
>> function call ( Pending )
>> 
> 
> Thanks for the update.
> Hope the above is helpful.
> 
> Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-25 15:03   ` Ankur Saini
@ 2021-06-25 15:34     ` David Malcolm
  2021-06-26 15:20       ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-25 15:34 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - try to create an intra-procedural link between the calls the calling
> and returning snodes
> - figure out the program point where exploded graph would know about
> the function calls
> - figure out how the exploded node will know which function to call
> - create enodes and eedges for the calls
> 
> —
> 
> PROGRESS :
> 
> - I created an intraprocedural link between where the the splitting is happening to connect the call and returning snodes. like this :-
> 
> (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
> ```
> 185             if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> 186             {
> 187                m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> 188                node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
> 189                m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
> 190             }
> 191             else
> 192             {
> 193               gcall *call = dyn_cast<gcall *> (stmt);
> 194               if (call)
> 195               {
> 196                 supernode *old_node_for_stmts = node_for_stmts;
> 197                 node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
                                                          ^^^^^^^^^^^^^^^^^^^^^
Given the dyn_cast of stmt to gcall * at line 193 you can use "call"
here, without the as_a cast, as you've already got "stmt" as a gcall *
as tline 193.

You might need to add a hash_map recording the mapping from such stmts
to the edges, like line 189 does.  I'm not sure, but you may need it
later.


> 198
> 199                 superedge *sedge = new callgraph_superedge (old_node_for_stmts,
> 200                     node_for_stmts,
> 201                     SUPEREDGE_INTRAPROCEDURAL_CALL,
> 202                     NULL);
> 203                 add_edge (sedge);
> 204               }                
> 205             }
> ```
> 
> - now that we have a intraprocedural link between such calls, and the
> analyzer will consider them as “impossible edge” ( whenever a "node-
> >on_edge()” returns false ) while processing worklist, and I think this
> should be the correct place to speculate about the function call by
> creating exploded nodes and edges representing calls ( maybe by adding
> a custom edge info ).
> 
> - after several of failed attempts to do as mentioned above, looks like
> I was looking the wrong way all along. I think I just found out what my
> mentor meant when telling me to look into "calls node->on_edge”. During
> the edge inspection ( in program_point::on_edge() ) , if it’s an
> Intraprocedural s sedge, maybe I can add an extra intraprocedural sedge
> to the correct edge right here with the info state of that program
> point. 

I don't think we need a superedge for such a call, just an
exploded_edge.  (Though perhaps adding a superedge might make things
easier?  I'm not sure, but I'd first try not bothering to add one)

> 
> Q. But even if we find out which function to call, how will the
> analyzer know which snode does that function belong ?

Use this method of supergraph:
  supernode *get_node_for_function_entry (function *fun) const;
to get the supernode for the entrypoint of a given function.

You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.

> Q. on line 461 of program-point.cc 
> 
> ```
> 457             else
> 458               {
> 459                 /* Otherwise, we ignore these edges  */
> 460                 if (logger)
> 461                   logger->log ("rejecting interprocedural edge");
> 462                 return false;
> 463               }
> ```
> why are we rejecting “interprocedural" edge when we are examining an
> “intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge,
> which is an interprocedural edge ?

Currently, those interprocedural edges don't do much.  Above the "else"
clause of the lines above the ones you quote is some support for call
summaries.

The idea is that we ought to be able to compute summaries of what a
function call does, and avoid exponential explosions during the
analysis by reusing summaries at a callsite.  But that code doesn't
work well at the moment; see:
  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390

If you ignore call summaries for now, I think you need to change this
logic so it detects if we have a function pointer that we "know" the
value of from the region_model, and have it generate an exploded_node
and exploded_edge for the call.  Have a look at how SUPEREDGE_CALL is
handled by program_state and program_point; you should implement
something similar, I think.  Given that you need both the super_edge,
point *and* state all together to detect this case, I think the logic
you need to add probably needs to be in exploded_node::on_edge as a
specialcase before the call there to next_point->on_edge.

Hope this is helpful
Dave


> 
> STATUS AT THE END OF THE DAY :- 
> 
> - try to create an intra-procedural link between the calls the calling
> and returning snodes ( Done )
> - figure out the program point where exploded graph would know about
> the function calls ( Done )
> - figure out how the exploded node will know which function to call (
> Pending )
> - create enodes and eedges for the calls ( Pending )
> 
> 
> Thank you
> - Ankur
> 
> > On 25-Jun-2021, at 2:23 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote:
> > > CURRENT STATUS :
> > > 
> > > analyzer is now splitting nodes even at call sites which doesn’t
> > > have
> > > a cgraph_edge. But as now the call and return nodes are not
> > > connected, the part of the function after such calls becomes
> > > unreachable making them impossible to properly analyse.
> > > 
> > > AIM for today : 
> > > 
> > > - try to create an intra-procedural link between the calls the
> > > calling and returning snodes 
> > > - find the place where the exploded nodes and edges are being
> > > formed 
> > > - figure out the program point where exploded graph would know
> > > about
> > > the function calls
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - I initially tried to connect the calling and returning snodes
> > > with
> > > an intraprocedural sedge but looks like for that only nodes which
> > > have a cgraph_edge or a CFG edge are connected in the supergraph. I
> > > tried a few ways to connect them but at the end thought I would be
> > > better off leaving them like this and connecting them during the
> > > creation of exploded graph itself.
> > > 
> > > - As the exploded graph is created during building and processing
> > > of
> > > the worklist, "build_initial_worklist ()” and “process_worklist()”
> > > should be the interesting areas to analyse, especially the
> > > processing
> > > part.
> > > 
> > > - “build_initial_worklist()” is just creating enodes for functions
> > > that can be called explicitly ( possible entry points ) so I guess
> > > the better place to investigate is “process_worklist ()” function.
> > 
> > Yes.
> > 
> > Have a look at exploded_graph::process_node (which is called by
> > process_worklist).
> > The eedges for calls with supergraph edges happens there in
> > the "case PK_AFTER_SUPERNODE:", which looks at the outgoing
> > superedges
> > from that supernode and calls node->on_edge on them, creating a
> > exploded nodes/exploded edge for each outgoing-superedge.
> > 
> > So you'll need to make some changes there, I think.
> > 
> > > 
> > > —
> > > 
> > > STATUS AT THE END OF THE DAY :- 
> > > 
> > > - try to create an intra-procedural link between the calls the
> > > calling and returning snodes ( Abandoned )
> > 
> > You may find the above useful if you're going to do it based on the
> > code I mentioned above.
> > 
> > > - find the place where the exploded nodes and edges are being
> > > formed
> > > ( Done )
> > > - figure out the program point where exploded graph knows about the
> > > function call ( Pending )
> > > 
> > 
> > Thanks for the update.
> > Hope the above is helpful.
> > 
> > Dave
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-25 15:34     ` David Malcolm
@ 2021-06-26 15:20       ` Ankur Saini
  2021-06-27 18:48         ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-06-26 15:20 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc


> On 25-Jun-2021, at 9:04 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
>> AIM for today : 
>> 
>> - try to create an intra-procedural link between the calls the calling
>> and returning snodes
>> - figure out the program point where exploded graph would know about
>> the function calls
>> - figure out how the exploded node will know which function to call
>> - create enodes and eedges for the calls
>> 
>> —
>> 
>> PROGRESS :
>> 
>> - I created an intraprocedural link between where the the splitting is happening to connect the call and returning snodes. like this :-
>> 
>> (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
>> ```
>> 185             if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
>> 186             {
>> 187                m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
>> 188                node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
>> 189                m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
>> 190             }
>> 191             else
>> 192             {
>> 193               gcall *call = dyn_cast<gcall *> (stmt);
>> 194               if (call)
>> 195               {
>> 196                 supernode *old_node_for_stmts = node_for_stmts;
>> 197                 node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
>                                                          ^^^^^^^^^^^^^^^^^^^^^
> Given the dyn_cast of stmt to gcall * at line 193 you can use "call"
> here, without the as_a cast, as you've already got "stmt" as a gcall *
> as tline 193.

ok

> 
> You might need to add a hash_map recording the mapping from such stmts
> to the edges, like line 189 does.  I'm not sure, but you may need it
> later.

but the node is being created if there is no cgraph_edge corresponding to the call, so to what edge will I map “node_for_stmts" to ?

> 
> 
>> 198
>> 199                 superedge *sedge = new callgraph_superedge (old_node_for_stmts,
>> 200                     node_for_stmts,
>> 201                     SUPEREDGE_INTRAPROCEDURAL_CALL,
>> 202                     NULL);
>> 203                 add_edge (sedge);
>> 204               }                
>> 205             }
>> ```
>> 
>> - now that we have a intraprocedural link between such calls, and the
>> analyzer will consider them as “impossible edge” ( whenever a "node-
>>> on_edge()” returns false ) while processing worklist, and I think this
>> should be the correct place to speculate about the function call by
>> creating exploded nodes and edges representing calls ( maybe by adding
>> a custom edge info ).
>> 
>> - after several of failed attempts to do as mentioned above, looks like
>> I was looking the wrong way all along. I think I just found out what my
>> mentor meant when telling me to look into "calls node->on_edge”. During
>> the edge inspection ( in program_point::on_edge() ) , if it’s an
>> Intraprocedural s sedge, maybe I can add an extra intraprocedural sedge
>> to the correct edge right here with the info state of that program
>> point. 
> 
> I don't think we need a superedge for such a call, just an
> exploded_edge.  (Though perhaps adding a superedge might make things
> easier?  I'm not sure, but I'd first try not bothering to add one)

ok, will scratch this idea for now.

> 
>> 
>> Q. But even if we find out which function to call, how will the
>> analyzer know which snode does that function belong ?
> 
> Use this method of supergraph:
>  supernode *get_node_for_function_entry (function *fun) const;
> to get the supernode for the entrypoint of a given function.
> 
> You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.

so once we get fndecl, it should be comparatively smooth sailing from there. 

My attempt to get the value of function pointer from the state : -

- to access the region model of the state, I tried to access “m_region_model” of that state.
- now I want to access cluster for a function pointer.
- but when looking at the accessible functions to region model class, I couldn’t seem to find the fitting one. ( the closest I could find was “region_model::get_reachable_svalues()” to get a set of all the svalues reachable from that model )

> 
>> Q. on line 461 of program-point.cc 
>> 
>> ```
>> 457             else
>> 458               {
>> 459                 /* Otherwise, we ignore these edges  */
>> 460                 if (logger)
>> 461                   logger->log ("rejecting interprocedural edge");
>> 462                 return false;
>> 463               }
>> ```
>> why are we rejecting “interprocedural" edge when we are examining an
>> “intraprocedural” edge ? or is it for the "cg_sedge->m_cedge” edge,
>> which is an interprocedural edge ?
> 
> Currently, those interprocedural edges don't do much.  Above the "else"
> clause of the lines above the ones you quote is some support for call
> summaries.
> 
> The idea is that we ought to be able to compute summaries of what a
> function call does, and avoid exponential explosions during the
> analysis by reusing summaries at a callsite.  But that code doesn't
> work well at the moment; see:
>  https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390 <https://gcc.gnu.org/bugzilla/showdependencytree.cgi?id=99390>
> 
> If you ignore call summaries for now, I think you need to change this
> logic so it detects if we have a function pointer that we "know" the
> value of from the region_model, and have it generate an exploded_node
> and exploded_edge for the call.  Have a look at how SUPEREDGE_CALL is
> handled by program_state and program_point; you should implement
> something similar, I think.  Given that you need both the super_edge,
> point *and* state all together to detect this case, I think the logic
> you need to add probably needs to be in exploded_node::on_edge as a
> specialcase before the call there to next_point->on_edge.
> 
> Hope this is helpful
> Dave

Thank you
- Ankur


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-26 15:20       ` Ankur Saini
@ 2021-06-27 18:48         ` David Malcolm
  2021-06-28 14:53           ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-27 18:48 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Sat, 2021-06-26 at 20:50 +0530, Ankur Saini wrote:
> 
> > On 25-Jun-2021, at 9:04 PM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Fri, 2021-06-25 at 20:33 +0530, Ankur Saini wrote:
> > > AIM for today : 
> > > 
> > > - try to create an intra-procedural link between the calls the
> > > calling
> > > and returning snodes
> > > - figure out the program point where exploded graph would know
> > > about
> > > the function calls
> > > - figure out how the exploded node will know which function to
> > > call
> > > - create enodes and eedges for the calls
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - I created an intraprocedural link between where the the
> > > splitting is happening to connect the call and returning snodes.
> > > like this :-
> > > 
> > > (in supergraph.cc at "supergraph::supergraph (logger *logger)" )
> > > ```
> > > 185             if (cgraph_edge *edge = supergraph_call_edge
> > > (fun, stmt))
> > > 186             {
> > > 187                m_cgraph_edge_to_caller_prev_node.put(edge,
> > > node_for_stmts);
> > > 188                node_for_stmts = add_node (fun, bb, as_a
> > > <gcall *> (stmt), NULL);
> > > 189                m_cgraph_edge_to_caller_next_node.put (edge,
> > > node_for_stmts);
> > > 190             }
> > > 191             else
> > > 192             {
> > > 193               gcall *call = dyn_cast<gcall *> (stmt);
> > > 194               if (call)
> > > 195               {
> > > 196                 supernode *old_node_for_stmts =
> > > node_for_stmts;
> > > 197                 node_for_stmts = add_node (fun, bb, as_a
> > > <gcall *> (stmt), NULL);
> >                                                         
> > ^^^^^^^^^^^^^^^^^^^^^
> > Given the dyn_cast of stmt to gcall * at line 193 you can use
> > "call"
> > here, without the as_a cast, as you've already got "stmt" as a
> > gcall *
> > as tline 193.
> 
> ok
> 
> > 
> > You might need to add a hash_map recording the mapping from such
> > stmts
> > to the edges, like line 189 does.  I'm not sure, but you may need
> > it
> > later.
> 
> but the node is being created if there is no cgraph_edge
> corresponding to the call, so to what edge will I map
> “node_for_stmts" to ?

Sorry; I think I got confused.  Re-reading this part of my email, it
doesn't make sense to me.  Sorry.

[...snip...]

> 
> 
> > 
> > > 
> > > Q. But even if we find out which function to call, how will the
> > > analyzer know which snode does that function belong ?
> > 
> > Use this method of supergraph:
> >  supernode *get_node_for_function_entry (function *fun) const;
> > to get the supernode for the entrypoint of a given function.
> > 
> > You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.
> 
> so once we get fndecl, it should be comparatively smooth sailing from
> there. 
> 
> My attempt to get the value of function pointer from the state : -
> 
> - to access the region model of the state, I tried to access
> “m_region_model” of that state.
> - now I want to access cluster for a function pointer.
> - but when looking at the accessible functions to region model class,
> I couldn’t seem to find the fitting one. ( the closest I could find
> was “region_model::get_reachable_svalues()” to get a set of all the
> svalues reachable from that model )

In general you can use:
  region_model::get_rvalue
to go from a tree to a symbolic value for what the analyzer "thinks"
the value of that tree is at that point along the path.

If it "knows" that it's a specific function pointer, then IIRC this
will return a region_svalue where region_svalue::get_pointee () will
(hopefully) point at the function_region representing the memory
holding the code of the function.  function_region::get_fndecl should
then give you the tree for the specific FUNCTION_DECL, from which you
can find the supergraph node etc.

It looks like
  region_model::get_fndecl_for_call
might already do most of what you need, but it looks like it bails out
for the "NULL cgraph_node" case.  Maybe that needs fixing, so that it
returns the fndecl for that case?  That already gets used in some
places, so maybe try putting a breakpoint on that and see if fixing
that gets you further?

Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-27 18:48         ` David Malcolm
@ 2021-06-28 14:53           ` Ankur Saini
  2021-06-28 23:39             ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-06-28 14:53 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 28-Jun-2021, at 12:18 AM, David Malcolm <dmalcolm@redhat.com> wrote:
>> 
>>> 
>>>> 
>>>> Q. But even if we find out which function to call, how will the
>>>> analyzer know which snode does that function belong ?
>>> 
>>> Use this method of supergraph:
>>>  supernode *get_node_for_function_entry (function *fun) const;
>>> to get the supernode for the entrypoint of a given function.
>>> 
>>> You can get the function * from a fndecl via DECL_STRUCT_FUNCTION.
>> 
>> so once we get fndecl, it should be comparatively smooth sailing from
>> there. 
>> 
>> My attempt to get the value of function pointer from the state : -
>> 
>> - to access the region model of the state, I tried to access
>> “m_region_model” of that state.
>> - now I want to access cluster for a function pointer.
>> - but when looking at the accessible functions to region model class,
>> I couldn’t seem to find the fitting one. ( the closest I could find
>> was “region_model::get_reachable_svalues()” to get a set of all the
>> svalues reachable from that model )
> 
> In general you can use:
>  region_model::get_rvalue
> to go from a tree to a symbolic value for what the analyzer "thinks"
> the value of that tree is at that point along the path.
> 
> If it "knows" that it's a specific function pointer, then IIRC this
> will return a region_svalue where region_svalue::get_pointee () will
> (hopefully) point at the function_region representing the memory
> holding the code of the function.  function_region::get_fndecl should
> then give you the tree for the specific FUNCTION_DECL, from which you
> can find the supergraph node etc.
> 
> It looks like
>  region_model::get_fndecl_for_call
> might already do most of what you need, but it looks like it bails out
> for the "NULL cgraph_node" case.  Maybe that needs fixing, so that it
> returns the fndecl for that case?  That already gets used in some
> places, so maybe try putting a breakpoint on that and see if fixing
> that gets you further?

shouldn’t the fn_decl should still have a cgraph_node if the function is declared in the program itself ? it should just not have an edge representing the call.
Because I was able to find the super-graph node just with the help of the function itself.

this is how the function looks "exploded_node::on_edge()" right now.

File: {$SCR_DIR}/gcc/analyzer/engine.cc
1305:     bool
1306:     exploded_node::on_edge (exploded_graph &eg,
1307:     			const superedge *succ,
1308:     			program_point *next_point,
1309:     			program_state *next_state,
1310:     			uncertainty_t *uncertainty)
1311:     {
1312:       LOG_FUNC (eg.get_logger ());
1313: 
1314:       if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL)
1315:       {    
1316:         const program_point *this_point = &this->get_point();
1317:         const program_state *this_state = &this->get_state ();
1318:         const gcall *call = this_point->get_supernode ()->get_final_call ();    
1319: 
1320:         impl_region_model_context ctxt (eg, 
1321:           this, 
1322:           this_state, 
1323:           next_state, 
1324:           uncertainty,
1325:           this_point->get_stmt());
1326: 
1327:         region_model *model = this_state->m_region_model;
1328:         tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
1329:         if(DECL_STRUCT_FUNCTION(fn_decl))
1330:         {
1331:           const supergraph *sg = &eg.get_supergraph();
1332:           supernode * sn =  sg->get_node_for_function_entry (DECL_STRUCT_FUNCTION(fn_decl));
1333:           // create enode and eedge ?
1334:         }
1335:       }
1336: 
1337:       if (!next_point->on_edge (eg, succ))
1338:         return false;
1339: 
1340:       if (!next_state->on_edge (eg, this, succ, uncertainty))
1341:         return false;
1342: 
1343:       return true;
1344:     }

for now, it is also detecting calls that already have call_sedge connecting them, so I think I also have to filter them out.

> 
> Hope this is helpful
> Dave

Thanks 
- Ankur


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-28 14:53           ` Ankur Saini
@ 2021-06-28 23:39             ` David Malcolm
  2021-06-29 16:34               ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-28 23:39 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Mon, 2021-06-28 at 20:23 +0530, Ankur Saini wrote:
> 
> 
> > On 28-Jun-2021, at 12:18 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > > 
> > > > 
> > > > > 
> > > > > Q. But even if we find out which function to call, how will
> > > > > the
> > > > > analyzer know which snode does that function belong ?
> > > > 
> > > > Use this method of supergraph:
> > > >  supernode *get_node_for_function_entry (function *fun) const;
> > > > to get the supernode for the entrypoint of a given function.
> > > > 
> > > > You can get the function * from a fndecl via
> > > > DECL_STRUCT_FUNCTION.
> > > 
> > > so once we get fndecl, it should be comparatively smooth sailing
> > > from
> > > there. 
> > > 
> > > My attempt to get the value of function pointer from the state :
> > > -
> > > 
> > > - to access the region model of the state, I tried to access
> > > “m_region_model” of that state.
> > > - now I want to access cluster for a function pointer.
> > > - but when looking at the accessible functions to region model
> > > class,
> > > I couldn’t seem to find the fitting one. ( the closest I could
> > > find
> > > was “region_model::get_reachable_svalues()” to get a set of all
> > > the
> > > svalues reachable from that model )
> > 
> > In general you can use:
> >  region_model::get_rvalue
> > to go from a tree to a symbolic value for what the analyzer
> > "thinks"
> > the value of that tree is at that point along the path.
> > 
> > If it "knows" that it's a specific function pointer, then IIRC this
> > will return a region_svalue where region_svalue::get_pointee ()
> > will
> > (hopefully) point at the function_region representing the memory
> > holding the code of the function.  function_region::get_fndecl
> > should
> > then give you the tree for the specific FUNCTION_DECL, from which
> > you
> > can find the supergraph node etc.
> > 
> > It looks like
> >  region_model::get_fndecl_for_call
> > might already do most of what you need, but it looks like it bails
> > out
> > for the "NULL cgraph_node" case.  Maybe that needs fixing, so that
> > it
> > returns the fndecl for that case?  That already gets used in some
> > places, so maybe try putting a breakpoint on that and see if fixing
> > that gets you further?
> 
> shouldn’t the fn_decl should still have a cgraph_node if the function
> is declared in the program itself ? it should just not have an edge
> representing the call.

That would make sense.  I'd suggest verifying that in the debugger.

> Because I was able to find the super-graph node just with the help of
> the function itself.

Great.


> 
> this is how the function looks "exploded_node::on_edge()" right now.
> 
> File: {$SCR_DIR}/gcc/analyzer/engine.cc
> 1305:     bool
> 1306:     exploded_node::on_edge (exploded_graph &eg,
> 1307:                           const superedge *succ,
> 1308:                           program_point *next_point,
> 1309:                           program_state *next_state,
> 1310:                           uncertainty_t *uncertainty)
> 1311:     {
> 1312:       LOG_FUNC (eg.get_logger ());
> 1313: 
> 1314:       if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL)
> 1315:       {    
> 1316:         const program_point *this_point = &this->get_point();
> 1317:         const program_state *this_state = &this->get_state ();
> 1318:         const gcall *call = this_point->get_supernode ()-
> >get_final_call ();    
> 1319: 
> 1320:         impl_region_model_context ctxt (eg, 
> 1321:           this, 
> 1322:           this_state, 
> 1323:           next_state, 
> 1324:           uncertainty,
> 1325:           this_point->get_stmt());
> 1326: 
> 1327:         region_model *model = this_state->m_region_model;
> 1328:         tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
> 1329:         if(DECL_STRUCT_FUNCTION(fn_decl))
> 1330:         {
> 1331:           const supergraph *sg = &eg.get_supergraph();
> 1332:           supernode * sn =  sg->get_node_for_function_entry
> (DECL_STRUCT_FUNCTION(fn_decl));
> 1333:           // create enode and eedge ?
> 1334:         }
> 1335:       }
> 1336: 
> 1337:       if (!next_point->on_edge (eg, succ))
> 1338:         return false;
> 1339: 
> 1340:       if (!next_state->on_edge (eg, this, succ, uncertainty))
> 1341:         return false;
> 1342: 
> 1343:       return true;
> 1344:     }

Looks promising.

> 
> for now, it is also detecting calls that already have call_sedge
> connecting them, so I think I also have to filter them out.

Right, I think so too.

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-28 23:39             ` David Malcolm
@ 2021-06-29 16:34               ` Ankur Saini
  2021-06-29 19:53                 ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-06-29 16:34 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- filter out the the nodes which already have an supergraph edge representing the call to avoid creating another edge for call
- create enode for destination
- create eedge representing the call itself

—

PROGRESS :

- in order to filter out only the relevant edges, I simply used the fact that the edge that we care about will not have any call_graph edge associated with it. ( means “sedge->get_any_callgraph_edge()" would return NULL )

- I was also successfully able to create the enode and connect it with an eedge representing the call and was able to see it calling the correct function on some examples. :)

- But the problem now is returning from the function, which turned out bigger then I though it was. 

- In order to tackle this problem, I first tried to update the call_string with the call, but the only way to push a call to the string I found was via “call_string::push_call()” function which finds the return_superedge from the cgraph_edge representing the return call ( which we don’t have )

so I decided to make an overload of "call_string::push_call()” which directly takes a return_superedge and push it the underlying vector of edges instead of taking it from the calling edge. It looks something like this :-

File:  {$SCR_DIR}/gcc/analyzer/call-string.cc <http://call-string.cc/>
158: void
159: call_string::push_call(const return_superedge *return_sedge)
160: {
161:   gcc_assert (return_sedge);
162:   m_return_edges.safe_push (return_sedge);
163: }

I also created a temporary return_superedge ( as we now have the source and destination ), and try to update the call_string with it just to find out that call_string is private to program_point. 

So my plan for next day would be to create a custom function to the program_point class the update the call stack and return back to correct spot. 

If there is a better way of doing it then do let me know.

STATUS AT THE END OF THE DAY :- 

- filter out the the nodes which already have an supergraph edge representing the call ( Done )
- create enode for destination ( Done )
- create eedge representing the call itself ( Done ? )

—

P.S. it has been over a week since I sent a mail to overseers@gcc.gnu.org <mailto:overseers@gcc.gnu.org> regarding the ssh key incident and I haven’t got any response form them till now, does this usually take this long for them to respond ? or does this means I didn’t provide some information to them that I should have. Is there something else I can do regarding this problem ?

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-29 16:34               ` Ankur Saini
@ 2021-06-29 19:53                 ` David Malcolm
       [not found]                   ` <AD7A4C2F-1451-4317-BE53-99DE9E9853AE@gmail.com>
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-29 19:53 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Tue, 2021-06-29 at 22:04 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - filter out the the nodes which already have an supergraph edge
> representing the call to avoid creating another edge for call
> - create enode for destination
> - create eedge representing the call itself
> 
> —
> 
> PROGRESS :
> 
> - in order to filter out only the relevant edges, I simply used the
> fact that the edge that we care about will not have any call_graph
> edge associated with it. ( means “sedge->get_any_callgraph_edge()"
> would return NULL )
> 
> - I was also successfully able to create the enode and connect it
> with an eedge representing the call and was able to see it calling
> the correct function on some examples. :)
> 
> - But the problem now is returning from the function, which turned
> out bigger then I though it was. 
> 
> - In order to tackle this problem, I first tried to update the
> call_string with the call, but the only way to push a call to the
> string I found was via “call_string::push_call()” function which
> finds the return_superedge from the cgraph_edge representing the
> return call ( which we don’t have )
> 
> so I decided to make an overload of "call_string::push_call()” which
> directly takes a return_superedge and push it the underlying vector
> of edges instead of taking it from the calling edge. It looks
> something like this :-
> 
> File:  {$SCR_DIR}/gcc/analyzer/call-string.cc
> <http://call-string.cc/>
> 158: void
> 159: call_string::push_call(const return_superedge *return_sedge)
> 160: {
> 161:   gcc_assert (return_sedge);
> 162:   m_return_edges.safe_push (return_sedge);
> 163: }

Looks reasonable.

> 
> I also created a temporary return_superedge ( as we now have the
> source and destination ), and try to update the call_string with it
> just to find out that call_string is private to program_point. 

I confess I'm having a little difficulty visualizing what the superedge
looks like with this new edge.


FWIW you can use the accessor:
  program_point::get_call_string ()
to get it in const form:
  const call_string &get_call_string () const { return m_call_string; }

but it sounds like you're trying to change things.



The purpose of class call_string is to track the stack of call sites,
so that when we return from a function, we return to the correct call
site.

I wonder if class call_string could be updated so that rather than
capturing a vec of superedges:
  auto_vec<const return_superedge *> m_return_edges;
it captures a vec of gcall *?

Then you wouldn't need a superedge ahead of time for the return from
the call.

I'm not sure if that would work, but that might be another approach you
could try, and might be simplest.  I'm not sure.

I *think* we only use the return_superedge within
program_point::on_edge, comparing against the successor edge, but that
code could be rewritten to look at which gcall * is associated with the
edge.

(again, I'm not sure, but maybe it's simpler)


> So my plan for next day would be to create a custom function to the
> program_point class the update the call stack and return back to
> correct spot. 
> 
> If there is a better way of doing it then do let me know.
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - filter out the the nodes which already have an supergraph edge
> representing the call ( Done )
> - create enode for destination ( Done )
> - create eedge representing the call itself ( Done ? )
> 
> —
> 
> P.S. it has been over a week since I sent a mail to    
> overseers@gcc.gnu.org <mailto:overseers@gcc.gnu.org> regarding the
> ssh key incident and I haven’t got any response form them till now,
> does this usually take this long for them to respond ? or does this
> means I didn’t provide some information to them that I should have.
> Is there something else I can do regarding this problem ?

I'd try resending the email.

Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
       [not found]                   ` <AD7A4C2F-1451-4317-BE53-99DE9E9853AE@gmail.com>
@ 2021-06-30 17:17                     ` David Malcolm
  2021-07-02 14:18                       ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-06-30 17:17 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Wed, 2021-06-30 at 21:39 +0530, Ankur Saini wrote:
> 
> 
> > On 30-Jun-2021, at 1:23 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Tue, 2021-06-29 at 22:04 +0530, Ankur Saini wrote:
> > 

[...]

> > > P.S. it has been over a week since I sent a mail to    
> > > overseers@gcc.gnu.org <mailto:overseers@gcc.gnu.org> <mailto: 
> > > overseers@gcc.gnu.org <mailto:overseers@gcc.gnu.org>> regarding
> > > the
> > > ssh key incident and I haven’t got any response form them till
> > > now,
> > > does this usually take this long for them to respond ? or does
> > > this
> > > means I didn’t provide some information to them that I should
> > > have.
> > > Is there something else I can do regarding this problem ?
> > 
> > I'd try resending the email.
> 
> ok I would be resending the mail again.
> Also should I cc that mail to you also ( similar to how they expect
> us to cc the sponsor at the time of creation of a new account ) ?

Yes please; that's a good idea

Dave
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-06-30 17:17                     ` David Malcolm
@ 2021-07-02 14:18                       ` Ankur Saini
  2021-07-03 14:37                         ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-02 14:18 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- find and try alternative to make the analyser return from the function
- if failed to find any worthy alternative then start changing the implementation of call_string to track gcalls* instead of return_edges

—

PROGRESS :

- I initially tried to look for some workarounds to make the call return from the function by making returning exploded nodes and edges at the time of the call itself but failed 

- so I created a local (experimental) branch and started hacking the call_string to maintain a vector of gcall* instead of return_superedge*.

- I was able to modify all function execpt few, for which I either didn’t quite understood what should be changed or found discussing with you better before proceeding much better. They are as follows :- 

1. call_string::validate () : ( this one performs sanity on the object ) earlier object was considered to be sane if the caller is the callee of previous entry, but how can I get caller from the gcall ? or when can I know that I have a correct vector of gcalls* ?

2. call_string::cmp (const call_string &a, const call_string &b) : ( this one is a comparator of the call-strings ) it was earlier comparing the index of edges and then length. for now I have just condensed it down to compare based on the size of gcall vector ( assuming more calls means deeper call stack )

3. call_string::print (pretty_printer *pp): ( printer of the call-string ): how should the printed call stack look-like ? 

for me, it looks like maybe just tracking gcalls is not enough, or maybe I don’t know enough about the stuff I can access from from the gcalls ( I am currently looking into gcc/gimple.c for the all the possible actions I can take with a gimple statement )

STATUS AT THE END OF THE DAY :- 

- find and try alternative to make the analyser return from the function ( abandoned )
- if failed to find any worthy alternative then start changing the implementation of call_string to track gcalls* instead of return_edges ( pending )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-02 14:18                       ` Ankur Saini
@ 2021-07-03 14:37                         ` Ankur Saini
  2021-07-05 16:15                           ` Ankur Saini
  2021-07-06 22:46                           ` David Malcolm
  0 siblings, 2 replies; 45+ messages in thread
From: Ankur Saini @ 2021-07-03 14:37 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- update the call_stack to track something else other than supergraph edges

—

PROGRESS :

- After some brainstorming about tracking the callstack, I think one better way to track the call stack is to keep track of source and destination of the call instead of supperedges representing them. 

- so I branched again and updated the call-string to now capture a pair of supernodes ( one representing callee and other representing caller ), like this I was not only able to easily port the entire code to adapt it without much difficulty, but now can also push calls to functions that doesn’t possess a superedge.

- changes done can be seen on the " refs/users/arsenic/heads/call_string_update “ branch. ( currently this branch doesn’t contain other changes I have done till now, as I wanted to test the new call-string representation exclusively and make sure it doesn’t affect the functionality of the base analyser )

- now hopefully all that is left for tomorrow is to update the analyzer to finally see, call and return from the function aclled via the function pointer. 

STATUS AT THE END OF THE DAY :- 

- update the call_stack to track something else other than supergraph edges ( done )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-03 14:37                         ` Ankur Saini
@ 2021-07-05 16:15                           ` Ankur Saini
  2021-07-06 23:11                             ` David Malcolm
  2021-07-06 22:46                           ` David Malcolm
  1 sibling, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-05 16:15 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

I forgot to send the daily report yesterday, so this one covers the work done on both days

AIM : 

- make the analyzer call the function with the updated call-string representation ( even the ones that doesn’t have a superedge )
- make the analyzer figure out the point of return from the function called without the superedge
- make the analyser figure out the correct point to return back in the caller function
- make enode and eedge representing the return call
- test the changes on the example I created before
- speculate what GCC generates for a vfunc call and discuss how can we use it to our advantage

—

PROGRESS  ( changes can be seen on "refs/users/arsenic/heads/analyzer_extension “ branch of the repository ) :

- Thanks to the new call-string representation, I was able to push calls to the call stack which doesn’t have a superedge and was successfully able to see the calls happening via the function pointer.

- To detect the returning point of the function I used the fact that such supernodes would contain an EXIT bb, would not have any return superedge and would still have a pending call-stack. 

- Now the next part was to find out the destination node of the return, for this I again made use of the new call string and created a custom accessor to get the caller and callee supernodes of the return call, then I extracted the gcall* from the caller supernode to ulpdate the program state, 

- now that I have got next state and next point, it was time to put the final piece of puzzle together and create exploded node and edge representing the returning call.

- I tested the changes on the the following program where the analyzer was earlier giving a false negative due to not detecting call via a function pointer

```
#include <stdio.h>
#include <stdlib.h>

void fun(int *int_ptr)
{
    free(int_ptr);
}

int test()
{
    int *int_ptr = (int*)malloc(sizeof(int));
    void (*fun_ptr)(int *) = &fun;
    (*fun_ptr)(int_ptr);

    return 0;
}

void test_2()
{
  test();
}
```
( compiler explorer link : https://godbolt.org/z/9KfenGET9 <https://godbolt.org/z/9KfenGET9> )

and results were showing success where the analyzer was now able to successfully detect, call and return from the function that was called via the function pointer and no longer reported the memory leak it was reporting before. : )

- I think I should point this out, in the process I created a lot of custom function to access/alter some data which was not possible before.

- now that calls via function pointer are taken care of, it was time to see what exactly happen what GCC generates when a function is dispatched dynamically, and as planned earlier, I went to  ipa-devirt.c ( devirtualizer’s implementation of GCC ) to investigate.

- althogh I didn’t understood everything that was happening there but here are some of the findings I though might be interesting for the project :- 
	> the polymorphic call is called with a OBJ_TYPE_REF which contains otr_type( a type of class whose method is called) and otr_token (the index into virtual table where address is taken)
	> the devirtualizer builds a type inheritance graph to keep track of entire inheritance hierarchy
	> the most interesting function I found was “possible_polymorphic_call_targets()” which returns the vector of all possible targets of polymorphic call represented by a calledge or a gcall.
	> what I understood the devirtualizer do is to search in these polymorphic calls and filter out the the calls which are more likely to be called ( known as likely calls ) and then turn them into speculative calls which are later turned into direct calls.

- another thing I was curious to know was, how would analyzer behave when encountered with a polymorphic call now that we are splitting the superedges at every call. 

the results were interesting, I was able to see analyzer splitting supernodes for the calls right away but this time they were not connected via a intraprocedural edge making the analyzer crashing at the callsite ( I would look more into it tomorrow ) 

the example I used was : -
```
struct A
{
    virtual int foo (void) 
    {
        return 42;
    }
};

struct B: public A
{
  int foo (void) 
    { 
    	return 0;
    }
};

int test()
{
    struct B b, *bptr=&b;
    bptr->foo();
    return bptr->foo();
}
```
( compiler explorer link : https://godbolt.org/z/d986ab7MY <https://godbolt.org/z/d986ab7MY> )

—

STATUS AT THE END OF THE DAY :- 

- make the analyzer call the function with the updated call-string representation ( even the ones that doesn’t have a superedge ) (done)
- make the analyzer figure out the point of return from the function called without the superedge (done)
- make the analyser figure out the correct point to return back in the caller function (done)
- make enode and eedge representing the return call (done)
- test the changes on the example I created before (done)
- speculate what GCC generates for a vfunc call and discuss how can we use it to our advantage (done)


Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-03 14:37                         ` Ankur Saini
  2021-07-05 16:15                           ` Ankur Saini
@ 2021-07-06 22:46                           ` David Malcolm
  2021-07-06 22:50                             ` David Malcolm
  2021-07-07 13:52                             ` Ankur Saini
  1 sibling, 2 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-06 22:46 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - update the call_stack to track something else other than supergraph
> edges
> 
> —
> 
> PROGRESS :
> 
> - After some brainstorming about tracking the callstack, I think one
> better way to track the call stack is to keep track of source and
> destination of the call instead of supperedges representing them. 
> 
> - so I branched again and updated the call-string to now capture a pair
> of supernodes ( one representing callee and other representing caller
> ), like this I was not only able to easily port the entire code to
> adapt it without much difficulty, but now can also push calls to
> functions that doesn’t possess a superedge.
> 
> - changes done can be seen on the "
> refs/users/arsenic/heads/call_string_update “ branch. ( currently this
> branch doesn’t contain other changes I have done till now, as I wanted
> to test the new call-string representation exclusively and make sure it
> doesn’t affect the functionality of the base analyser )

I'm not an expert at git, so it took me a while to figure out how to
access your branch.

It's easier for me if you can also use "git format-patch" to generate a
patch and "git send-email" to send it to this list (and me, please), so
that the patch content is going to the list.

The approach in the patch seems reasonable.

I think you may have a memory leak, though: you're changing call_string
from:
  auto_vec<const return_superedge *> m_return_edges;
to:
  auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;

and the std:pairs are being dynamically allocated on the heap.
Ownership gets transferred by call_string::operator=, but if I'm
reading the patch correctly never get deleted.  This is OK for
prototyping, but will need fixing before the code can be merged.

It's probably simplest to get rid of the indirection and allocation in
m_supernodes and have the std::pair be stored by value, rather than by
pointer, i.e.:
  auto_vec<std::pair<const supernode *, const supernode *> > m_supernodes;

Does that work? (or is there a problem I'm not thinking of).

If that's a problem, I think you might be able to get away with
dropping the "first" from the pair, and simply storing the supernode to
return to; I think the only places that "first" gets used are in dumps
and in validation.  But "first" is probably helpful for debugging, so
given that you've got it working with that field, better to keep it.

Hope this is helpful
Dave

> 
> - now hopefully all that is left for tomorrow is to update the analyzer
> to finally see, call and return from the function aclled via the
> function pointer. 
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - update the call_stack to track something else other than supergraph
> edges ( done )
> 
> Thank you
> - Ankur
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-06 22:46                           ` David Malcolm
@ 2021-07-06 22:50                             ` David Malcolm
  2021-07-07 13:52                             ` Ankur Saini
  1 sibling, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-06 22:50 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Tue, 2021-07-06 at 18:46 -0400, David Malcolm wrote:
> On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> > AIM for today : 
> > 
> > - update the call_stack to track something else other than
> > supergraph
> > edges
> > 
> > —
> > 
> > PROGRESS :
> > 
> > - After some brainstorming about tracking the callstack, I think
> > one
> > better way to track the call stack is to keep track of source and
> > destination of the call instead of supperedges representing them. 
> > 
> > - so I branched again and updated the call-string to now capture a
> > pair
> > of supernodes ( one representing callee and other representing
> > caller
> > ), like this I was not only able to easily port the entire code to
> > adapt it without much difficulty, but now can also push calls to
> > functions that doesn’t possess a superedge.
> > 
> > - changes done can be seen on the "
> > refs/users/arsenic/heads/call_string_update “ branch. ( currently
> > this
> > branch doesn’t contain other changes I have done till now, as I
> > wanted
> > to test the new call-string representation exclusively and make
> > sure it
> > doesn’t affect the functionality of the base analyser )
> 
> I'm not an expert at git, so it took me a while to figure out how to
> access your branch.
> 
> It's easier for me if you can also use "git format-patch" to generate
> a
> patch and "git send-email" to send it to this list (and me, please),
> so
> that the patch content is going to the list.
> 
> The approach in the patch seems reasonable.
> 
> I think you may have a memory leak, though: you're changing
> call_string
> from:
>   auto_vec<const return_superedge *> m_return_edges;
> to:
>   auto_vec<const std::pair<const supernode *,const supernode *>*>
> m_supernodes;

One other, unrelated idea that just occurred to me: those lines get
very long, so maybe introduce a typedef e.g. 
  typedef std::pair<const supernode *,const supernode *> element_t;
so that you can refer to the pairs as call_string::element_t, and just
element_t when you're in call_string scope, and just have a:

  auto_vec<element_t> m_supernodes;

or

  auto_vec<element_t> m_elements; 

within the call_string, if that makes sense.  Does that simplify
things?

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-05 16:15                           ` Ankur Saini
@ 2021-07-06 23:11                             ` David Malcolm
  0 siblings, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-06 23:11 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Mon, 2021-07-05 at 21:45 +0530, Ankur Saini wrote:
> I forgot to send the daily report yesterday, so this one covers the
> work done on both days
> 
> AIM : 
> 
> - make the analyzer call the function with the updated call-string
> representation ( even the ones that doesn’t have a superedge )
> - make the analyzer figure out the point of return from the function
> called without the superedge
> - make the analyser figure out the correct point to return back in the
> caller function
> - make enode and eedge representing the return call
> - test the changes on the example I created before
> - speculate what GCC generates for a vfunc call and discuss how can we
> use it to our advantage
> 
> —
> 
> PROGRESS  ( changes can be seen on
> "refs/users/arsenic/heads/analyzer_extension “ branch of the repository
> ) :
> 
> - Thanks to the new call-string representation, I was able to push
> calls to the call stack which doesn’t have a superedge and was
> successfully able to see the calls happening via the function pointer.
> 
> - To detect the returning point of the function I used the fact that
> such supernodes would contain an EXIT bb, would not have any return
> superedge and would still have a pending call-stack. 
> 
> - Now the next part was to find out the destination node of the return,
> for this I again made use of the new call string and created a custom
> accessor to get the caller and callee supernodes of the return call,
> then I extracted the gcall* from the caller supernode to ulpdate the
> program state, 
> 
> - now that I have got next state and next point, it was time to put the
> final piece of puzzle together and create exploded node and edge
> representing the returning call.
> 
> - I tested the changes on the the following program where the analyzer
> was earlier giving a false negative due to not detecting call via a
> function pointer
> 
> ```
> #include <stdio.h>
> #include <stdlib.h>
> 
> void fun(int *int_ptr)
> {
>     free(int_ptr);
> }
> 
> int test()
> {
>     int *int_ptr = (int*)malloc(sizeof(int));
>     void (*fun_ptr)(int *) = &fun;
>     (*fun_ptr)(int_ptr);
> 
>     return 0;
> }
> 
> void test_2()
> {
>   test();
> }
> ```
> ( compiler explorer link : https://godbolt.org/z/9KfenGET9 <
> https://godbolt.org/z/9KfenGET9> )
> 
> and results were showing success where the analyzer was now able to
> successfully detect, call and return from the function that was called
> via the function pointer and no longer reported the memory leak it was
> reporting before. : )

This is great; well done!

It would be good to turn the above into a regression test.  I think you
can do that by simply adding it to gcc/testsuite/gcc.dg/analyzer.  You
could also add a case where fun_ptr is called twice, and check that it
reports it as a double-free (and add a dg-warning directive to verify
that it correctly complains).

I wonder if your branch has already have fixed:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546

> 
> - I think I should point this out, in the process I created a lot of
> custom function to access/alter some data which was not possible
> before.
> 
> - now that calls via function pointer are taken care of, it was time
> to see what exactly happen what GCC generates when a function is
> dispatched dynamically, and as planned earlier, I went to  ipa-
> devirt.c ( devirtualizer’s implementation of GCC ) to investigate.
> 
> - althogh I didn’t understood everything that was happening there but
> here are some of the findings I though might be interesting for the
> project :- 
>         > the polymorphic call is called with a OBJ_TYPE_REF which
> contains otr_type( a type of class whose method is called) and
> otr_token (the index into virtual table where address is taken)
>         > the devirtualizer builds a type inheritance graph to keep
> track of entire inheritance hierarchy
>         > the most interesting function I found was
> “possible_polymorphic_call_targets()” which returns the vector of all
> possible targets of polymorphic call represented by a calledge or a
> gcall.
>         > what I understood the devirtualizer do is to search in
> these polymorphic calls and filter out the the calls which are more
> likely to be called ( known as likely calls ) and then turn them into
> speculative calls which are later turned into direct calls.
> 
> - another thing I was curious to know was, how would analyzer behave
> when encountered with a polymorphic call now that we are splitting
> the superedges at every call. 
> 
> the results were interesting, I was able to see analyzer splitting
> supernodes for the calls right away but this time they were not
> connected via a intraprocedural edge making the analyzer crashing at
> the callsite ( I would look more into it tomorrow ) 
> 
> the example I used was : -
> ```
> struct A
> {
>     virtual int foo (void) 
>     {
>         return 42;
>     }
> };
> 
> struct B: public A
> {
>   int foo (void) 
>     { 
>         return 0;
>     }
> };
> 
> int test()
> {
>     struct B b, *bptr=&b;
>     bptr->foo();
>     return bptr->foo();
> }
> ```
> ( compiler explorer link : https://godbolt.org/z/d986ab7MY < 
> https://godbolt.org/z/d986ab7MY> )
> 

I can see the crash in gdb:

In state_purge_per_ssa_name::process_point, when
  if (snode->m_returning_call)
the code assumes that there will a cgraph_edge, which isn't the case
anymore; it will need to go from the "return" supernode to the "call"
supernode (both within the caller function).


> —
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - make the analyzer call the function with the updated call-string
> representation ( even the ones that doesn’t have a superedge ) (done)
> - make the analyzer figure out the point of return from the function
> called without the superedge (done)
> - make the analyser figure out the correct point to return back in
> the caller function (done)
> - make enode and eedge representing the return call (done)
> - test the changes on the example I created before (done)
> - speculate what GCC generates for a vfunc call and discuss how can
> we use it to our advantage (done)
> 

Good work; looks promising.
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-06 22:46                           ` David Malcolm
  2021-07-06 22:50                             ` David Malcolm
@ 2021-07-07 13:52                             ` Ankur Saini
  2021-07-07 14:37                               ` David Malcolm
  1 sibling, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-07 13:52 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 07-Jul-2021, at 4:16 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
>> AIM for today : 
>> 
>> - update the call_stack to track something else other than supergraph
>> edges
>> 
>> —
>> 
>> PROGRESS :
>> 
>> - After some brainstorming about tracking the callstack, I think one
>> better way to track the call stack is to keep track of source and
>> destination of the call instead of supperedges representing them. 
>> 
>> - so I branched again and updated the call-string to now capture a pair
>> of supernodes ( one representing callee and other representing caller
>> ), like this I was not only able to easily port the entire code to
>> adapt it without much difficulty, but now can also push calls to
>> functions that doesn’t possess a superedge.
>> 
>> - changes done can be seen on the "
>> refs/users/arsenic/heads/call_string_update “ branch. ( currently this
>> branch doesn’t contain other changes I have done till now, as I wanted
>> to test the new call-string representation exclusively and make sure it
>> doesn’t affect the functionality of the base analyser )
> 
> I'm not an expert at git, so it took me a while to figure out how to
> access your branch.
> 
> It's easier for me if you can also use "git format-patch" to generate a
> patch and "git send-email" to send it to this list (and me, please), so
> that the patch content is going to the list.
> 
> The approach in the patch seems reasonable.
> 
> I think you may have a memory leak, though: you're changing call_string
> from:
>  auto_vec<const return_superedge *> m_return_edges;
> to:
>  auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;
> 
> and the std:pairs are being dynamically allocated on the heap.
> Ownership gets transferred by call_string::operator=, but if I'm
> reading the patch correctly never get deleted.  This is OK for
> prototyping, but will need fixing before the code can be merged.

> 
> It's probably simplest to get rid of the indirection and allocation in
> m_supernodes and have the std::pair be stored by value, rather than by
> pointer, i.e.:
>  auto_vec<std::pair<const supernode *, const supernode *> > m_supernodes;
> 
> Does that work? (or is there a problem I'm not thinking of).

yes, I noticed that while creating, was thinking to empty the vector and delete the all the memory in the destructor of the call-string ( or make them unique pointers and let them destroy themselves ) but looks like storing the values of the pairs would make more sense.

> 
> If that's a problem, I think you might be able to get away with
> dropping the "first" from the pair, and simply storing the supernode to
> return to; I think the only places that "first" gets used are in dumps
> and in validation.  But "first" is probably helpful for debugging, so
> given that you've got it working with that field, better to keep it.

yes, I see that too, but my idea is to keep it as is for now ( maybe it might turn out to be helpful in future ). I will change it back if it proves to be useless and we get time at the end.

> 
> Hope this is helpful
> Dave
> 
>> 
>> - now hopefully all that is left for tomorrow is to update the analyzer
>> to finally see, call and return from the function aclled via the
>> function pointer. 
>> 
>> STATUS AT THE END OF THE DAY :- 
>> 
>> - update the call_stack to track something else other than supergraph
>> edges ( done )
>> 
>> Thank you
>> - Ankur

———

> On 07-Jul-2021, at 4:20 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Tue, 2021-07-06 at 18:46 -0400, David Malcolm wrote:
>> On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
>>> AIM for today : 
>>> 
>>> - update the call_stack to track something else other than
>>> supergraph
>>> edges
>>> 
>>> —
>>> 
>>> PROGRESS :
>>> 
>>> - After some brainstorming about tracking the callstack, I think
>>> one
>>> better way to track the call stack is to keep track of source and
>>> destination of the call instead of supperedges representing them. 
>>> 
>>> - so I branched again and updated the call-string to now capture a
>>> pair
>>> of supernodes ( one representing callee and other representing
>>> caller
>>> ), like this I was not only able to easily port the entire code to
>>> adapt it without much difficulty, but now can also push calls to
>>> functions that doesn’t possess a superedge.
>>> 
>>> - changes done can be seen on the "
>>> refs/users/arsenic/heads/call_string_update “ branch. ( currently
>>> this
>>> branch doesn’t contain other changes I have done till now, as I
>>> wanted
>>> to test the new call-string representation exclusively and make
>>> sure it
>>> doesn’t affect the functionality of the base analyser )
>> 
>> I'm not an expert at git, so it took me a while to figure out how to
>> access your branch.
>> 
>> It's easier for me if you can also use "git format-patch" to generate
>> a
>> patch and "git send-email" to send it to this list (and me, please),
>> so
>> that the patch content is going to the list.
>> 
>> The approach in the patch seems reasonable.
>> 
>> I think you may have a memory leak, though: you're changing
>> call_string
>> from:
>>   auto_vec<const return_superedge *> m_return_edges;
>> to:
>>   auto_vec<const std::pair<const supernode *,const supernode *>*>
>> m_supernodes;
> 
> One other, unrelated idea that just occurred to me: those lines get
> very long, so maybe introduce a typedef e.g. 
>  typedef std::pair<const supernode *,const supernode *> element_t;
> so that you can refer to the pairs as call_string::element_t, and just
> element_t when you're in call_string scope, and just have a:
> 
>  auto_vec<element_t> m_supernodes;
> 
> or
> 
>  auto_vec<element_t> m_elements; 
> 
> within the call_string, if that makes sense.  Does that simplify
> things?

Yes, this is a nice idea, I will update the call-stack with next update to the analyzer, or should I update it and send a patch to the mailing list with this call_string changes for review first and then work on the other changes ?

also regarding the ChangeLog 

how exactly does one update the changelog, does it get updated with the commit messages or do one have to update it manually ? 

I found and read this doc (https://gcc.gnu.org/codingconventions.html#ChangeLogs <https://gcc.gnu.org/codingconventions.html#ChangeLogs>) about the same which says that “ ChangeLog entries are part of git commit messages and are automatically put into a corresponding ChangeLog file “. But when I pushed the commits which looks properly formatted to me, I was not able to see the changes reflecting back in  gcc/analyzer/changelog .

> 
> Dave

Thanks 

- Ankur Saini

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-07 13:52                             ` Ankur Saini
@ 2021-07-07 14:37                               ` David Malcolm
  2021-07-10 15:57                                 ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-07 14:37 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Wed, 2021-07-07 at 19:22 +0530, Ankur Saini wrote:
> 
> 
> > On 07-Jul-2021, at 4:16 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Sat, 2021-07-03 at 20:07 +0530, Ankur Saini wrote:
> > > AIM for today : 
> > > 
> > > - update the call_stack to track something else other than
> > > supergraph
> > > edges
> > > 
> > > —
> > > 
> > > PROGRESS :
> > > 
> > > - After some brainstorming about tracking the callstack, I think
> > > one
> > > better way to track the call stack is to keep track of source and
> > > destination of the call instead of supperedges representing them.
> > > 
> > > - so I branched again and updated the call-string to now capture
> > > a pair
> > > of supernodes ( one representing callee and other representing
> > > caller
> > > ), like this I was not only able to easily port the entire code
> > > to
> > > adapt it without much difficulty, but now can also push calls to
> > > functions that doesn’t possess a superedge.
> > > 
> > > - changes done can be seen on the "
> > > refs/users/arsenic/heads/call_string_update “ branch. ( currently
> > > this
> > > branch doesn’t contain other changes I have done till now, as I
> > > wanted
> > > to test the new call-string representation exclusively and make
> > > sure it
> > > doesn’t affect the functionality of the base analyser )
> > 
> > I'm not an expert at git, so it took me a while to figure out how
> > to
> > access your branch.
> > 
> > It's easier for me if you can also use "git format-patch" to
> > generate a
> > patch and "git send-email" to send it to this list (and me,
> > please), so
> > that the patch content is going to the list.
> > 
> > The approach in the patch seems reasonable.
> > 
> > I think you may have a memory leak, though: you're changing
> > call_string
> > from:
> >  auto_vec<const return_superedge *> m_return_edges;
> > to:
> >  auto_vec<const std::pair<const supernode *,const supernode *>*>
> > m_supernodes;
> > 
> > and the std:pairs are being dynamically allocated on the heap.
> > Ownership gets transferred by call_string::operator=, but if I'm
> > reading the patch correctly never get deleted.  This is OK for
> > prototyping, but will need fixing before the code can be merged.
> 
> > 
> > It's probably simplest to get rid of the indirection and allocation
> > in
> > m_supernodes and have the std::pair be stored by value, rather than
> > by
> > pointer, i.e.:
> >  auto_vec<std::pair<const supernode *, const supernode *> >
> > m_supernodes;
> > 
> > Does that work? (or is there a problem I'm not thinking of).
> 
> yes, I noticed that while creating, was thinking to empty the vector
> and delete the all the memory in the destructor of the call-string (
> or make them unique pointers and let them destroy themselves ) but
> looks like storing the values of the pairs would make more sense.

Yes, just storing the std::pair rather than new/delete is much simpler.

There's also an auto_delete_vec<T> which stores (T *) as the elements
and deletes all of the elements in its dtor, but the assignment
operator/copy-ctor/move-assign/move-ctor probably don't work properly,
and the overhead of new/delete probably isn't needed.

> 
> > 
> > If that's a problem, I think you might be able to get away with
> > dropping the "first" from the pair, and simply storing the
> > supernode to
> > return to; I think the only places that "first" gets used are in
> > dumps
> > and in validation.  But "first" is probably helpful for debugging,
> > so
> > given that you've got it working with that field, better to keep
> > it.
> 
> yes, I see that too, but my idea is to keep it as is for now ( maybe
> it might turn out to be helpful in future ). I will change it back if
> it proves to be useless and we get time at the end.

Yes; my suggestion was just in case there were issues with fixing the
auto_vec.   It's better for debugging to have both of the pointers in
the element.

[...snip...]

> > > 
> > > I think you may have a memory leak, though: you're changing
> > > call_string
> > > from:
> > >   auto_vec<const return_superedge *> m_return_edges;
> > > to:
> > >   auto_vec<const std::pair<const supernode *,const supernode *>*>
> > > m_supernodes;
> > 
> > One other, unrelated idea that just occurred to me: those lines get
> > very long, so maybe introduce a typedef e.g. 
> >  typedef std::pair<const supernode *,const supernode *> element_t;
> > so that you can refer to the pairs as call_string::element_t, and
> > just
> > element_t when you're in call_string scope, and just have a:
> > 
> >  auto_vec<element_t> m_supernodes;
> > 
> > or
> > 
> >  auto_vec<element_t> m_elements; 
> > 
> > within the call_string, if that makes sense.  Does that simplify
> > things?
> 
> Yes, this is a nice idea, I will update the call-stack with next
> update to the analyzer, or should I update it and send a patch to the
> mailing list with this call_string changes for review first and then
> work on the other changes ?

I prefer reviewing code via emails to the mailing list, rather than
looking at it in the repository.  One benefit is that other list
subscribers (and archive readers) can easily see the code we're
discussing; this will become more significant as we go into the ipa-
devirt code which wasn't written by me.

That said, one benefit of having your own branch is so you don't have
to wait for review - you can prototype things and press on without
waiting, and not have to worry about everything being perfect
immediately.  Hopefully we can review the code frequently enough to
allow you to "course-correct" without delaying you.

> 
> also regarding the ChangeLog 
> 
> how exactly does one update the changelog, does it get updated with
> the commit messages or do one have to update it manually ? 
> 
> I found and read this doc (  
> https://gcc.gnu.org/codingconventions.html#ChangeLogs <  
> https://gcc.gnu.org/codingconventions.html#ChangeLogs>) about the
> same which says that “ ChangeLog entries are part of git commit
> messages and are automatically put into a corresponding ChangeLog
> file “. But when I pushed the commits which looks properly formatted
> to me, I was not able to see the changes reflecting back in 
> gcc/analyzer/changelog .

There's a script that runs every 24 hours on the main branches which
looks at the commits that have happened since the script last run,
tries to parse the commit messages to find ChangeLog entries, and adds
a commit adding those entries to the ChangeLog files.  I don't think it
runs on user branches.

We require ChangeLog entries when merging code to trunk and the release
branches, but it may be overkill for a personal development branch. 
When I'm working on a new feature, I only bother writing the ChangeLog
when a patch is nearly ready for trunk, since often I find that patches
need a fair amount of reworking as I test them.

The call_string patch looks nearly ready for trunk, and thus probably
needs a ChangeLog, but the ipa-devirt work is going to be more
experimental for now, so I wouldn't bother with ChangeLogs on it yet
until it's clearer what the changes will be and how to land them into
trunk.

I'm currently working on implementing detection of uses of
uninitialized values in -fanalyzer for GCC 12, and so I'm making my own
changes (mostly to region-model/store/constraint-manager).  Hopefully
we won't get too many clashes between our changes.

Hope this makes sense
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-07 14:37                               ` David Malcolm
@ 2021-07-10 15:57                                 ` Ankur Saini
  2021-07-11 17:01                                   ` Ankur Saini
  2021-07-11 17:49                                   ` David Malcolm
  0 siblings, 2 replies; 45+ messages in thread
From: Ankur Saini @ 2021-07-10 15:57 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- update the call_string to store a vector of pair of supernode* instead of pointer to it 
- create a typdef to give a meaning full name to these " pair of supernode* “
- send the patch list to gcc-patches mailing list
- add the example program I created to the GCC tests

—

PROGRESS :

- I successfully changed the entire call string representation again to keep track of "auto_vec<element_t> m_elements;” from "auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;” 

- while doing so, one hurdle I found was to change "hashval_t hash () const;”, function of which I quite didn’t understood properly, but looking at source, it looked like it just needed some value ( integer or pointer ) to add to ‘hstate’ and ' m_elements’ was neither a pointer nor an integer so I instead added pointer to callee supernode ( “second” of the m_elements ) to the ‘hstate’ for now. 

- for the callstring patch, I created a patch file ( using git format-patch ) and sent it to patches mailing list (via git send email ) and CCed my mentor.
Although I received a positive reply from the command log (git send email) saying the mail was sent , I didn’t received that mail ( being subscribed to the patches list ) regarding the same ( I sent that just before sending this mail ).
The mail should be sent from arsenic@sourceware.org <mailto:arsenic@sourceware.org> 

- I also added the example I was using to test the analyzer to regression tests as "gcc/testsuite/gcc.dg/analyzer/call-via-fnptr.c”.

STATUS AT THE END OF THE DAY :- 

- update the call_string to store a vector of pair of supernode* instead of pointer to it ( done )
- create a typdef to give a meaning full name to these " pair of supernode* “ ( done )
- send the patch list to gcc-patches mailing list ( done ? )
- add the example program I created to the GCC tests ( done )


> On 07-Jul-2021, at 8:07 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Wed, 2021-07-07 at 19:22 +0530, Ankur Saini wrote:

[…]

>> 
>> also regarding the ChangeLog 
>> 
>> how exactly does one update the changelog, does it get updated with
>> the commit messages or do one have to update it manually ? 
>> 
>> I found and read this doc (  
>> https://gcc.gnu.org/codingconventions.html#ChangeLogs <https://gcc.gnu.org/codingconventions.html#ChangeLogs> <  
>> https://gcc.gnu.org/codingconventions.html#ChangeLogs <https://gcc.gnu.org/codingconventions.html#ChangeLogs>>) about the
>> same which says that “ ChangeLog entries are part of git commit
>> messages and are automatically put into a corresponding ChangeLog
>> file “. But when I pushed the commits which looks properly formatted
>> to me, I was not able to see the changes reflecting back in 
>> gcc/analyzer/changelog .
> 
> There's a script that runs every 24 hours on the main branches which
> looks at the commits that have happened since the script last run,
> tries to parse the commit messages to find ChangeLog entries, and adds
> a commit adding those entries to the ChangeLog files.  I don't think it
> runs on user branches.
> 
> We require ChangeLog entries when merging code to trunk and the release
> branches, but it may be overkill for a personal development branch. 
> When I'm working on a new feature, I only bother writing the ChangeLog
> when a patch is nearly ready for trunk, since often I find that patches
> need a fair amount of reworking as I test them.

make sense 

> 
> The call_string patch looks nearly ready for trunk, and thus probably
> needs a ChangeLog, but the ipa-devirt work is going to be more
> experimental for now, so I wouldn't bother with ChangeLogs on it yet
> until it's clearer what the changes will be and how to land them into
> trunk.

> 
> I'm currently working on implementing detection of uses of
> uninitialized values in -fanalyzer for GCC 12, and so I'm making my own
> changes (mostly to region-model/store/constraint-manager).  Hopefully
> we won't get too many clashes between our changes.

Ok , I will keep that in mind.

Thank you
- Ankur



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-10 15:57                                 ` Ankur Saini
@ 2021-07-11 17:01                                   ` Ankur Saini
  2021-07-11 18:01                                     ` David Malcolm
  2021-07-11 17:49                                   ` David Malcolm
  1 sibling, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-11 17:01 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM for today : 

- get "state_purge_per_ssa_name::process_point () “ to  go from the “return" supernode to the “call” supernode.
- fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process 
- test and observe the effect of changes done so far on vfunc calls

—

PROGRESS :

- In order to go from “return” supernode to “call” supernode, I used the fact that return supernode will have GIMPLE call statement which when passed to “get_supernode_for_stmt ()”  returns pointer to “call” supernode. 

now that part of the function look something like this 

File: {SCR_DIR}/gcc/analyzer/state-purge.cc <http://state-purge.cc/>

347: 	    /* Add any intraprocedually edge for a call.  */
348: 	    if (snode->m_returning_call)
349: 	      {
350: 			cgraph_edge *cedge
351: 			  = supergraph_call_edge (snode->m_fun,
352: 						  snode->m_returning_call);
353: 			if(!cedge)
354: 			{
355: 				supernode* callernode = map.get_sg ().get_supernode_for_stmt(snode->m_returning_call);
356: 				gcc_assert (callernode);
357: 				add_to_worklist
358: 				  (function_point::after_supernode (callernode),
359: 				   worklist, logger);
360: 			}
361: 			else
362: 			{
363: 				gcc_assert (cedge);
364: 				superedge *sedge
365: 				  = map.get_sg ().get_intraprocedural_edge_for_call (cedge);
366: 					gcc_assert (sedge);
367: 				add_to_worklist
368: 				  (function_point::after_supernode (sedge->m_src),
369: 				   worklist, logger);
370: 			}
371: 	      }

- now the patch also fixes bug #100546 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>) and doesn’t give out a false report about dereferencing a null pointer which will never happen.

- now I tested it with vfuncs to see what happens in that case, the results were as expected where the analyzer detects the call to virtual function and split call and returning supernodes, but did not understand which function to calll, making nodes after it unreachable. 

- Now If we somehow able to update the regional model to understand which function is called ( or may be called ) then the analyzer can now easily call and analyze that virtual function call.


STATUS AT THE END OF THE DAY :- 

- get "state_purge_per_ssa_name::process_point () “ to  go from the “return" supernode to the “call” supernode. ( done )
- fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process. ( done )
- test and observe the effect of changes done so far on vfunc calls ( done )

—
P.S. 
regarding the patch I sent to mailing list yesterday. I found it, apparently the mail was detected as a "spam mail” by my system and was redirected  to my spam inbox. 
Btw I am also attaching that patch file with this mail for the records.




Thank you 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-10 15:57                                 ` Ankur Saini
  2021-07-11 17:01                                   ` Ankur Saini
@ 2021-07-11 17:49                                   ` David Malcolm
  2021-07-12 16:37                                     ` Ankur Saini
  1 sibling, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-11 17:49 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Sat, 2021-07-10 at 21:27 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - update the call_string to store a vector of pair of supernode*
> instead of pointer to it 
> - create a typdef to give a meaning full name to these " pair of
> supernode* “
> - send the patch list to gcc-patches mailing list
> - add the example program I created to the GCC tests
> 
> —
> 
> PROGRESS :
> 
> - I successfully changed the entire call string representation again to
> keep track of "auto_vec<element_t> m_elements;” from "auto_vec<const
> std::pair<const supernode *,const supernode *>*> m_supernodes;” 
> 
> - while doing so, one hurdle I found was to change "hashval_t hash ()
> const;”, function of which I quite didn’t understood properly, but
> looking at source, it looked like it just needed some value ( integer
> or pointer ) to add to ‘hstate’ and ' m_elements’ was neither a pointer
> nor an integer so I instead added pointer to callee supernode (
> “second” of the m_elements ) to the ‘hstate’ for now. 
> 
> - for the callstring patch, I created a patch file ( using git format-
> patch ) and sent it to patches mailing list (via git send email ) and
> CCed my mentor.
> Although I received a positive reply from the command log (git send
> email) saying the mail was sent , I didn’t received that mail ( being
> subscribed to the patches list ) regarding the same ( I sent that just
> before sending this mail ).
> The mail should be sent from arsenic@sourceware.org <mailto:
> arsenic@sourceware.org> 

Thanks.

I see the patch email in the list archives here:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html
but for some reason it's not showing up in my inbox.  I'm not sure why;
I recently got migrated to a new email server and my filters are
currently a mess so it could be a problem at my end; sorry if that's
the case.

Given that neither you nor I seem to have received the patch I wonder
if anyone else received it?

Given that, I'm going to reply to that patch email inline here (by
copying and pasting it from the archive):

> [PATCH 1/2] analyzer: refactor callstring to work with pairs of supernodes [GSoC]
> 
> 2021-07-3  Ankur Saini  <arsenic@sourceware.org>

There are some formatting rules that we follow with ChangeLog entries.
We have a script:

  ./contrib/gcc-changelog/git_check_commit.py --help

that you can run to check the formatting.

>         * gcc/analyzer/call-string.cc: refactor callstring to work with pair of supernodes instead of super superedges
>         * gcc/analyzer/call-string.h: make callstring work with pairs of supernodes
>         * gcc/analyzer/program-point.cc: refactor program point to work with new call-string format

The "gcc/analyzer" directory has its own ChangeLog file, and filenames
should be expressed relative to it, so these entries should read
something like:

gcc/analyzer/ChangeLog:
	* call-string.cc: ...etc
	* call-string.h: ...etc
	* program-point.cc: ...etc

The entries should be sentences (i.e. initial capital letter and
trailing full-stop).

They should be line-wrapped (I do it at 74 columns), giving this:

gcc/analyzer/ChangeLog:
	* call-string.cc: Refactor callstring to work with pair of
	supernodes instead of superedges.
	* call-string.h: Make callstring work with pairs of supernodes.
	* program-point.cc: Refactor program point to work with new
	call-string format.

Your text editor might have a mode for working with ChangeLog files.

	[...snip...]

> @@ -152,22 +152,40 @@ call_string::push_call (const supergraph &sg,
>    gcc_assert (call_sedge);
>    const return_superedge *return_sedge = call_sedge->get_edge_for_return (sg);
>    gcc_assert (return_sedge);
> -  m_return_edges.safe_push (return_sedge);
> +  const std::pair<const supernode *,const supernode *> *e = new (std::pair<const supernode *,const supernode *>)

We don't want lines wider than 80 columns unless it can't be helped. 
Does your text editor have a feature to warn you about overlong lines?

Changing from:
  std::pair<const supernode *,const supernode *>
to:
  element_t
should make it much easier to avoid overlong lines.

[...snip...]

> diff --git a/gcc/analyzer/call-string.h b/gcc/analyzer/call-string.h
> index 7721571ed60..0134d185b90 100644
> --- a/gcc/analyzer/call-string.h
> +++ b/gcc/analyzer/call-string.h

[...snip...]

> +
> +  void push_call (const supernode *src, 
> +    const supernode *dest);

There's some odd indentation here.  Does your text editor have option
to
(a) show visible whitespace (distinguish between tabs vs spaces)
(b) enforce a coding standard?

If your editor supports it, it's easy to comply with a project's coding
standards, otherwise it can be a pain.

[...snip...]

>  private:
> -  auto_vec<const return_superedge *> m_return_edges;
> +  //auto_vec<const return_superedge *> m_return_edges;
> +  auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;
>  };

Commenting out lines is OK during prototyping.  Obviously as the patch
gets closer to be being ready we want to simply delete them instead.

[...]

> >From 95572742f1aaad1975aa35a663e8b26e671d4323 Mon Sep 17 00:00:00 2001
> From: Ankur Saini <arsenic@sourceware.org>
> Date: Sat, 10 Jul 2021 19:28:49 +0530
> Subject: [PATCH 2/2] analyzer: make callstring's pairs of supernodes
>  statically allocated [GSoC]
> 
>     2021-07-10  Ankur Saini  <arsenic@sourceware.org>
> 
>     	gcc/analyzer/
>             * call-string.cc: store a vector of std::pair of supernode* instead of pointer to them
>             * call-string.h: create a typedef for "auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;" to enhance readibility

...and to avoid having really long lines!

>             * program-point.cc: refactor program point to work with new call-string format

I think it's going to be much easier for me if you squash these two
patches into a single patch so I can review the combined change.  (If
you've not seen it yet, try out "git rebase -i" to see how to do this).

> ---
>  gcc/analyzer/call-string.cc   | 99 ++++++++++++++++++++---------------
>  gcc/analyzer/call-string.h    | 21 +++++---
>  gcc/analyzer/program-point.cc |  8 +--
>  3 files changed, 73 insertions(+), 55 deletions(-)
> 
[...]

> diff --git a/gcc/analyzer/call-string.h b/gcc/analyzer/call-string.h
> index 0134d185b90..450af6da21a 100644
> --- a/gcc/analyzer/call-string.h
> +++ b/gcc/analyzer/call-string.h
> @@ -28,6 +28,8 @@ class supernode;
>  class call_superedge;
>  class return_superedge;
>  
> + typedef std::pair<const supernode *,const supernode *> element_t;

Rather than a std::pair, I think a struct inside call_string like this
would be better: rather than "first" and "second" we can refer to
"m_caller" and "m_callee", which is closer to being self-documenting,
and it allows us to add member functions e.g. "get_caller_function":

class call_string
{
public:
  struct element_t
  {
    element_t (const supernode *caller, const supernode *callee)
    : m_caller (caller), m_callee (callee)
    {
    }

    function *get_caller_function () const {/*etc*/}
    function *get_callee_function () const {/*etc*/}

    const supernode *m_caller;
    const supernode *m_callee;
  };

};

[...snip...]

which might clarify the code further.

> 
> - I also added the example I was using to test the analyzer to
> regression tests as "gcc/testsuite/gcc.dg/analyzer/call-via-fnptr.c”.

Great!  Please add it to the patch.

> 
> STATUS AT THE END OF THE DAY :- 
> 
> - update the call_string to store a vector of pair of supernode*
> instead of pointer to it ( done )
> - create a typdef to give a meaning full name to these " pair of
> supernode* “ ( done )
> - send the patch list to gcc-patches mailing list ( done ? )
> - add the example program I created to the GCC tests ( done )
> 

Hope this is constructive
Dave



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-11 17:01                                   ` Ankur Saini
@ 2021-07-11 18:01                                     ` David Malcolm
  0 siblings, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-11 18:01 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Sun, 2021-07-11 at 22:31 +0530, Ankur Saini wrote:
> AIM for today : 
> 
> - get "state_purge_per_ssa_name::process_point () “ to  go from the
> “return" supernode to the “call” supernode.
> - fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process 
> - test and observe the effect of changes done so far on vfunc calls
> 
> —
> 
> PROGRESS :
> 
> - In order to go from “return” supernode to “call” supernode, I used
> the fact that return supernode will have GIMPLE call statement which
> when passed to “get_supernode_for_stmt ()”  returns pointer to “call”
> supernode. 
> 
> now that part of the function look something like this 
> 
> File: {SCR_DIR}/gcc/analyzer/state-purge.cc <http://state-purge.cc/>
> 
> 347:        /* Add any intraprocedually edge for a call.  */
> 348:        if (snode->m_returning_call)
> 349:          {
> 350:                    cgraph_edge *cedge
> 351:                      = supergraph_call_edge (snode->m_fun,
> 352:                                              snode-
> >m_returning_call);
> 353:                    if(!cedge)
> 354:                    {
> 355:                            supernode* callernode = map.get_sg
> ().get_supernode_for_stmt(snode->m_returning_call);
> 356:                            gcc_assert (callernode);
> 357:                            add_to_worklist
> 358:                              (function_point::after_supernode
> (callernode),
> 359:                               worklist, logger);
> 360:                    }
> 361:                    else
> 362:                    {
> 363:                            gcc_assert (cedge);
> 364:                            superedge *sedge
> 365:                              = map.get_sg
> ().get_intraprocedural_edge_for_call (cedge);
> 366:                                    gcc_assert (sedge);
> 367:                            add_to_worklist
> 368:                              (function_point::after_supernode
> (sedge->m_src),
> 369:                               worklist, logger);
> 370:                    }
> 371:          }
> 
> - now the patch also fixes bug #100546 (
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>) and doesn’t give
> out a false report about dereferencing a null pointer which will never
> happen.

Excellent.  You should add a testcase for that bug to the test suite.

> 
> - now I tested it with vfuncs to see what happens in that case, the
> results were as expected where the analyzer detects the call to virtual
> function and split call and returning supernodes, but did not
> understand which function to calll, making nodes after it unreachable. 
> 
> - Now If we somehow able to update the regional model to understand
> which function is called ( or may be called ) then the analyzer can now
> easily call and analyze that virtual function call.

I had some ideas about how to do this here:
  https://gcc.gnu.org/pipermail/gcc/2021-April/235335.html
which might work for simple cases where we have a code path through a
ctor of a known subclass

...but I haven't looked in detail at ipa-devirt.c yet, so I might be
wrong.

> 
> 
> STATUS AT THE END OF THE DAY :- 
> 
> - get "state_purge_per_ssa_name::process_point () “ to  go from the
> “return" supernode to the “call” supernode. ( done )
> - fix bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <  
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546> in the process. (
> done )
> - test and observe the effect of changes done so far on vfunc calls (
> done )
> 
> —
> P.S. 
> regarding the patch I sent to mailing list yesterday. I found it,
> apparently the mail was detected as a "spam mail” by my system and was
> redirected  to my spam inbox. 

Strange.  I didn't see it in my spam folder.

> Btw I am also attaching that patch file with this mail for the records.

Thanks.  I've replied to it in another email here:
  https://gcc.gnu.org/pipermail/gcc/2021-July/236726.html

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-11 17:49                                   ` David Malcolm
@ 2021-07-12 16:37                                     ` Ankur Saini
  2021-07-14 17:11                                       ` Ankur Saini
  2021-07-14 23:07                                       ` David Malcolm
  0 siblings, 2 replies; 45+ messages in thread
From: Ankur Saini @ 2021-07-12 16:37 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

> 
> On 11-Jul-2021, at 11:19 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Sat, 2021-07-10 at 21:27 +0530, Ankur Saini wrote:
>> AIM for today : 
>> 
>> - update the call_string to store a vector of pair of supernode*
>> instead of pointer to it 
>> - create a typdef to give a meaning full name to these " pair of
>> supernode* “
>> - send the patch list to gcc-patches mailing list
>> - add the example program I created to the GCC tests
>> 
>> —
>> 
>> PROGRESS :
>> 
>> - I successfully changed the entire call string representation again to
>> keep track of "auto_vec<element_t> m_elements;” from "auto_vec<const
>> std::pair<const supernode *,const supernode *>*> m_supernodes;” 
>> 
>> - while doing so, one hurdle I found was to change "hashval_t hash ()
>> const;”, function of which I quite didn’t understood properly, but
>> looking at source, it looked like it just needed some value ( integer
>> or pointer ) to add to ‘hstate’ and ' m_elements’ was neither a pointer
>> nor an integer so I instead added pointer to callee supernode (
>> “second” of the m_elements ) to the ‘hstate’ for now. 
>> 
>> - for the callstring patch, I created a patch file ( using git format-
>> patch ) and sent it to patches mailing list (via git send email ) and
>> CCed my mentor.
>> Although I received a positive reply from the command log (git send
>> email) saying the mail was sent , I didn’t received that mail ( being
>> subscribed to the patches list ) regarding the same ( I sent that just
>> before sending this mail ).
>> The mail should be sent from arsenic@sourceware.org <mailto:arsenic@sourceware.org> <mailto:
>> arsenic@sourceware.org <mailto:arsenic@sourceware.org>> 
> 
> Thanks.
> 
> I see the patch email in the list archives here:
>  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html <https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html>
> but for some reason it's not showing up in my inbox.  I'm not sure why;
> I recently got migrated to a new email server and my filters are
> currently a mess so it could be a problem at my end; sorry if that's
> the case.
> 
> Given that neither you nor I seem to have received the patch I wonder
> if anyone else received it?

Then I think it’s better to attach patch file with the updates here instead.

> 
> Given that, I'm going to reply to that patch email inline here (by
> copying and pasting it from the archive):
> 
>> [PATCH 1/2] analyzer: refactor callstring to work with pairs of supernodes [GSoC]
>> 
>> 2021-07-3  Ankur Saini  <arsenic@sourceware.org <mailto:arsenic@sourceware.org>>
> 
> There are some formatting rules that we follow with ChangeLog entries.
> We have a script:
> 
>  ./contrib/gcc-changelog/git_check_commit.py --help

Ok, I will keep in mind to double check with it from now on.

> 
> that you can run to check the formatting.
> 
>>        * gcc/analyzer/call-string.cc <http://call-string.cc/>: refactor callstring to work with pair of supernodes instead of super superedges
>>        * gcc/analyzer/call-string.h: make callstring work with pairs of supernodes
>>        * gcc/analyzer/program-point.cc <http://program-point.cc/>: refactor program point to work with new call-string format
> 
> The "gcc/analyzer" directory has its own ChangeLog file, and filenames
> should be expressed relative to it, so these entries should read
> something like:
> 
> gcc/analyzer/ChangeLog:
> 	* call-string.cc <http://call-string.cc/>: ...etc
> 	* call-string.h: ...etc
> 	* program-point.cc <http://program-point.cc/>: ...etc
> 
> The entries should be sentences (i.e. initial capital letter and
> trailing full-stop).
> 
> They should be line-wrapped (I do it at 74 columns), giving this:
> 
> gcc/analyzer/ChangeLog:
> 	* call-string.cc <http://call-string.cc/>: Refactor callstring to work with pair of
> 	supernodes instead of superedges.
> 	* call-string.h: Make callstring work with pairs of supernodes.
> 	* program-point.cc <http://program-point.cc/>: Refactor program point to work with new
> 	call-string format.
> 
> Your text editor might have a mode for working with ChangeLog files.

Yes, there is a way to wrap text after certain number of columns in my text editor, I would be using that from now on when working with changelogs.

> 
> 	[...snip...]
> 
>> @@ -152,22 +152,40 @@ call_string::push_call (const supergraph &sg,
>>   gcc_assert (call_sedge);
>>   const return_superedge *return_sedge = call_sedge->get_edge_for_return (sg);
>>   gcc_assert (return_sedge);
>> -  m_return_edges.safe_push (return_sedge);
>> +  const std::pair<const supernode *,const supernode *> *e = new (std::pair<const supernode *,const supernode *>)
> 
> We don't want lines wider than 80 columns unless it can't be helped. 
> Does your text editor have a feature to warn you about overlong lines?
> 
> Changing from:
>  std::pair<const supernode *,const supernode *>
> to:
>  element_t
> should make it much easier to avoid overlong lines.
> 
> [...snip...]
> 
>> diff --git a/gcc/analyzer/call-string.h b/gcc/analyzer/call-string.h
>> index 7721571ed60..0134d185b90 100644
>> --- a/gcc/analyzer/call-string.h
>> +++ b/gcc/analyzer/call-string.h
> 
> [...snip...]
> 
>> +
>> +  void push_call (const supernode *src, 
>> +    const supernode *dest);
> 
> There's some odd indentation here.  Does your text editor have option
> to
> (a) show visible whitespace (distinguish between tabs vs spaces)
> (b) enforce a coding standard?
> 
> If your editor supports it, it's easy to comply with a project's coding
> standards, otherwise it can be a pain.

Oh, I see. This explains the weird indentation convention I was seeing throughout the source. Actually my editor dynamically adjusts the width of the tab depending on the style used in source file and due to some reasons, it decided that it was 2 space wide here, this was leading to some weird indentations throughout the source. 
Well now it should be fixed, I manually adjusted it to be standard 8 wide now and switched of converting tabs to spaces in my editor settings.

> 
> [...snip...]
> 
>> private:
>> -  auto_vec<const return_superedge *> m_return_edges;
>> +  //auto_vec<const return_superedge *> m_return_edges;
>> +  auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;
>> };
> 
> Commenting out lines is OK during prototyping.  Obviously as the patch
> gets closer to be being ready we want to simply delete them instead.

Sorry, I might have missed this one during my reviewing phase.

> 
> [...]
> 
>>> From 95572742f1aaad1975aa35a663e8b26e671d4323 Mon Sep 17 00:00:00 2001
>> From: Ankur Saini <arsenic@sourceware.org <mailto:arsenic@sourceware.org>>
>> Date: Sat, 10 Jul 2021 19:28:49 +0530
>> Subject: [PATCH 2/2] analyzer: make callstring's pairs of supernodes
>> statically allocated [GSoC]
>> 
>>    2021-07-10  Ankur Saini  <arsenic@sourceware.org <mailto:arsenic@sourceware.org>>
>> 
>>    	gcc/analyzer/
>>            * call-string.cc <http://call-string.cc/>: store a vector of std::pair of supernode* instead of pointer to them
>>            * call-string.h: create a typedef for "auto_vec<const std::pair<const supernode *,const supernode *>*> m_supernodes;" to enhance readibility
> 
> ...and to avoid having really long lines!
> 
>>            * program-point.cc <http://program-point.cc/>: refactor program point to work with new call-string format
> 
> I think it's going to be much easier for me if you squash these two
> patches into a single patch so I can review the combined change.  (If
> you've not seen it yet, try out "git rebase -i" to see how to do this).

woah, this is magic !
I always use to perform a soft reset ( git reset —soft <commit> ) and commit in order to squash or reword my commits before, but never knew we could change history locally like this, amazing : D

> 
>> ---
>> gcc/analyzer/call-string.cc <http://call-string.cc/>   | 99 ++++++++++++++++++++---------------
>> gcc/analyzer/call-string.h    | 21 +++++---
>> gcc/analyzer/program-point.cc <http://program-point.cc/> |  8 +--
>> 3 files changed, 73 insertions(+), 55 deletions(-)
>> 
> [...]
> 
>> diff --git a/gcc/analyzer/call-string.h b/gcc/analyzer/call-string.h
>> index 0134d185b90..450af6da21a 100644
>> --- a/gcc/analyzer/call-string.h
>> +++ b/gcc/analyzer/call-string.h
>> @@ -28,6 +28,8 @@ class supernode;
>> class call_superedge;
>> class return_superedge;
>> 
>> + typedef std::pair<const supernode *,const supernode *> element_t;
> 
> Rather than a std::pair, I think a struct inside call_string like this
> would be better: rather than "first" and "second" we can refer to
> "m_caller" and "m_callee", which is closer to being self-documenting,
> and it allows us to add member functions e.g. "get_caller_function":
> 
> class call_string
> {
> public:
>  struct element_t
>  {
>    element_t (const supernode *caller, const supernode *callee)
>    : m_caller (caller), m_callee (callee)
>    {
>    }
> 
>    function *get_caller_function () const {/*etc*/}
>    function *get_callee_function () const {/*etc*/}
> 
>    const supernode *m_caller;
>    const supernode *m_callee;
>  };
> 
> };
> 
> [...snip...]
> 
> which might clarify the code further.

Instead of putting that struct inside the class, I declared it globally and overloaded some basic operators ( “==“ and “!=“ ) to make it work without having to change a lot how it is being handled in other areas of source ( program_point.cc and engine.cc ).

> 
>> 
>> - I also added the example I was using to test the analyzer to
>> regression tests as "gcc/testsuite/gcc.dg/analyzer/call-via-fnptr.c”.
> 
> Great!  Please add it to the patch.

I am thinking to add it with later patches as this one only focuses on changing the call_string only and doesn’t quite fix this bug.

> 
>> 
>> STATUS AT THE END OF THE DAY :- 
>> 
>> - update the call_string to store a vector of pair of supernode*
>> instead of pointer to it ( done )
>> - create a typdef to give a meaning full name to these " pair of
>> supernode* “ ( done )
>> - send the patch list to gcc-patches mailing list ( done ? )
>> - add the example program I created to the GCC tests ( done )
>> 
> 
> Hope this is constructive
> Dave

—

( new fixed patch should be attached with this mail )



Thanks 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-12 16:37                                     ` Ankur Saini
@ 2021-07-14 17:11                                       ` Ankur Saini
  2021-07-14 23:23                                         ` David Malcolm
  2021-07-14 23:07                                       ` David Malcolm
  1 sibling, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-14 17:11 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

CURRENT STATUS OF PROJECT:

- The analyzer can now sucessfully detect and analyze function calls that 
  doesn't have a callgraph edge ( like a call via function pointer )

- A weird indentation problem caused by my text editor pointed out in 
  one of the previous mails (https://gcc.gnu.org/pipermail/gcc/2021-July/236747.html) 
  , that despite being fixed, still messed up indentation in all of the changes
  I have done so far.

- the analyser can still not detect a call via vtable pointer

---
AIM FOR TODAY: 

- Complete the first evaluation of GSoC
- Fix the indentation errors my generated by my editor on changes done till now
- Add the tests to regress testing 
- Create a ChangeLog for the next patch 
- Attach the patch with this mail 
- Layout a new region subclass for vtables ( getting ready for next patch )

---
PROGRESS  :

- To fix the indentaion problem, I simply created a diff and fixed all of them
  manually. I also found and read a doc regarding coding convention used by GCC 
  (https://gcc.gnu.org/codingconventions.html) and refactored the chagnes and
  changelog to follow this.

- After that I branched out and layed out foundation for next update
  and started created a subclass region for vtable ( vtable_region ), which  
  currently do nothing

- After that in order to give some final finishing touches to previous changes,
  I created chagnelog and added 2 more tests to the analyzer testsuite as
  follows :

  1. (function-ptr-4.c)
  ```
  #include <stdio.h>
  #include <stdlib.h>
  
  void fun(int *int_ptr)
  {
	  free(int_ptr);  /* { dg-warning "double-‘free’ of ‘int_ptr’" } */
  }
  
  void single_call()
  {
	  int *int_ptr = (int*)malloc(sizeof(int));
	  void (*fun_ptr)(int *) = &fun;
	  (*fun_ptr)(int_ptr);
  }
  
  void double_call()
  {
	  int *int_ptr = (int*)malloc(sizeof(int));
	  void (*fun_ptr)(int *) = &fun;
	  (*fun_ptr)(int_ptr);
	  (*fun_ptr)(int_ptr);
  }
  
  /*{ dg-begin-multiline-output "" }
      6 |         free(int_ptr);
        |         ^~~~~~~~~~~~~
    ‘double_call’: events 1-2
      |
      |   16 | void double_call()
      |      |      ^~~~~~~~~~~
      |      |      |
      |      |      (1) entry to ‘double_call’
      |   17 | {
      |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
      |      |                              ~~~~~~~~~~~~~~~~~~~
      |      |                              |
      |      |                              (2) allocated here
      |
      +--> ‘fun’: events 3-6
             |
             |    4 | void fun(int *int_ptr)
             |      |      ^~~
             |      |      |
             |      |      (3) entry to ‘fun’
             |      |      (5) entry to ‘fun’
             |    5 | {
             |    6 |         free(int_ptr);
             |      |         ~~~~~~~~~~~~~
             |      |         |
             |      |         (4) first ‘free’ here
             |      |         (6) second ‘free’ here; first ‘free’ was at (4)
             |
  */
  ```
  (godbolt link <https://godbolt.org/z/1o3cK4aYo>)

  2. ( pr100546.c <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>)
  ```
  #include <stdio.h>
  #include <cstdlib.h>
  
  static void noReturn(const char *str) __attribute__((noreturn));
  static void noReturn(const char *str) {
      printf("%s\n", str);
      exit(1);
  }
  
  void (*noReturnPtr)(const char *str) = &noReturn;
  
  int main(int argc, char **argv) {
      char *str = 0;
      if (!str)
          noReturnPtr(__FILE__);
      return printf("%c\n", *str);
  }
  ```
  (godbolt link <https://godbolt.org/z/aWfW51se3>)

- But at the time of testing ( command used 
  was `make check-gcc RUNTESTFLAGS="-v -v analyzer.exp=pr100546.c"`), both of 
  them failed unexpectedly with Segmentation fault at the call

- From further inspection, I found out that this is due 
  "-fanalyzer-call-summaries" option, which looks like activats call summaries

- I would look into this in more details ( with gdb ) tomorrow, right now 
  my guess is that this is either due too the changes I did in state-purge.cc
  or is a call-summary related problem ( I remember it not being 
  perfetly implemented right now). 

---
STATUS AT THE END OF THE DAY :- 

- Complete the first evaluation of GSoC ( done )
- Fix the indentation errors my generated by my editor on changes done till now ( done )
- Layout a new region subclass for vtables ( done )
- Create a ChangeLog for the next patch ( done )
- Add the tests to regress testing ( pending )
- Attach the patch with this mail ( pending )

---
HOUR-O-METER :- 
no. of hours spent on the project today : 4 hours
Grand total (by the end of 14th July 2021): 195 hours

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-12 16:37                                     ` Ankur Saini
  2021-07-14 17:11                                       ` Ankur Saini
@ 2021-07-14 23:07                                       ` David Malcolm
  1 sibling, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-14 23:07 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Mon, 2021-07-12 at 22:07 +0530, Ankur Saini wrote:
> > 
> > On 11-Jul-2021, at 11:19 PM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Sat, 2021-07-10 at 21:27 +0530, Ankur Saini wrote:

[...]

> > > 
> > > - for the callstring patch, I created a patch file ( using git
> > > format-
> > > patch ) and sent it to patches mailing list (via git send email )
> > > and
> > > CCed my mentor.
> > > Although I received a positive reply from the command log (git send
> > > email) saying the mail was sent , I didn’t received that mail (
> > > being
> > > subscribed to the patches list ) regarding the same ( I sent that
> > > just
> > > before sending this mail ).
> > > The mail should be sent from arsenic@sourceware.org <mailto: 
> > > arsenic@sourceware.org> <mailto:
> > > arsenic@sourceware.org <mailto:arsenic@sourceware.org>> 
> > 
> > Thanks.
> > 
> > I see the patch email in the list archives here:
> >  https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html < 
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574888.html>
> > but for some reason it's not showing up in my inbox.  I'm not sure
> > why;
> > I recently got migrated to a new email server and my filters are
> > currently a mess so it could be a problem at my end; sorry if that's
> > the case.
> > 
> > Given that neither you nor I seem to have received the patch I wonder
> > if anyone else received it?
> 
> Then I think it’s better to attach patch file with the updates here
> instead.

FWIW I use "git send-email".  Might be worth trying that again to see
if it happens again, or if it was a one-time glitch.

[...]

> 
> 
> > 
> > If your editor supports it, it's easy to comply with a project's
> > coding
> > standards, otherwise it can be a pain.
> 
> Oh, I see. This explains the weird indentation convention I was seeing
> throughout the source. Actually my editor dynamically adjusts the width
> of the tab depending on the style used in source file and due to some
> reasons, it decided that it was 2 space wide here, this was leading to
> some weird indentations throughout the source. 
> Well now it should be fixed, I manually adjusted it to be standard 8
> wide now and switched of converting tabs to spaces in my editor
> settings.

Well, it is 2 spaces wide, but using tabs to take the place of 8 spaces
at a time when the indentation gets too large.

[...]
> 
> 
> > 
> > [...]
> > 
> > > > From 95572742f1aaad1975aa35a663e8b26e671d4323 Mon Sep 17 00:00:00
> > > > 2001
> > > From: Ankur Saini <arsenic@sourceware.org <mailto:  
> > > arsenic@sourceware.org>>
> > > Date: Sat, 10 Jul 2021 19:28:49 +0530
> > > Subject: [PATCH 2/2] analyzer: make callstring's pairs of
> > > supernodes
> > > statically allocated [GSoC]
> > > 
> > >    2021-07-10  Ankur Saini  <arsenic@sourceware.org <mailto:  
> > > arsenic@sourceware.org>>
> > > 
> > >         gcc/analyzer/
> > >            * call-string.cc <http://call-string.cc/>: store a
> > > vector of std::pair of supernode* instead of pointer to them
> > >            * call-string.h: create a typedef for "auto_vec<const
> > > std::pair<const supernode *,const supernode *>*> m_supernodes;" to
> > > enhance readibility
> > 
> > ...and to avoid having really long lines!
> > 
> > >            * program-point.cc <http://program-point.cc/>: refactor
> > > program point to work with new call-string format
> > 
> > I think it's going to be much easier for me if you squash these two
> > patches into a single patch so I can review the combined change.  (If
> > you've not seen it yet, try out "git rebase -i" to see how to do
> > this).
> 
> woah, this is magic !
> I always use to perform a soft reset ( git reset —soft <commit> ) and
> commit in order to squash or reword my commits before, but never knew
> we could change history locally like this, amazing : D

I love "git rebase -i" and "git add -p"; together they make me look
like a much better programmer than I really am :)

[...]

> > > 
> > > + typedef std::pair<const supernode *,const supernode *> element_t;
> > 
> > Rather than a std::pair, I think a struct inside call_string like
> > this
> > would be better: rather than "first" and "second" we can refer to
> > "m_caller" and "m_callee", which is closer to being self-documenting,
> > and it allows us to add member functions e.g. "get_caller_function":
> > 
> > class call_string
> > {
> > public:
> >  struct element_t
> >  {
> >    element_t (const supernode *caller, const supernode *callee)
> >    : m_caller (caller), m_callee (callee)
> >    {
> >    }
> > 
> >    function *get_caller_function () const {/*etc*/}
> >    function *get_callee_function () const {/*etc*/}
> > 
> >    const supernode *m_caller;
> >    const supernode *m_callee;
> >  };
> > 
> > };
> > 
> > [...snip...]
> > 
> > which might clarify the code further.
> 
> Instead of putting that struct inside the class, I declared it globally
> and overloaded some basic operators ( “==“ and “!=“ ) to make it work
> without having to change a lot how it is being handled in other areas
> of source ( program_point.cc and engine.cc ).

Fair enough, but calling it "element_t" seems too generic to me if it's
going to be in global scope.  I was suggesting to put it in call_string
so that it's a call_string::element_t when referred to from global
scope.

> 
> > 
> > > 
> > > - I also added the example I was using to test the analyzer to
> > > regression tests as "gcc/testsuite/gcc.dg/analyzer/call-via-
> > > fnptr.c”.
> > 
> > Great!  Please add it to the patch.
> 
> I am thinking to add it with later patches as this one only focuses on
> changing the call_string only and doesn’t quite fix this bug.

[...snip...]

The patch is looking good.  Some very minor nits:
> +function *
> +element_t::get_caller_function () const
> +{
> +  return m_caller->get_function();

Missing space before the open-parenthesis in the call to get_function.

> +}
> +
> +function *
> +element_t::get_callee_function () const
> +{
> +  return m_callee->get_function();
> +}

Likewise here.

Otherwise, looks good.  How does this affect the testsuite results?

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-14 17:11                                       ` Ankur Saini
@ 2021-07-14 23:23                                         ` David Malcolm
  2021-07-16 15:34                                           ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-14 23:23 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> CURRENT STATUS OF PROJECT:
> 
> - The analyzer can now sucessfully detect and analyze function calls
> that 
>   doesn't have a callgraph edge ( like a call via function pointer )

Excellent.

> 
> - A weird indentation problem caused by my text editor pointed out in
>   one of the previous mails ( 
> https://gcc.gnu.org/pipermail/gcc/2021-July/236747.html) 
>   , that despite being fixed, still messed up indentation in all of
> the changes
>   I have done so far.
> 
> - the analyser can still not detect a call via vtable pointer
> 
> ---
> AIM FOR TODAY: 
> 
> - Complete the first evaluation of GSoC
> - Fix the indentation errors my generated by my editor on changes
> done till now
> - Add the tests to regress testing 
> - Create a ChangeLog for the next patch 
> - Attach the patch with this mail 
> - Layout a new region subclass for vtables ( getting ready for next
> patch )
> 
> ---
> PROGRESS  :
> 
> - To fix the indentaion problem, I simply created a diff and fixed
> all of them
>   manually. I also found and read a doc regarding coding convention
> used by GCC 
>   (https://gcc.gnu.org/codingconventions.html) and refactored the
> chagnes and
>   changelog to follow this.

Great.

> 
> - After that I branched out and layed out foundation for next update
>   and started created a subclass region for vtable ( vtable_region ),
> which  
>   currently do nothing
> 
> - After that in order to give some final finishing touches to
> previous changes,
>   I created chagnelog and added 2 more tests to the analyzer
> testsuite as
>   follows :
> 
>   1. (function-ptr-4.c)
>   ```
[...snip...]
>   ```
>   (godbolt link <https://godbolt.org/z/1o3cK4aYo>)

Looks promising.

Does this work in DejaGnu?  The directive:
  /* { dg-warning "double-‘free’ of ‘int_ptr’" } */
might need changing to:
  /* { dg-warning "double-'free' of 'int_ptr'" } */
i.e. fixing the quotes to use ASCII ' rather than ‘ and ’.

It's worth running the testcases with LANG=C when generating the
expected outputs.  IIRC this is done automatically by the various "make
check-*".


> 
>   2. ( pr100546.c <   
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>)
>   ```
>   #include <stdio.h>
>   #include <cstdlib.h>
>   
>   static void noReturn(const char *str) __attribute__((noreturn));
>   static void noReturn(const char *str) {
>       printf("%s\n", str);
>       exit(1);
>   }
>   
>   void (*noReturnPtr)(const char *str) = &noReturn;
>   
>   int main(int argc, char **argv) {
>       char *str = 0;
>       if (!str)
>           noReturnPtr(__FILE__);
>       return printf("%c\n", *str);
>   }
>   ```
>   (godbolt link <https://godbolt.org/z/aWfW51se3>)
> 
> - But at the time of testing ( command used 
>   was `make check-gcc RUNTESTFLAGS="-v -v analyzer.exp=pr100546.c"`),
> both of 
>   them failed unexpectedly with Segmentation fault at the call
> 
> - From further inspection, I found out that this is due 
>   "-fanalyzer-call-summaries" option, which looks like activats call
> summaries
> 
> - I would look into this in more details ( with gdb ) tomorrow, right
> now 
>   my guess is that this is either due too the changes I did in state-
> purge.cc
>   or is a call-summary related problem ( I remember it not being 
>   perfetly implemented right now). 

I'm not proud of the call summary code, so that may well be the
problem.

Are you able to use gdb on the analyzer?  It ought to be fairly
painless to identify where a segfault is happening, so let me know if
you're running into any problems with that.

> 
> ---
> STATUS AT THE END OF THE DAY :- 
> 
> - Complete the first evaluation of GSoC ( done )
> - Fix the indentation errors my generated by my editor on changes
> done till now ( done )
> - Layout a new region subclass for vtables ( done )
> - Create a ChangeLog for the next patch ( done )
> - Add the tests to regress testing ( pending )
> - Attach the patch with this mail ( pending )
> 
> ---
> HOUR-O-METER :- 
> no. of hours spent on the project today : 4 hours
> Grand total (by the end of 14th July 2021): 195 hours

Thanks for estimating the time you're spending on the project.  I'm
wondering what the "grand total" above is covering: are you counting
the application and "community bonding" periods, or just the "coding"
period.

Do you have more of a per-week breakdown for the coding period?

The guidance from Google is that students are expected to spend
roughtly 175 hours total in the coding period of a GSoC 2021 project,
so I'm a bit alarmed if you've already spent more than that time when
we're only halfway through.

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-14 23:23                                         ` David Malcolm
@ 2021-07-16 15:34                                           ` Ankur Saini
  2021-07-16 21:27                                             ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-16 15:34 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 15-Jul-2021, at 4:53 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
>> CURRENT STATUS OF PROJECT:
>> 
>> - The analyzer can now sucessfully detect and analyze function calls
>> that 
>>   doesn't have a callgraph edge ( like a call via function pointer )
> 
> Excellent.
> 
>> 
>> - A weird indentation problem caused by my text editor pointed out in
>>   one of the previous mails ( 
>> https://gcc.gnu.org/pipermail/gcc/2021-July/236747.html) 
>>   , that despite being fixed, still messed up indentation in all of
>> the changes
>>   I have done so far.
>> 
>> - the analyser can still not detect a call via vtable pointer
>> 
>> ---
>> AIM FOR TODAY: 
>> 
>> - Complete the first evaluation of GSoC
>> - Fix the indentation errors my generated by my editor on changes
>> done till now
>> - Add the tests to regress testing 
>> - Create a ChangeLog for the next patch 
>> - Attach the patch with this mail 
>> - Layout a new region subclass for vtables ( getting ready for next
>> patch )
>> 
>> ---
>> PROGRESS  :
>> 
>> - To fix the indentaion problem, I simply created a diff and fixed
>> all of them
>>   manually. I also found and read a doc regarding coding convention
>> used by GCC 
>>   (https://gcc.gnu.org/codingconventions.html) and refactored the
>> chagnes and
>>   changelog to follow this.
> 
> Great.
> 
>> 
>> - After that I branched out and layed out foundation for next update
>>   and started created a subclass region for vtable ( vtable_region ),
>> which  
>>   currently do nothing
>> 
>> - After that in order to give some final finishing touches to
>> previous changes,
>>   I created chagnelog and added 2 more tests to the analyzer
>> testsuite as
>>   follows :
>> 
>>   1. (function-ptr-4.c)
>>   ```
> [...snip...]
>>   ```
>>   (godbolt link <https://godbolt.org/z/1o3cK4aYo <https://godbolt.org/z/1o3cK4aYo>>)
> 
> Looks promising.
> 
> Does this work in DejaGnu?  The directive:
>  /* { dg-warning "double-‘free’ of ‘int_ptr’" } */
> might need changing to:
>  /* { dg-warning "double-'free' of 'int_ptr'" } */
> i.e. fixing the quotes to use ASCII ' rather than ‘ and ’.
> 
> It's worth running the testcases with LANG=C when generating the
> expected outputs.  IIRC this is done automatically by the various "make
> check-*”.

ok

> 
> 
>> 
>>   2. ( pr100546.c <   
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
>>   ```
>>   #include <stdio.h>
>>   #include <cstdlib.h>
>>   
>>   static void noReturn(const char *str) __attribute__((noreturn));
>>   static void noReturn(const char *str) {
>>       printf("%s\n", str);
>>       exit(1);
>>   }
>>   
>>   void (*noReturnPtr)(const char *str) = &noReturn;
>>   
>>   int main(int argc, char **argv) {
>>       char *str = 0;
>>       if (!str)
>>           noReturnPtr(__FILE__);
>>       return printf("%c\n", *str);
>>   }
>>   ```
>>   (godbolt link <https://godbolt.org/z/aWfW51se3 <https://godbolt.org/z/aWfW51se3>>)
>> 
>> - But at the time of testing ( command used 
>>   was `make check-gcc RUNTESTFLAGS="-v -v analyzer.exp=pr100546.c"`),
>> both of 
>>   them failed unexpectedly with Segmentation fault at the call
>> 
>> - From further inspection, I found out that this is due 
>>   "-fanalyzer-call-summaries" option, which looks like activats call
>> summaries
>> 
>> - I would look into this in more details ( with gdb ) tomorrow, right
>> now 
>>   my guess is that this is either due too the changes I did in state-
>> purge.cc <http://purge.cc/>
>>   or is a call-summary related problem ( I remember it not being 
>>   perfetly implemented right now). 
> 
> I'm not proud of the call summary code, so that may well be the
> problem.
> 
> Are you able to use gdb on the analyzer?  It ought to be fairly
> painless to identify where a segfault is happening, so let me know if
> you're running into any problems with that.

Yes, I used gdb on the analyzer to go into details and looks like I was correct, the program was crashing in “analysis_plan::use_summary_p ()” on line 114 ( const cgraph_node *callee = edge->callee; ) where program was trying to access callgraph edge which didn’t exist .

I fixed it by simply making analyzer abort using call summaries in absence of callgraph edge.

File: {src-dir}/gcc/analyzer/analysis-plan.cc

105: bool
106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
107: {
108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
109:   if (!flag_analyzer_call_summaries)
110:     return false;
111: 
112:   /* Don't use call summaries if there is no callgraph edge */
113:   if(!edge || !edge->callee)
114:     return false;

and now the tests are passing successfully. ( both manually and via DejaGnu ).

I have attached a sample patch of work done till now with this mail for review ( I haven’t sent this one to the patches list as it’s change log was not complete for now ).

P.S. I have also sent another mail ( https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to patches list with the previous call-string patch and this time it popped up in my inbox as it should, did you also received it now ?



Thanks 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-16 15:34                                           ` Ankur Saini
@ 2021-07-16 21:27                                             ` David Malcolm
  2021-07-21 16:14                                               ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-16 21:27 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
> 
> 
> > On 15-Jul-2021, at 4:53 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> > 
> > 

[...snip...]

> > 
> > > 
> > >   2. ( pr100546.c <   
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 < 
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
> > >   ```
> > >   #include <stdio.h>
> > >   #include <cstdlib.h>
> > >   
> > >   static void noReturn(const char *str) __attribute__((noreturn));
> > >   static void noReturn(const char *str) {
> > >       printf("%s\n", str);
> > >       exit(1);
> > >   }
> > >   
> > >   void (*noReturnPtr)(const char *str) = &noReturn;
> > >   
> > >   int main(int argc, char **argv) {
> > >       char *str = 0;
> > >       if (!str)
> > >           noReturnPtr(__FILE__);
> > >       return printf("%c\n", *str);
> > >   }
> > >   ```
> > >   (godbolt link <https://godbolt.org/z/aWfW51se3 < 
> > > https://godbolt.org/z/aWfW51se3>>)
> > > 
> > > - But at the time of testing ( command used 
> > >   was `make check-gcc RUNTESTFLAGS="-v -v
> > > analyzer.exp=pr100546.c"`),
> > > both of 
> > >   them failed unexpectedly with Segmentation fault at the call
> > > 
> > > - From further inspection, I found out that this is due 
> > >   "-fanalyzer-call-summaries" option, which looks like activats
> > > call
> > > summaries
> > > 
> > > - I would look into this in more details ( with gdb ) tomorrow,
> > > right
> > > now 
> > >   my guess is that this is either due too the changes I did in
> > > state-
> > > purge.cc <http://purge.cc/>
> > >   or is a call-summary related problem ( I remember it not being 
> > >   perfetly implemented right now). 
> > 
> > I'm not proud of the call summary code, so that may well be the
> > problem.
> > 
> > Are you able to use gdb on the analyzer?  It ought to be fairly
> > painless to identify where a segfault is happening, so let me know if
> > you're running into any problems with that.
> 
> Yes, I used gdb on the analyzer to go into details and looks like I was
> correct, the program was crashing in “analysis_plan::use_summary_p ()”
> on line 114 ( const cgraph_node *callee = edge->callee; ) where program
> was trying to access callgraph edge which didn’t exist .
> 
> I fixed it by simply making analyzer abort using call summaries in
> absence of callgraph edge.
> 
> File: {src-dir}/gcc/analyzer/analysis-plan.cc
> 
> 105: bool
> 106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
> 107: {
> 108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
> 109:   if (!flag_analyzer_call_summaries)
> 110:     return false;
> 111: 
> 112:   /* Don't use call summaries if there is no callgraph edge */
> 113:   if(!edge || !edge->callee)
> 114:     return false;
> 
> and now the tests are passing successfully. ( both manually and via
> DejaGnu ).

Great.

> 
> I have attached a sample patch of work done till now with this mail for
> review ( I haven’t sent this one to the patches list as it’s change log
> was not complete for now ).
> 
> P.S. I have also sent another mail (   
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <  
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to
> patches list with the previous call-string patch and this time it
> popped up in my inbox as it should, did you also received it now ?

I can see it in the archive URL, but for some reason it's not showing
up in my inbox.  Bother.  Please can you try resending it directly to
me?

Putting email issues to one side, the patch you linked to above looks
good.  To what extent has it been tested?  If it bootstraps and passes
the test suite, it's ready for trunk.

Note that over the last couple of days I pushed my "use of
uninitialized values" detection work to trunk (aka master), along with
various other changes, so it's worth pulling master and rebasing on top
of that before testing.  I *think* we've been touching different parts
of the analyzer code, but there's a chance you might have to resolve
some merge conflicts.

As for the patch you attached to this email
  "[PATCH] analyzer: make analyer detect calls via function pointers"
here's an initial review:

> diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
> index 7dfc48e9c3e..1c7e4d2cc84 100644
> --- a/gcc/analyzer/analysis-plan.cc
> +++ b/gcc/analyzer/analysis-plan.cc
> @@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) const
>    if (!flag_analyzer_call_summaries)
>      return false;
>  
> +  /* Don't use call summaries if there is no callgraph edge */
> +  if(!edge || !edge->callee)
> +    return false;

Is it possible for a cgraph_edge to have a NULL callee?  (I don't think
so, but I could be wrong)

Nit: missing space between "if" and open-paren

> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
> index 7662a7f7bab..f45a614c0ab 100644
> --- a/gcc/analyzer/engine.cc
> +++ b/gcc/analyzer/engine.cc
> @@ -3170,10 +3170,13 @@ exploded_graph::process_node (exploded_node *node)
>        break;
>      case PK_AFTER_SUPERNODE:
>        {
> +        bool found_a_superedge = false;
> +        bool is_an_exit_block = false;
>  	/* If this is an EXIT BB, detect leaks, and potentially
>  	   create a function summary.  */
>  	if (point.get_supernode ()->return_p ())
>  	  {
> +	    is_an_exit_block = true;
>  	    node->detect_leaks (*this);
>  	    if (flag_analyzer_call_summaries
>  		&& point.get_call_string ().empty_p ())
> @@ -3201,6 +3204,7 @@ exploded_graph::process_node (exploded_node *node)
>  	superedge *succ;
>  	FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
>  	  {
> +	    found_a_superedge = true;
>  	    if (logger)
>  	      logger->log ("considering SN: %i -> SN: %i",
>  			   succ->m_src->m_index, succ->m_dest->m_index);
> @@ -3210,19 +3214,100 @@ exploded_graph::process_node (exploded_node *node)
>  						 point.get_call_string ());
>  	    program_state next_state (state);
>  	    uncertainty_t uncertainty;
> +
> +	    /* Check if now the analyzer know about the call via 
> +               function pointer or not. */
> +            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL && 
> +                !(succ->get_any_callgraph_edge()))

Some formatting nits:
- on a multiline conditional, our convention is to put the "&&" at the
start of a line, rather than the end
- missing space beween function name and open-paren in call.

Hence this should be formatted as:

            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
	        && !(succ->get_any_callgraph_edge ()))


> +              {    
> +                const program_point *this_point = &node->get_point();
> +                const program_state *this_state = &node->get_state ();
> +                const gcall *call 
> +                  = this_point->get_supernode ()->get_final_call ();
> +
> +                impl_region_model_context ctxt (*this,
> +                  node, 
> +                  this_state, 
> +                  &next_state, 
> +                  &uncertainty,
> +                  this_point->get_stmt());
> +
> +                region_model *model = this_state->m_region_model;
> +                tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
> +                function *fun = DECL_STRUCT_FUNCTION(fn_decl);
> +                if(fun)
> +                {
> +                  const supergraph *sg = &(this->get_supergraph());

Might as well turn this into

> +                  const supergraph &sg = get_supergraph ();

and update various "sg->" into "sg." below.

> +                  supernode * sn_entry = sg->get_node_for_function_entry (fun);
> +                  supernode * sn_exit = sg->get_node_for_function_exit (fun);
> +
> +                  program_point new_point 
> +                    = program_point::before_supernode (sn_entry,
> +            				               NULL,
> +            				               point.get_call_string ());
> +
> +                  new_point.push_to_call_stack (sn_exit,
> +                  				next_point.get_supernode());
> +
> +                  next_state.push_call(*this, node, call, &uncertainty);

Some logging here could be helpful.


[...]

> +
> +    /* Return from the calls which doesn't have a return superedge.
> +    	Such case occurs when GCC's middle end didn't knew which function to
> +    	call but analyzer did */
> +    if((is_an_exit_block && !found_a_superedge) && 
> +       (!point.get_call_string().empty_p()))

Similar formatting nits as above; this should look something like:

    if ((is_an_exit_block && !found_a_superedge)
         && !point.get_call_string ().empty_p ())


[...snip...]

>  
> +/* do exatly what region_model::update_for_return_superedge() do
> +   but get the call info from CALL_STMT instead from a suerpedge and 
> +   is availabe publicically   */
> +void
> +region_model::update_for_return_gcall (const gcall *call_stmt,
> +             			       region_model_context *ctxt)
> +{
> +  /* Get the region for the result of the call, within the caller frame.  */
> +  const region *result_dst_reg = NULL;
> +  tree lhs = gimple_call_lhs (call_stmt);
> +  if (lhs)
> +    {
> +      /* Normally we access the top-level frame, which is:
> +         path_var (expr, get_stack_depth () - 1)
> +         whereas here we need the caller frame, hence "- 2" here.  */
> +      gcc_assert (get_stack_depth () >= 2);
> +      result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
> +           			   ctxt);
> +    }
> +
> +  pop_frame (result_dst_reg, NULL, ctxt);
> +}

I *think* this can be consolidated with update_for_return_superedge. 
Maybe instead, just rename that to update_for_return, and update the
callsite of update_for_return_superedge to get the call_stmt there.

That way we avoid copy&paste repetition of code.

I think something similar can be done for
region_model::update_for_gcall and update_for_call_superedge.

[...snip...]

> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> index 8611d0f8689..ace9e7b128a 100644
> --- a/gcc/analyzer/supergraph.cc
> +++ b/gcc/analyzer/supergraph.cc
> @@ -183,11 +183,34 @@ supergraph::supergraph (logger *logger)
>  	      m_stmt_to_node_t.put (stmt, node_for_stmts);
>  	      m_stmt_uids.make_uid_unique (stmt);
>  	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
> -		{
> -		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> -		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
> -		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
> -		}
> +    		{
> +    		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
> +    		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt),
> +    		   			     NULL);
> +    		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
> +    		}
> +	        else
> +	        {
> +	          // maybe call is via a function pointer
> +	          gcall *call = dyn_cast<gcall *> (stmt);
> +	          if (call)

You can combine the if and the decl/dyn_cast into one line:

	          if (gcall *call = dyn_cast<gcall *> (stmt))

which saves a little vertical space and (slightly) limits the scope of
"call".

> +	          {
> +	            cgraph_edge *edge 
> +		      = cgraph_node::get (fun->decl)->get_edge (stmt);
> +	            if (!edge || !edge->callee)
> +	            {
> +	              supernode *old_node_for_stmts = node_for_stmts;
> +	              node_for_stmts = add_node (fun, bb, call, NULL);
> +
> +	              superedge *sedge 
> +	                = new callgraph_superedge (old_node_for_stmts,
> +	                  			   node_for_stmts,
> +	                  			   SUPEREDGE_INTRAPROCEDURAL_CALL,
> +	                  			   NULL);
> +	              add_edge (sedge);
> +	            }
> +	          }
> +	        }
>  	    }

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> new file mode 100644
> index 00000000000..1054ab4f240
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> @@ -0,0 +1,53 @@
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +void fun(int *int_ptr)
> +{
> +	free(int_ptr);  /* { dg-warning "double-'free' of 'int_ptr'" } */
> +}
> +
> +void single_call()
> +{
> +	int *int_ptr = (int*)malloc(sizeof(int));
> +	void (*fun_ptr)(int *) = &fun;
> +	(*fun_ptr)(int_ptr);
> +}
> +
> +void double_call()
> +{
> +	int *int_ptr = (int*)malloc(sizeof(int));
> +	void (*fun_ptr)(int *) = &fun;
> +	(*fun_ptr)(int_ptr);
> +	(*fun_ptr)(int_ptr);
> +}
> +
> +/*{ dg-begin-multiline-output "" }

Normally the test suite injects "-fdiagnostics-plain-output" into the
options, which turns off the source code printing and ASCII art for
interprocedural paths.  If you want to test them via dg-begin-
multiline-output you'll need to have a look at the directives at the
top of one of the existing tests that uses dg-begin-multiline-output.

> +    6 |         free(int_ptr);
> +      |         ^~~~~~~~~~~~~
> +  'double_call': events 1-2
> +    |
> +    |   16 | void double_call()
> +    |      |      ^~~~~~~~~~~
> +    |      |      |
> +    |      |      (1) entry to 'double_call'
> +    |   17 | {
> +    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
> +    |      |                              ~~~~~~~~~~~~~~~~~~~
> +    |      |                              |
> +    |      |                              (2) allocated here
> +    |
> +    +--> 'fun': events 3-6
> +           |
> +           |    4 | void fun(int *int_ptr)
> +           |      |      ^~~
> +           |      |      |
> +           |      |      (3) entry to ‘fun’
> +           |      |      (5) entry to ‘fun’
> +           |    5 | {
> +           |    6 |         free(int_ptr);
> +           |      |         ~~~~~~~~~~~~~
> +           |      |         |
> +           |      |         (4) first 'free' here
> +           |      |         (6) second 'free' here; first 'free' was at (4)
> +           |
> +*/

and normally there would be a terminating:

  { dg-end-multiline-output "" } */

In any case, the output above is missing some events: I think ideally
it would show the calls from *fun_ptr to fun and the returns, giving
something like the following (which I mocked up by hand):

  'double_call': events 1-3
    |
    |   16 | void double_call()
    |      |      ^~~~~~~~~~~
    |      |      |
    |      |      (1) entry to 'double_call'
    |   17 | {
    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
    |      |                              ~~~~~~~~~~~~~~~~~~~
    |      |                              |
    |      |                              (2) allocated here
    | ...  |
    |   19 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (3) calling 'fun' from 'double-call'
    |
    +--> 'fun': events 3-6
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (4) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (5) first 'free' here
           |
    <------+
    |
  'double_call': events 6-7
    |
    |   19 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (6) returning to 'double-call' from 'fun'
    |   20 |         (*fun_ptr)(int_ptr);
    |      |         ^~~~~~~~~~~~~~~~~~~
    |      |         |
    |      |         (7) calling 'fun' from 'double-call'
    |
    +--> 'fun': events 8-9
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (8) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (9) second 'free' here; first 'free' was at (5)

The events are created in diagnostic-manager.cc

Am I right in thinking that there's a interprocedural superedge for the
dynamically-discovered calls?

diagnostic_manager::add_events_for_superedge creates events for calls
and returns, so maybe you just need to update the case
SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
"dynamically discovered edge" cases (compare it with the other cases in
that function).   You might need to update the existing call_event and
return_event subclasses slightly (see checker-path.cc/h)

Ideally, event (7) above would read
  "passing freed pointer 'int_ptr' in call to 'fun from 'double_call"
but that's advanced material :)

[...snip...]

Hope this makes sense and is constructive
Dave



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-16 21:27                                             ` David Malcolm
@ 2021-07-21 16:14                                               ` Ankur Saini
  2021-07-22 17:10                                                 ` Ankur Saini
  2021-07-22 23:07                                                 ` David Malcolm
  0 siblings, 2 replies; 45+ messages in thread
From: Ankur Saini @ 2021-07-21 16:14 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 17-Jul-2021, at 2:57 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
>> 
>> 
>>> On 15-Jul-2021, at 4:53 AM, David Malcolm <dmalcolm@redhat.com>
>>> wrote:
>>> 
>>> On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
>>> 
>>> 
> 
> [...snip...]
> 
>>> 
>>>> 
>>>>   2. ( pr100546.c <   
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546 < 
>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100546>>)
>>>>   ```
>>>>   #include <stdio.h>
>>>>   #include <cstdlib.h>
>>>>   
>>>>   static void noReturn(const char *str) __attribute__((noreturn));
>>>>   static void noReturn(const char *str) {
>>>>       printf("%s\n", str);
>>>>       exit(1);
>>>>   }
>>>>   
>>>>   void (*noReturnPtr)(const char *str) = &noReturn;
>>>>   
>>>>   int main(int argc, char **argv) {
>>>>       char *str = 0;
>>>>       if (!str)
>>>>           noReturnPtr(__FILE__);
>>>>       return printf("%c\n", *str);
>>>>   }
>>>>   ```
>>>>   (godbolt link <https://godbolt.org/z/aWfW51se3 < 
>>>> https://godbolt.org/z/aWfW51se3>>)
>>>> 
>>>> - But at the time of testing ( command used 
>>>>   was `make check-gcc RUNTESTFLAGS="-v -v
>>>> analyzer.exp=pr100546.c"`),
>>>> both of 
>>>>   them failed unexpectedly with Segmentation fault at the call
>>>> 
>>>> - From further inspection, I found out that this is due 
>>>>   "-fanalyzer-call-summaries" option, which looks like activats
>>>> call
>>>> summaries
>>>> 
>>>> - I would look into this in more details ( with gdb ) tomorrow,
>>>> right
>>>> now 
>>>>   my guess is that this is either due too the changes I did in
>>>> state-
>>>> purge.cc <http://purge.cc/>
>>>>   or is a call-summary related problem ( I remember it not being 
>>>>   perfetly implemented right now). 
>>> 
>>> I'm not proud of the call summary code, so that may well be the
>>> problem.
>>> 
>>> Are you able to use gdb on the analyzer?  It ought to be fairly
>>> painless to identify where a segfault is happening, so let me know if
>>> you're running into any problems with that.
>> 
>> Yes, I used gdb on the analyzer to go into details and looks like I was
>> correct, the program was crashing in “analysis_plan::use_summary_p ()”
>> on line 114 ( const cgraph_node *callee = edge->callee; ) where program
>> was trying to access callgraph edge which didn’t exist .
>> 
>> I fixed it by simply making analyzer abort using call summaries in
>> absence of callgraph edge.
>> 
>> File: {src-dir}/gcc/analyzer/analysis-plan.cc
>> 
>> 105: bool
>> 106: analysis_plan::use_summary_p (const cgraph_edge *edge) const
>> 107: {
>> 108:   /* Don't use call summaries if -fno-analyzer-call-summaries.  */
>> 109:   if (!flag_analyzer_call_summaries)
>> 110:     return false;
>> 111: 
>> 112:   /* Don't use call summaries if there is no callgraph edge */
>> 113:   if(!edge || !edge->callee)
>> 114:     return false;
>> 
>> and now the tests are passing successfully. ( both manually and via
>> DejaGnu ).
> 
> Great.
> 
>> 
>> I have attached a sample patch of work done till now with this mail for
>> review ( I haven’t sent this one to the patches list as it’s change log
>> was not complete for now ).
>> 
>> P.S. I have also sent another mail (   
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <  
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html> ) to
>> patches list with the previous call-string patch and this time it
>> popped up in my inbox as it should, did you also received it now ?
> 
> I can see it in the archive URL, but for some reason it's not showing
> up in my inbox.  Bother.  Please can you try resending it directly to
> me?

Ok, I have sent the call-string patch directly to you. I have actually sent 2 mails ( from different mail ids ) to check if it’s the id which is causing the issue or the contents of the mail itself.

> 
> Putting email issues to one side, the patch you linked to above looks
> good.  To what extent has it been tested?  If it bootstraps and passes
> the test suite, it's ready for trunk.

It bootstrapped successfully on a couple of the x86_64 machines ( on gcc farm ) And regress testing is underway.

> 
> Note that over the last couple of days I pushed my "use of
> uninitialized values" detection work to trunk (aka master), along with
> various other changes, so it's worth pulling master and rebasing on top
> of that before testing.  I *think* we've been touching different parts
> of the analyzer code, but there's a chance you might have to resolve
> some merge conflicts.

I have been constantly updating my branch as soon as I see a commit on analyzer ( on patches list ) to avoid any conflicts. 
Till now I haven’t encountered any.

> 
> As for the patch you attached to this email
>  "[PATCH] analyzer: make analyer detect calls via function pointers"
> here's an initial review:

thanks for the review

> 
>> diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
>> index 7dfc48e9c3e..1c7e4d2cc84 100644
>> --- a/gcc/analyzer/analysis-plan.cc
>> +++ b/gcc/analyzer/analysis-plan.cc
>> @@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) const
>>   if (!flag_analyzer_call_summaries)
>>     return false;
>> 
>> +  /* Don't use call summaries if there is no callgraph edge */
>> +  if(!edge || !edge->callee)
>> +    return false;
> 
> Is it possible for a cgraph_edge to have a NULL callee?  (I don't think
> so, but I could be wrong)

Maybe not, I did it because in function supergraph_call_edge (), there was a similar case where it was pointed out that this might be the case when call is happening via function pointers.

File: {source_dir}/gcc/analyzer/supergraph.cc
72: /* Get the cgraph_edge, but only if there's an underlying function body.  */
73: 
74: cgraph_edge *
75: supergraph_call_edge (function *fun, gimple *stmt)
76: {
77:   gcall *call = dyn_cast<gcall *> (stmt);
78:   if (!call)
79:     return NULL;
80:   cgraph_edge *edge = cgraph_node::get (fun->decl)->get_edge (stmt);
81:   if (!edge)
82:     return NULL;
83:   if (!edge->callee)
84:     return NULL; /* e.g. for a function pointer.  */
85:   if (!get_ultimate_function_for_cgraph_edge (edge))
86:     return NULL;
87:   return edge;
88: }

> 
> Nit: missing space between "if" and open-paren

Fixed

> 
>> diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
>> index 7662a7f7bab..f45a614c0ab 100644
>> --- a/gcc/analyzer/engine.cc
>> +++ b/gcc/analyzer/engine.cc
>> @@ -3170,10 +3170,13 @@ exploded_graph::process_node (exploded_node *node)
>>       break;
>>     case PK_AFTER_SUPERNODE:
>>       {
>> +        bool found_a_superedge = false;
>> +        bool is_an_exit_block = false;
>> 	/* If this is an EXIT BB, detect leaks, and potentially
>> 	   create a function summary.  */
>> 	if (point.get_supernode ()->return_p ())
>> 	  {
>> +	    is_an_exit_block = true;
>> 	    node->detect_leaks (*this);
>> 	    if (flag_analyzer_call_summaries
>> 		&& point.get_call_string ().empty_p ())
>> @@ -3201,6 +3204,7 @@ exploded_graph::process_node (exploded_node *node)
>> 	superedge *succ;
>> 	FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
>> 	  {
>> +	    found_a_superedge = true;
>> 	    if (logger)
>> 	      logger->log ("considering SN: %i -> SN: %i",
>> 			   succ->m_src->m_index, succ->m_dest->m_index);
>> @@ -3210,19 +3214,100 @@ exploded_graph::process_node (exploded_node *node)
>> 						 point.get_call_string ());
>> 	    program_state next_state (state);
>> 	    uncertainty_t uncertainty;
>> +
>> +	    /* Check if now the analyzer know about the call via 
>> +               function pointer or not. */
>> +            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL && 
>> +                !(succ->get_any_callgraph_edge()))
> 
> Some formatting nits:
> - on a multiline conditional, our convention is to put the "&&" at the
> start of a line, rather than the end
> - missing space beween function name and open-paren in call.
> 
> Hence this should be formatted as:
> 
>            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
> 	        && !(succ->get_any_callgraph_edge ()))

Fixed

> 
> 
>> +              {    
>> +                const program_point *this_point = &node->get_point();
>> +                const program_state *this_state = &node->get_state ();
>> +                const gcall *call 
>> +                  = this_point->get_supernode ()->get_final_call ();
>> +
>> +                impl_region_model_context ctxt (*this,
>> +                  node, 
>> +                  this_state, 
>> +                  &next_state, 
>> +                  &uncertainty,
>> +                  this_point->get_stmt());
>> +
>> +                region_model *model = this_state->m_region_model;
>> +                tree fn_decl = model->get_fndecl_for_call(call,&ctxt);
>> +                function *fun = DECL_STRUCT_FUNCTION(fn_decl);
>> +                if(fun)
>> +                {
>> +                  const supergraph *sg = &(this->get_supergraph());
> 
> Might as well turn this into
> 
>> +                  const supergraph &sg = get_supergraph ();
> 
> and update various "sg->" into "sg." below.

Done

> 
>> +                  supernode * sn_entry = sg->get_node_for_function_entry (fun);
>> +                  supernode * sn_exit = sg->get_node_for_function_exit (fun);
>> +
>> +                  program_point new_point 
>> +                    = program_point::before_supernode (sn_entry,
>> +            				               NULL,
>> +            				               point.get_call_string ());
>> +
>> +                  new_point.push_to_call_stack (sn_exit,
>> +                  				next_point.get_supernode());
>> +
>> +                  next_state.push_call(*this, node, call, &uncertainty);
> 
> Some logging here could be helpful.

Will do

> 
> 
> [...]
> 
>> +
>> +    /* Return from the calls which doesn't have a return superedge.
>> +    	Such case occurs when GCC's middle end didn't knew which function to
>> +    	call but analyzer did */
>> +    if((is_an_exit_block && !found_a_superedge) && 
>> +       (!point.get_call_string().empty_p()))
> 
> Similar formatting nits as above; this should look something like:
> 
>    if ((is_an_exit_block && !found_a_superedge)
>         && !point.get_call_string ().empty_p ())

Fixed

> 
> 
> [...snip...]
> 
>> 
>> +/* do exatly what region_model::update_for_return_superedge() do
>> +   but get the call info from CALL_STMT instead from a suerpedge and 
>> +   is availabe publicically   */
>> +void
>> +region_model::update_for_return_gcall (const gcall *call_stmt,
>> +             			       region_model_context *ctxt)
>> +{
>> +  /* Get the region for the result of the call, within the caller frame.  */
>> +  const region *result_dst_reg = NULL;
>> +  tree lhs = gimple_call_lhs (call_stmt);
>> +  if (lhs)
>> +    {
>> +      /* Normally we access the top-level frame, which is:
>> +         path_var (expr, get_stack_depth () - 1)
>> +         whereas here we need the caller frame, hence "- 2" here.  */
>> +      gcc_assert (get_stack_depth () >= 2);
>> +      result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
>> +           			   ctxt);
>> +    }
>> +
>> +  pop_frame (result_dst_reg, NULL, ctxt);
>> +}
> 
> I *think* this can be consolidated with update_for_return_superedge. 
> Maybe instead, just rename that to update_for_return, and update the
> callsite of update_for_return_superedge to get the call_stmt there.
> 
> That way we avoid copy&paste repetition of code.
> 
> I think something similar can be done for
> region_model::update_for_gcall and update_for_call_superedge.

Done

> 
> [...snip...]
> 
>> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
>> index 8611d0f8689..ace9e7b128a 100644
>> --- a/gcc/analyzer/supergraph.cc
>> +++ b/gcc/analyzer/supergraph.cc
>> @@ -183,11 +183,34 @@ supergraph::supergraph (logger *logger)
>> 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
>> 	      m_stmt_uids.make_uid_unique (stmt);
>> 	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
>> -		{
>> -		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
>> -		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
>> -		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
>> -		}
>> +    		{
>> +    		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
>> +    		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt),
>> +    		   			     NULL);
>> +    		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
>> +    		}
>> +	        else
>> +	        {
>> +	          // maybe call is via a function pointer
>> +	          gcall *call = dyn_cast<gcall *> (stmt);
>> +	          if (call)
> 
> You can combine the if and the decl/dyn_cast into one line:
> 
> 	          if (gcall *call = dyn_cast<gcall *> (stmt))
> 
> which saves a little vertical space and (slightly) limits the scope of
> "call”.

Done

> 
>> +	          {
>> +	            cgraph_edge *edge 
>> +		      = cgraph_node::get (fun->decl)->get_edge (stmt);
>> +	            if (!edge || !edge->callee)
>> +	            {
>> +	              supernode *old_node_for_stmts = node_for_stmts;
>> +	              node_for_stmts = add_node (fun, bb, call, NULL);
>> +
>> +	              superedge *sedge 
>> +	                = new callgraph_superedge (old_node_for_stmts,
>> +	                  			   node_for_stmts,
>> +	                  			   SUPEREDGE_INTRAPROCEDURAL_CALL,
>> +	                  			   NULL);
>> +	              add_edge (sedge);
>> +	            }
>> +	          }
>> +	        }
>> 	    }
> 
> [...snip...]
> 
>> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
>> new file mode 100644
>> index 00000000000..1054ab4f240
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
>> @@ -0,0 +1,53 @@
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +
>> +void fun(int *int_ptr)
>> +{
>> +	free(int_ptr);  /* { dg-warning "double-'free' of 'int_ptr'" } */
>> +}
>> +
>> +void single_call()
>> +{
>> +	int *int_ptr = (int*)malloc(sizeof(int));
>> +	void (*fun_ptr)(int *) = &fun;
>> +	(*fun_ptr)(int_ptr);
>> +}
>> +
>> +void double_call()
>> +{
>> +	int *int_ptr = (int*)malloc(sizeof(int));
>> +	void (*fun_ptr)(int *) = &fun;
>> +	(*fun_ptr)(int_ptr);
>> +	(*fun_ptr)(int_ptr);
>> +}
>> +
>> +/*{ dg-begin-multiline-output "" }
> 
> Normally the test suite injects "-fdiagnostics-plain-output" into the
> options, which turns off the source code printing and ASCII art for
> interprocedural paths.  If you want to test them via dg-begin-
> multiline-output you'll need to have a look at the directives at the
> top of one of the existing tests that uses dg-begin-multiline-output.
> 
>> +    6 |         free(int_ptr);
>> +      |         ^~~~~~~~~~~~~
>> +  'double_call': events 1-2
>> +    |
>> +    |   16 | void double_call()
>> +    |      |      ^~~~~~~~~~~
>> +    |      |      |
>> +    |      |      (1) entry to 'double_call'
>> +    |   17 | {
>> +    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
>> +    |      |                              ~~~~~~~~~~~~~~~~~~~
>> +    |      |                              |
>> +    |      |                              (2) allocated here
>> +    |
>> +    +--> 'fun': events 3-6
>> +           |
>> +           |    4 | void fun(int *int_ptr)
>> +           |      |      ^~~
>> +           |      |      |
>> +           |      |      (3) entry to ‘fun’
>> +           |      |      (5) entry to ‘fun’
>> +           |    5 | {
>> +           |    6 |         free(int_ptr);
>> +           |      |         ~~~~~~~~~~~~~
>> +           |      |         |
>> +           |      |         (4) first 'free' here
>> +           |      |         (6) second 'free' here; first 'free' was at (4)
>> +           |
>> +*/
> 
> and normally there would be a terminating:
> 
>  { dg-end-multiline-output "" } */
> 
> In any case, the output above is missing some events: I think ideally
> it would show the calls from *fun_ptr to fun and the returns, giving
> something like the following (which I mocked up by hand):
> 
>  'double_call': events 1-3
>    |
>    |   16 | void double_call()
>    |      |      ^~~~~~~~~~~
>    |      |      |
>    |      |      (1) entry to 'double_call'
>    |   17 | {
>    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
>    |      |                              ~~~~~~~~~~~~~~~~~~~
>    |      |                              |
>    |      |                              (2) allocated here
>    | ...  |
>    |   19 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (3) calling 'fun' from 'double-call'
>    |
>    +--> 'fun': events 3-6
>           |
>           |    4 | void fun(int *int_ptr)
>           |      |      ^~~
>           |      |      |
>           |      |      (4) entry to ‘fun’
>           |    5 | {
>           |    6 |         free(int_ptr);
>           |      |         ~~~~~~~~~~~~~
>           |      |         |
>           |      |         (5) first 'free' here
>           |
>    <------+
>    |
>  'double_call': events 6-7
>    |
>    |   19 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (6) returning to 'double-call' from 'fun'
>    |   20 |         (*fun_ptr)(int_ptr);
>    |      |         ^~~~~~~~~~~~~~~~~~~
>    |      |         |
>    |      |         (7) calling 'fun' from 'double-call'
>    |
>    +--> 'fun': events 8-9
>           |
>           |    4 | void fun(int *int_ptr)
>           |      |      ^~~
>           |      |      |
>           |      |      (8) entry to ‘fun’
>           |    5 | {
>           |    6 |         free(int_ptr);
>           |      |         ~~~~~~~~~~~~~
>           |      |         |
>           |      |         (9) second 'free' here; first 'free' was at (5)
> 
> The events are created in diagnostic-manager.cc
> 
> Am I right in thinking that there's a interprocedural superedge for the
> dynamically-discovered calls?

No there isn’t, such calls will only have an exploded edge and no interprocedural superedge

> 
> diagnostic_manager::add_events_for_superedge creates events for calls
> and returns, so maybe you just need to update the case
> SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
> "dynamically discovered edge" cases (compare it with the other cases in
> that function).   You might need to update the existing call_event and
> return_event subclasses slightly (see checker-path.cc/h)

As we already have exploded edges representing the call, my idea was to add event for such cases via custom edge info ( similar to what we have for longjmp case ) instead of creating a special case in diagnostic_manager::add_events_for_superedge (). 

> 
> Ideally, event (7) above would read
>  "passing freed pointer 'int_ptr' in call to 'fun from 'double_call"
> but that's advanced material :)
> 
> [...snip...]
> 
> Hope this makes sense and is constructive
> Dave

Thanks 
- Ankur


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-21 16:14                                               ` Ankur Saini
@ 2021-07-22 17:10                                                 ` Ankur Saini
  2021-07-22 23:21                                                   ` David Malcolm
  2021-07-24 16:35                                                   ` Ankur Saini
  2021-07-22 23:07                                                 ` David Malcolm
  1 sibling, 2 replies; 45+ messages in thread
From: Ankur Saini @ 2021-07-22 17:10 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM FOR TODAY: 

- Add custom edge info to the eedges created for dynamically discovered calls
- Add the custom events to be showing in diagnostics
- update call_event and return_event to also work for the cases where there is no underlying superedge representing the call

---
PROGRESS  :

- I created "dynamic_call_info_t" subclass reprsenting custom info on the edge representing the dynamically discovered calls 

- I overloaded it's "add_events_to_path ()" function to add call and return event to checkers path

- Now call_event and return_event subclasses mostly make use of the underlying interprocedural superedge representing the call to work properly. To tackle this problem, I used the same method I used for callstring patch earlier working with src and dest supernodes instead of superedge )

- The call_event subclass (and same applies to return_event subclass also) now have 2 additional pointers to source and destination supernodes representing the call in absense of a superedge. 

- I have also tweeked a few more things to make it work, I think the best way to show them all is to attach a patch ( it should be attached with this mail ) for just the changes I did today for better understanding on what exactly have I changed since last update. ( this patch would be squashed in previous one before the final review ).

- After all the changes done, now the analyzer emmits the following error message for the test program ( godbolt link https://godbolt.org/z/Td8n4c9a6 <https://godbolt.org/z/Td8n4c9a6> ), which I think now emmits all the events it was missing before.

```
test.c: In function ‘fun’:
test.c:6:9: warning: double-‘free’ of ‘int_ptr’ [CWE-415] [-Wanalyzer-double-free]
    6 |         free(int_ptr);
      |         ^~~~~~~~~~~~~
  ‘double_call’: events 1-3
    |
    |   16 | void double_call()
    |      |      ^~~~~~~~~~~
    |      |      |
    |      |      (1) entry to ‘double_call’
    |   17 | {
    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
    |      |                              ~~~~~~~~~~~~~~~~~~~
    |      |                              |
    |      |                              (2) allocated here
    |   19 |         void (*fun_ptr)(int *) = &fun;
    |   20 |         (*fun_ptr)(int_ptr);
    |      |         ~~~~~~~~~~~~~~~~~~~
    |      |          |
    |      |          (3) calling ‘fun’ from ‘double_call’
    |
    +--> ‘fun’: events 4-5
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (4) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (5) first ‘free’ here
           |
    <------+
    |
  ‘double_call’: events 6-7
    |
    |   20 |         (*fun_ptr)(int_ptr);
    |      |         ~^~~~~~~~~~~~~~~~~~
    |      |          |
    |      |          (6) returning to ‘double_call’ from ‘fun’
    |   21 |         (*fun_ptr)(int_ptr);
    |      |         ~~~~~~~~~~~~~~~~~~~
    |      |          |
    |      |          (7) calling ‘fun’ from ‘double_call’
    |
    +--> ‘fun’: events 8-9
           |
           |    4 | void fun(int *int_ptr)
           |      |      ^~~
           |      |      |
           |      |      (8) entry to ‘fun’
           |    5 | {
           |    6 |         free(int_ptr);
           |      |         ~~~~~~~~~~~~~
           |      |         |
           |      |         (9) second ‘free’ here; first ‘free’ was at (5)
           |
```

---
STATUS AT THE END OF THE DAY :- 

- Add custom edge info to the eedges created for dynamically discovered calls (done )
- Add the custom events to be showing in diagnostics (done)
- update call_event and return_event to also work for the cases where there is no underlying superedge representing the call (done)

--- 
Question / doubt :- 

- In "case EK_RETURN_EDGE” of "diagnostic_manager::prune_for_sm_diagnostic ()” function. 

File:{source_dir}/gcc/analyzer/diagnostic-manager.cc
2105: 			log ("event %i:"
2106: 			     " recording critical state for %qs at return"
2107: 			     " from %qE in caller to %qE in callee",
2108: 			     idx, sval_desc.m_buffer, callee_var, callee_var);

shouldn’t it be 

2107: 			     " from %qE in caller to %qE in callee",
2108: 			     idx, sval_desc.m_buffer, caller_var, callee_var);

and get value of caller_var before ? will they always be same ?

---
Patch representing changes done today :-



Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-21 16:14                                               ` Ankur Saini
  2021-07-22 17:10                                                 ` Ankur Saini
@ 2021-07-22 23:07                                                 ` David Malcolm
  1 sibling, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-22 23:07 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Wed, 2021-07-21 at 21:44 +0530, Ankur Saini wrote:
> 
> 
> > On 17-Jul-2021, at 2:57 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Fri, 2021-07-16 at 21:04 +0530, Ankur Saini wrote:
> > > 
> > > 
> > > > On 15-Jul-2021, at 4:53 AM, David Malcolm <dmalcolm@redhat.com>
> > > > wrote:
> > > > 
> > > > On Wed, 2021-07-14 at 22:41 +0530, Ankur Saini wrote:
> > > > 
> > > > 
> > 

[...snip...]

> > > 
> > > I have attached a sample patch of work done till now with this
> > > mail for
> > > review ( I haven’t sent this one to the patches list as it’s
> > > change log
> > > was not complete for now ).
> > > 
> > > P.S. I have also sent another mail (   
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html <
> > >  
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575396.html>
> > > ) to
> > > patches list with the previous call-string patch and this time it
> > > popped up in my inbox as it should, did you also received it now
> > > ?
> > 
> > I can see it in the archive URL, but for some reason it's not
> > showing
> > up in my inbox.  Bother.  Please can you try resending it directly
> > to
> > me?
> 
> Ok, I have sent the call-string patch directly to you. I have
> actually sent 2 mails ( from different mail ids ) to check if it’s
> the id which is causing the issue or the contents of the mail itself.

I've been looking, but I don't see the patch.  Sorry about this.

> 
> > 
> > Putting email issues to one side, the patch you linked to above
> > looks
> > good.  To what extent has it been tested?  If it bootstraps and
> > passes
> > the test suite, it's ready for trunk.
> 
> It bootstrapped successfully on a couple of the x86_64 machines ( on
> gcc farm ) And regress testing is underway.

Great.

[...snip....]

> > 
> > In any case, the output above is missing some events: I think
> > ideally
> > it would show the calls from *fun_ptr to fun and the returns,
> > giving
> > something like the following (which I mocked up by hand):
> > 
> >  'double_call': events 1-3
> >    |
> >    |   16 | void double_call()
> >    |      |      ^~~~~~~~~~~
> >    |      |      |
> >    |      |      (1) entry to 'double_call'
> >    |   17 | {
> >    |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
> >    |      |                              ~~~~~~~~~~~~~~~~~~~
> >    |      |                              |
> >    |      |                              (2) allocated here
> >    | ...  |
> >    |   19 |         (*fun_ptr)(int_ptr);
> >    |      |         ^~~~~~~~~~~~~~~~~~~
> >    |      |         |
> >    |      |         (3) calling 'fun' from 'double-call'
> >    |
> >    +--> 'fun': events 3-6
> >           |
> >           |    4 | void fun(int *int_ptr)
> >           |      |      ^~~
> >           |      |      |
> >           |      |      (4) entry to ‘fun’
> >           |    5 | {
> >           |    6 |         free(int_ptr);
> >           |      |         ~~~~~~~~~~~~~
> >           |      |         |
> >           |      |         (5) first 'free' here
> >           |
> >    <------+
> >    |
> >  'double_call': events 6-7
> >    |
> >    |   19 |         (*fun_ptr)(int_ptr);
> >    |      |         ^~~~~~~~~~~~~~~~~~~
> >    |      |         |
> >    |      |         (6) returning to 'double-call' from 'fun'
> >    |   20 |         (*fun_ptr)(int_ptr);
> >    |      |         ^~~~~~~~~~~~~~~~~~~
> >    |      |         |
> >    |      |         (7) calling 'fun' from 'double-call'
> >    |
> >    +--> 'fun': events 8-9
> >           |
> >           |    4 | void fun(int *int_ptr)
> >           |      |      ^~~
> >           |      |      |
> >           |      |      (8) entry to ‘fun’
> >           |    5 | {
> >           |    6 |         free(int_ptr);
> >           |      |         ~~~~~~~~~~~~~
> >           |      |         |
> >           |      |         (9) second 'free' here; first 'free' was
> > at (5)
> > 
> > The events are created in diagnostic-manager.cc
> > 
> > Am I right in thinking that there's a interprocedural superedge for
> > the
> > dynamically-discovered calls?
> 
> No there isn’t, such calls will only have an exploded edge and no
> interprocedural superedge
> 
> > 
> > diagnostic_manager::add_events_for_superedge creates events for
> > calls
> > and returns, so maybe you just need to update the case
> > SUPEREDGE_INTRAPROCEDURAL_CALL there, to do something for the
> > "dynamically discovered edge" cases (compare it with the other
> > cases in
> > that function).   You might need to update the existing call_event
> > and
> > return_event subclasses slightly (see checker-path.cc/h)
> 
> As we already have exploded edges representing the call, my idea was
> to add event for such cases via custom edge info ( similar to what we
> have for longjmp case ) instead of creating a special case in
> diagnostic_manager::add_events_for_superedge ().

That sounds like it could work too.

Dave

> 
> > 
> > Ideally, event (7) above would read
> >  "passing freed pointer 'int_ptr' in call to 'fun from
> > 'double_call"
> > but that's advanced material :)
> > 
> > [...snip...]
> > 
> > Hope this makes sense and is constructive
> > Dave
> 
> Thanks 
> - Ankur
> 



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-22 17:10                                                 ` Ankur Saini
@ 2021-07-22 23:21                                                   ` David Malcolm
  2021-07-24 16:35                                                   ` Ankur Saini
  1 sibling, 0 replies; 45+ messages in thread
From: David Malcolm @ 2021-07-22 23:21 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Thu, 2021-07-22 at 22:40 +0530, Ankur Saini wrote:
> AIM FOR TODAY: 
> 
> - Add custom edge info to the eedges created for dynamically
> discovered calls
> - Add the custom events to be showing in diagnostics
> - update call_event and return_event to also work for the cases where
> there is no underlying superedge representing the call
> 
> ---
> PROGRESS  :
> 
> - I created "dynamic_call_info_t" subclass reprsenting custom info on
> the edge representing the dynamically discovered calls 
> 
> - I overloaded it's "add_events_to_path ()" function to add call and
> return event to checkers path
> 
> - Now call_event and return_event subclasses mostly make use of the
> underlying interprocedural superedge representing the call to work
> properly. To tackle this problem, I used the same method I used for
> callstring patch earlier working with src and dest supernodes instead
> of superedge )
> 
> - The call_event subclass (and same applies to return_event subclass
> also) now have 2 additional pointers to source and destination
> supernodes representing the call in absense of a superedge. 
> 
> - I have also tweeked a few more things to make it work, I think the
> best way to show them all is to attach a patch ( it should be
> attached with this mail ) for just the changes I did today for better
> understanding on what exactly have I changed since last update. (
> this patch would be squashed in previous one before the final review
> ).

It's much easier to understand via the patch :)

> 
> - After all the changes done, now the analyzer emmits the following
> error message for the test program ( godbolt link  
> https://godbolt.org/z/Td8n4c9a6 <https://godbolt.org/z/Td8n4c9a6> ),
> which I think now emmits all the events it was missing before.
> 
> ```
> test.c: In function ‘fun’:
> test.c:6:9: warning: double-‘free’ of ‘int_ptr’ [CWE-415] [-
> Wanalyzer-double-free]
>     6 |         free(int_ptr);
>       |         ^~~~~~~~~~~~~
>   ‘double_call’: events 1-3
>     |
>     |   16 | void double_call()
>     |      |      ^~~~~~~~~~~
>     |      |      |
>     |      |      (1) entry to ‘double_call’
>     |   17 | {
>     |   18 |         int *int_ptr = (int*)malloc(sizeof(int));
>     |      |                              ~~~~~~~~~~~~~~~~~~~
>     |      |                              |
>     |      |                              (2) allocated here
>     |   19 |         void (*fun_ptr)(int *) = &fun;
>     |   20 |         (*fun_ptr)(int_ptr);
>     |      |         ~~~~~~~~~~~~~~~~~~~
>     |      |          |
>     |      |          (3) calling ‘fun’ from ‘double_call’
>     |
>     +--> ‘fun’: events 4-5
>            |
>            |    4 | void fun(int *int_ptr)
>            |      |      ^~~
>            |      |      |
>            |      |      (4) entry to ‘fun’
>            |    5 | {
>            |    6 |         free(int_ptr);
>            |      |         ~~~~~~~~~~~~~
>            |      |         |
>            |      |         (5) first ‘free’ here
>            |
>     <------+
>     |
>   ‘double_call’: events 6-7
>     |
>     |   20 |         (*fun_ptr)(int_ptr);
>     |      |         ~^~~~~~~~~~~~~~~~~~
>     |      |          |
>     |      |          (6) returning to ‘double_call’ from ‘fun’
>     |   21 |         (*fun_ptr)(int_ptr);
>     |      |         ~~~~~~~~~~~~~~~~~~~
>     |      |          |
>     |      |          (7) calling ‘fun’ from ‘double_call’
>     |
>     +--> ‘fun’: events 8-9
>            |
>            |    4 | void fun(int *int_ptr)
>            |      |      ^~~
>            |      |      |
>            |      |      (8) entry to ‘fun’
>            |    5 | {
>            |    6 |         free(int_ptr);
>            |      |         ~~~~~~~~~~~~~
>            |      |         |
>            |      |         (9) second ‘free’ here; first ‘free’ was
> at (5)
>            |
> ```

Looks great.


> 
> ---
> STATUS AT THE END OF THE DAY :- 
> 
> - Add custom edge info to the eedges created for dynamically
> discovered calls (done )
> - Add the custom events to be showing in diagnostics (done)
> - update call_event and return_event to also work for the cases where
> there is no underlying superedge representing the call (done)
> 
> --- 
> Question / doubt :- 
> 
> - In "case EK_RETURN_EDGE” of
> "diagnostic_manager::prune_for_sm_diagnostic ()” function. 
> 
> File:{source_dir}/gcc/analyzer/diagnostic-manager.cc
> 2105:                   log ("event %i:"
> 2106:                        " recording critical state for %qs at
> return"
> 2107:                        " from %qE in caller to %qE in callee",
> 2108:                        idx, sval_desc.m_buffer, callee_var,
> callee_var);
> 
> shouldn’t it be 
> 
> 2107:                        " from %qE in caller to %qE in callee",
> 2108:                        idx, sval_desc.m_buffer, caller_var,
> callee_var);

Good catch: I think it should be the latter version.  (posting it as a
unified diff would make it easier for me to read, but maybe your email
client makes that hard?)

> 
> and get value of caller_var before ? will they always be same ?

IIRC this is for if you have something like:

int *
foo ()
{
   int *p = malloc (sizeof (int));
   set_int (p);
   return p;
}

void set_int (int *q)
{
  *q = 42;
}

We want to complain about the malloc result being unchecked, and thus a
possible deref of NULL at "*q = 42;"

The "critical state" is that "q" is unchecked in set_int, but as we
walk backwards through the events we see that it was passed to set_int
as "p", and that in "foo" it is "p" that is unchecked.  Here the
callee_var is "q" (in set_int) and the caller_var is "p" (in foo).

This affects the wording of the events (see the "precision of wording"
hooks in pending-diagnostic.h).


> ---
> Patch representing changes done today :-

The patch looks reasonable; does it preserve the wording of the various
events?


You've made a lot of progress in handling jumps through dynamically-
discovered function pointers.  I think you should now look at simple
cases of virtual function calls (building on your work so far).

Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-22 17:10                                                 ` Ankur Saini
  2021-07-22 23:21                                                   ` David Malcolm
@ 2021-07-24 16:35                                                   ` Ankur Saini
  2021-07-27 15:05                                                     ` Ankur Saini
  1 sibling, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-24 16:35 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM FOR TODAY: 

- Send the callstring patch to the patches list after sucessfull bootstrap and 
regress test
- Create some basic examples to see how virtual function are being called 
- Layout a basic `vtable_region : public region` subclass to store vtables 
found till now

---
PROGRESS :

- I sent the final callstring patch after sucessfully bootstraping and running 
the regress tests on it with no unexpected fails/passes. This time I wrote the 
mail manually(instead of using git send-mail) and attached the patch file with 
it.

- while testing this on various examples I created, I found out the following : 

1. vtable (which I guess looks something like _ZTV1<classname> in GIMPLE 
representation ) is either inside the contructor of the function or ( if 
default constructor is not overloaded ) in the function where the instance of 
the class is created.

2. when called ( call looks something like this OBJ_TYPE_REF(_3;(struct <
classname>)a_7->0) (a_7); ) we get the name of the class which functions is 
being called and the index in vtable where address is taken.

- Although at first I though it was imposible to get the info about the 
functions the vtable is holding by just looking at it's GIMPLE ir, after a 
little digging in ipa-devirt.c I found out that aprantly compiler's front end 
actually attaches this information ( known as BINFO ) with every tree (
RECORD_TYPE) which can be used to acess info about the same. ( I will be diggin 
more about it to see the extect to which I can use this BINFO to the project's 
advantage )

- So maybe the new region should hold a collection ( maybe vector ?) of cgraph 
nodes of the functions present in the vtable, which will be evaluated when the 
analyzer tries to find the function decl of the dynamically dicovered call.

- The problem arrises when we have a base class pointer and we don't exactly 
know which subclasse's method is being called. 

In a previous discussion (https://gcc.gnu.org/pipermail/gcc/2021-April/235335.html <https://gcc.gnu.org/pipermail/gcc/2021-April/235335.html>),
a possible plan to tackle this was to "potentially have the analyzer speculate 
about the subclasses it knows about, adding exploded edges to speculated calls 
for the various subclasses." But I would like look a bit more into ipa-devirt, 
especially possible_polymorphic_call_targets() function ( which seems to do 
most of the work similar to what we want to do ) to find something to overcome 
this problem.

---
STATUS AT THE END OF THE DAY :- 

- Send the callstring patch to the patches list after sucessfull bootstrap and 
regress test ( done )
- Create some basic examples to see how virtual function are being called ( done )
- Layout a basic `vtable_region : public region` subclass to store vtables 
found till now ( started )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-24 16:35                                                   ` Ankur Saini
@ 2021-07-27 15:05                                                     ` Ankur Saini
  2021-07-28 15:49                                                       ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-27 15:05 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

sorry for lack of updates recently, most of the time was consumed in exploring GCC's devirtualiser and experimenting with some approaches, and didn’t got enough content out everyday for a daily-report.

AIM: 

- get the analyzer figure out which function to call when a vritual function is called.

---
PROGRESS :

The plan is to use functions GCC's devirtualiser to directly find out possible targets functions that can be called when a virtual function is called and then let analyzer analyzer every single one of them by creating enodes and eedges. 

- I expanded upon my last update ( detecting calls via function pointers ), and figured out that in case of a vfunc call, the regional model would not be able to find a fn_decl for the given gcall. ( i.e. model->get_fndecl_for_call(call,&ctxt) would return NULL ).

- The only function I want to use from the ipa-devirt was possible_polymorphic_call_targets () { declared in ipa-utils.h:114, this function basically returns a vector of cgraph_nodes representing the possible callee's of an indirect polymorphic call (represented by a cgraph_edge) }, and to use that I needed the cgraph_edge representing the call. 

- In case of a vfunc call, we would have an indirect call edge ( an edge where callee is not known as compiletime ) which I obtained from the gimple call of the stmt.

- After that I confirmed if it is a polymorphic call or not (condition: edge->indirect_info->polymorphic should be exist )

- Once made sure that it's a vfunc call the analyzer is looking at, I simplay used the possible_polymorphic_call_targets () function to get a vector for all the possible targets it can call. 

- The results were amazing, not only the analyzer was now able to figure out which functions can be called for simple cases, but the fact that ipa-devirt also uses it's inheritance graph to search for possible calls was making it possible for analyzer(who doesn't understand inheritance yet) to even correctly detect calls that were happening via a base class pointer. :)

- Now all that is left is to make the analyzer speculate those calls by creating enodes and eedges for the calls ( similar to how it does in case for function pointers ).

---
STATUS AT THE END OF THE DAY :- 

- get the analyzer figure out which function to call when a vritual function is called. ( done )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-27 15:05                                                     ` Ankur Saini
@ 2021-07-28 15:49                                                       ` Ankur Saini
  2021-07-29 12:50                                                         ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-28 15:49 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM For Today: 

- Make the anlalyzer evaluate the vfunc calls by creating enodes and eedges 
  representing the call. 

- Make the analyzer not treat such calls as call to "unknown function"

---
PROGRESS :

- After the analyzer sucessfully found which funcitons to call ( thanks to 
  functions from ipa-devirt.c ), All that was left was it to create enodes and 
  eedges for analyzeing such calls for which I took the same aproach I used to 
  detect and analyzer calls via function pointers. 

- now the analyzer was sucessfully decting and analyzing the calls via 
  functions pointers, but due to some reason it was still not giving out 
  analysis reports for such calls. 

- After a bit of digging, I found out that analyzer was also treating such 
  calls as " call to unknown funcitons " and in case of such calls, analyzer 
  plays safe by resetting all of the state-machine state for the values that 
  are reachable by the function call, to avoid false reports. ( it was also 
  mentioned in a previous discussions <https://gcc.gnu.org/pipermail/gcc/2021-April/235335.html> on the list)

- I fixed the porblem by simply updating exploded_node::on_stmt () by setting
  the value of "unknown_side_effects" ( a boolean which is set true when
  analyzer thinks executing a gimple stmt would cause some side effects, and 
  acts conservatively for such cases ) to "false" when gimple call stmt is a 
  polymorphic call. Some thing like this, at the end of the definition of 
  exploded_node::on_stmt (): -

File: {src-dir}/gcc/analyzer/engine.cc
1246:     /* If the statmement is a polymorphic call then assume 
1247:        there are no side effects.  */
1248:     gimple *call_stmt = const_cast<gimple *>(stmt);
1249:     if (gcall *call = dyn_cast<gcall *> (call_stmt))
1250:     {
1251:       function *fun = this->get_function();
1252:       cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (call);
1253:       if ((e && e->indirect_info) && (e->indirect_info->polymorphic))
1254:         unknown_side_effects = false;
1255:     }
1256: 
1257:   on_stmt_post (stmt, state, unknown_side_effects, &ctxt);
1258: 
1259:   return on_stmt_flags ();

- After this the analyzer was working as expected for virtual function calls on
  all of the examples I created. 

- Let's take example of the following program ( the same program which was
  mentioned in proposal, for which analyzer was giving out a false positive )

File: test.cpp
01: #include <cstdlib>
02: 
03: struct A
04: {
05:     virtual int foo (void) 
06:     {
07:         return 42;
08:     }
09: };
10: 
11: struct B: public A
12: {
13: 	int *ptr;
14:     void alloc ()
15:     {
16:         ptr = (int*)malloc(sizeof(int));
17:     }
18: 	int foo (void) 
19:     { 
20:         free(ptr);
21:         return 0;
22:     }
23: };
24: 
25: int test()
26: {
27:     struct B b, *bptr=&b;
28:     b.alloc();
29:     bptr->foo();
30:     return bptr->foo();
31: }
32: 
33: int main()
34: {
35:     test();
36: }
37: 
( godbolt link of the above code (https://godbolt.org/z/n17WK4MxG) )

for the same porgram, the analyzer now generates the following warning message :

warning: double-‘free’ of ‘b.B::ptr’ [CWE-415] [-Wanalyzer-double-free]
   20 |         free(ptr);
      |         ~~~~^~~~~
  ‘int test()’: events 1-2
    |
    |   25 | int test()
    |      |     ^~~~
    |      |     |
    |      |     (1) entry to ‘test’
    |......
    |   28 |     b.alloc();
    |      |     ~~~~~~~~~
    |      |            |
    |      |            (2) calling ‘B::alloc’ from ‘test’
    |
    +--> ‘void B::alloc()’: events 3-4
           |
           |   14 |     void alloc ()
           |      |          ^~~~~
           |      |          |
           |      |          (3) entry to ‘B::alloc’
           |   15 |     {
           |   16 |         ptr = (int*)malloc(sizeof(int));
           |      |                     ~~~~~~~~~~~~~~~~~~~
           |      |                           |
           |      |                           (4) allocated here
           |
    <------+
    |
  ‘int test()’: events 5-6
    |
    |   28 |     b.alloc();
    |      |     ~~~~~~~^~
    |      |            |
    |      |            (5) returning to ‘test’ from ‘B::alloc’
    |   29 |     bptr->foo();
    |      |     ~~~~~~~~~~~
    |      |              |
    |      |              (6) calling ‘B::foo’ from ‘test’
    |
    +--> ‘virtual int B::foo()’: events 7-8
           |
           |   18 |         int foo (void)
           |      |             ^~~
           |      |             |
           |      |             (7) entry to ‘B::foo’
           |   19 |     {
           |   20 |         free(ptr);
           |      |         ~~~~~~~~~
           |      |             |
           |      |             (8) first ‘free’ here
           |
    <------+
    |
  ‘int test()’: events 9-10
    |
    |   29 |     bptr->foo();
    |      |     ~~~~~~~~~^~
    |      |              |
    |      |              (9) returning to ‘test’ from ‘B::foo’
    |   30 |     return bptr->foo();
    |      |            ~~~~~~~~~~~
    |      |                     |
    |      |                     (10) calling ‘B::foo’ from ‘test’
    |
    +--> ‘virtual int B::foo()’: events 11-12
           |
           |   18 |         int foo (void)
           |      |             ^~~
           |      |             |
           |      |             (11) entry to ‘B::foo’
           |   19 |     {
           |   20 |         free(ptr);
           |      |         ~~~~~~~~~
           |      |             |
           |      |             (12) second ‘free’ here; first ‘free’ was at (8)
           |

- Sorry no patch for the following changes available right now as there is
  still a bit of cleaning and refining left to do in it ( will post the patch
  with tomorrow's report )

- That being said, I also want to test the working of this patch with a few
 more examples of vfunc calls to check for a possible loophole.

---
STATUS AT THE END OF THE DAY :- 

- Make the anlalyzer evaluate the vfunc calls by creating enodes and eedges
  representing the call. ( done )

- Make the analyzer not treat such calls as call to "unknown function" ( done )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-28 15:49                                                       ` Ankur Saini
@ 2021-07-29 12:50                                                         ` Ankur Saini
  2021-07-30  0:05                                                           ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-07-29 12:50 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

I have attached the patches(one is the updated version of previous patch to 
detect calls via function pointers) of the changed done to make the analyzer 
understand the calls to virtual functions for initial review. 

1. I decided to make a dedicated function to create enodes and eedges for the 
dynamically discovered calls as I found myself using the exact same peice of 
code again to analyse vfunc calls.

2. Boostaraping and testing of these changes are underway.

3. Regarding the regress tests that have to be added to test functionality of 
vfunc extension patch :
Should I add many test files for different types of inheritences or should I 
add one ( or two ) test files, with a lot of fucntions in them testing different 
types of calls ?

---
Patches :

[-- Attachment #2: fn_ptr.patch --]
[-- Type: application/octet-stream, Size: 30682 bytes --]

From 7ef22601bc0d5573e5fccc56e4817b15ccce2661 Mon Sep 17 00:00:00 2001
From: Ankur Saini <arsenic@sourceware.org>
Date: Thu, 29 Jul 2021 15:48:07 +0530
Subject: [PATCH] analyzer: detect and analyze calls via function pointer

---
 gcc/analyzer/analysis-plan.cc                 |   4 +
 gcc/analyzer/checker-path.cc                  |  28 +--
 gcc/analyzer/checker-path.h                   |   6 +
 gcc/analyzer/diagnostic-manager.cc            |  19 +-
 gcc/analyzer/engine.cc                        | 164 +++++++++++++++++-
 gcc/analyzer/exploded-graph.h                 |  35 ++++
 gcc/analyzer/program-point.cc                 |  18 ++
 gcc/analyzer/program-point.h                  |   3 +-
 gcc/analyzer/program-state.cc                 |  44 +++++
 gcc/analyzer/program-state.h                  |  11 ++
 gcc/analyzer/region-model.cc                  |  44 +++--
 gcc/analyzer/region-model.h                   |   6 +
 gcc/analyzer/state-purge.cc                   |  36 ++--
 gcc/analyzer/supergraph.cc                    |  32 +++-
 gcc/analyzer/supergraph.h                     |   5 +
 .../gcc.dg/analyzer/function-ptr-4.c          |  25 +++
 gcc/testsuite/gcc.dg/analyzer/pr100546.c      |  17 ++
 17 files changed, 441 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr100546.c

diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
index 7dfc48e9c3e..57a6dcb1f6e 100644
--- a/gcc/analyzer/analysis-plan.cc
+++ b/gcc/analyzer/analysis-plan.cc
@@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) const
   if (!flag_analyzer_call_summaries)
     return false;
 
+  /* Don't use call summaries if there is no callgraph edge */
+  if (!edge || !edge->callee)
+    return false;
+
   /* TODO: don't count callsites each time.  */
   int num_call_sites = 0;
   const cgraph_node *callee = edge->callee;
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index e10c8e2bb7c..e132f003470 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -614,7 +614,11 @@ call_event::call_event (const exploded_edge &eedge,
 			location_t loc, tree fndecl, int depth)
 : superedge_event (EK_CALL_EDGE, eedge, loc, fndecl, depth)
 {
-  gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_CALL);
+  if (eedge.m_sedge)
+    gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_CALL);
+
+   m_src_snode = eedge.m_src->get_supernode ();
+   m_dest_snode = eedge.m_dest->get_supernode ();
 }
 
 /* Implementation of diagnostic_event::get_desc vfunc for
@@ -638,8 +642,8 @@ call_event::get_desc (bool can_colorize) const
       label_text custom_desc
 	= m_pending_diagnostic->describe_call_with_state
 	    (evdesc::call_with_state (can_colorize,
-				      m_sedge->m_src->m_fun->decl,
-				      m_sedge->m_dest->m_fun->decl,
+				      m_src_snode->m_fun->decl,
+				      m_dest_snode->m_fun->decl,
 				      var,
 				      m_critical_state));
       if (custom_desc.m_buffer)
@@ -648,8 +652,8 @@ call_event::get_desc (bool can_colorize) const
 
   return make_label_text (can_colorize,
 			  "calling %qE from %qE",
-			  m_sedge->m_dest->m_fun->decl,
-			  m_sedge->m_src->m_fun->decl);
+			  m_dest_snode->m_fun->decl,
+			  m_src_snode->m_fun->decl);
 }
 
 /* Override of checker_event::is_call_p for calls.  */
@@ -668,7 +672,11 @@ return_event::return_event (const exploded_edge &eedge,
 			    location_t loc, tree fndecl, int depth)
 : superedge_event (EK_RETURN_EDGE, eedge, loc, fndecl, depth)
 {
-  gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_RETURN);
+  if (eedge.m_sedge)
+    gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_RETURN);
+
+  m_src_snode = eedge.m_src->get_supernode ();
+  m_dest_snode = eedge.m_dest->get_supernode ();
 }
 
 /* Implementation of diagnostic_event::get_desc vfunc for
@@ -694,16 +702,16 @@ return_event::get_desc (bool can_colorize) const
       label_text custom_desc
 	= m_pending_diagnostic->describe_return_of_state
 	    (evdesc::return_of_state (can_colorize,
-				      m_sedge->m_dest->m_fun->decl,
-				      m_sedge->m_src->m_fun->decl,
+				      m_dest_snode->m_fun->decl,
+				      m_src_snode->m_fun->decl,
 				      m_critical_state));
       if (custom_desc.m_buffer)
 	return custom_desc;
     }
   return make_label_text (can_colorize,
 			  "returning to %qE from %qE",
-			  m_sedge->m_dest->m_fun->decl,
-			  m_sedge->m_src->m_fun->decl);
+			  m_dest_snode->m_fun->decl,
+			  m_src_snode->m_fun->decl);
 }
 
 /* Override of checker_event::is_return_p for returns.  */
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 1843c4bc7b4..27634c20864 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -338,6 +338,9 @@ public:
   label_text get_desc (bool can_colorize) const FINAL OVERRIDE;
 
   bool is_call_p () const FINAL OVERRIDE;
+
+  const supernode *m_src_snode;
+  const supernode *m_dest_snode;
 };
 
 /* A concrete event subclass for an interprocedural return.  */
@@ -351,6 +354,9 @@ public:
   label_text get_desc (bool can_colorize) const FINAL OVERRIDE;
 
   bool is_return_p () const FINAL OVERRIDE;
+
+  const supernode *m_src_snode;
+  const supernode *m_dest_snode;
 };
 
 /* A concrete event subclass for the start of a consolidated run of CFG
diff --git a/gcc/analyzer/diagnostic-manager.cc b/gcc/analyzer/diagnostic-manager.cc
index 631fef6ad78..d7d9fa4c3d8 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -2060,18 +2060,17 @@ diagnostic_manager::prune_for_sm_diagnostic (checker_path *path,
 	case EK_CALL_EDGE:
 	  {
 	    call_event *event = (call_event *)base_event;
-	    const callgraph_superedge& cg_superedge
-	      = event->get_callgraph_superedge ();
 	    const region_model *callee_model
 	      = event->m_eedge.m_dest->get_state ().m_region_model;
+	    const region_model *caller_model
+	      = event->m_eedge.m_src->get_state ().m_region_model;
 	    tree callee_var = callee_model->get_representative_tree (sval);
 	    /* We could just use caller_model->get_representative_tree (sval);
 	       to get the caller_var, but for now use
 	       map_expr_from_callee_to_caller so as to only record critical
 	       state for parms and the like.  */
 	    callsite_expr expr;
-	    tree caller_var
-	      = cg_superedge.map_expr_from_callee_to_caller (callee_var, &expr);
+	    tree caller_var = caller_model->get_representative_tree (sval);
 	    if (caller_var)
 	      {
 		if (get_logger ())
@@ -2093,15 +2092,11 @@ diagnostic_manager::prune_for_sm_diagnostic (checker_path *path,
 	    if (sval)
 	      {
 		return_event *event = (return_event *)base_event;
-		const callgraph_superedge& cg_superedge
-		  = event->get_callgraph_superedge ();
-		const region_model *caller_model
-		  = event->m_eedge.m_dest->get_state ().m_region_model;
-		tree caller_var = caller_model->get_representative_tree (sval);
 		callsite_expr expr;
-		tree callee_var
-		  = cg_superedge.map_expr_from_caller_to_callee (caller_var,
-								 &expr);
+
+		const region_model *callee_model
+	      	  = event->m_eedge.m_src->get_state ().m_region_model;
+		tree callee_var = callee_model->get_representative_tree (sval);
 		if (callee_var)
 		  {
 		    if (get_logger ())
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index ee625fbdcdf..773fda144b0 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1627,6 +1627,50 @@ exploded_node::dump_succs_and_preds (FILE *outf) const
   }
 }
 
+/* class dynamic_call_info_t : public exploded_edge::custom_info_t.  */
+
+/* Implementation of exploded_edge::custom_info_t::update_model vfunc
+   for dynamic_call_info_t.
+
+   Update state for the dynamically discorverd calls */
+
+void
+dynamic_call_info_t::update_model (region_model *model,
+				   const exploded_edge &eedge)
+{
+  const program_state &dest_state = eedge.m_dest->get_state ();
+  *model = *dest_state.m_region_model;
+}
+
+/* Implementation of exploded_edge::custom_info_t::add_events_to_path vfunc
+   for dynamic_call_info_t.  */
+
+void
+dynamic_call_info_t::add_events_to_path (checker_path *emission_path,
+				   const exploded_edge &eedge)
+{
+  const exploded_node *src_node = eedge.m_src;
+  const program_point &src_point = src_node->get_point ();
+  const int src_stack_depth = src_point.get_stack_depth ();
+  const exploded_node *dest_node = eedge.m_dest;
+  const program_point &dest_point = dest_node->get_point ();
+  const int dest_stack_depth = dest_point.get_stack_depth ();
+
+  if (m_is_returning_call)
+    emission_path->add_event (new return_event (eedge, (m_dynamic_call
+	                   			        ? m_dynamic_call->location
+	           	   		                : UNKNOWN_LOCATION),
+	          	      dest_point.get_fndecl (),
+	          	      dest_stack_depth));
+  else
+    emission_path->add_event (new call_event (eedge, (m_dynamic_call
+	                   			      ? m_dynamic_call->location
+	           	   		              : UNKNOWN_LOCATION),
+	          	      src_point.get_fndecl (),
+	          	      src_stack_depth));
+
+}
+
 /* class rewind_info_t : public exploded_edge::custom_info_t.  */
 
 /* Implementation of exploded_edge::custom_info_t::update_model vfunc
@@ -2980,6 +3024,53 @@ state_change_requires_new_enode_p (const program_state &old_state,
   return false;
 }
 
+/* Create enodes and eedges for the function calls that doesn't have an 
+   underlying call superedge.
+
+   Such case occurs when GCC's middle end didn't knew which function to
+   call but analyzer did (for example calls to virtual functions or calls 
+   that happen via function pointer).  */
+
+void
+exploded_graph::create_dynamic_call (const gcall *call,
+                                     tree fn_decl,
+                                     exploded_node *node,
+                                     program_state &next_state,
+                                     program_point &next_point,
+                                     uncertainty_t *uncertainty)
+{
+  const program_point *this_point = &node->get_point ();
+  // assert for fn_decl ?
+  function *fun = DECL_STRUCT_FUNCTION (fn_decl);
+  if (fun)
+    {
+      const supergraph &sg = this->get_supergraph ();
+      supernode * sn_entry = sg.get_node_for_function_entry (fun);
+      supernode * sn_exit = sg.get_node_for_function_exit (fun);
+
+      program_point new_point
+        = program_point::before_supernode (sn_entry,
+				           NULL,
+				           this_point->get_call_string ());
+
+      new_point.push_to_call_stack (sn_exit,
+  				    next_point.get_supernode());
+      next_state.push_call (*this, node, call, uncertainty);
+
+      // TODO: add some logging here regarding dynamic call
+      
+      if (next_state.m_valid)
+        {
+          exploded_node *enode = get_or_create_node (new_point,
+					             next_state,
+					             node);
+          if (enode)
+            add_edge (node,enode, NULL,
+          	      new dynamic_call_info_t (call));
+        }
+     }
+}
+
 /* The core of exploded_graph::process_worklist (the main analysis loop),
    handling one node in the worklist.
 
@@ -3174,10 +3265,13 @@ exploded_graph::process_node (exploded_node *node)
       break;
     case PK_AFTER_SUPERNODE:
       {
+        bool found_a_superedge = false;
+        bool is_an_exit_block = false;
 	/* If this is an EXIT BB, detect leaks, and potentially
 	   create a function summary.  */
 	if (point.get_supernode ()->return_p ())
 	  {
+	    is_an_exit_block = true;
 	    node->detect_leaks (*this);
 	    if (flag_analyzer_call_summaries
 		&& point.get_call_string ().empty_p ())
@@ -3205,6 +3299,7 @@ exploded_graph::process_node (exploded_node *node)
 	superedge *succ;
 	FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
 	  {
+	    found_a_superedge = true;
 	    if (logger)
 	      logger->log ("considering SN: %i -> SN: %i",
 			   succ->m_src->m_index, succ->m_dest->m_index);
@@ -3214,19 +3309,74 @@ exploded_graph::process_node (exploded_node *node)
 						 point.get_call_string ());
 	    program_state next_state (state);
 	    uncertainty_t uncertainty;
+
+	    /* Try to discover and analyse indirect function calls. */
+            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
+            	&& !(succ->get_any_callgraph_edge ()))
+              {
+                const gcall *call
+                  = point.get_supernode ()->get_final_call ();
+
+                impl_region_model_context ctxt (*this,
+                                                node,
+                                                &state,
+                                                &next_state,
+                                                &uncertainty,
+                                                point.get_stmt());
+
+                region_model *model = state.m_region_model;
+
+                /* Call is possibly happening via a function pointer.  */
+                if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
+                  create_dynamic_call (call, fn_decl, node, next_state,
+                                       next_point, &uncertainty);
+              }
+
 	    if (!node->on_edge (*this, succ, &next_point, &next_state,
-				&uncertainty))
+			        &uncertainty))
 	      {
-		if (logger)
-		  logger->log ("skipping impossible edge to SN: %i",
-			       succ->m_dest->m_index);
-		continue;
+	        if (logger)
+	          logger->log ("skipping impossible edge to SN: %i",
+		               succ->m_dest->m_index);
+	        continue;
 	      }
-	    exploded_node *next = get_or_create_node (next_point, next_state,
-						      node);
+	    exploded_node *next = get_or_create_node (next_point,
+	    					      next_state,
+					              node);
 	    if (next)
 	      add_edge (node, next, succ);
 	  }
+
+        /* Return from the calls which doesn't have a return superedge.
+    	   Such case occurs when GCC's middle end didn't knew which function to
+    	   call but analyzer did.  */
+        if((is_an_exit_block && !found_a_superedge)
+           && (!point.get_call_string ().empty_p ()))
+          {
+            const call_string cs = point.get_call_string ();
+            program_point next_point
+              = program_point::before_supernode (cs.get_caller_node (),
+                                                 NULL, cs);
+            program_state next_state (state);
+            uncertainty_t uncertainty;
+
+            const gcall *call
+              = next_point.get_supernode ()->get_returning_call ();
+
+            if(call)
+              next_state.returning_call (*this, node, call, &uncertainty);
+
+            if (next_state.m_valid)
+              {
+                next_point.pop_from_call_stack ();
+                exploded_node *enode = get_or_create_node (next_point,
+                                                           next_state,
+                                                           node);
+                if (enode)
+                  add_edge (node, enode, NULL,
+                            new dynamic_call_info_t (call, true));
+              }
+          }
       }
       break;
     }
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 8f48d8a286c..9a2d9522d6c 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -362,6 +362,34 @@ private:
   DISABLE_COPY_AND_ASSIGN (exploded_edge);
 };
 
+/* Extra data for an exploded_edge that represents a dynamic call info ( calls
+   that doesn't have a superedge representing the call ).  */
+
+class dynamic_call_info_t : public exploded_edge::custom_info_t
+{
+public:
+  dynamic_call_info_t (const gcall *dynamic_call,
+  		       const bool is_returning_call = false)
+  : m_dynamic_call (dynamic_call), 
+    m_is_returning_call (is_returning_call)
+  {}
+
+  void print (pretty_printer *pp) FINAL OVERRIDE
+  {
+    pp_string (pp, "dynamic_call");
+  }
+
+  void update_model (region_model *model,
+		     const exploded_edge &eedge) FINAL OVERRIDE;
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &eedge) FINAL OVERRIDE;
+private:
+  const gcall *m_dynamic_call;
+  const bool m_is_returning_call;
+};
+
+
 /* Extra data for an exploded_edge that represents a rewind from a
    longjmp to a setjmp (or from a siglongjmp to a sigsetjmp).  */
 
@@ -785,6 +813,13 @@ public:
   bool maybe_process_run_of_before_supernode_enodes (exploded_node *node);
   void process_node (exploded_node *node);
 
+  void create_dynamic_call (const gcall *call,
+                            tree fn_decl,
+                            exploded_node *node,
+                            program_state &next_state,
+                            program_point &next_point,
+                            uncertainty_t *uncertainty);
+
   exploded_node *get_or_create_node (const program_point &point,
 				     const program_state &state,
 				     exploded_node *enode_for_diag);
diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc
index 2e8d98ada2a..cb2b4e052cf 100644
--- a/gcc/analyzer/program-point.cc
+++ b/gcc/analyzer/program-point.cc
@@ -323,6 +323,24 @@ program_point::to_json () const
   return point_obj;
 }
 
+/* Update the callstack to represent a call from caller to callee.
+
+   Genrally used to push a custom call to a perticular program point 
+   where we don't have a superedge representing the call.  */
+void
+program_point::push_to_call_stack (const supernode *caller,
+				   const supernode *callee)
+{
+  m_call_string.push_call (callee, caller);
+}
+
+/* Pop the topmost call from the current callstack.  */
+void
+program_point::pop_from_call_stack ()
+{
+  m_call_string.pop ();
+}
+
 /* Generate a hash value for this program_point.  */
 
 hashval_t
diff --git a/gcc/analyzer/program-point.h b/gcc/analyzer/program-point.h
index 5f86745cd1e..6bae29b23e8 100644
--- a/gcc/analyzer/program-point.h
+++ b/gcc/analyzer/program-point.h
@@ -293,7 +293,8 @@ public:
   }
 
   bool on_edge (exploded_graph &eg, const superedge *succ);
-
+  void push_to_call_stack (const supernode *caller, const supernode *callee);
+  void pop_from_call_stack ();
   void validate () const;
 
   /* For before_stmt, go to next stmt.  */
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 5bb86767873..eb05994f0ef 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1034,6 +1034,50 @@ program_state::on_edge (exploded_graph &eg,
   return true;
 }
 
+/* Update this program_state to reflect a call to function
+   represented by CALL_STMT.
+   currently used only when the call doesn't have a superedge representing 
+   the call ( like call via a function pointer )  */
+void
+program_state::push_call (exploded_graph &eg,
+      			  exploded_node *enode,
+      			  const gcall *call_stmt,
+      			  uncertainty_t *uncertainty)
+{
+  /* Update state.  */
+  const program_point &point = enode->get_point ();
+  const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
+
+  impl_region_model_context ctxt (eg, enode,
+          			  &enode->get_state (),
+          			  this,
+          			  uncertainty,
+          			  last_stmt);
+  m_region_model->update_for_gcall (call_stmt, &ctxt);
+}
+
+/* Update this program_state to reflect a return from function
+   call to which is represented by CALL_STMT.
+   currently used only when the call doesn't have a superedge representing 
+   the return */
+void
+program_state::returning_call (exploded_graph &eg,
+      			       exploded_node *enode,
+      			       const gcall *call_stmt,
+      			       uncertainty_t *uncertainty)
+{
+  /* Update state.  */
+  const program_point &point = enode->get_point ();
+  const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
+
+  impl_region_model_context ctxt (eg, enode,
+          			  &enode->get_state (),
+          			  this,
+          			  uncertainty,
+          			  last_stmt);
+  m_region_model->update_for_return_gcall (call_stmt, &ctxt);
+}
+
 /* Generate a simpler version of THIS, discarding state that's no longer
    relevant at POINT.
    The idea is that we're more likely to be able to consolidate
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index 8dee930665c..658dbb69075 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -218,6 +218,17 @@ public:
   void push_frame (const extrinsic_state &ext_state, function *fun);
   function * get_current_function () const;
 
+  void push_call (exploded_graph &eg,
+      		  exploded_node *enode,
+      		  const gcall *call_stmt,
+      		  uncertainty_t *uncertainty);
+
+  void returning_call (exploded_graph &eg,
+      		       exploded_node *enode,
+      		       const gcall *call_stmt,
+      		       uncertainty_t *uncertainty);
+
+
   bool on_edge (exploded_graph &eg,
 		exploded_node *enode,
 		const superedge *succ,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6d02c60449c..1e86d1f3bf8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3139,12 +3139,11 @@ region_model::maybe_update_for_edge (const superedge &edge,
    caller's frame.  */
 
 void
-region_model::update_for_call_superedge (const call_superedge &call_edge,
-					 region_model_context *ctxt)
+region_model::update_for_gcall (const gcall *call_stmt,
+				region_model_context *ctxt)
 {
   /* Build a vec of argument svalues, using the current top
      frame for resolving tree expressions.  */
-  const gcall *call_stmt = call_edge.get_call_stmt ();
   auto_vec<const svalue *> arg_svals (gimple_call_num_args (call_stmt));
 
   for (unsigned i = 0; i < gimple_call_num_args (call_stmt); i++)
@@ -3153,33 +3152,58 @@ region_model::update_for_call_superedge (const call_superedge &call_edge,
       arg_svals.quick_push (get_rvalue (arg, ctxt));
     }
 
-  push_frame (call_edge.get_callee_function (), &arg_svals, ctxt);
+  /* Get the function * from the call.  */
+  tree fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+  function *fun = DECL_STRUCT_FUNCTION (fn_decl);
+  push_frame (fun, &arg_svals, ctxt);
 }
 
 /* Pop the top-most frame_region from the stack, and copy the return
    region's values (if any) into the region for the lvalue of the LHS of
    the call (if any).  */
+
 void
-region_model::update_for_return_superedge (const return_superedge &return_edge,
-					   region_model_context *ctxt)
+region_model::update_for_return_gcall (const gcall *call_stmt,
+             			       region_model_context *ctxt)
 {
   /* Get the region for the result of the call, within the caller frame.  */
   const region *result_dst_reg = NULL;
-  const gcall *call_stmt = return_edge.get_call_stmt ();
   tree lhs = gimple_call_lhs (call_stmt);
   if (lhs)
     {
       /* Normally we access the top-level frame, which is:
-	   path_var (expr, get_stack_depth () - 1)
-	 whereas here we need the caller frame, hence "- 2" here.  */
+         path_var (expr, get_stack_depth () - 1)
+         whereas here we need the caller frame, hence "- 2" here.  */
       gcc_assert (get_stack_depth () >= 2);
       result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
-				   ctxt);
+           			   ctxt);
     }
 
   pop_frame (result_dst_reg, NULL, ctxt);
 }
 
+/* Extract calling infromation from the superedge and update the model for the 
+   call  */
+
+void
+region_model::update_for_call_superedge (const call_superedge &call_edge,
+					 region_model_context *ctxt)
+{
+  const gcall *call_stmt = call_edge.get_call_stmt ();
+  update_for_gcall (call_stmt,ctxt);
+}
+
+/* Extract calling infromation from the return superedge and update the model 
+   for the returning call */
+
+void
+region_model::update_for_return_superedge (const return_superedge &return_edge,
+					   region_model_context *ctxt)
+{
+  const gcall *call_stmt = return_edge.get_call_stmt ();
+  update_for_return_gcall (call_stmt, ctxt);
+}
+
 /* Update this region_model with a summary of the effect of calling
    and returning from CG_SEDGE.
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 734ec601237..a15bc9e2f6d 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -589,6 +589,12 @@ class region_model
 			      region_model_context *ctxt,
 			      rejected_constraint **out);
 
+  void update_for_gcall (const gcall *call_stmt,
+			 region_model_context *ctxt);
+  
+  void update_for_return_gcall (const gcall *call_stmt,
+				region_model_context *ctxt);
+
   const region *push_frame (function *fun, const vec<const svalue *> *arg_sids,
 			    region_model_context *ctxt);
   const frame_region *get_current_frame () const { return m_current_frame; }
diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index e82ea87e735..c4050c2cf9a 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -377,18 +377,32 @@ state_purge_per_ssa_name::process_point (const function_point &point,
 	  {
 	    /* Add any intraprocedually edge for a call.  */
 	    if (snode->m_returning_call)
-	      {
-		cgraph_edge *cedge
+	    {
+	      gcall *returning_call = snode->m_returning_call;
+	      cgraph_edge *cedge
 		  = supergraph_call_edge (snode->m_fun,
-					  snode->m_returning_call);
-		gcc_assert (cedge);
-		superedge *sedge
-		  = map.get_sg ().get_intraprocedural_edge_for_call (cedge);
-		gcc_assert (sedge);
-		add_to_worklist
-		  (function_point::after_supernode (sedge->m_src),
-		   worklist, logger);
-	      }
+					  returning_call);
+	      if(!cedge)
+	        {
+	          supernode *callernode 
+	            = map.get_sg ().get_supernode_for_stmt (returning_call);
+
+	          gcc_assert (callernode);
+	          add_to_worklist 
+	            (function_point::after_supernode (callernode),
+		     worklist, logger);
+	        }
+	      else
+	        {
+		  gcc_assert (cedge);
+		  superedge *sedge
+		    = map.get_sg ().get_intraprocedural_edge_for_call (cedge);
+		  gcc_assert (sedge);
+		  add_to_worklist 
+		    (function_point::after_supernode (sedge->m_src),
+		     worklist, logger);
+	         }
+	    }
 	  }
       }
       break;
diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index 8611d0f8689..598965dd7fc 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -183,11 +183,33 @@ supergraph::supergraph (logger *logger)
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
 	      m_stmt_uids.make_uid_unique (stmt);
 	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
-		{
-		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
-		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
-		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
-		}
+    		{
+    		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
+    		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt),
+    		   			     NULL);
+    		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
+    		}
+	       else
+	        {
+	          // maybe call is via a function pointer
+	          if (gcall *call = dyn_cast<gcall *> (stmt))
+	          {
+	            cgraph_edge *edge 
+		      = cgraph_node::get (fun->decl)->get_edge (stmt);
+	            if (!edge || !edge->callee)
+	            {
+	              supernode *old_node_for_stmts = node_for_stmts;
+	              node_for_stmts = add_node (fun, bb, call, NULL);
+
+	              superedge *sedge 
+	                = new callgraph_superedge (old_node_for_stmts,
+	                  			   node_for_stmts,
+	                  			   SUPEREDGE_INTRAPROCEDURAL_CALL,
+	                  			   NULL);
+	              add_edge (sedge);
+	            }
+	          }
+	        }
 	    }
 
 	  m_bb_to_final_node.put (bb, node_for_stmts);
diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
index f4090fd5e0e..6ca8b418224 100644
--- a/gcc/analyzer/supergraph.h
+++ b/gcc/analyzer/supergraph.h
@@ -268,6 +268,11 @@ class supernode : public dnode<supergraph_traits>
     return i;
   }
 
+  gcall *get_returning_call () const
+  {
+    return m_returning_call;
+  }
+
   gimple *get_last_stmt () const
   {
     if (m_stmts.length () == 0)
diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
new file mode 100644
index 00000000000..c62510c026f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
@@ -0,0 +1,25 @@
+/*Test to see if the analyzer detect and analyze calls via 
+  fucntion pointers or not. */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+void fun(int *int_ptr)
+{
+	free(int_ptr); /* { dg-warning "double-'free' of 'int_ptr'" } */
+}
+
+void single_call()
+{
+	int *int_ptr = (int*)malloc(sizeof(int));
+	void (*fun_ptr)(int *) = &fun;
+	(*fun_ptr)(int_ptr);
+}
+
+void double_call()
+{
+	int *int_ptr = (int*)malloc(sizeof(int));
+	void (*fun_ptr)(int *) = &fun;
+	(*fun_ptr)(int_ptr);
+	(*fun_ptr)(int_ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr100546.c b/gcc/testsuite/gcc.dg/analyzer/pr100546.c
new file mode 100644
index 00000000000..3349d4067af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr100546.c
@@ -0,0 +1,17 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+static void noReturn(const char *str) __attribute__((noreturn));
+static void noReturn(const char *str) {
+    printf("%s\n", str);
+    exit(1);
+}
+
+void (*noReturnPtr)(const char *str) = &noReturn;
+
+int main(int argc, char **argv) {
+    char *str = 0;
+    if (!str)
+        noReturnPtr(__FILE__);
+    return printf("%c\n", *str);
+}
-- 
2.32.0


[-- Attachment #3: v_table.patch --]
[-- Type: application/octet-stream, Size: 7195 bytes --]

From f9bbc8080a5f40c9cbb44addf390c4dcaf233cad Mon Sep 17 00:00:00 2001
From: Ankur Saini <arsenic@sourceware.org>
Date: Thu, 29 Jul 2021 17:20:10 +0530
Subject: [PATCH] analyzer: detect and analyzer vfunc calls

---
 gcc/analyzer/engine.cc        | 56 +++++++++++++++++++++++++++++++----
 gcc/analyzer/program-state.cc |  5 ++--
 gcc/analyzer/program-state.h  |  3 +-
 gcc/analyzer/region-model.cc  | 11 ++++---
 gcc/analyzer/region-model.h   |  3 +-
 5 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 773fda144b0..ec990ad22ec 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include <zlib.h>
 #include "plugin.h"
 #include "target.h"
+#include "ipa-utils.h"
 
 /* For an overview, see gcc/doc/analyzer.texi.  */
 
@@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
 	unknown_side_effects = false;
     }
 
+  /* If the statmement is a polymorphic call then assume 
+     there are no side effects.  */
+  gimple *call_stmt = const_cast<gimple *>(stmt);
+  if (gcall *call = dyn_cast<gcall *> (call_stmt))
+  {
+    function *fun = this->get_function();
+    cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (call);
+    if ((e && e->indirect_info) && (e->indirect_info->polymorphic))
+        unknown_side_effects = false;
+  }
+
   on_stmt_post (stmt, state, unknown_side_effects, &ctxt);
 
   return on_stmt_flags ();
@@ -3040,7 +3052,6 @@ exploded_graph::create_dynamic_call (const gcall *call,
                                      uncertainty_t *uncertainty)
 {
   const program_point *this_point = &node->get_point ();
-  // assert for fn_decl ?
   function *fun = DECL_STRUCT_FUNCTION (fn_decl);
   if (fun)
     {
@@ -3055,7 +3066,7 @@ exploded_graph::create_dynamic_call (const gcall *call,
 
       new_point.push_to_call_stack (sn_exit,
   				    next_point.get_supernode());
-      next_state.push_call (*this, node, call, uncertainty);
+      next_state.push_call (*this, node, call, uncertainty, fn_decl);
 
       // TODO: add some logging here regarding dynamic call
       
@@ -3327,9 +3338,44 @@ exploded_graph::process_node (exploded_node *node)
                 region_model *model = state.m_region_model;
 
                 /* Call is possibly happening via a function pointer.  */
-                if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
-                  create_dynamic_call (call, fn_decl, node, next_state,
-                                       next_point, &uncertainty);
+                if (tree fn_decl = model->get_fndecl_for_call (call,&ctxt))
+                  create_dynamic_call (call,
+                  		       fn_decl,
+                  		       node,
+                  		       next_state,
+                                       next_point,
+                                       &uncertainty);
+                else
+                  {
+                    /* Call is possibly a polymorphic call.
+
+                       In such case, use devirtisation tools to find 
+                       possible callees of this function call and let the 
+                       analyzer specluate them all.  */
+		    function *fun = node->get_function ();
+                    gcall *stmt  = const_cast<gcall *>(call);
+                    cgraph_edge *e
+                      = cgraph_node::get (fun->decl)->get_edge (stmt);
+                    if (e->indirect_info->polymorphic)
+                      {
+			void *cache_token;
+			bool final;
+			vec <cgraph_node *> targets
+			  = possible_polymorphic_call_targets (e,
+							       &final,
+							       &cache_token,
+							       true);
+	       		for (cgraph_node *x : targets)
+	       		  {
+	       		    create_dynamic_call (stmt,
+						 x->decl,
+						 node,
+						 next_state,
+						 next_point,
+						 &uncertainty);
+	       		  }
+                      }
+                  }
               }
 
 	    if (!node->on_edge (*this, succ, &next_point, &next_state,
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index eb05994f0ef..b1c1d67bb8b 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1042,7 +1042,8 @@ void
 program_state::push_call (exploded_graph &eg,
       			  exploded_node *enode,
       			  const gcall *call_stmt,
-      			  uncertainty_t *uncertainty)
+      			  uncertainty_t *uncertainty,
+      			  tree fn_decl)
 {
   /* Update state.  */
   const program_point &point = enode->get_point ();
@@ -1053,7 +1054,7 @@ program_state::push_call (exploded_graph &eg,
           			  this,
           			  uncertainty,
           			  last_stmt);
-  m_region_model->update_for_gcall (call_stmt, &ctxt);
+  m_region_model->update_for_gcall(call_stmt, &ctxt, fn_decl);
 }
 
 /* Update this program_state to reflect a return from function
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index 658dbb69075..f4bd4cbcf49 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -221,7 +221,8 @@ public:
   void push_call (exploded_graph &eg,
       		  exploded_node *enode,
       		  const gcall *call_stmt,
-      		  uncertainty_t *uncertainty);
+      		  uncertainty_t *uncertainty,
+      		  tree fn_decl = NULL);
 
   void returning_call (exploded_graph &eg,
       		       exploded_node *enode,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 1e86d1f3bf8..935a8c6431a 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3140,7 +3140,8 @@ region_model::maybe_update_for_edge (const superedge &edge,
 
 void
 region_model::update_for_gcall (const gcall *call_stmt,
-				region_model_context *ctxt)
+				region_model_context *ctxt,
+				tree fn_decl)
 {
   /* Build a vec of argument svalues, using the current top
      frame for resolving tree expressions.  */
@@ -3151,9 +3152,11 @@ region_model::update_for_gcall (const gcall *call_stmt,
       tree arg = gimple_call_arg (call_stmt, i);
       arg_svals.quick_push (get_rvalue (arg, ctxt));
     }
-
-  /* Get the function * from the call.  */
-  tree fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+  
+  /* Get the fn_decl from the call if not provided as argument.  */
+  if (!fn_decl)
+    fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+ 
   function *fun = DECL_STRUCT_FUNCTION (fn_decl);
   push_frame (fun, &arg_svals, ctxt);
 }
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index a15bc9e2f6d..f06788771dc 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -590,7 +590,8 @@ class region_model
 			      rejected_constraint **out);
 
   void update_for_gcall (const gcall *call_stmt,
-			 region_model_context *ctxt);
+			 region_model_context *ctxt,
+			 tree fn_decl = NULL);
   
   void update_for_return_gcall (const gcall *call_stmt,
 				region_model_context *ctxt);
-- 
2.32.0


[-- Attachment #4: Type: text/plain, Size: 20 bytes --]



Thank you 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-29 12:50                                                         ` Ankur Saini
@ 2021-07-30  0:05                                                           ` David Malcolm
       [not found]                                                             ` <ACE21DBF-8163-4F28-B755-6B05FDA27A0E@gmail.com>
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-30  0:05 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Thu, 2021-07-29 at 18:20 +0530, Ankur Saini wrote:
> I have attached the patches(one is the updated version of previous
> patch to 
> detect calls via function pointers) of the changed done to make the
> analyzer 
> understand the calls to virtual functions for initial review. 
> 
> 1. I decided to make a dedicated function to create enodes and eedges
> for the 
> dynamically discovered calls as I found myself using the exact same
> peice of 
> code again to analyse vfunc calls.

Makes sense.

> 
> 2. Boostaraping and testing of these changes are underway.
> 
> 3. Regarding the regress tests that have to be added to test
> functionality of 
> vfunc extension patch :
> Should I add many test files for different types of inheritences or
> should I 
> add one ( or two ) test files, with a lot of fucntions in them testing
> different 
> types of calls ?

Both approaches have merit, and there's an element of personal taste.

I find that during development and debugging it's handy to have the
tests broken out into individual files, but it's good to eventually
combine the tests to minimize the number of invocations that the test
harness has to do.

That said, interprocedural tests tend to be fiddly, so it's often good
to keep these in separate files.

I tend to combine my tests and add them to git, and then to temporarily
trim them down when debugging them to minimize the amoung of unrelated
stuff I'm having to look at when debugging, knowing that git has the
full version saved.

I hope that answers your question.

> 
> ---
> Patches :

This isn't a full review, but...

fn_ptr.patch:

> diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> new file mode 100644
> index 00000000000..c62510c026f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
> @@ -0,0 +1,25 @@
> +/*Test to see if the analyzer detect and analyze calls via 
> +  fucntion pointers or not. */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +void fun(int *int_ptr)
> +{
> +	free(int_ptr); /* { dg-warning "double-'free' of 'int_ptr'" } */
> +}
> +
> +void single_call()
> +{
> +	int *int_ptr = (int*)malloc(sizeof(int));
> +	void (*fun_ptr)(int *) = &fun;
> +	(*fun_ptr)(int_ptr);
> +}
> +
> +void double_call()
> +{
> +	int *int_ptr = (int*)malloc(sizeof(int));
> +	void (*fun_ptr)(int *) = &fun;
> +	(*fun_ptr)(int_ptr);
> +	(*fun_ptr)(int_ptr);
> +}

...thinking back to our discussion about events, it would be good to
verify that the analyzer is emitting them.  You can put directives
like:

   /* { dg-message "calling 'fun' from 'double_call'" } */

on the appropriate lines to test this via DejaGnu.

"analyzer: detect and analyzer vfunc calls"

[...snip...]

> @@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
>  	unknown_side_effects = false;
>      }
>  
> +  /* If the statmement is a polymorphic call then assume 
> +     there are no side effects.  */
> +  gimple *call_stmt = const_cast<gimple *>(stmt);
> +  if (gcall *call = dyn_cast<gcall *> (call_stmt))
> +  {
> +    function *fun = this->get_function();
> +    cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (call);
> +    if ((e && e->indirect_info) && (e->indirect_info->polymorphic))
> +        unknown_side_effects = false;
> +  }
> +

This seems wrong; surely it depends on what the call is - or am I
missing something?  Is the issue that we're speculating lots of
possibilities as dynamic calls?  If so, would it be better to terminate
the remaining analysis path (if that makes sense), and assume that any
further analysis happens on extra edges added for the speculated calls?

FWIW I've been experimenting with adding "bifurcation" support so that
you can do:
  program_state *other = ctxt->bifurcate ();
and have it split the analysis into states (e.g. for handling realloc,
so that we can split into 3 states: "succeeded", "succeeded but moved",
"failed").  Unfortunately my code for this is a mess (it's a hacked up
prototype).  Should I try to post what I have for this?


[...snip...]


> @@ -3327,9 +3338,44 @@ exploded_graph::process_node (exploded_node *node)
>                  region_model *model = state.m_region_model;
>  
>                  /* Call is possibly happening via a function pointer.  */
> -                if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
> -                  create_dynamic_call (call, fn_decl, node, next_state,
> -                                       next_point, &uncertainty);
> +                if (tree fn_decl = model->get_fndecl_for_call (call,&ctxt))
> +                  create_dynamic_call (call,
> +                  		       fn_decl,
> +                  		       node,
> +                  		       next_state,
> +                                       next_point,
> +                                       &uncertainty);
> +                else
> +                  {
> +                    /* Call is possibly a polymorphic call.
> +
> +                       In such case, use devirtisation tools to find 
> +                       possible callees of this function call and let the 
> +                       analyzer specluate them all.  */
> +		    function *fun = node->get_function ();
> +                    gcall *stmt  = const_cast<gcall *>(call);
> +                    cgraph_edge *e
> +                      = cgraph_node::get (fun->decl)->get_edge (stmt);
> +                    if (e->indirect_info->polymorphic)
> +                      {
> +			void *cache_token;
> +			bool final;
> +			vec <cgraph_node *> targets
> +			  = possible_polymorphic_call_targets (e,
> +							       &final,
> +							       &cache_token,
> +							       true);
> +	       		for (cgraph_node *x : targets)
> +	       		  {
> +	       		    create_dynamic_call (stmt,
> +						 x->decl,
> +						 node,
> +						 next_state,
> +						 next_point,
> +						 &uncertainty);
> +	       		  }
> +                      }
> +                  }

Interesting.

Caveat: I'm not familiar with the devirt code.

If we're speculating that a particular call happens, do we gain
information about what kind of object we're dealing with?

If we have a repeated call to the same vfunc, do we know that we're
calling the same function?

[...snip...]

I'm interested in seeing what test cases you have.


Hope this is helpful
Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
       [not found]                                                             ` <ACE21DBF-8163-4F28-B755-6B05FDA27A0E@gmail.com>
@ 2021-07-30 14:48                                                               ` David Malcolm
  2021-08-03 16:12                                                                 ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-07-30 14:48 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Fri, 2021-07-30 at 18:11 +0530, Ankur Saini wrote:
> 
> 
> > On 30-Jul-2021, at 5:35 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Thu, 2021-07-29 at 18:20 +0530, Ankur Saini wrote:

[..snip...]
> > > 
> 
> > 
> > > @@ -1242,6 +1243,17 @@ exploded_node::on_stmt (exploded_graph &eg,
> > >         unknown_side_effects = false;
> > >     }
> > > 
> > > +  /* If the statmement is a polymorphic call then assume 
> > > +     there are no side effects.  */
> > > +  gimple *call_stmt = const_cast<gimple *>(stmt);
> > > +  if (gcall *call = dyn_cast<gcall *> (call_stmt))
> > > +  {
> > > +    function *fun = this->get_function();
> > > +    cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge
> > > (call);
> > > +    if ((e && e->indirect_info) && (e->indirect_info-
> > > >polymorphic))
> > > +        unknown_side_effects = false;
> > > +  }
> > > +
> > 
> > This seems wrong; surely it depends on what the call is - or am I
> > missing something?  Is the issue that we're speculating lots of
> > possibilities as dynamic calls?  If so, would it be better to
> > terminate
> > the remaining analysis path (if that makes sense), and assume that
> > any
> > further analysis happens on extra edges added for the speculated
> > calls?
> 
> Actually the issue here was, the analyzer was not able to find the body
> of callee function here and was treating all polymorphic calls as “call
> to unknown functions” and resetting all of the state machines and not
> generating desired diagnostics. I just changed it to assume there is no
> side effect of the call right now. 
> should I maybe eventually check for the unknown call later ( when the
> anayzer knows which function it is calling ) ? 

I'm not sure.  I think it at least warrants a "FIXME" style comment in
the code above.  It also suggests some more test cases, to cover the
calls does have side effects vs call doesn't have side effects: e.g. a
case where, say, A:foo() modifies a global variable, and B::foo()
doesn't; and something like:
  int saved_g = g;
  a->foo ();
  __analyzer_eval (saved_g == g);
could be used in DejaGnu to see if we know what got called.


> 
> > 
> > FWIW I've been experimenting with adding "bifurcation" support so
> > that
> > you can do:
> >  program_state *other = ctxt->bifurcate ();
> > and have it split the analysis into states (e.g. for handling
> > realloc,
> > so that we can split into 3 states: "succeeded", "succeeded but
> > moved",
> > "failed").  Unfortunately my code for this is a mess (it's a hacked
> > up
> > prototype).  Should I try to post what I have for this?
> 
> This surely looks like a thing which the project can take advantage of,
> maybe by bifurcating the analysis at virtual function calls. 

(nods)

I have some non-analyzer tasks to focus on today, so I'm not going to
have that code ready until next week.

[..snip...]

> > 
> > If we're speculating that a particular call happens, do we gain
> > information about what kind of object we're dealing with?
> 
> I think I used the wrong wording there, we are not quite "speculating”
> the calls, but are calling and analysing them instead.
> 
> > 
> > If we have a repeated call to the same vfunc, do we know that we're
> > calling the same function?
> 
> yes, this is a problem I didn’t foresee. When a call can have multiple
> targets, the analyser acts as if all of those functions are called from
> that point and then on second function call, it will do the same,
> leading to some weird diagnostic results. Looks like this is where the
> "bifurcation” might come in handy
> 
> But for a lot of simple cases, apparently the
> possible_polymorphic_call_targets () do the job good enough to only
> give out a single target accurately ( even when a base class pointer is
> used )
> 
> > 
> > [...snip...]
> > 
> > I'm interested in seeing what test cases you have.
> 
> I was testing it on a couple simple test programs ( which I think are
> not enough now ). 
> - - - 
> 1.  https://godbolt.org/z/qboq35bar <https://godbolt.org/z/qboq35bar> (
> analyser doesn’t generate any warnings as this one was just to check if
> the analyser is detecting calls correctly or not  )
> 
> super graph and exploded graph :-

FWIW some of these examples are hitting the complexity limit in
eg_traits::dump_args_t::show_enode_details_p and thus not showing the
details, (which is probably a good thing; we don't want to be sending
huge attachments to the list).  Unfortunately, beyond a certain point,
the .dot dumps get unreadable.

Looking at test_1.cpp.eg.dot I see that after the constructor runs, the
state has e.g.:

  cluster for A a: (&constexpr int (* A::_ZTV1A [4])(...)+(sizetype)16)

which I think means that we "know" that the object vtable is the vtable
for A i.e. that this object is an A:

  $ echo "_ZTV1A" | c++filt
  vtable for A

The cluster dump is simplified if the value is for the whole object,
which is happening here, so the dump might be clearer with an example
that adds some member data in the ctor.  For that case, in the state
after the ctor returns I expect you will see the cluster has a ptr to
the vtable, then the member data.

So in an interprocedural example where a ctor runs, the state should
capture what actual subclass the instance is after the ctor has
returned.

I wonder if it's possible to make use of that knowledge when handling a
virtual function call.

An example might be:

struct A
{
  A (int data) : m_data (data) {}
  virtual char foo () const { return 'A'; }
  int m_data;
};
class B : public A
{

  B (int data) : A (data) {}
  virtual char foo () const { return 'B'; }
  int m_data;
}

A *make_a (int v)  __atttribute__(noinline)) { return new A(v); }
B *make_b (int v)  __atttribute__(noinline)) { return new B(v); }

void test_base (int v)
{
  A *base = make_a (v);
  __analyzer_eval (base->foo () == 'A'); // should be TRUE
  delete base;
}

void test_sub (int v)
{
  A *sub = make_b (v);
  __analyzer_eval (sub->foo () == 'B'); // should be TRUE
  delete sub;
}

where the "noinline" factory functions should hide the specific
subclasses from the optimizer, so the analyzer has to rely on the
vtables bound to the clusters in the store, but in theory can tell that
"base" is specifically an A, and that "sub" is specifically a B, and
hence "know" exactly what is called at each foo vfunc callsite.

(caveat: untested code)

Hope the above makes sense.

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-07-30 14:48                                                               ` David Malcolm
@ 2021-08-03 16:12                                                                 ` Ankur Saini
  2021-08-04 16:02                                                                   ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-08-03 16:12 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

AIM: 

- Transfer the vfunc handling code to region_model::get_fndecl_for_call ()
- Filter out a possible targets of a polymorphic call to only one most porbable target

---
PROGRESS :

- I decided to transfer the code of detecting virtual call to region_model::get_fndecl_for_call () so that the analyzer kind of "devirtualise" polymorphic calls to give a single accurate fn_decl of the possible target .
This makes it possible to fix the part where I had to make analyzer assume a call to have no side effect when it is a polymorphic call when analysing a call stmt.

- The way analyzer is more capable to see through a polymorphic call is the fact that state of the expoloded node at the time of call knows what subclass the pointer which is being used to call a vfunc is actually pointing to.
( here is an example showing the same https://godbolt.org/z/8MWx58dWo )

- So currently I am working on a way to extract this info from the state and use it to find the most accurate target amongst all possible targets of a polymorphic call we already have, and let the analyzer only call one function at the callsite.
Current idea is to evaluate both's ( the possible fn_decl and the pointee of the pointer used to call that is used to call vfunc ) DECL_CONTEXT to see if we find a match.

---
STATUS AT THE END OF THE DAY :- 

- Transfer the vfunc handling code to region_model::get_fndecl_for_call () (done )
- Filter out a possible targets of a polymorphic call to only one most porbable target ( pending )

Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-08-03 16:12                                                                 ` Ankur Saini
@ 2021-08-04 16:02                                                                   ` Ankur Saini
  2021-08-04 23:26                                                                     ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-08-04 16:02 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 5868 bytes --]

AIM for today: 

- Extract out the pointer that is being used to call the vfunc from the current region.
- Search it's regions to find out which subclass the pointer is actually pointing to.
- Make use of this information to filter out one most probable call to function out of all of the possible functions that can be called at that callsite.

—

PROGRESS :

- From observation, a typical vfunc call that isn't devirtualised by the compiler's front end looks something like this 
"OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
where "a_ptr_5(D)" is pointer that is being used to call the virtual function.

- We can access it's region to see what is the type of the object the pointer is actually pointing to.

- This is then used to find a call with DECL_CONTEXT of the object from the all the possible targets of that polymorphic call.

- The changes can be tested on refs/users/arsenic/heads/polymorphic_cal branch of the repository.

- I tested the changes with on the following test program ( https://godbolt.org/z/fqrsE1d84 )

And it successfully provided the following analysis :

```
/Users/ankursaini/Desktop/test.cpp: In member function ‘virtual int B::deallocate()’:
/Users/ankursaini/Desktop/test.cpp:25:13: warning: double-‘free’ of ‘b.B::ptr’ [CWE-415] [-Wanalyzer-double-free]
   25 |         free(ptr);
      |         ~~~~^~~~~
  ‘void test()’: events 1-2
    |
    |   35 | void test()
    |      |      ^~~~
    |      |      |
    |      |      (1) entry to ‘test’
    |......
    |   40 |     b.allocate();
    |      |     ~~~~~~~~~~~~
    |      |               |
    |      |               (2) calling ‘B::allocate’ from ‘test’
    |
    +--> ‘void B::allocate()’: events 3-4
           |
           |   19 |     void allocate ()
           |      |          ^~~~~~~~
           |      |          |
           |      |          (3) entry to ‘B::allocate’
           |   20 |     {
           |   21 |         ptr = (int*)malloc(sizeof(int));
           |      |                     ~~~~~~~~~~~~~~~~~~~
           |      |                           |
           |      |                           (4) allocated here
           |
    <------+
    |
  ‘void test()’: events 5-6
    |
    |   40 |     b.allocate();
    |      |     ~~~~~~~~~~^~
    |      |               |
    |      |               (5) returning to ‘test’ from ‘B::allocate’
    |   41 |     foo(aptr);
    |      |     ~~~~~~~~~  
    |      |        |
    |      |        (6) calling ‘foo’ from ‘test’
    |
    +--> ‘void foo(A*)’: events 7-8
           |
           |   30 | void foo(A *a_ptr)
           |      |      ^~~
           |      |      |
           |      |      (7) entry to ‘foo’
           |   31 | {
           |   32 |     printf("%d\n",a_ptr->deallocate());
           |      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           |      |           |
           |      |           (8) calling ‘B::deallocate’ from ‘foo’
           |
           +--> ‘virtual int B::deallocate()’: events 9-10
                  |
                  |   23 |     int deallocate (void)
                  |      |         ^~~~~~~~~~
                  |      |         |
                  |      |         (9) entry to ‘B::deallocate’
                  |   24 |     {
                  |   25 |         free(ptr);
                  |      |         ~~~~~~~~~
                  |      |             |
                  |      |             (10) first ‘free’ here
                  |
           <------+
           |
         ‘void foo(A*)’: event 11
           |
           |   32 |     printf("%d\n",a_ptr->deallocate());
           |      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
           |      |           |
           |      |           (11) returning to ‘foo’ from ‘B::deallocate’
           |
    <------+
    |
  ‘void test()’: events 12-13
    |
    |   41 |     foo(aptr);
    |      |     ~~~^~~~~~
    |      |        |
    |      |        (12) returning to ‘test’ from ‘foo’
    |......
    |   45 |     foo(aptr);
    |      |     ~~~~~~~~~
    |      |        |
    |      |        (13) calling ‘foo’ from ‘test’
    |
    +--> ‘void foo(A*)’: events 14-15
           |
           |   30 | void foo(A *a_ptr)
           |      |      ^~~
           |      |      |
           |      |      (14) entry to ‘foo’
           |   31 | {
           |   32 |     printf("%d\n",a_ptr->deallocate());
           |      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           |      |           |
           |      |           (15) calling ‘B::deallocate’ from ‘foo’
           |
           +--> ‘virtual int B::deallocate()’: events 16-17
                  |
                  |   23 |     int deallocate (void)
                  |      |         ^~~~~~~~~~
                  |      |         |
                  |      |         (16) entry to ‘B::deallocate’
                  |   24 |     {
                  |   25 |         free(ptr);
                  |      |         ~~~~~~~~~
                  |      |             |
                  |      |             (17) second ‘free’ here; first ‘free’ was at (10)
                  |
```

—

STATUS AT THE END OF THE DAY :- 

- Extract out the pointer that is being used to call the vfunc from the current region. (done)
- Search it's regions to find out which subclass the pointer is actually pointing to. (done)
- Make use of this information to filter out one most probable call to function out of all of the possible functions that can be called at that call-site. (done)

--- 

Patch file ( prototype ) : 



[-- Attachment #2: indirect_calls.patch --]
[-- Type: application/octet-stream, Size: 41244 bytes --]

From 57120cbe7269e857be179a7adc024ffca618d074 Mon Sep 17 00:00:00 2001
From: Ankur Saini <arsenic@sourceware.org>
Date: Thu, 29 Jul 2021 15:48:07 +0530
Subject: [PATCH 1/2] analyzer: detect and analyze calls via function pointer

---
 gcc/analyzer/analysis-plan.cc                 |   4 +
 gcc/analyzer/checker-path.cc                  |  28 +--
 gcc/analyzer/checker-path.h                   |   6 +
 gcc/analyzer/diagnostic-manager.cc            |  19 +-
 gcc/analyzer/engine.cc                        | 167 +++++++++++++++++-
 gcc/analyzer/exploded-graph.h                 |  35 ++++
 gcc/analyzer/program-point.cc                 |  18 ++
 gcc/analyzer/program-point.h                  |   3 +-
 gcc/analyzer/program-state.cc                 |  44 +++++
 gcc/analyzer/program-state.h                  |  11 ++
 gcc/analyzer/region-model.cc                  |  44 +++--
 gcc/analyzer/region-model.h                   |   6 +
 gcc/analyzer/state-purge.cc                   |  36 ++--
 gcc/analyzer/supergraph.cc                    |  32 +++-
 gcc/analyzer/supergraph.h                     |   5 +
 .../gcc.dg/analyzer/function-ptr-4.c          |  25 +++
 gcc/testsuite/gcc.dg/analyzer/pr100546.c      |  17 ++
 17 files changed, 444 insertions(+), 56 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr100546.c

diff --git a/gcc/analyzer/analysis-plan.cc b/gcc/analyzer/analysis-plan.cc
index 7dfc48e9c3e..57a6dcb1f6e 100644
--- a/gcc/analyzer/analysis-plan.cc
+++ b/gcc/analyzer/analysis-plan.cc
@@ -109,6 +109,10 @@ analysis_plan::use_summary_p (const cgraph_edge *edge) const
   if (!flag_analyzer_call_summaries)
     return false;
 
+  /* Don't use call summaries if there is no callgraph edge */
+  if (!edge || !edge->callee)
+    return false;
+
   /* TODO: don't count callsites each time.  */
   int num_call_sites = 0;
   const cgraph_node *callee = edge->callee;
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index e10c8e2bb7c..e132f003470 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -614,7 +614,11 @@ call_event::call_event (const exploded_edge &eedge,
 			location_t loc, tree fndecl, int depth)
 : superedge_event (EK_CALL_EDGE, eedge, loc, fndecl, depth)
 {
-  gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_CALL);
+  if (eedge.m_sedge)
+    gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_CALL);
+
+   m_src_snode = eedge.m_src->get_supernode ();
+   m_dest_snode = eedge.m_dest->get_supernode ();
 }
 
 /* Implementation of diagnostic_event::get_desc vfunc for
@@ -638,8 +642,8 @@ call_event::get_desc (bool can_colorize) const
       label_text custom_desc
 	= m_pending_diagnostic->describe_call_with_state
 	    (evdesc::call_with_state (can_colorize,
-				      m_sedge->m_src->m_fun->decl,
-				      m_sedge->m_dest->m_fun->decl,
+				      m_src_snode->m_fun->decl,
+				      m_dest_snode->m_fun->decl,
 				      var,
 				      m_critical_state));
       if (custom_desc.m_buffer)
@@ -648,8 +652,8 @@ call_event::get_desc (bool can_colorize) const
 
   return make_label_text (can_colorize,
 			  "calling %qE from %qE",
-			  m_sedge->m_dest->m_fun->decl,
-			  m_sedge->m_src->m_fun->decl);
+			  m_dest_snode->m_fun->decl,
+			  m_src_snode->m_fun->decl);
 }
 
 /* Override of checker_event::is_call_p for calls.  */
@@ -668,7 +672,11 @@ return_event::return_event (const exploded_edge &eedge,
 			    location_t loc, tree fndecl, int depth)
 : superedge_event (EK_RETURN_EDGE, eedge, loc, fndecl, depth)
 {
-  gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_RETURN);
+  if (eedge.m_sedge)
+    gcc_assert (eedge.m_sedge->m_kind == SUPEREDGE_RETURN);
+
+  m_src_snode = eedge.m_src->get_supernode ();
+  m_dest_snode = eedge.m_dest->get_supernode ();
 }
 
 /* Implementation of diagnostic_event::get_desc vfunc for
@@ -694,16 +702,16 @@ return_event::get_desc (bool can_colorize) const
       label_text custom_desc
 	= m_pending_diagnostic->describe_return_of_state
 	    (evdesc::return_of_state (can_colorize,
-				      m_sedge->m_dest->m_fun->decl,
-				      m_sedge->m_src->m_fun->decl,
+				      m_dest_snode->m_fun->decl,
+				      m_src_snode->m_fun->decl,
 				      m_critical_state));
       if (custom_desc.m_buffer)
 	return custom_desc;
     }
   return make_label_text (can_colorize,
 			  "returning to %qE from %qE",
-			  m_sedge->m_dest->m_fun->decl,
-			  m_sedge->m_src->m_fun->decl);
+			  m_dest_snode->m_fun->decl,
+			  m_src_snode->m_fun->decl);
 }
 
 /* Override of checker_event::is_return_p for returns.  */
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 1843c4bc7b4..27634c20864 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -338,6 +338,9 @@ public:
   label_text get_desc (bool can_colorize) const FINAL OVERRIDE;
 
   bool is_call_p () const FINAL OVERRIDE;
+
+  const supernode *m_src_snode;
+  const supernode *m_dest_snode;
 };
 
 /* A concrete event subclass for an interprocedural return.  */
@@ -351,6 +354,9 @@ public:
   label_text get_desc (bool can_colorize) const FINAL OVERRIDE;
 
   bool is_return_p () const FINAL OVERRIDE;
+
+  const supernode *m_src_snode;
+  const supernode *m_dest_snode;
 };
 
 /* A concrete event subclass for the start of a consolidated run of CFG
diff --git a/gcc/analyzer/diagnostic-manager.cc b/gcc/analyzer/diagnostic-manager.cc
index 631fef6ad78..d7d9fa4c3d8 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -2060,18 +2060,17 @@ diagnostic_manager::prune_for_sm_diagnostic (checker_path *path,
 	case EK_CALL_EDGE:
 	  {
 	    call_event *event = (call_event *)base_event;
-	    const callgraph_superedge& cg_superedge
-	      = event->get_callgraph_superedge ();
 	    const region_model *callee_model
 	      = event->m_eedge.m_dest->get_state ().m_region_model;
+	    const region_model *caller_model
+	      = event->m_eedge.m_src->get_state ().m_region_model;
 	    tree callee_var = callee_model->get_representative_tree (sval);
 	    /* We could just use caller_model->get_representative_tree (sval);
 	       to get the caller_var, but for now use
 	       map_expr_from_callee_to_caller so as to only record critical
 	       state for parms and the like.  */
 	    callsite_expr expr;
-	    tree caller_var
-	      = cg_superedge.map_expr_from_callee_to_caller (callee_var, &expr);
+	    tree caller_var = caller_model->get_representative_tree (sval);
 	    if (caller_var)
 	      {
 		if (get_logger ())
@@ -2093,15 +2092,11 @@ diagnostic_manager::prune_for_sm_diagnostic (checker_path *path,
 	    if (sval)
 	      {
 		return_event *event = (return_event *)base_event;
-		const callgraph_superedge& cg_superedge
-		  = event->get_callgraph_superedge ();
-		const region_model *caller_model
-		  = event->m_eedge.m_dest->get_state ().m_region_model;
-		tree caller_var = caller_model->get_representative_tree (sval);
 		callsite_expr expr;
-		tree callee_var
-		  = cg_superedge.map_expr_from_caller_to_callee (caller_var,
-								 &expr);
+
+		const region_model *callee_model
+	      	  = event->m_eedge.m_src->get_state ().m_region_model;
+		tree callee_var = callee_model->get_representative_tree (sval);
 		if (callee_var)
 		  {
 		    if (get_logger ())
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index ee625fbdcdf..60bd46abdff 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -1627,6 +1627,50 @@ exploded_node::dump_succs_and_preds (FILE *outf) const
   }
 }
 
+/* class dynamic_call_info_t : public exploded_edge::custom_info_t.  */
+
+/* Implementation of exploded_edge::custom_info_t::update_model vfunc
+   for dynamic_call_info_t.
+
+   Update state for the dynamically discorverd calls */
+
+void
+dynamic_call_info_t::update_model (region_model *model,
+				   const exploded_edge &eedge)
+{
+  const program_state &dest_state = eedge.m_dest->get_state ();
+  *model = *dest_state.m_region_model;
+}
+
+/* Implementation of exploded_edge::custom_info_t::add_events_to_path vfunc
+   for dynamic_call_info_t.  */
+
+void
+dynamic_call_info_t::add_events_to_path (checker_path *emission_path,
+				   const exploded_edge &eedge)
+{
+  const exploded_node *src_node = eedge.m_src;
+  const program_point &src_point = src_node->get_point ();
+  const int src_stack_depth = src_point.get_stack_depth ();
+  const exploded_node *dest_node = eedge.m_dest;
+  const program_point &dest_point = dest_node->get_point ();
+  const int dest_stack_depth = dest_point.get_stack_depth ();
+
+  if (m_is_returning_call)
+    emission_path->add_event (new return_event (eedge, (m_dynamic_call
+	                   			        ? m_dynamic_call->location
+	           	   		                : UNKNOWN_LOCATION),
+	          	      dest_point.get_fndecl (),
+	          	      dest_stack_depth));
+  else
+    emission_path->add_event (new call_event (eedge, (m_dynamic_call
+	                   			      ? m_dynamic_call->location
+	           	   		              : UNKNOWN_LOCATION),
+	          	      src_point.get_fndecl (),
+	          	      src_stack_depth));
+
+}
+
 /* class rewind_info_t : public exploded_edge::custom_info_t.  */
 
 /* Implementation of exploded_edge::custom_info_t::update_model vfunc
@@ -2980,6 +3024,55 @@ state_change_requires_new_enode_p (const program_state &old_state,
   return false;
 }
 
+/* Create enodes and eedges for the function calls that doesn't have an 
+   underlying call superedge.
+
+   Such case occurs when GCC's middle end didn't know which function to
+   call but analyzer do (with the help of current state).
+
+   Some example such calls are dynamically dispatched calls to virtual
+   functions or calls that happen via function pointer.  */
+
+void
+exploded_graph::create_dynamic_call (const gcall *call,
+                                     tree fn_decl,
+                                     exploded_node *node,
+                                     program_state &next_state,
+                                     program_point &next_point,
+                                     uncertainty_t *uncertainty)
+{
+  const program_point *this_point = &node->get_point ();
+  // assert for fn_decl ?
+  function *fun = DECL_STRUCT_FUNCTION (fn_decl);
+  if (fun)
+    {
+      const supergraph &sg = this->get_supergraph ();
+      supernode * sn_entry = sg.get_node_for_function_entry (fun);
+      supernode * sn_exit = sg.get_node_for_function_exit (fun);
+
+      program_point new_point
+        = program_point::before_supernode (sn_entry,
+				           NULL,
+				           this_point->get_call_string ());
+
+      new_point.push_to_call_stack (sn_exit,
+  				    next_point.get_supernode());
+      next_state.push_call (*this, node, call, uncertainty);
+
+      // TODO: add some logging here regarding dynamic call
+      
+      if (next_state.m_valid)
+        {
+          exploded_node *enode = get_or_create_node (new_point,
+					             next_state,
+					             node);
+          if (enode)
+            add_edge (node,enode, NULL,
+          	      new dynamic_call_info_t (call));
+        }
+     }
+}
+
 /* The core of exploded_graph::process_worklist (the main analysis loop),
    handling one node in the worklist.
 
@@ -3174,10 +3267,13 @@ exploded_graph::process_node (exploded_node *node)
       break;
     case PK_AFTER_SUPERNODE:
       {
+        bool found_a_superedge = false;
+        bool is_an_exit_block = false;
 	/* If this is an EXIT BB, detect leaks, and potentially
 	   create a function summary.  */
 	if (point.get_supernode ()->return_p ())
 	  {
+	    is_an_exit_block = true;
 	    node->detect_leaks (*this);
 	    if (flag_analyzer_call_summaries
 		&& point.get_call_string ().empty_p ())
@@ -3205,6 +3301,7 @@ exploded_graph::process_node (exploded_node *node)
 	superedge *succ;
 	FOR_EACH_VEC_ELT (point.get_supernode ()->m_succs, i, succ)
 	  {
+	    found_a_superedge = true;
 	    if (logger)
 	      logger->log ("considering SN: %i -> SN: %i",
 			   succ->m_src->m_index, succ->m_dest->m_index);
@@ -3214,19 +3311,75 @@ exploded_graph::process_node (exploded_node *node)
 						 point.get_call_string ());
 	    program_state next_state (state);
 	    uncertainty_t uncertainty;
+
+	    /* Try to discover and analyse indirect function calls. */
+            if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
+            	&& !(succ->get_any_callgraph_edge ()))
+              {
+                const gcall *call
+                  = point.get_supernode ()->get_final_call ();
+
+                impl_region_model_context ctxt (*this,
+                                                node,
+                                                &state,
+                                                &next_state,
+                                                &uncertainty,
+                                                point.get_stmt());
+
+                region_model *model = state.m_region_model;
+
+                /* Call is possibly happening via a function pointer.  */
+                if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
+                  create_dynamic_call (call, fn_decl, node, next_state,
+                                       next_point, &uncertainty);
+              }
+
 	    if (!node->on_edge (*this, succ, &next_point, &next_state,
-				&uncertainty))
+			        &uncertainty))
 	      {
-		if (logger)
-		  logger->log ("skipping impossible edge to SN: %i",
-			       succ->m_dest->m_index);
-		continue;
+	        if (logger)
+	          logger->log ("skipping impossible edge to SN: %i",
+		               succ->m_dest->m_index);
+	        continue;
 	      }
-	    exploded_node *next = get_or_create_node (next_point, next_state,
-						      node);
+	    exploded_node *next = get_or_create_node (next_point,
+	    					      next_state,
+					              node);
 	    if (next)
 	      add_edge (node, next, succ);
 	  }
+
+        /* Return from the calls which doesn't have a return superedge.
+    	   Such case occurs when GCC's middle end didn't knew which function to
+    	   call but analyzer did.  */
+        if((is_an_exit_block && !found_a_superedge)
+           && (!point.get_call_string ().empty_p ()))
+          {
+            const call_string cs = point.get_call_string ();
+            program_point next_point
+              = program_point::before_supernode (cs.get_caller_node (),
+                                                 NULL,
+                                                 cs);
+            program_state next_state (state);
+            uncertainty_t uncertainty;
+
+            const gcall *call
+              = next_point.get_supernode ()->get_returning_call ();
+
+            if(call)
+              next_state.returning_call (*this, node, call, &uncertainty);
+
+            if (next_state.m_valid)
+              {
+                next_point.pop_from_call_stack ();
+                exploded_node *enode = get_or_create_node (next_point,
+                                                           next_state,
+                                                           node);
+                if (enode)
+                  add_edge (node, enode, NULL,
+                            new dynamic_call_info_t (call, true));
+              }
+          }
       }
       break;
     }
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 8f48d8a286c..291983ad7dc 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -362,6 +362,34 @@ private:
   DISABLE_COPY_AND_ASSIGN (exploded_edge);
 };
 
+/* Extra data for an exploded_edge that represents a dynamic call info ( calls
+   that doesn't have a superedge representing the call ).  */
+
+class dynamic_call_info_t : public exploded_edge::custom_info_t
+{
+public:
+  dynamic_call_info_t (const gcall *dynamic_call,
+  		       const bool is_returning_call = false)
+  : m_dynamic_call (dynamic_call), 
+    m_is_returning_call (is_returning_call)
+  {}
+
+  void print (pretty_printer *pp) FINAL OVERRIDE
+  {
+    pp_string (pp, "dynamic_call");
+  }
+
+  void update_model (region_model *model,
+		     const exploded_edge &eedge) FINAL OVERRIDE;
+
+  void add_events_to_path (checker_path *emission_path,
+			   const exploded_edge &eedge) FINAL OVERRIDE;
+private:
+  const gcall *m_dynamic_call;
+  const bool m_is_returning_call;
+};
+
+
 /* Extra data for an exploded_edge that represents a rewind from a
    longjmp to a setjmp (or from a siglongjmp to a sigsetjmp).  */
 
@@ -785,6 +813,13 @@ public:
   bool maybe_process_run_of_before_supernode_enodes (exploded_node *node);
   void process_node (exploded_node *node);
 
+  void create_dynamic_call (const gcall *call,
+                            tree fn_decl,
+                            exploded_node *node,
+                            program_state next_state,
+                            program_point &next_point,
+                            uncertainty_t *uncertainty);
+
   exploded_node *get_or_create_node (const program_point &point,
 				     const program_state &state,
 				     exploded_node *enode_for_diag);
diff --git a/gcc/analyzer/program-point.cc b/gcc/analyzer/program-point.cc
index 2e8d98ada2a..cb2b4e052cf 100644
--- a/gcc/analyzer/program-point.cc
+++ b/gcc/analyzer/program-point.cc
@@ -323,6 +323,24 @@ program_point::to_json () const
   return point_obj;
 }
 
+/* Update the callstack to represent a call from caller to callee.
+
+   Genrally used to push a custom call to a perticular program point 
+   where we don't have a superedge representing the call.  */
+void
+program_point::push_to_call_stack (const supernode *caller,
+				   const supernode *callee)
+{
+  m_call_string.push_call (callee, caller);
+}
+
+/* Pop the topmost call from the current callstack.  */
+void
+program_point::pop_from_call_stack ()
+{
+  m_call_string.pop ();
+}
+
 /* Generate a hash value for this program_point.  */
 
 hashval_t
diff --git a/gcc/analyzer/program-point.h b/gcc/analyzer/program-point.h
index 5f86745cd1e..6bae29b23e8 100644
--- a/gcc/analyzer/program-point.h
+++ b/gcc/analyzer/program-point.h
@@ -293,7 +293,8 @@ public:
   }
 
   bool on_edge (exploded_graph &eg, const superedge *succ);
-
+  void push_to_call_stack (const supernode *caller, const supernode *callee);
+  void pop_from_call_stack ();
   void validate () const;
 
   /* For before_stmt, go to next stmt.  */
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 5bb86767873..eb05994f0ef 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1034,6 +1034,50 @@ program_state::on_edge (exploded_graph &eg,
   return true;
 }
 
+/* Update this program_state to reflect a call to function
+   represented by CALL_STMT.
+   currently used only when the call doesn't have a superedge representing 
+   the call ( like call via a function pointer )  */
+void
+program_state::push_call (exploded_graph &eg,
+      			  exploded_node *enode,
+      			  const gcall *call_stmt,
+      			  uncertainty_t *uncertainty)
+{
+  /* Update state.  */
+  const program_point &point = enode->get_point ();
+  const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
+
+  impl_region_model_context ctxt (eg, enode,
+          			  &enode->get_state (),
+          			  this,
+          			  uncertainty,
+          			  last_stmt);
+  m_region_model->update_for_gcall (call_stmt, &ctxt);
+}
+
+/* Update this program_state to reflect a return from function
+   call to which is represented by CALL_STMT.
+   currently used only when the call doesn't have a superedge representing 
+   the return */
+void
+program_state::returning_call (exploded_graph &eg,
+      			       exploded_node *enode,
+      			       const gcall *call_stmt,
+      			       uncertainty_t *uncertainty)
+{
+  /* Update state.  */
+  const program_point &point = enode->get_point ();
+  const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
+
+  impl_region_model_context ctxt (eg, enode,
+          			  &enode->get_state (),
+          			  this,
+          			  uncertainty,
+          			  last_stmt);
+  m_region_model->update_for_return_gcall (call_stmt, &ctxt);
+}
+
 /* Generate a simpler version of THIS, discarding state that's no longer
    relevant at POINT.
    The idea is that we're more likely to be able to consolidate
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index 8dee930665c..658dbb69075 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -218,6 +218,17 @@ public:
   void push_frame (const extrinsic_state &ext_state, function *fun);
   function * get_current_function () const;
 
+  void push_call (exploded_graph &eg,
+      		  exploded_node *enode,
+      		  const gcall *call_stmt,
+      		  uncertainty_t *uncertainty);
+
+  void returning_call (exploded_graph &eg,
+      		       exploded_node *enode,
+      		       const gcall *call_stmt,
+      		       uncertainty_t *uncertainty);
+
+
   bool on_edge (exploded_graph &eg,
 		exploded_node *enode,
 		const superedge *succ,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6d02c60449c..1e86d1f3bf8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3139,12 +3139,11 @@ region_model::maybe_update_for_edge (const superedge &edge,
    caller's frame.  */
 
 void
-region_model::update_for_call_superedge (const call_superedge &call_edge,
-					 region_model_context *ctxt)
+region_model::update_for_gcall (const gcall *call_stmt,
+				region_model_context *ctxt)
 {
   /* Build a vec of argument svalues, using the current top
      frame for resolving tree expressions.  */
-  const gcall *call_stmt = call_edge.get_call_stmt ();
   auto_vec<const svalue *> arg_svals (gimple_call_num_args (call_stmt));
 
   for (unsigned i = 0; i < gimple_call_num_args (call_stmt); i++)
@@ -3153,33 +3152,58 @@ region_model::update_for_call_superedge (const call_superedge &call_edge,
       arg_svals.quick_push (get_rvalue (arg, ctxt));
     }
 
-  push_frame (call_edge.get_callee_function (), &arg_svals, ctxt);
+  /* Get the function * from the call.  */
+  tree fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+  function *fun = DECL_STRUCT_FUNCTION (fn_decl);
+  push_frame (fun, &arg_svals, ctxt);
 }
 
 /* Pop the top-most frame_region from the stack, and copy the return
    region's values (if any) into the region for the lvalue of the LHS of
    the call (if any).  */
+
 void
-region_model::update_for_return_superedge (const return_superedge &return_edge,
-					   region_model_context *ctxt)
+region_model::update_for_return_gcall (const gcall *call_stmt,
+             			       region_model_context *ctxt)
 {
   /* Get the region for the result of the call, within the caller frame.  */
   const region *result_dst_reg = NULL;
-  const gcall *call_stmt = return_edge.get_call_stmt ();
   tree lhs = gimple_call_lhs (call_stmt);
   if (lhs)
     {
       /* Normally we access the top-level frame, which is:
-	   path_var (expr, get_stack_depth () - 1)
-	 whereas here we need the caller frame, hence "- 2" here.  */
+         path_var (expr, get_stack_depth () - 1)
+         whereas here we need the caller frame, hence "- 2" here.  */
       gcc_assert (get_stack_depth () >= 2);
       result_dst_reg = get_lvalue (path_var (lhs, get_stack_depth () - 2),
-				   ctxt);
+           			   ctxt);
     }
 
   pop_frame (result_dst_reg, NULL, ctxt);
 }
 
+/* Extract calling infromation from the superedge and update the model for the 
+   call  */
+
+void
+region_model::update_for_call_superedge (const call_superedge &call_edge,
+					 region_model_context *ctxt)
+{
+  const gcall *call_stmt = call_edge.get_call_stmt ();
+  update_for_gcall (call_stmt,ctxt);
+}
+
+/* Extract calling infromation from the return superedge and update the model 
+   for the returning call */
+
+void
+region_model::update_for_return_superedge (const return_superedge &return_edge,
+					   region_model_context *ctxt)
+{
+  const gcall *call_stmt = return_edge.get_call_stmt ();
+  update_for_return_gcall (call_stmt, ctxt);
+}
+
 /* Update this region_model with a summary of the effect of calling
    and returning from CG_SEDGE.
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 734ec601237..a15bc9e2f6d 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -589,6 +589,12 @@ class region_model
 			      region_model_context *ctxt,
 			      rejected_constraint **out);
 
+  void update_for_gcall (const gcall *call_stmt,
+			 region_model_context *ctxt);
+  
+  void update_for_return_gcall (const gcall *call_stmt,
+				region_model_context *ctxt);
+
   const region *push_frame (function *fun, const vec<const svalue *> *arg_sids,
 			    region_model_context *ctxt);
   const frame_region *get_current_frame () const { return m_current_frame; }
diff --git a/gcc/analyzer/state-purge.cc b/gcc/analyzer/state-purge.cc
index e82ea87e735..c4050c2cf9a 100644
--- a/gcc/analyzer/state-purge.cc
+++ b/gcc/analyzer/state-purge.cc
@@ -377,18 +377,32 @@ state_purge_per_ssa_name::process_point (const function_point &point,
 	  {
 	    /* Add any intraprocedually edge for a call.  */
 	    if (snode->m_returning_call)
-	      {
-		cgraph_edge *cedge
+	    {
+	      gcall *returning_call = snode->m_returning_call;
+	      cgraph_edge *cedge
 		  = supergraph_call_edge (snode->m_fun,
-					  snode->m_returning_call);
-		gcc_assert (cedge);
-		superedge *sedge
-		  = map.get_sg ().get_intraprocedural_edge_for_call (cedge);
-		gcc_assert (sedge);
-		add_to_worklist
-		  (function_point::after_supernode (sedge->m_src),
-		   worklist, logger);
-	      }
+					  returning_call);
+	      if(!cedge)
+	        {
+	          supernode *callernode 
+	            = map.get_sg ().get_supernode_for_stmt (returning_call);
+
+	          gcc_assert (callernode);
+	          add_to_worklist 
+	            (function_point::after_supernode (callernode),
+		     worklist, logger);
+	        }
+	      else
+	        {
+		  gcc_assert (cedge);
+		  superedge *sedge
+		    = map.get_sg ().get_intraprocedural_edge_for_call (cedge);
+		  gcc_assert (sedge);
+		  add_to_worklist 
+		    (function_point::after_supernode (sedge->m_src),
+		     worklist, logger);
+	         }
+	    }
 	  }
       }
       break;
diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index 8611d0f8689..598965dd7fc 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -183,11 +183,33 @@ supergraph::supergraph (logger *logger)
 	      m_stmt_to_node_t.put (stmt, node_for_stmts);
 	      m_stmt_uids.make_uid_unique (stmt);
 	      if (cgraph_edge *edge = supergraph_call_edge (fun, stmt))
-		{
-		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
-		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt), NULL);
-		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
-		}
+    		{
+    		  m_cgraph_edge_to_caller_prev_node.put(edge, node_for_stmts);
+    		  node_for_stmts = add_node (fun, bb, as_a <gcall *> (stmt),
+    		   			     NULL);
+    		  m_cgraph_edge_to_caller_next_node.put (edge, node_for_stmts);
+    		}
+	       else
+	        {
+	          // maybe call is via a function pointer
+	          if (gcall *call = dyn_cast<gcall *> (stmt))
+	          {
+	            cgraph_edge *edge 
+		      = cgraph_node::get (fun->decl)->get_edge (stmt);
+	            if (!edge || !edge->callee)
+	            {
+	              supernode *old_node_for_stmts = node_for_stmts;
+	              node_for_stmts = add_node (fun, bb, call, NULL);
+
+	              superedge *sedge 
+	                = new callgraph_superedge (old_node_for_stmts,
+	                  			   node_for_stmts,
+	                  			   SUPEREDGE_INTRAPROCEDURAL_CALL,
+	                  			   NULL);
+	              add_edge (sedge);
+	            }
+	          }
+	        }
 	    }
 
 	  m_bb_to_final_node.put (bb, node_for_stmts);
diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
index f4090fd5e0e..6ca8b418224 100644
--- a/gcc/analyzer/supergraph.h
+++ b/gcc/analyzer/supergraph.h
@@ -268,6 +268,11 @@ class supernode : public dnode<supergraph_traits>
     return i;
   }
 
+  gcall *get_returning_call () const
+  {
+    return m_returning_call;
+  }
+
   gimple *get_last_stmt () const
   {
     if (m_stmts.length () == 0)
diff --git a/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
new file mode 100644
index 00000000000..c62510c026f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/function-ptr-4.c
@@ -0,0 +1,25 @@
+/*Test to see if the analyzer detect and analyze calls via 
+  fucntion pointers or not. */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+void fun(int *int_ptr)
+{
+	free(int_ptr); /* { dg-warning "double-'free' of 'int_ptr'" } */
+}
+
+void single_call()
+{
+	int *int_ptr = (int*)malloc(sizeof(int));
+	void (*fun_ptr)(int *) = &fun;
+	(*fun_ptr)(int_ptr);
+}
+
+void double_call()
+{
+	int *int_ptr = (int*)malloc(sizeof(int));
+	void (*fun_ptr)(int *) = &fun;
+	(*fun_ptr)(int_ptr);
+	(*fun_ptr)(int_ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr100546.c b/gcc/testsuite/gcc.dg/analyzer/pr100546.c
new file mode 100644
index 00000000000..3349d4067af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr100546.c
@@ -0,0 +1,17 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+static void noReturn(const char *str) __attribute__((noreturn));
+static void noReturn(const char *str) {
+    printf("%s\n", str);
+    exit(1);
+}
+
+void (*noReturnPtr)(const char *str) = &noReturn;
+
+int main(int argc, char **argv) {
+    char *str = 0;
+    if (!str)
+        noReturnPtr(__FILE__);
+    return printf("%c\n", *str);
+}
-- 
2.32.0


From 7746e93ec6e7248eded75b62e63cc3a882fbb66f Mon Sep 17 00:00:00 2001
From: Ankur Saini <arsenic@sourceware.org>
Date: Thu, 29 Jul 2021 17:20:10 +0530
Subject: [PATCH 2/2] analyzer: detect and analyze vfunc calls

---
 gcc/analyzer/engine.cc        | 38 +++++++++++-----
 gcc/analyzer/exploded-graph.h |  3 +-
 gcc/analyzer/program-state.cc |  5 ++-
 gcc/analyzer/program-state.h  |  3 +-
 gcc/analyzer/region-model.cc  | 83 +++++++++++++++++++++++++++++++++--
 gcc/analyzer/region-model.h   |  3 +-
 6 files changed, 114 insertions(+), 21 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 60bd46abdff..dd3433a0f11 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3037,12 +3037,14 @@ void
 exploded_graph::create_dynamic_call (const gcall *call,
                                      tree fn_decl,
                                      exploded_node *node,
-                                     program_state &next_state,
+                                     program_state next_state,
                                      program_point &next_point,
-                                     uncertainty_t *uncertainty)
+                                     uncertainty_t *uncertainty,
+                                     logger *logger)
 {
+  LOG_FUNC (logger);
+
   const program_point *this_point = &node->get_point ();
-  // assert for fn_decl ?
   function *fun = DECL_STRUCT_FUNCTION (fn_decl);
   if (fun)
     {
@@ -3057,12 +3059,16 @@ exploded_graph::create_dynamic_call (const gcall *call,
 
       new_point.push_to_call_stack (sn_exit,
   				    next_point.get_supernode());
-      next_state.push_call (*this, node, call, uncertainty);
+      next_state.push_call (*this, node, call, uncertainty, fn_decl);
 
-      // TODO: add some logging here regarding dynamic call
-      
       if (next_state.m_valid)
         {
+          if (logger)
+            logger->log ("Discovered call to %s [SN: %i -> SN: %i]",
+            		  function_name(fun),
+		          this_point->get_supernode ()->m_index,
+		          sn_entry->m_index);
+
           exploded_node *enode = get_or_create_node (new_point,
 					             next_state,
 					             node);
@@ -3312,7 +3318,10 @@ exploded_graph::process_node (exploded_node *node)
 	    program_state next_state (state);
 	    uncertainty_t uncertainty;
 
-	    /* Try to discover and analyse indirect function calls. */
+	    /* Try to discover and analyse indirect function calls.
+
+	       Some examples of such calls are virtual function calls
+	       and calls that happen via a function pointer.  */
             if (succ->m_kind == SUPEREDGE_INTRAPROCEDURAL_CALL
             	&& !(succ->get_any_callgraph_edge ()))
               {
@@ -3327,11 +3336,16 @@ exploded_graph::process_node (exploded_node *node)
                                                 point.get_stmt());
 
                 region_model *model = state.m_region_model;
-
-                /* Call is possibly happening via a function pointer.  */
-                if (tree fn_decl = model->get_fndecl_for_call(call,&ctxt))
-                  create_dynamic_call (call, fn_decl, node, next_state,
-                                       next_point, &uncertainty);
+                if (tree fn_decl = model->get_fndecl_for_call (call,&ctxt))
+                  {
+                    create_dynamic_call (call,
+                                         fn_decl,
+                                         node,
+                                         next_state,
+                                         next_point,
+                                         &uncertainty,
+                                         logger);
+                  }
               }
 
 	    if (!node->on_edge (*this, succ, &next_point, &next_state,
diff --git a/gcc/analyzer/exploded-graph.h b/gcc/analyzer/exploded-graph.h
index 291983ad7dc..c9edb9a4abf 100644
--- a/gcc/analyzer/exploded-graph.h
+++ b/gcc/analyzer/exploded-graph.h
@@ -818,7 +818,8 @@ public:
                             exploded_node *node,
                             program_state next_state,
                             program_point &next_point,
-                            uncertainty_t *uncertainty);
+                            uncertainty_t *uncertainty,
+                            logger *logger);
 
   exploded_node *get_or_create_node (const program_point &point,
 				     const program_state &state,
diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index eb05994f0ef..b1c1d67bb8b 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -1042,7 +1042,8 @@ void
 program_state::push_call (exploded_graph &eg,
       			  exploded_node *enode,
       			  const gcall *call_stmt,
-      			  uncertainty_t *uncertainty)
+      			  uncertainty_t *uncertainty,
+      			  tree fn_decl)
 {
   /* Update state.  */
   const program_point &point = enode->get_point ();
@@ -1053,7 +1054,7 @@ program_state::push_call (exploded_graph &eg,
           			  this,
           			  uncertainty,
           			  last_stmt);
-  m_region_model->update_for_gcall (call_stmt, &ctxt);
+  m_region_model->update_for_gcall(call_stmt, &ctxt, fn_decl);
 }
 
 /* Update this program_state to reflect a return from function
diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h
index 658dbb69075..f4bd4cbcf49 100644
--- a/gcc/analyzer/program-state.h
+++ b/gcc/analyzer/program-state.h
@@ -221,7 +221,8 @@ public:
   void push_call (exploded_graph &eg,
       		  exploded_node *enode,
       		  const gcall *call_stmt,
-      		  uncertainty_t *uncertainty);
+      		  uncertainty_t *uncertainty,
+      		  tree fn_decl = NULL);
 
   void returning_call (exploded_graph &eg,
       		       exploded_node *enode,
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 1e86d1f3bf8..300ed3f17d0 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "attribs.h"
 #include "tree-object-size.h"
+#include "ipa-utils.h"
 
 #if ENABLE_ANALYZER
 
@@ -3140,7 +3141,8 @@ region_model::maybe_update_for_edge (const superedge &edge,
 
 void
 region_model::update_for_gcall (const gcall *call_stmt,
-				region_model_context *ctxt)
+				region_model_context *ctxt,
+				tree fn_decl)
 {
   /* Build a vec of argument svalues, using the current top
      frame for resolving tree expressions.  */
@@ -3151,9 +3153,11 @@ region_model::update_for_gcall (const gcall *call_stmt,
       tree arg = gimple_call_arg (call_stmt, i);
       arg_svals.quick_push (get_rvalue (arg, ctxt));
     }
-
-  /* Get the function * from the call.  */
-  tree fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+  
+  /* Get the fn_decl from the call if not provided as argument.  */
+  if (!fn_decl)
+    fn_decl = get_fndecl_for_call (call_stmt,ctxt);
+ 
   function *fun = DECL_STRUCT_FUNCTION (fn_decl);
   push_frame (fun, &arg_svals, ctxt);
 }
@@ -3654,6 +3658,77 @@ region_model::get_fndecl_for_call (const gcall *call,
 	}
     }
 
+  /* Call is possibly a polymorphic call.
+  
+     In such case, use devirtisation tools to find 
+     possible callees of this function call.  */
+  
+  function *fun = get_current_function ();
+  gcall *stmt  = const_cast<gcall *> (call);
+  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
+  if (e->indirect_info->polymorphic)
+  {
+    void *cache_token;
+    bool final;
+    vec <cgraph_node *> targets
+      = possible_polymorphic_call_targets (e, &final, &cache_token, true);
+    if (!targets.is_empty ())
+      {
+        tree most_propbable_taget = NULL_TREE;
+        if(targets.length () == 1)
+    	    return targets[0]->decl;
+    
+        /* From the current state, check which subclass the pointer that 
+           is being used to this polymorphic call points to, and use to
+           filter out correct function call.  */
+        tree t_val = gimple_call_arg (call, 0);
+        const svalue *sval = get_rvalue (t_val, ctxt);
+
+        const region *reg
+          = [&]()->const region *
+              {
+                switch (sval->get_kind ())
+                  {
+                    case SK_INITIAL:
+                      {
+                        const initial_svalue *initial_sval
+                          = sval->dyn_cast_initial_svalue ();
+                        return initial_sval->get_region ();
+                      }
+                      break;
+                    case SK_REGION:
+                      {
+                        const region_svalue *region_sval 
+                          = sval->dyn_cast_region_svalue ();
+                        return region_sval->get_pointee ();
+                      }
+                      break;
+
+                    default:
+                      return NULL;
+                  }
+              } ();
+
+        gcc_assert (reg);
+
+        tree known_possible_subclass_type;
+        known_possible_subclass_type = reg->get_type ();
+        if (reg->get_kind () == RK_FIELD)
+          {
+             const field_region* field_reg = reg->dyn_cast_field_region ();
+             known_possible_subclass_type 
+               = DECL_CONTEXT (field_reg->get_field ());
+          }
+
+        for (cgraph_node *x : targets)
+          {
+            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
+              most_propbable_taget = x->decl;
+          }
+        return most_propbable_taget;
+      }
+   }
+
   return NULL_TREE;
 }
 
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index a15bc9e2f6d..f06788771dc 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -590,7 +590,8 @@ class region_model
 			      rejected_constraint **out);
 
   void update_for_gcall (const gcall *call_stmt,
-			 region_model_context *ctxt);
+			 region_model_context *ctxt,
+			 tree fn_decl = NULL);
   
   void update_for_return_gcall (const gcall *call_stmt,
 				region_model_context *ctxt);
-- 
2.32.0


[-- Attachment #3: Type: text/plain, Size: 20 bytes --]




Thank you
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-08-04 16:02                                                                   ` Ankur Saini
@ 2021-08-04 23:26                                                                     ` David Malcolm
  2021-08-05 14:57                                                                       ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-08-04 23:26 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:

[...snip...]
> 
> - From observation, a typical vfunc call that isn't devirtualised by
> the compiler's front end looks something like this 
> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> where "a_ptr_5(D)" is pointer that is being used to call the virtual
> function.
> 
> - We can access it's region to see what is the type of the object the
> pointer is actually pointing to.
> 
> - This is then used to find a call with DECL_CONTEXT of the object
> from the all the possible targets of that polymorphic call.

[...]

> 
> Patch file ( prototype ) : 
> 

> +  /* Call is possibly a polymorphic call.
> +  
> +     In such case, use devirtisation tools to find 
> +     possible callees of this function call.  */
> +  
> +  function *fun = get_current_function ();
> +  gcall *stmt  = const_cast<gcall *> (call);
> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> +  if (e->indirect_info->polymorphic)
> +  {
> +    void *cache_token;
> +    bool final;
> +    vec <cgraph_node *> targets
> +      = possible_polymorphic_call_targets (e, &final, &cache_token, true);
> +    if (!targets.is_empty ())
> +      {
> +        tree most_propbable_taget = NULL_TREE;
> +        if(targets.length () == 1)
> +    	    return targets[0]->decl;
> +    
> +        /* From the current state, check which subclass the pointer that 
> +           is being used to this polymorphic call points to, and use to
> +           filter out correct function call.  */
> +        tree t_val = gimple_call_arg (call, 0);

Maybe rename to "this_expr"?


> +        const svalue *sval = get_rvalue (t_val, ctxt);

and "this_sval"?

...assuming that that's what the value is.

Probably should reject the case where there are zero arguments.


> +
> +        const region *reg
> +          = [&]()->const region *
> +              {
> +                switch (sval->get_kind ())
> +                  {
> +                    case SK_INITIAL:
> +                      {
> +                        const initial_svalue *initial_sval
> +                          = sval->dyn_cast_initial_svalue ();
> +                        return initial_sval->get_region ();
> +                      }
> +                      break;
> +                    case SK_REGION:
> +                      {
> +                        const region_svalue *region_sval 
> +                          = sval->dyn_cast_region_svalue ();
> +                        return region_sval->get_pointee ();
> +                      }
> +                      break;
> +
> +                    default:
> +                      return NULL;
> +                  }
> +              } ();
 
I think the above should probably be a subroutine.

That said, it's not clear to me what it's doing, or that this is correct.

I'm guessing that you need to see if
  *((void **)this)
is a vtable pointer (or something like that), and, if so, which class
it is for.

Is there a way of getting the vtable pointer as an svalue?

> +        gcc_assert (reg);
> +
> +        tree known_possible_subclass_type;
> +        known_possible_subclass_type = reg->get_type ();
> +        if (reg->get_kind () == RK_FIELD)
> +          {
> +             const field_region* field_reg = reg->dyn_cast_field_region ();
> +             known_possible_subclass_type 
> +               = DECL_CONTEXT (field_reg->get_field ());
> +          }
> +
> +        for (cgraph_node *x : targets)
> +          {
> +            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
> +              most_propbable_taget = x->decl;
> +          }
> +        return most_propbable_taget;
> +      }
> +   }
> +
>    return NULL_TREE;
>  }

Dave



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-08-04 23:26                                                                     ` David Malcolm
@ 2021-08-05 14:57                                                                       ` Ankur Saini
  2021-08-05 23:09                                                                         ` David Malcolm
  0 siblings, 1 reply; 45+ messages in thread
From: Ankur Saini @ 2021-08-05 14:57 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> 
> [...snip...]
>> 
>> - From observation, a typical vfunc call that isn't devirtualised by
>> the compiler's front end looks something like this 
>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
>> where "a_ptr_5(D)" is pointer that is being used to call the virtual
>> function.
>> 
>> - We can access it's region to see what is the type of the object the
>> pointer is actually pointing to.
>> 
>> - This is then used to find a call with DECL_CONTEXT of the object
>> from the all the possible targets of that polymorphic call.
> 
> [...]
> 
>> 
>> Patch file ( prototype ) : 
>> 
> 
>> +  /* Call is possibly a polymorphic call.
>> +  
>> +     In such case, use devirtisation tools to find 
>> +     possible callees of this function call.  */
>> +  
>> +  function *fun = get_current_function ();
>> +  gcall *stmt  = const_cast<gcall *> (call);
>> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>> +  if (e->indirect_info->polymorphic)
>> +  {
>> +    void *cache_token;
>> +    bool final;
>> +    vec <cgraph_node *> targets
>> +      = possible_polymorphic_call_targets (e, &final, &cache_token, true);
>> +    if (!targets.is_empty ())
>> +      {
>> +        tree most_propbable_taget = NULL_TREE;
>> +        if(targets.length () == 1)
>> +    	    return targets[0]->decl;
>> +    
>> +        /* From the current state, check which subclass the pointer that 
>> +           is being used to this polymorphic call points to, and use to
>> +           filter out correct function call.  */
>> +        tree t_val = gimple_call_arg (call, 0);
> 
> Maybe rename to "this_expr"?
> 
> 
>> +        const svalue *sval = get_rvalue (t_val, ctxt);
> 
> and "this_sval"?

ok

> 
> ...assuming that that's what the value is.
> 
> Probably should reject the case where there are zero arguments.

Ideally it should always have one argument representing the pointer used to call the function. 

for example, if the function is called like this : -

a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a pointer to an object of a subclass.

I saw that it’s GIMPLE representation is as follows : -

OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);

> 
> 
>> +
>> +        const region *reg
>> +          = [&]()->const region *
>> +              {
>> +                switch (sval->get_kind ())
>> +                  {
>> +                    case SK_INITIAL:
>> +                      {
>> +                        const initial_svalue *initial_sval
>> +                          = sval->dyn_cast_initial_svalue ();
>> +                        return initial_sval->get_region ();
>> +                      }
>> +                      break;
>> +                    case SK_REGION:
>> +                      {
>> +                        const region_svalue *region_sval 
>> +                          = sval->dyn_cast_region_svalue ();
>> +                        return region_sval->get_pointee ();
>> +                      }
>> +                      break;
>> +
>> +                    default:
>> +                      return NULL;
>> +                  }
>> +              } ();
> 
> I think the above should probably be a subroutine.
> 
> That said, it's not clear to me what it's doing, or that this is correct.


Sorry, I think I should have explained it earlier.

Let's take an example code snippet :- 

Derived d;
Base *base_ptr;
base_ptr = &d;
base_ptr->foo();	// where foo() is a virtual function

This genertes the following GIMPLE dump :- 

Derived::Derived (&d);
base_ptr_6 = &d.D.3779;
_1 = base_ptr_6->_vptr.Base;
_2 = _1 + 8;
_3 = *_2;
OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

Here instead of trying to extract virtual pointer from the call and see which subclass it belongs, I found it simpler to extract the actual pointer which is used to call the function itself (which from observation, is always the first parameter of the call) and used the region model at that point to figure out what is the type of the object it actually points to ultimately get the actual subclass who's function is being called here. :)

Now let me try to explain how I actually executed it ( A lot of assumptions here are based on observation, so please correct me wherever you think I made a false interpretation or forgot about a certain special case ) :

- once it is confirmed that the call that we are dealing with is a polymorphic call ( via the cgraph edge representing the call ), I used the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined in ipa-devirt.c ), to get the possible callee of that call. 

  function *fun = get_current_function ();
  gcall *stmt  = const_cast<gcall *> (call);
  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
  if (e->indirect_info->polymorphic)
  {
    void *cache_token;
    bool final;
    vec <cgraph_node *> targets
      = possible_polymorphic_call_targets (e, &final, &cache_token, true);

- Now if the list contains more than one targets, I will make use of the current enode's region model to get more info about the pointer which was used to call the function .

    	/* here I extract the pointer (which was used to call the function), which from observation, is always the zeroth argument of the call.  */
        tree t_val = gimple_call_arg (call, 0);
        const svalue *sval = get_rvalue (t_val, ctxt);

- In all the examples I used, the pointer is represented as region_svalue or as initial_svalue (I think, initial_svalue is the case where the pointer is taken as a parameter of the current function and analyzer is analysing top-level call to this function )

Here are some examples of the following, Where I used __analyzer_describe () to show the same 
 . (https://godbolt.org/z/Mqs8oM6ff)
 . (https://godbolt.org/z/z4sfTM3f5))

 	/* here I extract the region that the pointer is pointing to, and as both of them returns a (const region *), I used a lambda to get it ( If you want, I can turn this into a separate function to make it more readable )  */

        const region *reg
          = [&]()->const region *
              {
                switch (sval->get_kind ())
                  {
                    case SK_INITIAL:
                      {
                        const initial_svalue *initial_sval
                          = sval->dyn_cast_initial_svalue ();
                        return initial_sval->get_region ();
                      }
                      break;
                    case SK_REGION:
                      {
                        const region_svalue *region_sval 
                          = sval->dyn_cast_region_svalue ();
                        return region_sval->get_pointee ();
                      }
                      break;

                    default:
                      return NULL;
                  }
              } ();

        gcc_assert (reg);

        /* Now that I have the region, I tried to get the type of the object it is holding and put it in ‘known_possible_subclass_type’.  */

        tree known_possible_subclass_type;
        known_possible_subclass_type = reg->get_type ();
        if (reg->get_kind () == RK_FIELD)
          {
             const field_region* field_reg = reg->dyn_cast_field_region ();
             known_possible_subclass_type 
               = DECL_CONTEXT (field_reg->get_field ());
          }

/* After that I iterated over the entire array of possible calls to find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same as that of the type of the object that the pointer is actually pointing to.  */

        for (cgraph_node *x : targets)
          {
            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
              most_propbable_taget = x->decl;
          }
        return most_propbable_taget;
      }
   }

I tested it on all of the test programs I created and till now in all of the cases, the analyzer is correctly determining the call. I am currently in the process of creating more tests ( including multiple types of inheritances ) to see how successful is this implementation .

> 
> I'm guessing that you need to see if
>  *((void **)this)
> is a vtable pointer (or something like that), and, if so, which class
> it is for.
> 
> Is there a way of getting the vtable pointer as an svalue?
> 
>> +        gcc_assert (reg);
>> +
>> +        tree known_possible_subclass_type;
>> +        known_possible_subclass_type = reg->get_type ();
>> +        if (reg->get_kind () == RK_FIELD)
>> +          {
>> +             const field_region* field_reg = reg->dyn_cast_field_region ();
>> +             known_possible_subclass_type 
>> +               = DECL_CONTEXT (field_reg->get_field ());
>> +          }
>> +
>> +        for (cgraph_node *x : targets)
>> +          {
>> +            if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>> +              most_propbable_taget = x->decl;
>> +          }
>> +        return most_propbable_taget;
>> +      }
>> +   }
>> +
>>   return NULL_TREE;
>> }
> 
> Dave
> 
> 

Thanks 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-08-05 14:57                                                                       ` Ankur Saini
@ 2021-08-05 23:09                                                                         ` David Malcolm
  2021-08-06 15:41                                                                           ` Ankur Saini
  0 siblings, 1 reply; 45+ messages in thread
From: David Malcolm @ 2021-08-05 23:09 UTC (permalink / raw)
  To: Ankur Saini; +Cc: gcc

On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
> 
> 
> > On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com>
> > wrote:
> > 
> > On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
> > 
> > [...snip...]
> > > 
> > > - From observation, a typical vfunc call that isn't devirtualised
> > > by
> > > the compiler's front end looks something like this 
> > > "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
> > > where "a_ptr_5(D)" is pointer that is being used to call the
> > > virtual
> > > function.
> > > 
> > > - We can access it's region to see what is the type of the object
> > > the
> > > pointer is actually pointing to.
> > > 
> > > - This is then used to find a call with DECL_CONTEXT of the object
> > > from the all the possible targets of that polymorphic call.
> > 
> > [...]
> > 
> > > 
> > > Patch file ( prototype ) : 
> > > 
> > 
> > > +  /* Call is possibly a polymorphic call.
> > > +  
> > > +     In such case, use devirtisation tools to find 
> > > +     possible callees of this function call.  */
> > > +  
> > > +  function *fun = get_current_function ();
> > > +  gcall *stmt  = const_cast<gcall *> (call);
> > > +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
> > > +  if (e->indirect_info->polymorphic)
> > > +  {
> > > +    void *cache_token;
> > > +    bool final;
> > > +    vec <cgraph_node *> targets
> > > +      = possible_polymorphic_call_targets (e, &final,
> > > &cache_token, true);
> > > +    if (!targets.is_empty ())
> > > +      {
> > > +        tree most_propbable_taget = NULL_TREE;
> > > +        if(targets.length () == 1)
> > > +           return targets[0]->decl;
> > > +    
> > > +        /* From the current state, check which subclass the
> > > pointer that 
> > > +           is being used to this polymorphic call points to, and
> > > use to
> > > +           filter out correct function call.  */
> > > +        tree t_val = gimple_call_arg (call, 0);
> > 
> > Maybe rename to "this_expr"?
> > 
> > 
> > > +        const svalue *sval = get_rvalue (t_val, ctxt);
> > 
> > and "this_sval"?
> 
> ok
> 
> > 
> > ...assuming that that's what the value is.
> > 
> > Probably should reject the case where there are zero arguments.
> 
> Ideally it should always have one argument representing the pointer
> used to call the function. 
> 
> for example, if the function is called like this : -
> 
> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
> pointer to an object of a subclass.
> 
> I saw that it’s GIMPLE representation is as follows : -
> 
> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
> 
> > 
> > 
> > > +
> > > +        const region *reg
> > > +          = [&]()->const region *
> > > +              {
> > > +                switch (sval->get_kind ())
> > > +                  {
> > > +                    case SK_INITIAL:
> > > +                      {
> > > +                        const initial_svalue *initial_sval
> > > +                          = sval->dyn_cast_initial_svalue ();
> > > +                        return initial_sval->get_region ();
> > > +                      }
> > > +                      break;
> > > +                    case SK_REGION:
> > > +                      {
> > > +                        const region_svalue *region_sval 
> > > +                          = sval->dyn_cast_region_svalue ();
> > > +                        return region_sval->get_pointee ();
> > > +                      }
> > > +                      break;
> > > +
> > > +                    default:
> > > +                      return NULL;
> > > +                  }
> > > +              } ();
> > 
> > I think the above should probably be a subroutine.
> > 
> > That said, it's not clear to me what it's doing, or that this is
> > correct.
> 
> 
> Sorry, I think I should have explained it earlier.
> 
> Let's take an example code snippet :- 
> 
> Derived d;
> Base *base_ptr;
> base_ptr = &d;
> base_ptr->foo();        // where foo() is a virtual function
> 
> This genertes the following GIMPLE dump :- 
> 
> Derived::Derived (&d);
> base_ptr_6 = &d.D.3779;
> _1 = base_ptr_6->_vptr.Base;
> _2 = _1 + 8;
> _3 = *_2;
> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);

I did a bit of playing with this example, and tried adding:

1876	    case OBJ_TYPE_REF:
1877	      gcc_unreachable ();
1878	      break;

to region_model::get_rvalue_1, and running cc1plus under the debugger.

The debugger hits the "gcc_unreachable ();", at this stmt:

     OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);

Looking at the region_model with region_model::debug() shows:

(gdb) call debug()
stack depth: 1
  frame (index 0): frame: ‘test’@1
clusters within frame: ‘test’@1
  cluster for: Derived d
    key:   {bytes 0-7}
    value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
  cluster for: base_ptr_5: &Derived d.<anonymous>
  cluster for: _2: &‘foo’
m_called_unknown_fn: FALSE
constraint_manager:
  equiv classes:
    ec0: {&Derived d.<anonymous>}
    ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
    ec2: {(void *)0B == [m_constant]‘0B’}
    ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
  constraints:
    0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
    1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void *)0B == [m_constant]‘0B’}
    2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} != ec2: {(void *)0B == [m_constant]‘0B’}

i.e. it already "knows" that _2 is &'foo' for Derived::foo.

So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
function pointer directly from the vtable for such cases, so something
like:

    case OBJ_TYPE_REF:
	{
	   tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
	   return get_rvalue (expr, ctxt); 
	}
	break;

might get the function pointer.

(caveat: untested code)

> 
> Here instead of trying to extract virtual pointer from the call and see
> which subclass it belongs, I found it simpler to extract the actual
> pointer which is used to call the function itself (which from
> observation, is always the first parameter of the call) and used the
> region model at that point to figure out what is the type of the object
> it actually points to ultimately get the actual subclass who's function
> is being called here. :)
> 
> Now let me try to explain how I actually executed it ( A lot of
> assumptions here are based on observation, so please correct me
> wherever you think I made a false interpretation or forgot about a
> certain special case ) :
> 
> - once it is confirmed that the call that we are dealing with is a
> polymorphic call ( via the cgraph edge representing the call ), I used
> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
> in ipa-devirt.c ), to get the possible callee of that call. 
> 
>   function *fun = get_current_function ();
>   gcall *stmt  = const_cast<gcall *> (call);
>   cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>   if (e->indirect_info->polymorphic)
>   {
>     void *cache_token;
>     bool final;
>     vec <cgraph_node *> targets
>       = possible_polymorphic_call_targets (e, &final, &cache_token,
> true);
> 
> - Now if the list contains more than one targets, I will make use of
> the current enode's region model to get more info about the pointer
> which was used to call the function .
> 
>         /* here I extract the pointer (which was used to call the
> function), which from observation, is always the zeroth argument of the
> call.  */
>         tree t_val = gimple_call_arg (call, 0);
>         const svalue *sval = get_rvalue (t_val, ctxt);
> 
> - In all the examples I used, the pointer is represented as
> region_svalue or as initial_svalue (I think, initial_svalue is the case
> where the pointer is taken as a parameter of the current function and
> analyzer is analysing top-level call to this function )
> 
> Here are some examples of the following, Where I used
> __analyzer_describe () to show the same 
>  . (https://godbolt.org/z/Mqs8oM6ff)
>  . (https://godbolt.org/z/z4sfTM3f5))
> 
>         /* here I extract the region that the pointer is pointing to,
> and as both of them returns a (const region *), I used a lambda to get
> it ( If you want, I can turn this into a separate function to make it
> more readable )  */
> 
>         const region *reg
>           = [&]()->const region *
>               {
>                 switch (sval->get_kind ())
>                   {
>                     case SK_INITIAL:
>                       {
>                         const initial_svalue *initial_sval
>                           = sval->dyn_cast_initial_svalue ();
>                         return initial_sval->get_region ();
>                       }
>                       break;
>                     case SK_REGION:
>                       {
>                         const region_svalue *region_sval 
>                           = sval->dyn_cast_region_svalue ();
>                         return region_sval->get_pointee ();
>                       }
>                       break;
> 
>                     default:
>                       return NULL;
>                   }
>               } ();
> 
>         gcc_assert (reg);
> 
>         /* Now that I have the region, I tried to get the type of the
> object it is holding and put it in ‘known_possible_subclass_type’.  */
> 
>         tree known_possible_subclass_type;
>         known_possible_subclass_type = reg->get_type ();
>         if (reg->get_kind () == RK_FIELD)
>           {
>              const field_region* field_reg = reg->dyn_cast_field_region
> ();
>              known_possible_subclass_type 
>                = DECL_CONTEXT (field_reg->get_field ());
>           }
> 
> /* After that I iterated over the entire array of possible calls to
> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
> as that of the type of the object that the pointer is actually pointing
> to.  */
> 
>         for (cgraph_node *x : targets)
>           {
>             if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>               most_propbable_taget = x->decl;
>           }
>         return most_propbable_taget;
>       }
>    }
> 
> I tested it on all of the test programs I created and till now in all
> of the cases, the analyzer is correctly determining the call. I am
> currently in the process of creating more tests ( including multiple
> types of inheritances ) to see how successful is this implementation .

I'm still skeptical of the above code; my feeling is that with more
tests you'll find cases where it doesn't work.  Maybe dynamically
allocated instances?

Hope this is constructive

Dave


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: daily report on extending static analyzer project [GSoC]
  2021-08-05 23:09                                                                         ` David Malcolm
@ 2021-08-06 15:41                                                                           ` Ankur Saini
  0 siblings, 0 replies; 45+ messages in thread
From: Ankur Saini @ 2021-08-06 15:41 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc



> On 06-Aug-2021, at 4:39 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> On Thu, 2021-08-05 at 20:27 +0530, Ankur Saini wrote:
>> 
>> 
>>> On 05-Aug-2021, at 4:56 AM, David Malcolm <dmalcolm@redhat.com>
>>> wrote:
>>> 
>>> On Wed, 2021-08-04 at 21:32 +0530, Ankur Saini wrote:
>>> 
>>> [...snip...]
>>>> 
>>>> - From observation, a typical vfunc call that isn't devirtualised
>>>> by
>>>> the compiler's front end looks something like this 
>>>> "OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5(D))"
>>>> where "a_ptr_5(D)" is pointer that is being used to call the
>>>> virtual
>>>> function.
>>>> 
>>>> - We can access it's region to see what is the type of the object
>>>> the
>>>> pointer is actually pointing to.
>>>> 
>>>> - This is then used to find a call with DECL_CONTEXT of the object
>>>> from the all the possible targets of that polymorphic call.
>>> 
>>> [...]
>>> 
>>>> 
>>>> Patch file ( prototype ) : 
>>>> 
>>> 
>>>> +  /* Call is possibly a polymorphic call.
>>>> +  
>>>> +     In such case, use devirtisation tools to find 
>>>> +     possible callees of this function call.  */
>>>> +  
>>>> +  function *fun = get_current_function ();
>>>> +  gcall *stmt  = const_cast<gcall *> (call);
>>>> +  cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>>>> +  if (e->indirect_info->polymorphic)
>>>> +  {
>>>> +    void *cache_token;
>>>> +    bool final;
>>>> +    vec <cgraph_node *> targets
>>>> +      = possible_polymorphic_call_targets (e, &final,
>>>> &cache_token, true);
>>>> +    if (!targets.is_empty ())
>>>> +      {
>>>> +        tree most_propbable_taget = NULL_TREE;
>>>> +        if(targets.length () == 1)
>>>> +           return targets[0]->decl;
>>>> +    
>>>> +        /* From the current state, check which subclass the
>>>> pointer that 
>>>> +           is being used to this polymorphic call points to, and
>>>> use to
>>>> +           filter out correct function call.  */
>>>> +        tree t_val = gimple_call_arg (call, 0);
>>> 
>>> Maybe rename to "this_expr"?
>>> 
>>> 
>>>> +        const svalue *sval = get_rvalue (t_val, ctxt);
>>> 
>>> and "this_sval"?
>> 
>> ok
>> 
>>> 
>>> ...assuming that that's what the value is.
>>> 
>>> Probably should reject the case where there are zero arguments.
>> 
>> Ideally it should always have one argument representing the pointer
>> used to call the function. 
>> 
>> for example, if the function is called like this : -
>> 
>> a_ptr->foo(arg);  // where foo() is a virtual function and a_ptr is a
>> pointer to an object of a subclass.
>> 
>> I saw that it’s GIMPLE representation is as follows : -
>> 
>> OBJ_TYPE_REF(_2;(struct A)a_ptr_5(D)->0) (a_ptr_5, arg);
>> 
>>> 
>>> 
>>>> +
>>>> +        const region *reg
>>>> +          = [&]()->const region *
>>>> +              {
>>>> +                switch (sval->get_kind ())
>>>> +                  {
>>>> +                    case SK_INITIAL:
>>>> +                      {
>>>> +                        const initial_svalue *initial_sval
>>>> +                          = sval->dyn_cast_initial_svalue ();
>>>> +                        return initial_sval->get_region ();
>>>> +                      }
>>>> +                      break;
>>>> +                    case SK_REGION:
>>>> +                      {
>>>> +                        const region_svalue *region_sval 
>>>> +                          = sval->dyn_cast_region_svalue ();
>>>> +                        return region_sval->get_pointee ();
>>>> +                      }
>>>> +                      break;
>>>> +
>>>> +                    default:
>>>> +                      return NULL;
>>>> +                  }
>>>> +              } ();
>>> 
>>> I think the above should probably be a subroutine.
>>> 
>>> That said, it's not clear to me what it's doing, or that this is
>>> correct.
>> 
>> 
>> Sorry, I think I should have explained it earlier.
>> 
>> Let's take an example code snippet :- 
>> 
>> Derived d;
>> Base *base_ptr;
>> base_ptr = &d;
>> base_ptr->foo();        // where foo() is a virtual function
>> 
>> This genertes the following GIMPLE dump :- 
>> 
>> Derived::Derived (&d);
>> base_ptr_6 = &d.D.3779;
>> _1 = base_ptr_6->_vptr.Base;
>> _2 = _1 + 8;
>> _3 = *_2;
>> OBJ_TYPE_REF(_3;(struct Base)base_ptr_6->1) (base_ptr_6);
> 
> I did a bit of playing with this example, and tried adding:
> 
> 1876	    case OBJ_TYPE_REF:
> 1877	      gcc_unreachable ();
> 1878	      break;
> 
> to region_model::get_rvalue_1, and running cc1plus under the debugger.
> 
> The debugger hits the "gcc_unreachable ();", at this stmt:
> 
>     OBJ_TYPE_REF(_2;(struct Base)base_ptr_5->0) (base_ptr_5);
> 
> Looking at the region_model with region_model::debug() shows:
> 
> (gdb) call debug()
> stack depth: 1
>  frame (index 0): frame: ‘test’@1
> clusters within frame: ‘test’@1
>  cluster for: Derived d
>    key:   {bytes 0-7}
>    value: ‘int (*) () *’ {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
>  cluster for: base_ptr_5: &Derived d.<anonymous>
>  cluster for: _2: &‘foo’
> m_called_unknown_fn: FALSE
> constraint_manager:
>  equiv classes:
>    ec0: {&Derived d.<anonymous>}
>    ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)}
>    ec2: {(void *)0B == [m_constant]‘0B’}
>    ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)}
>  constraints:
>    0: ec0: {&Derived d.<anonymous>} != ec2: {(void *)0B == [m_constant]‘0B’}
>    1: ec1: {&constexpr int (* Derived::_ZTV7Derived [3])(...)} != ec2: {(void *)0B == [m_constant]‘0B’}
>    2: ec3: {(&constexpr int (* Derived::_ZTV7Derived [3])(...)+(sizetype)16)} != ec2: {(void *)0B == [m_constant]‘0B’}
> 
> i.e. it already "knows" that _2 is &'foo' for Derived::foo.
> 
> So I think looking at OBJ_TYPE_REF_EXPR in the above case may give the
> function pointer directly from the vtable for such cases, so something
> like:
> 
>    case OBJ_TYPE_REF:
> 	{
> 	   tree expr = OBJ_TYPE_REF_EXPR (pv.m_tree);
> 	   return get_rvalue (expr, ctxt); 
> 	}
> 	break;
> 
> might get the function pointer.

I tried it, and yes, it works like a charm. Thanks : )

> 
> (caveat: untested code)
> 
>> 
>> Here instead of trying to extract virtual pointer from the call and see
>> which subclass it belongs, I found it simpler to extract the actual
>> pointer which is used to call the function itself (which from
>> observation, is always the first parameter of the call) and used the
>> region model at that point to figure out what is the type of the object
>> it actually points to ultimately get the actual subclass who's function
>> is being called here. :)
>> 
>> Now let me try to explain how I actually executed it ( A lot of
>> assumptions here are based on observation, so please correct me
>> wherever you think I made a false interpretation or forgot about a
>> certain special case ) :
>> 
>> - once it is confirmed that the call that we are dealing with is a
>> polymorphic call ( via the cgraph edge representing the call ), I used
>> the "possible_polymorphic_call_targets ()" from ipa-utils.h ( defined
>> in ipa-devirt.c ), to get the possible callee of that call. 
>> 
>>   function *fun = get_current_function ();
>>   gcall *stmt  = const_cast<gcall *> (call);
>>   cgraph_edge *e = cgraph_node::get (fun->decl)->get_edge (stmt);
>>   if (e->indirect_info->polymorphic)
>>   {
>>     void *cache_token;
>>     bool final;
>>     vec <cgraph_node *> targets
>>       = possible_polymorphic_call_targets (e, &final, &cache_token,
>> true);
>> 
>> - Now if the list contains more than one targets, I will make use of
>> the current enode's region model to get more info about the pointer
>> which was used to call the function .
>> 
>>         /* here I extract the pointer (which was used to call the
>> function), which from observation, is always the zeroth argument of the
>> call.  */
>>         tree t_val = gimple_call_arg (call, 0);
>>         const svalue *sval = get_rvalue (t_val, ctxt);
>> 
>> - In all the examples I used, the pointer is represented as
>> region_svalue or as initial_svalue (I think, initial_svalue is the case
>> where the pointer is taken as a parameter of the current function and
>> analyzer is analysing top-level call to this function )
>> 
>> Here are some examples of the following, Where I used
>> __analyzer_describe () to show the same 
>>  . (https://godbolt.org/z/Mqs8oM6ff)
>>  . (https://godbolt.org/z/z4sfTM3f5))
>> 
>>         /* here I extract the region that the pointer is pointing to,
>> and as both of them returns a (const region *), I used a lambda to get
>> it ( If you want, I can turn this into a separate function to make it
>> more readable )  */
>> 
>>         const region *reg
>>           = [&]()->const region *
>>               {
>>                 switch (sval->get_kind ())
>>                   {
>>                     case SK_INITIAL:
>>                       {
>>                         const initial_svalue *initial_sval
>>                           = sval->dyn_cast_initial_svalue ();
>>                         return initial_sval->get_region ();
>>                       }
>>                       break;
>>                     case SK_REGION:
>>                       {
>>                         const region_svalue *region_sval 
>>                           = sval->dyn_cast_region_svalue ();
>>                         return region_sval->get_pointee ();
>>                       }
>>                       break;
>> 
>>                     default:
>>                       return NULL;
>>                   }
>>               } ();
>> 
>>         gcc_assert (reg);
>> 
>>         /* Now that I have the region, I tried to get the type of the
>> object it is holding and put it in ‘known_possible_subclass_type’.  */
>> 
>>         tree known_possible_subclass_type;
>>         known_possible_subclass_type = reg->get_type ();
>>         if (reg->get_kind () == RK_FIELD)
>>           {
>>              const field_region* field_reg = reg->dyn_cast_field_region
>> ();
>>              known_possible_subclass_type 
>>                = DECL_CONTEXT (field_reg->get_field ());
>>           }
>> 
>> /* After that I iterated over the entire array of possible calls to
>> find the function which whose scope ( DECL_CONTEXT (fn_decl) ) is same
>> as that of the type of the object that the pointer is actually pointing
>> to.  */
>> 
>>         for (cgraph_node *x : targets)
>>           {
>>             if (DECL_CONTEXT (x->decl) == known_possible_subclass_type)
>>               most_propbable_taget = x->decl;
>>           }
>>         return most_propbable_taget;
>>       }
>>    }
>> 
>> I tested it on all of the test programs I created and till now in all
>> of the cases, the analyzer is correctly determining the call. I am
>> currently in the process of creating more tests ( including multiple
>> types of inheritances ) to see how successful is this implementation .
> 
> I'm still skeptical of the above code; my feeling is that with more
> tests you'll find cases where it doesn't work.  Maybe dynamically
> allocated instances?

That’s what I was thinking, and that’s why I wanted it to test on more programs, but looks like I don’t have need this anymore.

> 
> Hope this is constructive
> 
> Dave
> 

Thanks 
- Ankur

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2021-08-06 15:42 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 14:29 daily report on extending static analyzer project [GSoC] Ankur Saini
2021-06-24 20:53 ` David Malcolm
2021-06-25 15:03   ` Ankur Saini
2021-06-25 15:34     ` David Malcolm
2021-06-26 15:20       ` Ankur Saini
2021-06-27 18:48         ` David Malcolm
2021-06-28 14:53           ` Ankur Saini
2021-06-28 23:39             ` David Malcolm
2021-06-29 16:34               ` Ankur Saini
2021-06-29 19:53                 ` David Malcolm
     [not found]                   ` <AD7A4C2F-1451-4317-BE53-99DE9E9853AE@gmail.com>
2021-06-30 17:17                     ` David Malcolm
2021-07-02 14:18                       ` Ankur Saini
2021-07-03 14:37                         ` Ankur Saini
2021-07-05 16:15                           ` Ankur Saini
2021-07-06 23:11                             ` David Malcolm
2021-07-06 22:46                           ` David Malcolm
2021-07-06 22:50                             ` David Malcolm
2021-07-07 13:52                             ` Ankur Saini
2021-07-07 14:37                               ` David Malcolm
2021-07-10 15:57                                 ` Ankur Saini
2021-07-11 17:01                                   ` Ankur Saini
2021-07-11 18:01                                     ` David Malcolm
2021-07-11 17:49                                   ` David Malcolm
2021-07-12 16:37                                     ` Ankur Saini
2021-07-14 17:11                                       ` Ankur Saini
2021-07-14 23:23                                         ` David Malcolm
2021-07-16 15:34                                           ` Ankur Saini
2021-07-16 21:27                                             ` David Malcolm
2021-07-21 16:14                                               ` Ankur Saini
2021-07-22 17:10                                                 ` Ankur Saini
2021-07-22 23:21                                                   ` David Malcolm
2021-07-24 16:35                                                   ` Ankur Saini
2021-07-27 15:05                                                     ` Ankur Saini
2021-07-28 15:49                                                       ` Ankur Saini
2021-07-29 12:50                                                         ` Ankur Saini
2021-07-30  0:05                                                           ` David Malcolm
     [not found]                                                             ` <ACE21DBF-8163-4F28-B755-6B05FDA27A0E@gmail.com>
2021-07-30 14:48                                                               ` David Malcolm
2021-08-03 16:12                                                                 ` Ankur Saini
2021-08-04 16:02                                                                   ` Ankur Saini
2021-08-04 23:26                                                                     ` David Malcolm
2021-08-05 14:57                                                                       ` Ankur Saini
2021-08-05 23:09                                                                         ` David Malcolm
2021-08-06 15:41                                                                           ` Ankur Saini
2021-07-22 23:07                                                 ` David Malcolm
2021-07-14 23:07                                       ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).