More questions on points-to analysis

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* More questions on points-to analysis
@ 2021-03-17 10:31 Erick Ochoa
  2021-03-17 10:51 ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Erick Ochoa @ 2021-03-17 10:31 UTC (permalink / raw)
  To: gcc

Hello,

I'm still trying to compare the solution generated from the
intraprocedural points-to analysis in GCC against an external solver.

Yesterday it was pointed out that "NULL is not conservatively
correctly represented in the constraints". Can someone expand on this?
To me this sounds like a couple of things:
* even though malloc may return NULL, NULL is not added to the
points-to sets of whatever variable is on the left hand side of the
malloc call.
* the process in GCC that generates the constraints for NULL somehow
does not generate enough constraints to treat NULL conservatively and
therefore there might be points-to sets which should contain NULL but
don't. (However, doesn't this mean that feeding the constraints to an
external solver should still give the same answers?)
* the process in GCC that generates the constraints for NULL works
fine (i.e., feeding the constraints generated by GCC to an external
solver should yield a conservatively correct answer) but the process
that solves the constraints relaxes the solutions for the NULL
constraint variable (i.e., GCC has deviated from the constraint
solving algorithm somehow)

Also, "at some point we decided to encode optimistic info into
pt->null which means points-to now has to compute a conservatively
correct pt->null." Doesn't this contradict itself? How is a pt->null
first optimistically and now conservatively? Is what this is trying to
say that:

* NULL constraints were conservative first
* pt->null optimistic first
* Then conversion to SSA happened and NULL constraints became not
conservatively represented in the constraints (effectively becoming
somewhat optimistic)
* To avoid NULL and pt->null be both unsafe, pt->null was changed to
be conservative

I've been looking at find_what_vars_points_to and have changed my code
which verifies the constraint points-to sets. Basically, I now find
which variables have been collapsed and only for "real" constraint
pointer variables I take a look at the points to solution struct.
Before looking into vars, I take a look at the fields and compare the
null, anything, escape, etc, against the id of the pointee-variable.
Checking vars is slightly confusing for me at the moment, since it
appears that there are at least 3 plausible ways of validating the
solution (I haven't actually gotten there because assertions are being
triggered).

```
for (auto &output : *orel) {
       int from_i;
       int to_i;

      // Since find_what_var_points_to
      // doesn't change the solution for collapsed
      // variables, only verify the answer for the real ones.
       varinfo_t from_var = get_varinfo(from_i);
       varinfo_t vi = get_varinfo (find (from_i));
       if (from_var->id != vi->id) continue;
       if (!from_var->may_have_pointers) continue;

       // compute the pt_solution
       pt_solution solution = find_what_var_points_to (cfun->decl, from_var);

       // pointee variable according to external analysis
       varinfo_t vi_to = get_varinfo(to_i);

       // Since some artificial variables are stored in fields instead
of the bitset
       // assert based on field values.
      // However you can see that I already had to disable some of the
assertions.
       if (vi_to->is_artificial_var)
        {
          if (vi_to->id == nothing_id)
          {
            gcc_assert(solution.null && vi_to->id == nothing_id);
            continue;
          }
          else if (vi_to->id == escaped_id)
            {
              if (in_ipa_mode)
              {
                gcc_assert(solution.ipa_escaped && vi_to->id == escaped_id);
              }
              else
              {
                //gcc_assert(solution.escaped && vi_to->id == escaped_id);
              }
              continue;
              /* Expand some special vars of ESCAPED in-place here. ??*/
            }
          // More...
     }

       if (solution.anything) continue;

       bitmap vars = solution.vars;
       if (!vars) continue;

       if (dump_file) fprintf(dump_file, "SAME = %s\n",
bitmap_bit_p(vars, DECL_PT_UID(vi_to->decl)) ? "true" : "false");
       if (dump_file) fprintf(dump_file, "SAME2 = %s\n",
bitmap_bit_p(vars, to_i) ? "true" : "false");
       if (dump_file) fprintf(dump_file, "SAME3 = %s\n",
bitmap_bit_p(from_var->solution, to_i) ? "true" : "false");
```

Can someone help me figure out why even though I have a "real"
variable and I compute its solution with the "find_what_var_points_to"
method the solution does not have the fields that I expect to be set?
(I would expect solution.escaped to be escaped if the pointee variable
vi_to has an id = escaped_id).

And also, how is the DECL_PT_UID different from the varinfo id field?
Shouldn't they be the same? It seems that during
"find_what_var_points_to" DECL_PT_UID is being used to set the bit in
the bitmap, but in previous instances it was the varinfo id offset?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-17 10:31 More questions on points-to analysis Erick Ochoa
@ 2021-03-17 10:51 ` Richard Biener
  2021-03-17 15:16   ` Erick Ochoa
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2021-03-17 10:51 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Wed, Mar 17, 2021 at 11:34 AM Erick Ochoa via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hello,
>
> I'm still trying to compare the solution generated from the
> intraprocedural points-to analysis in GCC against an external solver.
>
> Yesterday it was pointed out that "NULL is not conservatively
> correctly represented in the constraints". Can someone expand on this?
> To me this sounds like a couple of things:
> * even though malloc may return NULL, NULL is not added to the
> points-to sets of whatever variable is on the left hand side of the
> malloc call.
> * the process in GCC that generates the constraints for NULL somehow
> does not generate enough constraints to treat NULL conservatively and
> therefore there might be points-to sets which should contain NULL but
> don't. (However, doesn't this mean that feeding the constraints to an
> external solver should still give the same answers?)

Yes, there is an unknown number of places that get this "wrong".  Getting
it "right" wasn't needed until we started using pt->null for anything.

> * the process in GCC that generates the constraints for NULL works
> fine (i.e., feeding the constraints generated by GCC to an external
> solver should yield a conservatively correct answer) but the process
> that solves the constraints relaxes the solutions for the NULL
> constraint variable (i.e., GCC has deviated from the constraint
> solving algorithm somehow)

No, that part should work OK.

> Also, "at some point we decided to encode optimistic info into
> pt->null which means points-to now has to compute a conservatively
> correct pt->null." Doesn't this contradict itself? How is a pt->null
> first optimistically and now conservatively? Is what this is trying to
> say that:
>
> * NULL constraints were conservative first
> * pt->null optimistic first
> * Then conversion to SSA happened and NULL constraints became not
> conservatively represented in the constraints (effectively becoming
> somewhat optimistic)
> * To avoid NULL and pt->null be both unsafe, pt->null was changed to
> be conservative

The SSA points-to solution is what gets used, it is now populated not only
by PTA but also by range analysis which eventually sets pt->null to false.

PTA now simply sets pt->null to true as a conservative measure because
it cannot guarantee that when !pt->null the pointer is actually never NULL
(because of the above issues).

> I've been looking at find_what_vars_points_to and have changed my code
> which verifies the constraint points-to sets. Basically, I now find
> which variables have been collapsed and only for "real" constraint
> pointer variables I take a look at the points to solution struct.
> Before looking into vars, I take a look at the fields and compare the
> null, anything, escape, etc, against the id of the pointee-variable.
> Checking vars is slightly confusing for me at the moment, since it
> appears that there are at least 3 plausible ways of validating the
> solution (I haven't actually gotten there because assertions are being
> triggered).
>
> ```
> for (auto &output : *orel) {
>        int from_i;
>        int to_i;
>
>       // Since find_what_var_points_to
>       // doesn't change the solution for collapsed
>       // variables, only verify the answer for the real ones.
>        varinfo_t from_var = get_varinfo(from_i);
>        varinfo_t vi = get_varinfo (find (from_i));
>        if (from_var->id != vi->id) continue;
>        if (!from_var->may_have_pointers) continue;
>
>        // compute the pt_solution
>        pt_solution solution = find_what_var_points_to (cfun->decl, from_var);
>
>        // pointee variable according to external analysis
>        varinfo_t vi_to = get_varinfo(to_i);
>
>        // Since some artificial variables are stored in fields instead
> of the bitset
>        // assert based on field values.
>       // However you can see that I already had to disable some of the
> assertions.
>        if (vi_to->is_artificial_var)
>         {
>           if (vi_to->id == nothing_id)
>           {
>             gcc_assert(solution.null && vi_to->id == nothing_id);
>             continue;
>           }
>           else if (vi_to->id == escaped_id)
>             {
>               if (in_ipa_mode)
>               {
>                 gcc_assert(solution.ipa_escaped && vi_to->id == escaped_id);
>               }
>               else
>               {
>                 //gcc_assert(solution.escaped && vi_to->id == escaped_id);
>               }
>               continue;
>               /* Expand some special vars of ESCAPED in-place here. ??*/
>             }
>           // More...
>      }
>
>        if (solution.anything) continue;
>
>        bitmap vars = solution.vars;
>        if (!vars) continue;
>
>
>        if (dump_file) fprintf(dump_file, "SAME = %s\n",
> bitmap_bit_p(vars, DECL_PT_UID(vi_to->decl)) ? "true" : "false");
>        if (dump_file) fprintf(dump_file, "SAME2 = %s\n",
> bitmap_bit_p(vars, to_i) ? "true" : "false");
>        if (dump_file) fprintf(dump_file, "SAME3 = %s\n",
> bitmap_bit_p(from_var->solution, to_i) ? "true" : "false");
> ```
>
> Can someone help me figure out why even though I have a "real"
> variable and I compute its solution with the "find_what_var_points_to"
> method the solution does not have the fields that I expect to be set?
> (I would expect solution.escaped to be escaped if the pointee variable
> vi_to has an id = escaped_id).

solution.escaped is 1 if the pointer variable may point to everything
that escaped.  That is, solution.vars is really solution.vars |
cfun->gimple_df->escaped
(there is also ipa_escaped).  Have a look at pt_solution_includes_1 for an
outline how to interpret the fields when asking whether DECL is a member of
the points-to solution.

>
> And also, how is the DECL_PT_UID different from the varinfo id field?
> Shouldn't they be the same? It seems that during
> "find_what_var_points_to" DECL_PT_UID is being used to set the bit in
> the bitmap, but in previous instances it was the varinfo id offset?

The "SSA" points-to sets contain DECL_UIDs, not varinfo IDs which are
only transitional during points-to computation.  DECL_PT_UID maintains
the original UID also after inlining duplicates decls so the points-to solutions
remain valid even after inlining (or cloning, or IPA ICF, etc.).

Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-17 10:51 ` Richard Biener
@ 2021-03-17 15:16   ` Erick Ochoa
  2021-03-17 16:21     ` Erick Ochoa
  2021-03-18 12:27     ` Richard Biener
  0 siblings, 2 replies; 9+ messages in thread
From: Erick Ochoa @ 2021-03-17 15:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Erick Ochoa, GCC Development

Hi Richard, I think I misunderstood yesterday's answer and deviated a
little bit. But now I want to focus on this:

> > * the process in GCC that generates the constraints for NULL works
> > fine (i.e., feeding the constraints generated by GCC to an external
> > solver should yield a conservatively correct answer) but the process
> > that solves the constraints relaxes the solutions for the NULL
> > constraint variable (i.e., GCC has deviated from the constraint
> > solving algorithm somehow)
>
> No, that part should work OK.
>

So, let's ignore the other solver for now and instead focus on the
concrete example I presented on the previous email. If GCC is solving
these constraints:

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
i = NONLOCAL
i = &NONLOCAL
ESCAPED = &NONLOCAL
_2 = *ISRA.4
```

What would a hand calculated solution gives us vs what was the
solution given by GCC?

I am following the algorithm stated on Section 3.3 of Structure
Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
UNKNOWN constraint since there isn't any other field offset that needs
to be calculated.

First, I want to make some adjustments. I am going to be using "=" to
signify the \supseteq symbol and I will be adding curly braces to
specify the element in a set as opposed to the whole set. Therefore
the constraints will now become (ordered slightly differently):

```
====direct contraints========
ANYTHING = { ANYTHING }
ESCAPED = { NONLOCAL }
NONLOCAL = { NONLOCAL }
NONLOCAL =  { ESCAPED }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
i = { NONLOCAL }

====complex constraints======
ESCAPED = *ESCAPED
*ESCAPED = NONLOCAL
*ISRA.4 = derefaddrtmp(9)
_2 = *ISRA.4

===== copy-constraints======
ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
it will matter...
i = NONLOCAL
```

Solution sets are basically the direct constraints at the moment.

Let's now create the graph

1. node ESCAPED has an edge going to itself (due to the copy constraint)
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has no outgoing edge

Now, we can iterate over this set of nodes

1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
edges, but it has complex-constraints. Let's modify the graph.
  1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
edge from ESCAPED to NONLOCAL.
  2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
edge from NONLOCAL to NONLOCAL

The graph is now transformed to

1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has an edge going to NONLOCAL

The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
Sol(NONLOCAL) = {NONLOCAL, ESCAPED}

Now we continue walking

2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
There are no edges, but it has complex-constraints. Let's modify the
graph.
  1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
a copy edge from NONLOCAL to derefaddrtmp(9).

The graph is now transformed to

1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
2. node ISRA.4 has no outgoing copy edges
3. node derefaddrtmp(9) has no outgoing edges
4. node _2 has no outgoing edges
5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)

The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
ESCAPED, NULL}.

Now I could continue, but here is already something that is not shown
in the points-to sets in the dump. It shows that

NONLOCAL = {NONLOCAL, ESCAPED, NULL}

Looking at the data that I showed yesterday:

```
NONLOCAL = { ESCAPED NONLOCAL } same as i
```

we see that NULL is not in the solution set of NONLOCAL.

Now, yesterday you said that NULL is not conservatively correctly
represented in the constraints. You also said today the points-to
analysis should be solving the constraints fine. What I now understand
from this is that while NULL may be pointed to by some constraints, it
doesn't mean that not being on the set means that a pointer will not
point to NULL. However, it should still be shown in the dumps where
the points-to sets are shown for the constraint variables since it is
solved using the same analysis? Is this correct? Am I doing the points
to analysis by hand wrong somehow? Why would NULL not be in
Sol(NONLOCAL) if it is solving the same constraints that I am solving
by hand?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-17 15:16   ` Erick Ochoa
@ 2021-03-17 16:21     ` Erick Ochoa
  2021-03-17 16:44       ` Erick Ochoa
  2021-03-18 12:27     ` Richard Biener
  1 sibling, 1 reply; 9+ messages in thread
From: Erick Ochoa @ 2021-03-17 16:21 UTC (permalink / raw)
  Cc: Richard Biener, GCC Development

Mm... ignore this for now please, I think I messed up the analysis by
hand. I will try again. Thanks!

On Wed, 17 Mar 2021 at 16:16, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Hi Richard, I think I misunderstood yesterday's answer and deviated a
> little bit. But now I want to focus on this:
>
> > > * the process in GCC that generates the constraints for NULL works
> > > fine (i.e., feeding the constraints generated by GCC to an external
> > > solver should yield a conservatively correct answer) but the process
> > > that solves the constraints relaxes the solutions for the NULL
> > > constraint variable (i.e., GCC has deviated from the constraint
> > > solving algorithm somehow)
> >
> > No, that part should work OK.
> >
>
> So, let's ignore the other solver for now and instead focus on the
> concrete example I presented on the previous email. If GCC is solving
> these constraints:
>
> ```
> ANYTHING = &ANYTHING
> ESCAPED = *ESCAPED
> ESCAPED = ESCAPED + UNKNOWN
> *ESCAPED = NONLOCAL
> NONLOCAL = &NONLOCAL
> NONLOCAL = &ESCAPED
> INTEGER = &ANYTHING
> ISRA.4 = &NONLOCAL
> derefaddrtmp(9) = &NULL
> *ISRA.4 = derefaddrtmp(9)
> i = NONLOCAL
> i = &NONLOCAL
> ESCAPED = &NONLOCAL
> _2 = *ISRA.4
> ```
>
> What would a hand calculated solution gives us vs what was the
> solution given by GCC?
>
> I am following the algorithm stated on Section 3.3 of Structure
> Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> UNKNOWN constraint since there isn't any other field offset that needs
> to be calculated.
>
> First, I want to make some adjustments. I am going to be using "=" to
> signify the \supseteq symbol and I will be adding curly braces to
> specify the element in a set as opposed to the whole set. Therefore
> the constraints will now become (ordered slightly differently):
>
> ```
> ====direct contraints========
> ANYTHING = { ANYTHING }
> ESCAPED = { NONLOCAL }
> NONLOCAL = { NONLOCAL }
> NONLOCAL =  { ESCAPED }
> INTEGER = { ANYTHING }
> ISRA.4 = { NONLOCAL }
> derefaddrtmp(9) = { NULL }
> i = { NONLOCAL }
>
> ====complex constraints======
> ESCAPED = *ESCAPED
> *ESCAPED = NONLOCAL
> *ISRA.4 = derefaddrtmp(9)
> _2 = *ISRA.4
>
> ===== copy-constraints======
> ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> it will matter...
> i = NONLOCAL
> ```
>
> Solution sets are basically the direct constraints at the moment.
>
> Let's now create the graph
>
> 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has no outgoing edge
>
> Now, we can iterate over this set of nodes
>
> 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> edges, but it has complex-constraints. Let's modify the graph.
>   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> edge from ESCAPED to NONLOCAL.
>   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> edge from NONLOCAL to NONLOCAL
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL
>
> The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
>
> Now we continue walking
>
> 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> There are no edges, but it has complex-constraints. Let's modify the
> graph.
>   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> a copy edge from NONLOCAL to derefaddrtmp(9).
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
>
> The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> ESCAPED, NULL}.
>
> Now I could continue, but here is already something that is not shown
> in the points-to sets in the dump. It shows that
>
> NONLOCAL = {NONLOCAL, ESCAPED, NULL}
>
> Looking at the data that I showed yesterday:
>
> ```
> NONLOCAL = { ESCAPED NONLOCAL } same as i
> ```
>
> we see that NULL is not in the solution set of NONLOCAL.
>
> Now, yesterday you said that NULL is not conservatively correctly
> represented in the constraints. You also said today the points-to
> analysis should be solving the constraints fine. What I now understand
> from this is that while NULL may be pointed to by some constraints, it
> doesn't mean that not being on the set means that a pointer will not
> point to NULL. However, it should still be shown in the dumps where
> the points-to sets are shown for the constraint variables since it is
> solved using the same analysis? Is this correct? Am I doing the points
> to analysis by hand wrong somehow? Why would NULL not be in
> Sol(NONLOCAL) if it is solving the same constraints that I am solving
> by hand?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-17 16:21     ` Erick Ochoa
@ 2021-03-17 16:44       ` Erick Ochoa
  0 siblings, 0 replies; 9+ messages in thread
From: Erick Ochoa @ 2021-03-17 16:44 UTC (permalink / raw)
  Cc: Richard Biener, GCC Development

No, I think it is correct. Any help would be clearly appreciated.
Thanks! (Even if I'm wrong, of course I'd like to know.)

On Wed, 17 Mar 2021 at 17:21, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Mm... ignore this for now please, I think I messed up the analysis by
> hand. I will try again. Thanks!
>
> On Wed, 17 Mar 2021 at 16:16, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> >
> > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > little bit. But now I want to focus on this:
> >
> > > > * the process in GCC that generates the constraints for NULL works
> > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > solver should yield a conservatively correct answer) but the process
> > > > that solves the constraints relaxes the solutions for the NULL
> > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > solving algorithm somehow)
> > >
> > > No, that part should work OK.
> > >
> >
> > So, let's ignore the other solver for now and instead focus on the
> > concrete example I presented on the previous email. If GCC is solving
> > these constraints:
> >
> > ```
> > ANYTHING = &ANYTHING
> > ESCAPED = *ESCAPED
> > ESCAPED = ESCAPED + UNKNOWN
> > *ESCAPED = NONLOCAL
> > NONLOCAL = &NONLOCAL
> > NONLOCAL = &ESCAPED
> > INTEGER = &ANYTHING
> > ISRA.4 = &NONLOCAL
> > derefaddrtmp(9) = &NULL
> > *ISRA.4 = derefaddrtmp(9)
> > i = NONLOCAL
> > i = &NONLOCAL
> > ESCAPED = &NONLOCAL
> > _2 = *ISRA.4
> > ```
> >
> > What would a hand calculated solution gives us vs what was the
> > solution given by GCC?
> >
> > I am following the algorithm stated on Section 3.3 of Structure
> > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > UNKNOWN constraint since there isn't any other field offset that needs
> > to be calculated.
> >
> > First, I want to make some adjustments. I am going to be using "=" to
> > signify the \supseteq symbol and I will be adding curly braces to
> > specify the element in a set as opposed to the whole set. Therefore
> > the constraints will now become (ordered slightly differently):
> >
> > ```
> > ====direct contraints========
> > ANYTHING = { ANYTHING }
> > ESCAPED = { NONLOCAL }
> > NONLOCAL = { NONLOCAL }
> > NONLOCAL =  { ESCAPED }
> > INTEGER = { ANYTHING }
> > ISRA.4 = { NONLOCAL }
> > derefaddrtmp(9) = { NULL }
> > i = { NONLOCAL }
> >
> > ====complex constraints======
> > ESCAPED = *ESCAPED
> > *ESCAPED = NONLOCAL
> > *ISRA.4 = derefaddrtmp(9)
> > _2 = *ISRA.4
> >
> > ===== copy-constraints======
> > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > it will matter...
> > i = NONLOCAL
> > ```
> >
> > Solution sets are basically the direct constraints at the moment.
> >
> > Let's now create the graph
> >
> > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has no outgoing edge
> >
> > Now, we can iterate over this set of nodes
> >
> > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > edges, but it has complex-constraints. Let's modify the graph.
> >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > edge from ESCAPED to NONLOCAL.
> >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > edge from NONLOCAL to NONLOCAL
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL
> >
> > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> >
> > Now we continue walking
> >
> > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > There are no edges, but it has complex-constraints. Let's modify the
> > graph.
> >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > a copy edge from NONLOCAL to derefaddrtmp(9).
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> >
> > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > ESCAPED, NULL}.
> >
> > Now I could continue, but here is already something that is not shown
> > in the points-to sets in the dump. It shows that
> >
> > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> >
> > Looking at the data that I showed yesterday:
> >
> > ```
> > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > ```
> >
> > we see that NULL is not in the solution set of NONLOCAL.
> >
> > Now, yesterday you said that NULL is not conservatively correctly
> > represented in the constraints. You also said today the points-to
> > analysis should be solving the constraints fine. What I now understand
> > from this is that while NULL may be pointed to by some constraints, it
> > doesn't mean that not being on the set means that a pointer will not
> > point to NULL. However, it should still be shown in the dumps where
> > the points-to sets are shown for the constraint variables since it is
> > solved using the same analysis? Is this correct? Am I doing the points
> > to analysis by hand wrong somehow? Why would NULL not be in
> > Sol(NONLOCAL) if it is solving the same constraints that I am solving
> > by hand?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-17 15:16   ` Erick Ochoa
  2021-03-17 16:21     ` Erick Ochoa
@ 2021-03-18 12:27     ` Richard Biener
  2021-03-18 12:36       ` Erick Ochoa
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Biener @ 2021-03-18 12:27 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Hi Richard, I think I misunderstood yesterday's answer and deviated a
> little bit. But now I want to focus on this:
>
> > > * the process in GCC that generates the constraints for NULL works
> > > fine (i.e., feeding the constraints generated by GCC to an external
> > > solver should yield a conservatively correct answer) but the process
> > > that solves the constraints relaxes the solutions for the NULL
> > > constraint variable (i.e., GCC has deviated from the constraint
> > > solving algorithm somehow)
> >
> > No, that part should work OK.
> >
>
> So, let's ignore the other solver for now and instead focus on the
> concrete example I presented on the previous email. If GCC is solving
> these constraints:
>
> ```
> ANYTHING = &ANYTHING
> ESCAPED = *ESCAPED
> ESCAPED = ESCAPED + UNKNOWN
> *ESCAPED = NONLOCAL
> NONLOCAL = &NONLOCAL
> NONLOCAL = &ESCAPED
> INTEGER = &ANYTHING
> ISRA.4 = &NONLOCAL
> derefaddrtmp(9) = &NULL
> *ISRA.4 = derefaddrtmp(9)
> i = NONLOCAL
> i = &NONLOCAL
> ESCAPED = &NONLOCAL
> _2 = *ISRA.4
> ```
>
> What would a hand calculated solution gives us vs what was the
> solution given by GCC?
>
> I am following the algorithm stated on Section 3.3 of Structure
> Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> UNKNOWN constraint since there isn't any other field offset that needs
> to be calculated.
>
> First, I want to make some adjustments. I am going to be using "=" to
> signify the \supseteq symbol and I will be adding curly braces to
> specify the element in a set as opposed to the whole set. Therefore
> the constraints will now become (ordered slightly differently):
>
> ```
> ====direct contraints========
> ANYTHING = { ANYTHING }
> ESCAPED = { NONLOCAL }
> NONLOCAL = { NONLOCAL }
> NONLOCAL =  { ESCAPED }
> INTEGER = { ANYTHING }
> ISRA.4 = { NONLOCAL }
> derefaddrtmp(9) = { NULL }
> i = { NONLOCAL }
>
> ====complex constraints======
> ESCAPED = *ESCAPED
> *ESCAPED = NONLOCAL
> *ISRA.4 = derefaddrtmp(9)
> _2 = *ISRA.4
>
> ===== copy-constraints======
> ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> it will matter...
> i = NONLOCAL
> ```
>
> Solution sets are basically the direct constraints at the moment.
>
> Let's now create the graph
>
> 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has no outgoing edge
>
> Now, we can iterate over this set of nodes
>
> 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> edges, but it has complex-constraints. Let's modify the graph.
>   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> edge from ESCAPED to NONLOCAL.
>   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> edge from NONLOCAL to NONLOCAL
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL
>
> The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
>
> Now we continue walking
>
> 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> There are no edges, but it has complex-constraints. Let's modify the
> graph.
>   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> a copy edge from NONLOCAL to derefaddrtmp(9).
>
> The graph is now transformed to
>
> 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> 2. node ISRA.4 has no outgoing copy edges
> 3. node derefaddrtmp(9) has no outgoing edges
> 4. node _2 has no outgoing edges
> 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
>
> The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> ESCAPED, NULL}.
>
> Now I could continue, but here is already something that is not shown
> in the points-to sets in the dump. It shows that
>
> NONLOCAL = {NONLOCAL, ESCAPED, NULL}
>
> Looking at the data that I showed yesterday:
>
> ```
> NONLOCAL = { ESCAPED NONLOCAL } same as i
> ```
>
> we see that NULL is not in the solution set of NONLOCAL.
>
> Now, yesterday you said that NULL is not conservatively correctly
> represented in the constraints. You also said today the points-to
> analysis should be solving the constraints fine. What I now understand
> from this is that while NULL may be pointed to by some constraints, it
> doesn't mean that not being on the set means that a pointer will not
> point to NULL. However, it should still be shown in the dumps where
> the points-to sets are shown for the constraint variables since it is
> solved using the same analysis? Is this correct? Am I doing the points
> to analysis by hand wrong somehow? Why would NULL not be in
> Sol(NONLOCAL) if it is solving the same constraints that I am solving
> by hand?

Because NONLOCAL is a special var and those do not get "solved".
Their points-to set is fixed.  NONLOCAL is the set of "global"
variables _at function entry_ (otherwise it would be the same as
ESCAPED).

Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-18 12:27     ` Richard Biener
@ 2021-03-18 12:36       ` Erick Ochoa
  2021-03-19 10:29         ` Erick Ochoa
  0 siblings, 1 reply; 9+ messages in thread
From: Erick Ochoa @ 2021-03-18 12:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

Perfect, thank you! This is exactly the answer that I was looking for.


On Thu, 18 Mar 2021 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> >
> > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > little bit. But now I want to focus on this:
> >
> > > > * the process in GCC that generates the constraints for NULL works
> > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > solver should yield a conservatively correct answer) but the process
> > > > that solves the constraints relaxes the solutions for the NULL
> > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > solving algorithm somehow)
> > >
> > > No, that part should work OK.
> > >
> >
> > So, let's ignore the other solver for now and instead focus on the
> > concrete example I presented on the previous email. If GCC is solving
> > these constraints:
> >
> > ```
> > ANYTHING = &ANYTHING
> > ESCAPED = *ESCAPED
> > ESCAPED = ESCAPED + UNKNOWN
> > *ESCAPED = NONLOCAL
> > NONLOCAL = &NONLOCAL
> > NONLOCAL = &ESCAPED
> > INTEGER = &ANYTHING
> > ISRA.4 = &NONLOCAL
> > derefaddrtmp(9) = &NULL
> > *ISRA.4 = derefaddrtmp(9)
> > i = NONLOCAL
> > i = &NONLOCAL
> > ESCAPED = &NONLOCAL
> > _2 = *ISRA.4
> > ```
> >
> > What would a hand calculated solution gives us vs what was the
> > solution given by GCC?
> >
> > I am following the algorithm stated on Section 3.3 of Structure
> > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > UNKNOWN constraint since there isn't any other field offset that needs
> > to be calculated.
> >
> > First, I want to make some adjustments. I am going to be using "=" to
> > signify the \supseteq symbol and I will be adding curly braces to
> > specify the element in a set as opposed to the whole set. Therefore
> > the constraints will now become (ordered slightly differently):
> >
> > ```
> > ====direct contraints========
> > ANYTHING = { ANYTHING }
> > ESCAPED = { NONLOCAL }
> > NONLOCAL = { NONLOCAL }
> > NONLOCAL =  { ESCAPED }
> > INTEGER = { ANYTHING }
> > ISRA.4 = { NONLOCAL }
> > derefaddrtmp(9) = { NULL }
> > i = { NONLOCAL }
> >
> > ====complex constraints======
> > ESCAPED = *ESCAPED
> > *ESCAPED = NONLOCAL
> > *ISRA.4 = derefaddrtmp(9)
> > _2 = *ISRA.4
> >
> > ===== copy-constraints======
> > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > it will matter...
> > i = NONLOCAL
> > ```
> >
> > Solution sets are basically the direct constraints at the moment.
> >
> > Let's now create the graph
> >
> > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has no outgoing edge
> >
> > Now, we can iterate over this set of nodes
> >
> > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > edges, but it has complex-constraints. Let's modify the graph.
> >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > edge from ESCAPED to NONLOCAL.
> >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > edge from NONLOCAL to NONLOCAL
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL
> >
> > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> >
> > Now we continue walking
> >
> > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > There are no edges, but it has complex-constraints. Let's modify the
> > graph.
> >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > a copy edge from NONLOCAL to derefaddrtmp(9).
> >
> > The graph is now transformed to
> >
> > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > 2. node ISRA.4 has no outgoing copy edges
> > 3. node derefaddrtmp(9) has no outgoing edges
> > 4. node _2 has no outgoing edges
> > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> >
> > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > ESCAPED, NULL}.
> >
> > Now I could continue, but here is already something that is not shown
> > in the points-to sets in the dump. It shows that
> >
> > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> >
> > Looking at the data that I showed yesterday:
> >
> > ```
> > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > ```
> >
> > we see that NULL is not in the solution set of NONLOCAL.
> >
> > Now, yesterday you said that NULL is not conservatively correctly
> > represented in the constraints. You also said today the points-to
> > analysis should be solving the constraints fine. What I now understand
> > from this is that while NULL may be pointed to by some constraints, it
> > doesn't mean that not being on the set means that a pointer will not
> > point to NULL. However, it should still be shown in the dumps where
> > the points-to sets are shown for the constraint variables since it is
> > solved using the same analysis? Is this correct? Am I doing the points
> > to analysis by hand wrong somehow? Why would NULL not be in
> > Sol(NONLOCAL) if it is solving the same constraints that I am solving
> > by hand?
>
> Because NONLOCAL is a special var and those do not get "solved".
> Their points-to set is fixed.  NONLOCAL is the set of "global"
> variables _at function entry_ (otherwise it would be the same as
> ESCAPED).
>
> Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-18 12:36       ` Erick Ochoa
@ 2021-03-19 10:29         ` Erick Ochoa
  2021-03-19 10:37           ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Erick Ochoa @ 2021-03-19 10:29 UTC (permalink / raw)
  Cc: Richard Biener, GCC Development

Hello Richard,

I have a related question. I have the following GCC program (which is
again an altered version of ipa-pta-16.c:

```

struct X
{
  long l1;
  struct Y
    {
      long l2;
      int *p;
    } y;
};
int i;
static int __attribute__((noinline))
foo (struct X *x)
{
  struct Y y = x->y;
  *y.p = 0;
   //i = 1;
  return *y.p;
}
extern void abort (void);
int main()
{
  struct X x;
  struct X t;
  x.y.p = &i;
  long* e = &t.l1;
  if (foo(&x) != 1)
    abort ();
  return e;
}
```

I am compiling it with gcc -O2 -ftree-pta -fdump-tree-alias
-ftree-salias ipa-pta-16.c

and I get the following constraints for function foo.

```
ANYTHING = &ANYTHING
ESCAPED = *ESCAPED
ESCAPED = ESCAPED + UNKNOWN
*ESCAPED = NONLOCAL
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL
*ISRA.4 = derefaddrtmp(9)
```

I am just ordering to make this easier:

```
ANYTHING = &ANYTHING
NONLOCAL = &NONLOCAL
NONLOCAL = &ESCAPED
INTEGER = &ANYTHING
ISRA.4 = &NONLOCAL
derefaddrtmp(9) = &NULL

ESCAPED = ESCAPED + UNKNOWN

ESCAPED = *ESCAPED
*ESCAPED = NONLOCAL
*ISRA.4 = derefaddrtmp(9)
```

I now construct a constraint graph. It is basically a collection of
unconnected nodes, except for ESCAPED which points to itself.

I then walk over this graph and first encounter the constraint
"ESCAPED = ESCAPED + UNKNOWN". Since Sol(ESCAPED) is the empty set
nothing happens.
Similar for the constraints "ESCAPED = *ESCAPED" and "*ESCAPED = NONLOCAL".

I then walk to "*ISRA.4 = derefaddrtmp(9)". Sol(ISRA.4) = { NONLOCAL
}. Then there's possibly an edge made from NONLOCAL to derefaddrtmp.
(I am not sure of the implementation, since NONLOCAL doesn't change it
might not require an added edge.). And because NONLOCAL is a special
variable, Sol(NONLOCAL) does not change.

Now, none of the solutions have changed. We have possibly added an
edge to NONLOCAL, so let's just process the graph once more. Again
Sol(ESCAPED) is still empty so "ESCAPED = ESCAPED + UNKNOWN", "ESCAPED
= *ESCAPED", and "*ESCAPED = NONLOCAL" should have no effect ."Then
*ISRA.4 = derefaddrtmp(9)" also has no effect since nothing can
happen.

My question here is, why if we look at the points-to sets dumped by GCC we see

ESCAPED = { NULL }

Here is the full points-to sets:

computed for foo:

```
ANYTHING = { ANYTHING }
ESCAPED = { NULL }
NONLOCAL = { ESCAPED NONLOCAL }
STOREDANYTHING = { }
INTEGER = { ANYTHING }
ISRA.4 = { NONLOCAL }
derefaddrtmp(9) = { NULL }
foo.constprop.0.isra.0 = { }
```

I think this might have to do with "+ UNKNOWN" but I am unsure about
its semantics in the constraint set.

Thanks, any help is appreciated!


On Thu, 18 Mar 2021 at 13:36, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Perfect, thank you! This is exactly the answer that I was looking for.
>
>
> On Thu, 18 Mar 2021 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> > >
> > > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > > little bit. But now I want to focus on this:
> > >
> > > > > * the process in GCC that generates the constraints for NULL works
> > > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > > solver should yield a conservatively correct answer) but the process
> > > > > that solves the constraints relaxes the solutions for the NULL
> > > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > > solving algorithm somehow)
> > > >
> > > > No, that part should work OK.
> > > >
> > >
> > > So, let's ignore the other solver for now and instead focus on the
> > > concrete example I presented on the previous email. If GCC is solving
> > > these constraints:
> > >
> > > ```
> > > ANYTHING = &ANYTHING
> > > ESCAPED = *ESCAPED
> > > ESCAPED = ESCAPED + UNKNOWN
> > > *ESCAPED = NONLOCAL
> > > NONLOCAL = &NONLOCAL
> > > NONLOCAL = &ESCAPED
> > > INTEGER = &ANYTHING
> > > ISRA.4 = &NONLOCAL
> > > derefaddrtmp(9) = &NULL
> > > *ISRA.4 = derefaddrtmp(9)
> > > i = NONLOCAL
> > > i = &NONLOCAL
> > > ESCAPED = &NONLOCAL
> > > _2 = *ISRA.4
> > > ```
> > >
> > > What would a hand calculated solution gives us vs what was the
> > > solution given by GCC?
> > >
> > > I am following the algorithm stated on Section 3.3 of Structure
> > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > > UNKNOWN constraint since there isn't any other field offset that needs
> > > to be calculated.
> > >
> > > First, I want to make some adjustments. I am going to be using "=" to
> > > signify the \supseteq symbol and I will be adding curly braces to
> > > specify the element in a set as opposed to the whole set. Therefore
> > > the constraints will now become (ordered slightly differently):
> > >
> > > ```
> > > ====direct contraints========
> > > ANYTHING = { ANYTHING }
> > > ESCAPED = { NONLOCAL }
> > > NONLOCAL = { NONLOCAL }
> > > NONLOCAL =  { ESCAPED }
> > > INTEGER = { ANYTHING }
> > > ISRA.4 = { NONLOCAL }
> > > derefaddrtmp(9) = { NULL }
> > > i = { NONLOCAL }
> > >
> > > ====complex constraints======
> > > ESCAPED = *ESCAPED
> > > *ESCAPED = NONLOCAL
> > > *ISRA.4 = derefaddrtmp(9)
> > > _2 = *ISRA.4
> > >
> > > ===== copy-constraints======
> > > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > > it will matter...
> > > i = NONLOCAL
> > > ```
> > >
> > > Solution sets are basically the direct constraints at the moment.
> > >
> > > Let's now create the graph
> > >
> > > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has no outgoing edge
> > >
> > > Now, we can iterate over this set of nodes
> > >
> > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > > edges, but it has complex-constraints. Let's modify the graph.
> > >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > > edge from ESCAPED to NONLOCAL.
> > >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > > edge from NONLOCAL to NONLOCAL
> > >
> > > The graph is now transformed to
> > >
> > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has an edge going to NONLOCAL
> > >
> > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> > >
> > > Now we continue walking
> > >
> > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > > There are no edges, but it has complex-constraints. Let's modify the
> > > graph.
> > >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > > a copy edge from NONLOCAL to derefaddrtmp(9).
> > >
> > > The graph is now transformed to
> > >
> > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > 2. node ISRA.4 has no outgoing copy edges
> > > 3. node derefaddrtmp(9) has no outgoing edges
> > > 4. node _2 has no outgoing edges
> > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> > >
> > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > > ESCAPED, NULL}.
> > >
> > > Now I could continue, but here is already something that is not shown
> > > in the points-to sets in the dump. It shows that
> > >
> > > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> > >
> > > Looking at the data that I showed yesterday:
> > >
> > > ```
> > > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > > ```
> > >
> > > we see that NULL is not in the solution set of NONLOCAL.
> > >
> > > Now, yesterday you said that NULL is not conservatively correctly
> > > represented in the constraints. You also said today the points-to
> > > analysis should be solving the constraints fine. What I now understand
> > > from this is that while NULL may be pointed to by some constraints, it
> > > doesn't mean that not being on the set means that a pointer will not
> > > point to NULL. However, it should still be shown in the dumps where
> > > the points-to sets are shown for the constraint variables since it is
> > > solved using the same analysis? Is this correct? Am I doing the points
> > > to analysis by hand wrong somehow? Why would NULL not be in
> > > Sol(NONLOCAL) if it is solving the same constraints that I am solving
> > > by hand?
> >
> > Because NONLOCAL is a special var and those do not get "solved".
> > Their points-to set is fixed.  NONLOCAL is the set of "global"
> > variables _at function entry_ (otherwise it would be the same as
> > ESCAPED).
> >
> > Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More questions on points-to analysis
  2021-03-19 10:29         ` Erick Ochoa
@ 2021-03-19 10:37           ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2021-03-19 10:37 UTC (permalink / raw)
  To: Erick Ochoa; +Cc: GCC Development

On Fri, Mar 19, 2021 at 11:29 AM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
>
> Hello Richard,
>
> I have a related question. I have the following GCC program (which is
> again an altered version of ipa-pta-16.c:
>
> ```
>
> struct X
> {
>   long l1;
>   struct Y
>     {
>       long l2;
>       int *p;
>     } y;
> };
> int i;
> static int __attribute__((noinline))
> foo (struct X *x)
> {
>   struct Y y = x->y;
>   *y.p = 0;
>    //i = 1;
>   return *y.p;
> }
> extern void abort (void);
> int main()
> {
>   struct X x;
>   struct X t;
>   x.y.p = &i;
>   long* e = &t.l1;
>   if (foo(&x) != 1)
>     abort ();
>   return e;
> }
> ```
>
> I am compiling it with gcc -O2 -ftree-pta -fdump-tree-alias
> -ftree-salias ipa-pta-16.c
>
> and I get the following constraints for function foo.
>
> ```
> ANYTHING = &ANYTHING
> ESCAPED = *ESCAPED
> ESCAPED = ESCAPED + UNKNOWN
> *ESCAPED = NONLOCAL
> NONLOCAL = &NONLOCAL
> NONLOCAL = &ESCAPED
> INTEGER = &ANYTHING
> ISRA.4 = &NONLOCAL
> derefaddrtmp(9) = &NULL
> *ISRA.4 = derefaddrtmp(9)
> ```
>
> I am just ordering to make this easier:
>
> ```
> ANYTHING = &ANYTHING
> NONLOCAL = &NONLOCAL
> NONLOCAL = &ESCAPED
> INTEGER = &ANYTHING
> ISRA.4 = &NONLOCAL
> derefaddrtmp(9) = &NULL
>
> ESCAPED = ESCAPED + UNKNOWN
>
> ESCAPED = *ESCAPED
> *ESCAPED = NONLOCAL
> *ISRA.4 = derefaddrtmp(9)
> ```
>
> I now construct a constraint graph. It is basically a collection of
> unconnected nodes, except for ESCAPED which points to itself.
>
> I then walk over this graph and first encounter the constraint
> "ESCAPED = ESCAPED + UNKNOWN". Since Sol(ESCAPED) is the empty set
> nothing happens.
> Similar for the constraints "ESCAPED = *ESCAPED" and "*ESCAPED = NONLOCAL".
>
> I then walk to "*ISRA.4 = derefaddrtmp(9)". Sol(ISRA.4) = { NONLOCAL
> }. Then there's possibly an edge made from NONLOCAL to derefaddrtmp.
> (I am not sure of the implementation, since NONLOCAL doesn't change it
> might not require an added edge.). And because NONLOCAL is a special
> variable, Sol(NONLOCAL) does not change.
>
> Now, none of the solutions have changed. We have possibly added an
> edge to NONLOCAL, so let's just process the graph once more. Again
> Sol(ESCAPED) is still empty so "ESCAPED = ESCAPED + UNKNOWN", "ESCAPED
> = *ESCAPED", and "*ESCAPED = NONLOCAL" should have no effect ."Then
> *ISRA.4 = derefaddrtmp(9)" also has no effect since nothing can
> happen.
>
> My question here is, why if we look at the points-to sets dumped by GCC we see
>
> ESCAPED = { NULL }

That's likely because ISRA points to global memory (NONLOCAL) and thus
all stores to it, like *ISRA.4 = derefaddrtmp(9), escape.  That detail
is handled
in the solver in do_ds_constraint:

             /* If v is a global variable then this is an escape point.  */
              if (v->is_global_var
                  && !escaped_p)
                {
                  t = find (escaped_id);
                  if (add_graph_edge (graph, t, rhs)
                      && bitmap_ior_into (get_varinfo (t)->solution, sol))
                    bitmap_set_bit (changed, t);
                  /* Enough to let rhs escape once.  */
                  escaped_p = true;
                }

which is an artifact on how we have split NONLOCAL and ESCAPED
(for the benefit of getting some flow sensitivity, function start and later).

You can also see special-casing for stores to ANYTHING in this
function.

> Here is the full points-to sets:
>
> computed for foo:
>
> ```
> ANYTHING = { ANYTHING }
> ESCAPED = { NULL }
> NONLOCAL = { ESCAPED NONLOCAL }
> STOREDANYTHING = { }
> INTEGER = { ANYTHING }
> ISRA.4 = { NONLOCAL }
> derefaddrtmp(9) = { NULL }
> foo.constprop.0.isra.0 = { }
> ```
>
> I think this might have to do with "+ UNKNOWN" but I am unsure about
> its semantics in the constraint set.
>
> Thanks, any help is appreciated!
>
>
> On Thu, 18 Mar 2021 at 13:36, Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> >
> > Perfect, thank you! This is exactly the answer that I was looking for.
> >
> >
> > On Thu, 18 Mar 2021 at 13:27, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Wed, Mar 17, 2021 at 4:17 PM Erick Ochoa <eochoa@gcc.gnu.org> wrote:
> > > >
> > > > Hi Richard, I think I misunderstood yesterday's answer and deviated a
> > > > little bit. But now I want to focus on this:
> > > >
> > > > > > * the process in GCC that generates the constraints for NULL works
> > > > > > fine (i.e., feeding the constraints generated by GCC to an external
> > > > > > solver should yield a conservatively correct answer) but the process
> > > > > > that solves the constraints relaxes the solutions for the NULL
> > > > > > constraint variable (i.e., GCC has deviated from the constraint
> > > > > > solving algorithm somehow)
> > > > >
> > > > > No, that part should work OK.
> > > > >
> > > >
> > > > So, let's ignore the other solver for now and instead focus on the
> > > > concrete example I presented on the previous email. If GCC is solving
> > > > these constraints:
> > > >
> > > > ```
> > > > ANYTHING = &ANYTHING
> > > > ESCAPED = *ESCAPED
> > > > ESCAPED = ESCAPED + UNKNOWN
> > > > *ESCAPED = NONLOCAL
> > > > NONLOCAL = &NONLOCAL
> > > > NONLOCAL = &ESCAPED
> > > > INTEGER = &ANYTHING
> > > > ISRA.4 = &NONLOCAL
> > > > derefaddrtmp(9) = &NULL
> > > > *ISRA.4 = derefaddrtmp(9)
> > > > i = NONLOCAL
> > > > i = &NONLOCAL
> > > > ESCAPED = &NONLOCAL
> > > > _2 = *ISRA.4
> > > > ```
> > > >
> > > > What would a hand calculated solution gives us vs what was the
> > > > solution given by GCC?
> > > >
> > > > I am following the algorithm stated on Section 3.3 of Structure
> > > > Aliasing in GCC, and I will be ignoring the ESCAPED = ESCAPED +
> > > > UNKNOWN constraint since there isn't any other field offset that needs
> > > > to be calculated.
> > > >
> > > > First, I want to make some adjustments. I am going to be using "=" to
> > > > signify the \supseteq symbol and I will be adding curly braces to
> > > > specify the element in a set as opposed to the whole set. Therefore
> > > > the constraints will now become (ordered slightly differently):
> > > >
> > > > ```
> > > > ====direct contraints========
> > > > ANYTHING = { ANYTHING }
> > > > ESCAPED = { NONLOCAL }
> > > > NONLOCAL = { NONLOCAL }
> > > > NONLOCAL =  { ESCAPED }
> > > > INTEGER = { ANYTHING }
> > > > ISRA.4 = { NONLOCAL }
> > > > derefaddrtmp(9) = { NULL }
> > > > i = { NONLOCAL }
> > > >
> > > > ====complex constraints======
> > > > ESCAPED = *ESCAPED
> > > > *ESCAPED = NONLOCAL
> > > > *ISRA.4 = derefaddrtmp(9)
> > > > _2 = *ISRA.4
> > > >
> > > > ===== copy-constraints======
> > > > ESCAPED = ESCAPED // again ignoring the + UNKNOWN since I don't think
> > > > it will matter...
> > > > i = NONLOCAL
> > > > ```
> > > >
> > > > Solution sets are basically the direct constraints at the moment.
> > > >
> > > > Let's now create the graph
> > > >
> > > > 1. node ESCAPED has an edge going to itself (due to the copy constraint)
> > > > 2. node ISRA.4 has no outgoing copy edges
> > > > 3. node derefaddrtmp(9) has no outgoing edges
> > > > 4. node _2 has no outgoing edges
> > > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > > 6. node NONLOCAL has no outgoing edge
> > > >
> > > > Now, we can iterate over this set of nodes
> > > >
> > > > 1. Walking over node ESCAPED. Sol(ESCAPED) = {NONLOCAL}. There are no
> > > > edges, but it has complex-constraints. Let's modify the graph.
> > > >   1. Looking at ESCAPED = *ESCAPED we note that we need to add a copy
> > > > edge from ESCAPED to NONLOCAL.
> > > >   2. Looking at *ESCAPED = NONLOCAL we note that we need to add a copy
> > > > edge from NONLOCAL to NONLOCAL
> > > >
> > > > The graph is now transformed to
> > > >
> > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > > 2. node ISRA.4 has no outgoing copy edges
> > > > 3. node derefaddrtmp(9) has no outgoing edges
> > > > 4. node _2 has no outgoing edges
> > > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > > 6. node NONLOCAL has an edge going to NONLOCAL
> > > >
> > > > The solution set of escaped is now Sol(ESCAPED) = Sol(ESCAPED) U
> > > > Sol(NONLOCAL) = {NONLOCAL, ESCAPED}
> > > >
> > > > Now we continue walking
> > > >
> > > > 2. Walking over node ISRA.4. It has the solution set { NONLOCAL }.
> > > > There are no edges, but it has complex-constraints. Let's modify the
> > > > graph.
> > > >   1. Looking at *ISRA.4 = derefaddrtmp(9), we note that we need to add
> > > > a copy edge from NONLOCAL to derefaddrtmp(9).
> > > >
> > > > The graph is now transformed to
> > > >
> > > > 1. node ESCAPED has an edge going to ESCAPED and NONLOCAL
> > > > 2. node ISRA.4 has no outgoing copy edges
> > > > 3. node derefaddrtmp(9) has no outgoing edges
> > > > 4. node _2 has no outgoing edges
> > > > 5. node i has an outgoing edge to NONLOCAL (due to the copy constraint)
> > > > 6. node NONLOCAL has an edge going to NONLOCAL, derefaddrtmp(9)
> > > >
> > > > The Sol(NONLOCAL) = Sol(NONLOCAL) U Sol(derefaddrtmp(9)) = {NONLOCAL,
> > > > ESCAPED, NULL}.
> > > >
> > > > Now I could continue, but here is already something that is not shown
> > > > in the points-to sets in the dump. It shows that
> > > >
> > > > NONLOCAL = {NONLOCAL, ESCAPED, NULL}
> > > >
> > > > Looking at the data that I showed yesterday:
> > > >
> > > > ```
> > > > NONLOCAL = { ESCAPED NONLOCAL } same as i
> > > > ```
> > > >
> > > > we see that NULL is not in the solution set of NONLOCAL.
> > > >
> > > > Now, yesterday you said that NULL is not conservatively correctly
> > > > represented in the constraints. You also said today the points-to
> > > > analysis should be solving the constraints fine. What I now understand
> > > > from this is that while NULL may be pointed to by some constraints, it
> > > > doesn't mean that not being on the set means that a pointer will not
> > > > point to NULL. However, it should still be shown in the dumps where
> > > > the points-to sets are shown for the constraint variables since it is
> > > > solved using the same analysis? Is this correct? Am I doing the points
> > > > to analysis by hand wrong somehow? Why would NULL not be in
> > > > Sol(NONLOCAL) if it is solving the same constraints that I am solving
> > > > by hand?
> > >
> > > Because NONLOCAL is a special var and those do not get "solved".
> > > Their points-to set is fixed.  NONLOCAL is the set of "global"
> > > variables _at function entry_ (otherwise it would be the same as
> > > ESCAPED).
> > >
> > > Richard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-19 10:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-17 10:31 More questions on points-to analysis Erick Ochoa
2021-03-17 10:51 ` Richard Biener
2021-03-17 15:16   ` Erick Ochoa
2021-03-17 16:21     ` Erick Ochoa
2021-03-17 16:44       ` Erick Ochoa
2021-03-18 12:27     ` Richard Biener
2021-03-18 12:36       ` Erick Ochoa
2021-03-19 10:29         ` Erick Ochoa
2021-03-19 10:37           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).