public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Static Analyzer: correlate state for different region/svalue for reporting
@ 2023-03-15 15:24 Pierrick Philippe
  2023-03-15 16:26 ` David Malcolm
  0 siblings, 1 reply; 5+ messages in thread
From: Pierrick Philippe @ 2023-03-15 15:24 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]

Hi everyone,

I have some question regarding the analyzer.
I'm currently working on an fully out-of-tree static analyzer plugin.
I started development on commit tagged as /basepoints/gcc-13/, but 
recently moved my code to a more recent /trunk/ branch (build /20230309/).

 From my different experiments and analyzer's log readings, I have the 
intuition that '/ana::state_machine::state/' class is mostly (if not 
only) linked with '/ana::svalue/', i.e. the lvalue of a memory region.
In my analysis case, I would like to also be able to track state for 
some rvalue of a memory region, i.e. '/ana::region/' objects.
So, first question: is there any way to associate and track the state of 
a rvalue, independently of its lvalue?

To try to clarify the question, here's an example:

'''
int __attribute__("__some_attribute__") x = 42;
/* STEP-1
 From now on, we consider the state of x as being marked by some_attribute.
But in fact, in the log, we can observe that we'll have something like 
this in the new '/ana::program_state/':
{0x4b955b0: (int)42: marked_state (‘x’)} */

int *y = &x;
/* STEP-2
For analysis purpose, you want to know that from now on, y is pointing 
to marked data.
So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name 
instance of y) accordingly, with a state named 'points-to_marked_data'
and setting 'x' as the origin of the state (in the sense of the argument 
/origin/ from '/ana::sm_context::on_transition/'.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

int z = *y;
/* STEP-3
Now you have implicit copy of marked data, and you want to report it.
So you state the LHS of the GIMPLE statement (i.e. some ssa_name 
instance of z) as being marked, with 'y' as the origin.
What we now have in the new '/ana::program_state/' is this:
{0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 
(‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
'''

In STEP-2:

We lost the information saying that the rvalue of y (i.e. y), is 
pointing to marked data.
Only remain the information that its lvalue (i.e. &x), is pointing to 
marked data, which is of course true.

In STEP-3:

No information is added regarding z in the new program_state.
In fact, if one have a closer look at the log, we see that the LHS of 
the GIMPLE statement (the ssa_name instance of z), is already in the 
state of 'marked_data'.
Through the log to the call to '/sm_context::on_transition/' this can be 
seen:
     marked_sm: state transition of ‘z_5’: marked_data -> marked_data

All of this step is somehow leading to confusing diagnostic.
For example, you miss the fact that 'z' is being marked, because no 
event happen as it is somehow aliasing 'x' svalue.
Though it might not be true in case of missed optimization.

Of course, if anyone wants to have a look at my code, I can provide it 
to you (it is not yet publicly available at the moment).
Also, let me know if there is any imprecise information.

Thank you for your time,

Pierrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Static Analyzer: correlate state for different region/svalue for reporting
  2023-03-15 15:24 Static Analyzer: correlate state for different region/svalue for reporting Pierrick Philippe
@ 2023-03-15 16:26 ` David Malcolm
  2023-03-16  8:54   ` Pierrick Philippe
  0 siblings, 1 reply; 5+ messages in thread
From: David Malcolm @ 2023-03-15 16:26 UTC (permalink / raw)
  To: Pierrick Philippe, gcc

On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
> Hi everyone,

Hi Pierrick

> 
> I have some question regarding the analyzer.
> I'm currently working on an fully out-of-tree static analyzer plugin.
> I started development on commit tagged as /basepoints/gcc-13/, but 
> recently moved my code to a more recent /trunk/ branch (build
> /20230309/).
> 
>  From my different experiments and analyzer's log readings, I have
> the 
> intuition that '/ana::state_machine::state/' class is mostly (if not 
> only) linked with '/ana::svalue/', 

Yes: a program_state (which describes the state at a particular node
along an execution path) contains (among other things) an sm_state_map
per state_machine, and the sm_state_map maps from svalues to
state_machine::states.

> i.e. the lvalue of a memory region.
> In my analysis case, I would like to also be able to track state for 
> some rvalue of a memory region, i.e. '/ana::region/' objects.

You're using the terms "lvalue" and "rvalue", but if I'm reading your
email right, you're using them in exactly the opposite way that I
understand them.

I think of an "lvalue" as something that can be on the left-hand side
of an assignment: something that can be written to (an ana::region),
whereas an "rvalue" is something that can be on the right-hand side of
an assignment: a pattern of bits (an ana:svalue) that be written into
an lvalue.

An ana::svalue is a pattern of bits (possibly symbolic, such as "the
constant 42" or "the initial value of param x")

An ana::region is a description of a memory location for reads/writes
(possibly symbolic, such as "local var x within this frame", or "the
block reached by dereferencing the initial value of ptr in the frame
above").

Sorry if I've misread things; I'll try to answer the rest of the email
as best as I can...

> So, first question: is there any way to associate and track the state
> of 
> a rvalue, independently of its lvalue?
> 
> To try to clarify the question, here's an example:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> /* STEP-1
>  From now on, we consider the state of x as being marked by
> some_attribute.
> But in fact, in the log, we can observe that we'll have something
> like 
> this in the new '/ana::program_state/':
> {0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: sm-state is associated within a program_state with an svalue, in
this case with the constant 42.

There isn't yet a way to instead give sm-state to "x" itself.  I
suppose you could give sm-state to &x (the pointer that points to x is
an instance of region_svalue, which is an svalue), but I haven't tried
this approach myself.

> 
> int *y = &x;
> /* STEP-2
> For analysis purpose, you want to know that from now on, y is
> pointing 
> to marked data.
> So you set state of the LHS of the GIMPLE statement (i.e. some
> ssa_name 
> instance of y) accordingly, with a state named 'points-
> to_marked_data'
> and setting 'x' as the origin of the state (in the sense of the
> argument 
> /origin/ from '/ana::sm_context::on_transition/'.
> What we now have in the new '/ana::program_state/' is this:
> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> (int)42 
> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */

Yes: you've set the state on the svalue "&x", not on "y".

> 
> int z = *y;
> /* STEP-3
> Now you have implicit copy of marked data, and you want to report it.
> So you state the LHS of the GIMPLE statement (i.e. some ssa_name 
> instance of z) as being marked, with 'y' as the origin.
> What we now have in the new '/ana::program_state/' is this:
> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> (int)42 
> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> '''

Presumably the program_state also shows that you have a binding for the
region "z", containing the svalue 42 (this is represented within the
"store", within the region_model within the program_state).

> 
> In STEP-2:
> 
> We lost the information saying that the rvalue of y (i.e. y), is 
> pointing to marked data.
> Only remain the information that its lvalue (i.e. &x), is pointing to
> marked data, which is of course true.

Note that the analyzer by default attempts to purge state that it
doesn't think will be needed anymore, so it may have purged the state
of "y" if y isn't going to be read from anymore.  You could try -fno-
analyzer-state-purge to avoid this purging.

> 
> In STEP-3:
> 
> No information is added regarding z in the new program_state.
> In fact, if one have a closer look at the log, we see that the LHS of
> the GIMPLE statement (the ssa_name instance of z), is already in the 
> state of 'marked_data'.
> Through the log to the call to '/sm_context::on_transition/' this can
> be 
> seen:
>      marked_sm: state transition of ‘z_5’: marked_data -> marked_data
> 
> All of this step is somehow leading to confusing diagnostic.
> For example, you miss the fact that 'z' is being marked, because no 
> event happen as it is somehow aliasing 'x' svalue.
> Though it might not be true in case of missed optimization.
> 
> Of course, if anyone wants to have a look at my code, I can provide
> it 
> to you (it is not yet publicly available at the moment).

I think I'd have to look at the code to be of more help; I confess that
I stopped understanding somewhere around step 3, sorry.

Are you able to post a simple example of the code you'd like to
analyze, with an idea of the behavior you'd like from the analyzer?
(e.g. the state transitions)

> Also, let me know if there is any imprecise information.
> 
> Thank you for your time,

Thanks; hope the above is helpful
Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Static Analyzer: correlate state for different region/svalue for reporting
  2023-03-15 16:26 ` David Malcolm
@ 2023-03-16  8:54   ` Pierrick Philippe
  2023-03-16 13:44     ` David Malcolm
  0 siblings, 1 reply; 5+ messages in thread
From: Pierrick Philippe @ 2023-03-16  8:54 UTC (permalink / raw)
  To: David Malcolm, gcc

On 15/03/2023 17:26, David Malcolm wrote:
> On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
>> Hi everyone,
> Hi Pierrick
>
>> I have some question regarding the analyzer.
>> I'm currently working on an fully out-of-tree static analyzer plugin.
>> I started development on commit tagged as /basepoints/gcc-13/, but
>> recently moved my code to a more recent /trunk/ branch (build
>> /20230309/).
>>
>>   From my different experiments and analyzer's log readings, I have
>> the
>> intuition that '/ana::state_machine::state/' class is mostly (if not
>> only) linked with '/ana::svalue/',
> Yes: a program_state (which describes the state at a particular node
> along an execution path) contains (among other things) an sm_state_map
> per state_machine, and the sm_state_map maps from svalues to
> state_machine::states.
>
>> i.e. the lvalue of a memory region.
>> In my analysis case, I would like to also be able to track state for
>> some rvalue of a memory region, i.e. '/ana::region/' objects.
> You're using the terms "lvalue" and "rvalue", but if I'm reading your
> email right, you're using them in exactly the opposite way that I
> understand them.
I apologize, you are completely right, I swapped their usage...
>
> I think of an "lvalue" as something that can be on the left-hand side
> of an assignment: something that can be written to (an ana::region),
> whereas an "rvalue" is something that can be on the right-hand side of
> an assignment: a pattern of bits (an ana:svalue) that be written into
> an lvalue.
>
> An ana::svalue is a pattern of bits (possibly symbolic, such as "the
> constant 42" or "the initial value of param x")
>
> An ana::region is a description of a memory location for reads/writes
> (possibly symbolic, such as "local var x within this frame", or "the
> block reached by dereferencing the initial value of ptr in the frame
> above").
>
> Sorry if I've misread things; I'll try to answer the rest of the email
> as best as I can...
>
>> So, first question: is there any way to associate and track the state
>> of
>> a rvalue, independently of its lvalue?
>>
>> To try to clarify the question, here's an example:
>>
>> '''
>> int __attribute__("__some_attribute__") x = 42;
>> /* STEP-1
>>   From now on, we consider the state of x as being marked by
>> some_attribute.
>> But in fact, in the log, we can observe that we'll have something
>> like
>> this in the new '/ana::program_state/':
>> {0x4b955b0: (int)42: marked_state (‘x’)} */
> Yes: sm-state is associated within a program_state with an svalue, in
> this case with the constant 42.
>
> There isn't yet a way to instead give sm-state to "x" itself.  I
> suppose you could give sm-state to &x (the pointer that points to x is
> an instance of region_svalue, which is an svalue), but I haven't tried
> this approach myself.
>
>> int *y = &x;
>> /* STEP-2
>> For analysis purpose, you want to know that from now on, y is
>> pointing
>> to marked data.
>> So you set state of the LHS of the GIMPLE statement (i.e. some
>> ssa_name
>> instance of y) accordingly, with a state named 'points-
>> to_marked_data'
>> and setting 'x' as the origin of the state (in the sense of the
>> argument
>> /origin/ from '/ana::sm_context::on_transition/'.
>> What we now have in the new '/ana::program_state/' is this:
>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>> (int)42
>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> Yes: you've set the state on the svalue "&x", not on "y".
>
>> int z = *y;
>> /* STEP-3
>> Now you have implicit copy of marked data, and you want to report it.
>> So you state the LHS of the GIMPLE statement (i.e. some ssa_name
>> instance of z) as being marked, with 'y' as the origin.
>> What we now have in the new '/ana::program_state/' is this:
>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>> (int)42
>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
>> '''
> Presumably the program_state also shows that you have a binding for the
> region "z", containing the svalue 42 (this is represented within the
> "store", within the region_model within the program_state).

Indeed, there is a binding for region "z" to the svalue 42 within the 
new program state.

>> In STEP-2:
>>
>> We lost the information saying that the rvalue of y (i.e. y), is
>> pointing to marked data.
>> Only remain the information that its lvalue (i.e. &x), is pointing to
>> marked data, which is of course true.
> Note that the analyzer by default attempts to purge state that it
> doesn't think will be needed anymore, so it may have purged the state
> of "y" if y isn't going to be read from anymore.  You could try -fno-
> analyzer-state-purge to avoid this purging.

Nothing changes when I run it with the -fno-anlyzer-state-purge,
it is still the state of &x which is tracked.

>> In STEP-3:
>>
>> No information is added regarding z in the new program_state.
>> In fact, if one have a closer look at the log, we see that the LHS of
>> the GIMPLE statement (the ssa_name instance of z), is already in the
>> state of 'marked_data'.
>> Through the log to the call to '/sm_context::on_transition/' this can
>> be
>> seen:
>>       marked_sm: state transition of ‘z_5’: marked_data -> marked_data
>>
>> All of this step is somehow leading to confusing diagnostic.
>> For example, you miss the fact that 'z' is being marked, because no
>> event happen as it is somehow aliasing 'x' svalue.
>> Though it might not be true in case of missed optimization.
>>
>> Of course, if anyone wants to have a look at my code, I can provide
>> it
>> to you (it is not yet publicly available at the moment).
> I think I'd have to look at the code to be of more help; I confess that
> I stopped understanding somewhere around step 3, sorry.
No worries, the idea is that regarding the state_map, z_3 is already 
considered as marked.
> Are you able to post a simple example of the code you'd like to
> analyze, with an idea of the behavior you'd like from the analyzer?
> (e.g. the state transitions)

Sure here is an exemple:

'''
int __attribute__("__some_attribute__") x = 42;
// Transition state for x: start -> marked
int *y = &x;
// Transition state for y: start -> points-to_marked
int z = *y;
// Transition state for z: start -> marked
if (z);
// Diagnostic is issued because of the use of a marked data in a 
condition, for analysis purpose.
'''

The issued diagnostic I do have in mind looks like:

'''
1 | int __attribute__("__some_attribute__") x = 42;
    | ^
    | |
    | (1) ‘x’ gets marked here
2 | int *y = &x;
    |       ~
    |        |
    |      (2) ‘y’ points-to marked data here
3 | int z = *y;
    |     ~
    |      |
    |    (3) ‘z’ gets marked here
4 | if (z);
    |     ~
    |      |
    |    (4) used of a marked value 'z' in a condition
'''

What I get from now on is this:

'''
1 |     int __attribute__((__taint__)) x = 42;
    |                                                 ^
    |                                                  |
    |                                                (1) ‘x’ gets 
tainted here
    |......
4 |     if(z) return 666;
    |        ~
    |         |
    |       (2) use of tainted value ‘z’ in a condition
'''

And for some reason, if I change the analyzed code to this:

'''
int __attribute__("__some_attribute__") x = 42;
int *y = &x;
x = 6; // New line
int z = *y;
if (z);
'''

This is what I get:

'''
2 |     int __attribute__((__taint__)) x = 42;
    |                                                 ^
    |                                                  |
    |                                                (1) ‘x’ gets marked 
here
3 |     int * y = &x;
    |            ~
    |             |
    |           (2) ‘&x’ points to tainted data here
4 |     x = 6;
5 |     int z = *y;
    |          ~
    |           |
    |         (3) ‘x’ gets tainted here
6 |     if(z) return 666;
    |        ~
    |         |
    |       (4) use of tainted value ‘z’ in a condition
'''

Sorry for the long mail here,
Let me know if you want to have a look at the exploded graph or at the 
analyzer's log.

Thank you for your time,

Pierrick


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Static Analyzer: correlate state for different region/svalue for reporting
  2023-03-16  8:54   ` Pierrick Philippe
@ 2023-03-16 13:44     ` David Malcolm
  2023-03-21 14:36       ` Pierrick Philippe
  0 siblings, 1 reply; 5+ messages in thread
From: David Malcolm @ 2023-03-16 13:44 UTC (permalink / raw)
  To: Pierrick Philippe, gcc

On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote:
> On 15/03/2023 17:26, David Malcolm wrote:
> > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:

[...snip...]

> > 
> > 
> > An ana::svalue is a pattern of bits (possibly symbolic, such as
> > "the
> > constant 42" or "the initial value of param x")
> > 
> > An ana::region is a description of a memory location for
> > reads/writes
> > (possibly symbolic, such as "local var x within this frame", or
> > "the
> > block reached by dereferencing the initial value of ptr in the
> > frame
> > above").
> > 
> > Sorry if I've misread things; I'll try to answer the rest of the
> > email
> > as best as I can...
> > 
> > > So, first question: is there any way to associate and track the
> > > state
> > > of
> > > a rvalue, independently of its lvalue?
> > > 
> > > To try to clarify the question, here's an example:
> > > 
> > > '''
> > > int __attribute__("__some_attribute__") x = 42;
> > > /* STEP-1
> > >   From now on, we consider the state of x as being marked by
> > > some_attribute.
> > > But in fact, in the log, we can observe that we'll have something
> > > like
> > > this in the new '/ana::program_state/':
> > > {0x4b955b0: (int)42: marked_state (‘x’)} */
> > Yes: sm-state is associated within a program_state with an svalue,
> > in
> > this case with the constant 42.
> > 
> > There isn't yet a way to instead give sm-state to "x" itself.  I
> > suppose you could give sm-state to &x (the pointer that points to x
> > is
> > an instance of region_svalue, which is an svalue), but I haven't
> > tried
> > this approach myself.
> > 
> > > int *y = &x;
> > > /* STEP-2
> > > For analysis purpose, you want to know that from now on, y is
> > > pointing
> > > to marked data.
> > > So you set state of the LHS of the GIMPLE statement (i.e. some
> > > ssa_name
> > > instance of y) accordingly, with a state named 'points-
> > > to_marked_data'
> > > and setting 'x' as the origin of the state (in the sense of the
> > > argument
> > > /origin/ from '/ana::sm_context::on_transition/'.
> > > What we now have in the new '/ana::program_state/' is this:
> > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> > > (int)42
> > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> > Yes: you've set the state on the svalue "&x", not on "y".
> > 
> > > int z = *y;
> > > /* STEP-3
> > > Now you have implicit copy of marked data, and you want to report
> > > it.
> > > So you state the LHS of the GIMPLE statement (i.e. some ssa_name
> > > instance of z) as being marked, with 'y' as the origin.
> > > What we now have in the new '/ana::program_state/' is this:
> > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
> > > (int)42
> > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> > > '''
> > Presumably the program_state also shows that you have a binding for
> > the
> > region "z", containing the svalue 42 (this is represented within
> > the
> > "store", within the region_model within the program_state).
> 
> Indeed, there is a binding for region "z" to the svalue 42 within the
> new program state.
> 
> > > In STEP-2:
> > > 
> > > We lost the information saying that the rvalue of y (i.e. y), is
> > > pointing to marked data.
> > > Only remain the information that its lvalue (i.e. &x), is
> > > pointing to
> > > marked data, which is of course true.
> > Note that the analyzer by default attempts to purge state that it
> > doesn't think will be needed anymore, so it may have purged the
> > state
> > of "y" if y isn't going to be read from anymore.  You could try -
> > fno-
> > analyzer-state-purge to avoid this purging.
> 
> Nothing changes when I run it with the -fno-anlyzer-state-purge,
> it is still the state of &x which is tracked.
> 
> > > In STEP-3:
> > > 
> > > No information is added regarding z in the new program_state.
> > > In fact, if one have a closer look at the log, we see that the
> > > LHS of
> > > the GIMPLE statement (the ssa_name instance of z), is already in
> > > the
> > > state of 'marked_data'.
> > > Through the log to the call to '/sm_context::on_transition/' this
> > > can
> > > be
> > > seen:
> > >       marked_sm: state transition of ‘z_5’: marked_data ->
> > > marked_data
> > > 
> > > All of this step is somehow leading to confusing diagnostic.
> > > For example, you miss the fact that 'z' is being marked, because
> > > no
> > > event happen as it is somehow aliasing 'x' svalue.
> > > Though it might not be true in case of missed optimization.
> > > 
> > > Of course, if anyone wants to have a look at my code, I can
> > > provide
> > > it
> > > to you (it is not yet publicly available at the moment).
> > I think I'd have to look at the code to be of more help; I confess
> > that
> > I stopped understanding somewhere around step 3, sorry.
> No worries, the idea is that regarding the state_map, z_3 is already 
> considered as marked.
> > Are you able to post a simple example of the code you'd like to
> > analyze, with an idea of the behavior you'd like from the analyzer?
> > (e.g. the state transitions)
> 
> Sure here is an exemple:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> // Transition state for x: start -> marked
> int *y = &x;
> // Transition state for y: start -> points-to_marked
> int z = *y;
> // Transition state for z: start -> marked
> if (z);
> // Diagnostic is issued because of the use of a marked data in a 
> condition, for analysis purpose.
> '''
> 
> The issued diagnostic I do have in mind looks like:
> 
> '''
> 1 | int __attribute__("__some_attribute__") x = 42;
>     | ^
>     | |
>     | (1) ‘x’ gets marked here
> 2 | int *y = &x;
>     |       ~
>     |        |
>     |      (2) ‘y’ points-to marked data here
> 3 | int z = *y;
>     |     ~
>     |      |
>     |    (3) ‘z’ gets marked here
> 4 | if (z);
>     |     ~
>     |      |
>     |    (4) used of a marked value 'z' in a condition
> '''
> 
> What I get from now on is this:
> 
> '''
> 1 |     int __attribute__((__taint__)) x = 42;
>     |                                                 ^
>     |                                                  |
>     |                                                (1) ‘x’ gets 
> tainted here
>     |......
> 4 |     if(z) return 666;
>     |        ~
>     |         |
>     |       (2) use of tainted value ‘z’ in a condition
> '''

If I'm reading things right, it looks like the analyzer is correctly
complaining that the marked svalue is being used in a condition, but
the problem is that the events in the diagnostic_path aren't very good:
it's missing events (2) and (3) from the "what you had in mind"
diagnostic.

If I'm right, then the issue here seems to be more about how we
generate the diagnostic_path (the list of event messages displayed for
the diagnostic) when we emit the saved_diagnostic, rather than about
the state-tracking.

Unfortunately, when I wrote that code I was primarily looking at state
machines for tracking allocation state (e.g. double-free), and it
doesn't work well with other kinds of state machine.  I've experimented
with other approaches that might work better.  I think ultimately we
want to be able to walk backwards from a saved_diagnostic and be able
to answer the question "why did it get into this state?", which might
involve following assignments backwards.  I have a semi-working
prototype of this, but it would be a lot of work to get ready for trunk
(it tries to record a set of reversible "change" objects, so that we
can walk backwards from the diagnostic, undoing changes to see where a
value came from).

> 
> And for some reason, if I change the analyzed code to this:
> 
> '''
> int __attribute__("__some_attribute__") x = 42;
> int *y = &x;
> x = 6; // New line
> int z = *y;
> if (z);
> '''
> 
> This is what I get:
> 
> '''
> 2 |     int __attribute__((__taint__)) x = 42;
>     |                                                 ^
>     |                                                  |
>     |                                                (1) ‘x’ gets
> marked 
> here
> 3 |     int * y = &x;
>     |            ~
>     |             |
>     |           (2) ‘&x’ points to tainted data here
> 4 |     x = 6;
> 5 |     int z = *y;
>     |          ~
>     |           |
>     |         (3) ‘x’ gets tainted here
> 6 |     if(z) return 666;
>     |        ~
>     |         |
>     |       (4) use of tainted value ‘z’ in a condition

That event (3) looks bogus; looks like a bug in how we generate events
(the existing version of the code I was talking about above).

> '''
> 
> Sorry for the long mail here,

No worries; sorry about the bugs in the analyzer :/

> Let me know if you want to have a look at the exploded graph or at
> the 
> analyzer's log.

Does the above help?

Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Static Analyzer: correlate state for different region/svalue for reporting
  2023-03-16 13:44     ` David Malcolm
@ 2023-03-21 14:36       ` Pierrick Philippe
  0 siblings, 0 replies; 5+ messages in thread
From: Pierrick Philippe @ 2023-03-21 14:36 UTC (permalink / raw)
  To: David Malcolm, gcc

[-- Attachment #1: Type: text/plain, Size: 3659 bytes --]

On 16/03/2023 14:44, David Malcolm wrote:
> On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote:
>> On 15/03/2023 17:26, David Malcolm wrote:
>>> On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
[stripping]
>>>> So, first question: is there any way to associate and track the
>>>> state
>>>> of
>>>> a rvalue, independently of its lvalue?
>>>>
>>>> To try to clarify the question, here's an example:
>>>>
>>>> '''
>>>> int __attribute__("__some_attribute__") x = 42;
>>>> /* STEP-1
>>>>    From now on, we consider the state of x as being marked by
>>>> some_attribute.
>>>> But in fact, in the log, we can observe that we'll have something
>>>> like
>>>> this in the new '/ana::program_state/':
>>>> {0x4b955b0: (int)42: marked_state (‘x’)} */
[stripping]
>>>> int *y = &x;
>>>> /* STEP-2
>>>> For analysis purpose, you want to know that from now on, y is
>>>> pointing
>>>> to marked data.
>>>> So you set state of the LHS of the GIMPLE statement (i.e. some
>>>> ssa_name
>>>> instance of y) accordingly, with a state named 'points-
>>>> to_marked_data'
>>>> and setting 'x' as the origin of the state (in the sense of the
>>>> argument
>>>> /origin/ from '/ana::sm_context::on_transition/'.
>>>> What we now have in the new '/ana::program_state/' is this:
>>>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>>>> (int)42
>>>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
>>> Yes: you've set the state on the svalue "&x", not on "y".
>>>
>>>> int z = *y;
>>>> /* STEP-3
>>>> Now you have implicit copy of marked data, and you want to report
>>>> it.
>>>> So you state the LHS of the GIMPLE statement (i.e. some ssa_name
>>>> instance of z) as being marked, with 'y' as the origin.
>>>> What we now have in the new '/ana::program_state/' is this:
>>>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>>>> (int)42
>>>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
>>>> '''
>>> Presumably the program_state also shows that you have a binding for
>>> the
>>> region "z", containing the svalue 42 (this is represented within
>>> the
>>> "store", within the region_model within the program_state).

[stripping]

This is an update about tracking state of svalues instead of region for 
other kind of variables than pointers.

If you consider the following code:

'''
int __attribute__((__taint__)) x = 42;
/* Program state:
{0x4b955b0: (int)42: marked_state (‘x’)}
*/
int y = 42;
// Program state unchanged
if (y);
/* When querying the sm_context about the state of y,
it returns it as being in a "marked_state", because its svalue is the 
same as x's one.
Even though no call to change y's state has been made.
And here it triggers a diagnostic for my analysis.
*/
'''

I understand way better now the internals of the analyzer regarding the 
state's tracking.
I do completely understand now, why you've said you've mainly designed 
it for pointers, because this allow you to avoid to do some points-to 
analysis, by associating state with pointer's svalues instead of 
pointer's region.

But as you can see in the above code example, it has its drawback for 
analyzing variables with a different semantics, such as integer type 
variable.

I will have to modified the analyzer's code to add a way for state 
machine to ask the analyzer to track region's state instead of svalue's 
state to be able to keep using it with my analysis plugin.

Do you think it would be interesting having such features merged within 
the analyzer?
In any case, I'll start to work on it over the last /trunk/ branch, 
within an appropriate branch.

Thank you for your time,

Pierrick

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-21 14:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15 15:24 Static Analyzer: correlate state for different region/svalue for reporting Pierrick Philippe
2023-03-15 16:26 ` David Malcolm
2023-03-16  8:54   ` Pierrick Philippe
2023-03-16 13:44     ` David Malcolm
2023-03-21 14:36       ` Pierrick Philippe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).