From: Pierrick Philippe <pierrick.philippe@irisa.fr>
To: David Malcolm <dmalcolm@redhat.com>, gcc@gcc.gnu.org
Subject: Re: Static Analyzer: correlate state for different region/svalue for reporting
Date: Thu, 16 Mar 2023 09:54:09 +0100 [thread overview]
Message-ID: <ff85e786-4401-5735-9f4d-8b1f7bd3601a@irisa.fr> (raw)
In-Reply-To: <98024bfba94a9dc306475fd2753bdcb55961dada.camel@redhat.com>
On 15/03/2023 17:26, David Malcolm wrote:
> On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote:
>> Hi everyone,
> Hi Pierrick
>
>> I have some question regarding the analyzer.
>> I'm currently working on an fully out-of-tree static analyzer plugin.
>> I started development on commit tagged as /basepoints/gcc-13/, but
>> recently moved my code to a more recent /trunk/ branch (build
>> /20230309/).
>>
>> From my different experiments and analyzer's log readings, I have
>> the
>> intuition that '/ana::state_machine::state/' class is mostly (if not
>> only) linked with '/ana::svalue/',
> Yes: a program_state (which describes the state at a particular node
> along an execution path) contains (among other things) an sm_state_map
> per state_machine, and the sm_state_map maps from svalues to
> state_machine::states.
>
>> i.e. the lvalue of a memory region.
>> In my analysis case, I would like to also be able to track state for
>> some rvalue of a memory region, i.e. '/ana::region/' objects.
> You're using the terms "lvalue" and "rvalue", but if I'm reading your
> email right, you're using them in exactly the opposite way that I
> understand them.
I apologize, you are completely right, I swapped their usage...
>
> I think of an "lvalue" as something that can be on the left-hand side
> of an assignment: something that can be written to (an ana::region),
> whereas an "rvalue" is something that can be on the right-hand side of
> an assignment: a pattern of bits (an ana:svalue) that be written into
> an lvalue.
>
> An ana::svalue is a pattern of bits (possibly symbolic, such as "the
> constant 42" or "the initial value of param x")
>
> An ana::region is a description of a memory location for reads/writes
> (possibly symbolic, such as "local var x within this frame", or "the
> block reached by dereferencing the initial value of ptr in the frame
> above").
>
> Sorry if I've misread things; I'll try to answer the rest of the email
> as best as I can...
>
>> So, first question: is there any way to associate and track the state
>> of
>> a rvalue, independently of its lvalue?
>>
>> To try to clarify the question, here's an example:
>>
>> '''
>> int __attribute__("__some_attribute__") x = 42;
>> /* STEP-1
>> From now on, we consider the state of x as being marked by
>> some_attribute.
>> But in fact, in the log, we can observe that we'll have something
>> like
>> this in the new '/ana::program_state/':
>> {0x4b955b0: (int)42: marked_state (‘x’)} */
> Yes: sm-state is associated within a program_state with an svalue, in
> this case with the constant 42.
>
> There isn't yet a way to instead give sm-state to "x" itself. I
> suppose you could give sm-state to &x (the pointer that points to x is
> an instance of region_svalue, which is an svalue), but I haven't tried
> this approach myself.
>
>> int *y = &x;
>> /* STEP-2
>> For analysis purpose, you want to know that from now on, y is
>> pointing
>> to marked data.
>> So you set state of the LHS of the GIMPLE statement (i.e. some
>> ssa_name
>> instance of y) accordingly, with a state named 'points-
>> to_marked_data'
>> and setting 'x' as the origin of the state (in the sense of the
>> argument
>> /origin/ from '/ana::sm_context::on_transition/'.
>> What we now have in the new '/ana::program_state/' is this:
>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>> (int)42
>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
> Yes: you've set the state on the svalue "&x", not on "y".
>
>> int z = *y;
>> /* STEP-3
>> Now you have implicit copy of marked data, and you want to report it.
>> So you state the LHS of the GIMPLE statement (i.e. some ssa_name
>> instance of z) as being marked, with 'y' as the origin.
>> What we now have in the new '/ana::program_state/' is this:
>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0:
>> (int)42
>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */
>> '''
> Presumably the program_state also shows that you have a binding for the
> region "z", containing the svalue 42 (this is represented within the
> "store", within the region_model within the program_state).
Indeed, there is a binding for region "z" to the svalue 42 within the
new program state.
>> In STEP-2:
>>
>> We lost the information saying that the rvalue of y (i.e. y), is
>> pointing to marked data.
>> Only remain the information that its lvalue (i.e. &x), is pointing to
>> marked data, which is of course true.
> Note that the analyzer by default attempts to purge state that it
> doesn't think will be needed anymore, so it may have purged the state
> of "y" if y isn't going to be read from anymore. You could try -fno-
> analyzer-state-purge to avoid this purging.
Nothing changes when I run it with the -fno-anlyzer-state-purge,
it is still the state of &x which is tracked.
>> In STEP-3:
>>
>> No information is added regarding z in the new program_state.
>> In fact, if one have a closer look at the log, we see that the LHS of
>> the GIMPLE statement (the ssa_name instance of z), is already in the
>> state of 'marked_data'.
>> Through the log to the call to '/sm_context::on_transition/' this can
>> be
>> seen:
>> marked_sm: state transition of ‘z_5’: marked_data -> marked_data
>>
>> All of this step is somehow leading to confusing diagnostic.
>> For example, you miss the fact that 'z' is being marked, because no
>> event happen as it is somehow aliasing 'x' svalue.
>> Though it might not be true in case of missed optimization.
>>
>> Of course, if anyone wants to have a look at my code, I can provide
>> it
>> to you (it is not yet publicly available at the moment).
> I think I'd have to look at the code to be of more help; I confess that
> I stopped understanding somewhere around step 3, sorry.
No worries, the idea is that regarding the state_map, z_3 is already
considered as marked.
> Are you able to post a simple example of the code you'd like to
> analyze, with an idea of the behavior you'd like from the analyzer?
> (e.g. the state transitions)
Sure here is an exemple:
'''
int __attribute__("__some_attribute__") x = 42;
// Transition state for x: start -> marked
int *y = &x;
// Transition state for y: start -> points-to_marked
int z = *y;
// Transition state for z: start -> marked
if (z);
// Diagnostic is issued because of the use of a marked data in a
condition, for analysis purpose.
'''
The issued diagnostic I do have in mind looks like:
'''
1 | int __attribute__("__some_attribute__") x = 42;
| ^
| |
| (1) ‘x’ gets marked here
2 | int *y = &x;
| ~
| |
| (2) ‘y’ points-to marked data here
3 | int z = *y;
| ~
| |
| (3) ‘z’ gets marked here
4 | if (z);
| ~
| |
| (4) used of a marked value 'z' in a condition
'''
What I get from now on is this:
'''
1 | int __attribute__((__taint__)) x = 42;
| ^
| |
| (1) ‘x’ gets
tainted here
|......
4 | if(z) return 666;
| ~
| |
| (2) use of tainted value ‘z’ in a condition
'''
And for some reason, if I change the analyzed code to this:
'''
int __attribute__("__some_attribute__") x = 42;
int *y = &x;
x = 6; // New line
int z = *y;
if (z);
'''
This is what I get:
'''
2 | int __attribute__((__taint__)) x = 42;
| ^
| |
| (1) ‘x’ gets marked
here
3 | int * y = &x;
| ~
| |
| (2) ‘&x’ points to tainted data here
4 | x = 6;
5 | int z = *y;
| ~
| |
| (3) ‘x’ gets tainted here
6 | if(z) return 666;
| ~
| |
| (4) use of tainted value ‘z’ in a condition
'''
Sorry for the long mail here,
Let me know if you want to have a look at the exploded graph or at the
analyzer's log.
Thank you for your time,
Pierrick
next prev parent reply other threads:[~2023-03-16 8:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-15 15:24 Pierrick Philippe
2023-03-15 16:26 ` David Malcolm
2023-03-16 8:54 ` Pierrick Philippe [this message]
2023-03-16 13:44 ` David Malcolm
2023-03-21 14:36 ` Pierrick Philippe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff85e786-4401-5735-9f4d-8b1f7bd3601a@irisa.fr \
--to=pierrick.philippe@irisa.fr \
--cc=dmalcolm@redhat.com \
--cc=gcc@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).