* Static Analyzer: correlate state for different region/svalue for reporting @ 2023-03-15 15:24 Pierrick Philippe 2023-03-15 16:26 ` David Malcolm 0 siblings, 1 reply; 5+ messages in thread From: Pierrick Philippe @ 2023-03-15 15:24 UTC (permalink / raw) To: gcc [-- Attachment #1: Type: text/plain, Size: 3150 bytes --] Hi everyone, I have some question regarding the analyzer. I'm currently working on an fully out-of-tree static analyzer plugin. I started development on commit tagged as /basepoints/gcc-13/, but recently moved my code to a more recent /trunk/ branch (build /20230309/). From my different experiments and analyzer's log readings, I have the intuition that '/ana::state_machine::state/' class is mostly (if not only) linked with '/ana::svalue/', i.e. the lvalue of a memory region. In my analysis case, I would like to also be able to track state for some rvalue of a memory region, i.e. '/ana::region/' objects. So, first question: is there any way to associate and track the state of a rvalue, independently of its lvalue? To try to clarify the question, here's an example: ''' int __attribute__("__some_attribute__") x = 42; /* STEP-1 From now on, we consider the state of x as being marked by some_attribute. But in fact, in the log, we can observe that we'll have something like this in the new '/ana::program_state/': {0x4b955b0: (int)42: marked_state (‘x’)} */ int *y = &x; /* STEP-2 For analysis purpose, you want to know that from now on, y is pointing to marked data. So you set state of the LHS of the GIMPLE statement (i.e. some ssa_name instance of y) accordingly, with a state named 'points-to_marked_data' and setting 'x' as the origin of the state (in the sense of the argument /origin/ from '/ana::sm_context::on_transition/'. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ int z = *y; /* STEP-3 Now you have implicit copy of marked data, and you want to report it. So you state the LHS of the GIMPLE statement (i.e. some ssa_name instance of z) as being marked, with 'y' as the origin. What we now have in the new '/ana::program_state/' is this: {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: (int)42 (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ ''' In STEP-2: We lost the information saying that the rvalue of y (i.e. y), is pointing to marked data. Only remain the information that its lvalue (i.e. &x), is pointing to marked data, which is of course true. In STEP-3: No information is added regarding z in the new program_state. In fact, if one have a closer look at the log, we see that the LHS of the GIMPLE statement (the ssa_name instance of z), is already in the state of 'marked_data'. Through the log to the call to '/sm_context::on_transition/' this can be seen: marked_sm: state transition of ‘z_5’: marked_data -> marked_data All of this step is somehow leading to confusing diagnostic. For example, you miss the fact that 'z' is being marked, because no event happen as it is somehow aliasing 'x' svalue. Though it might not be true in case of missed optimization. Of course, if anyone wants to have a look at my code, I can provide it to you (it is not yet publicly available at the moment). Also, let me know if there is any imprecise information. Thank you for your time, Pierrick ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Static Analyzer: correlate state for different region/svalue for reporting 2023-03-15 15:24 Static Analyzer: correlate state for different region/svalue for reporting Pierrick Philippe @ 2023-03-15 16:26 ` David Malcolm 2023-03-16 8:54 ` Pierrick Philippe 0 siblings, 1 reply; 5+ messages in thread From: David Malcolm @ 2023-03-15 16:26 UTC (permalink / raw) To: Pierrick Philippe, gcc On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: > Hi everyone, Hi Pierrick > > I have some question regarding the analyzer. > I'm currently working on an fully out-of-tree static analyzer plugin. > I started development on commit tagged as /basepoints/gcc-13/, but > recently moved my code to a more recent /trunk/ branch (build > /20230309/). > > From my different experiments and analyzer's log readings, I have > the > intuition that '/ana::state_machine::state/' class is mostly (if not > only) linked with '/ana::svalue/', Yes: a program_state (which describes the state at a particular node along an execution path) contains (among other things) an sm_state_map per state_machine, and the sm_state_map maps from svalues to state_machine::states. > i.e. the lvalue of a memory region. > In my analysis case, I would like to also be able to track state for > some rvalue of a memory region, i.e. '/ana::region/' objects. You're using the terms "lvalue" and "rvalue", but if I'm reading your email right, you're using them in exactly the opposite way that I understand them. I think of an "lvalue" as something that can be on the left-hand side of an assignment: something that can be written to (an ana::region), whereas an "rvalue" is something that can be on the right-hand side of an assignment: a pattern of bits (an ana:svalue) that be written into an lvalue. An ana::svalue is a pattern of bits (possibly symbolic, such as "the constant 42" or "the initial value of param x") An ana::region is a description of a memory location for reads/writes (possibly symbolic, such as "local var x within this frame", or "the block reached by dereferencing the initial value of ptr in the frame above"). Sorry if I've misread things; I'll try to answer the rest of the email as best as I can... > So, first question: is there any way to associate and track the state > of > a rvalue, independently of its lvalue? > > To try to clarify the question, here's an example: > > ''' > int __attribute__("__some_attribute__") x = 42; > /* STEP-1 > From now on, we consider the state of x as being marked by > some_attribute. > But in fact, in the log, we can observe that we'll have something > like > this in the new '/ana::program_state/': > {0x4b955b0: (int)42: marked_state (‘x’)} */ Yes: sm-state is associated within a program_state with an svalue, in this case with the constant 42. There isn't yet a way to instead give sm-state to "x" itself. I suppose you could give sm-state to &x (the pointer that points to x is an instance of region_svalue, which is an svalue), but I haven't tried this approach myself. > > int *y = &x; > /* STEP-2 > For analysis purpose, you want to know that from now on, y is > pointing > to marked data. > So you set state of the LHS of the GIMPLE statement (i.e. some > ssa_name > instance of y) accordingly, with a state named 'points- > to_marked_data' > and setting 'x' as the origin of the state (in the sense of the > argument > /origin/ from '/ana::sm_context::on_transition/'. > What we now have in the new '/ana::program_state/' is this: > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > (int)42 > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ Yes: you've set the state on the svalue "&x", not on "y". > > int z = *y; > /* STEP-3 > Now you have implicit copy of marked data, and you want to report it. > So you state the LHS of the GIMPLE statement (i.e. some ssa_name > instance of z) as being marked, with 'y' as the origin. > What we now have in the new '/ana::program_state/' is this: > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > (int)42 > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > ''' Presumably the program_state also shows that you have a binding for the region "z", containing the svalue 42 (this is represented within the "store", within the region_model within the program_state). > > In STEP-2: > > We lost the information saying that the rvalue of y (i.e. y), is > pointing to marked data. > Only remain the information that its lvalue (i.e. &x), is pointing to > marked data, which is of course true. Note that the analyzer by default attempts to purge state that it doesn't think will be needed anymore, so it may have purged the state of "y" if y isn't going to be read from anymore. You could try -fno- analyzer-state-purge to avoid this purging. > > In STEP-3: > > No information is added regarding z in the new program_state. > In fact, if one have a closer look at the log, we see that the LHS of > the GIMPLE statement (the ssa_name instance of z), is already in the > state of 'marked_data'. > Through the log to the call to '/sm_context::on_transition/' this can > be > seen: > marked_sm: state transition of ‘z_5’: marked_data -> marked_data > > All of this step is somehow leading to confusing diagnostic. > For example, you miss the fact that 'z' is being marked, because no > event happen as it is somehow aliasing 'x' svalue. > Though it might not be true in case of missed optimization. > > Of course, if anyone wants to have a look at my code, I can provide > it > to you (it is not yet publicly available at the moment). I think I'd have to look at the code to be of more help; I confess that I stopped understanding somewhere around step 3, sorry. Are you able to post a simple example of the code you'd like to analyze, with an idea of the behavior you'd like from the analyzer? (e.g. the state transitions) > Also, let me know if there is any imprecise information. > > Thank you for your time, Thanks; hope the above is helpful Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Static Analyzer: correlate state for different region/svalue for reporting 2023-03-15 16:26 ` David Malcolm @ 2023-03-16 8:54 ` Pierrick Philippe 2023-03-16 13:44 ` David Malcolm 0 siblings, 1 reply; 5+ messages in thread From: Pierrick Philippe @ 2023-03-16 8:54 UTC (permalink / raw) To: David Malcolm, gcc On 15/03/2023 17:26, David Malcolm wrote: > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: >> Hi everyone, > Hi Pierrick > >> I have some question regarding the analyzer. >> I'm currently working on an fully out-of-tree static analyzer plugin. >> I started development on commit tagged as /basepoints/gcc-13/, but >> recently moved my code to a more recent /trunk/ branch (build >> /20230309/). >> >> From my different experiments and analyzer's log readings, I have >> the >> intuition that '/ana::state_machine::state/' class is mostly (if not >> only) linked with '/ana::svalue/', > Yes: a program_state (which describes the state at a particular node > along an execution path) contains (among other things) an sm_state_map > per state_machine, and the sm_state_map maps from svalues to > state_machine::states. > >> i.e. the lvalue of a memory region. >> In my analysis case, I would like to also be able to track state for >> some rvalue of a memory region, i.e. '/ana::region/' objects. > You're using the terms "lvalue" and "rvalue", but if I'm reading your > email right, you're using them in exactly the opposite way that I > understand them. I apologize, you are completely right, I swapped their usage... > > I think of an "lvalue" as something that can be on the left-hand side > of an assignment: something that can be written to (an ana::region), > whereas an "rvalue" is something that can be on the right-hand side of > an assignment: a pattern of bits (an ana:svalue) that be written into > an lvalue. > > An ana::svalue is a pattern of bits (possibly symbolic, such as "the > constant 42" or "the initial value of param x") > > An ana::region is a description of a memory location for reads/writes > (possibly symbolic, such as "local var x within this frame", or "the > block reached by dereferencing the initial value of ptr in the frame > above"). > > Sorry if I've misread things; I'll try to answer the rest of the email > as best as I can... > >> So, first question: is there any way to associate and track the state >> of >> a rvalue, independently of its lvalue? >> >> To try to clarify the question, here's an example: >> >> ''' >> int __attribute__("__some_attribute__") x = 42; >> /* STEP-1 >> From now on, we consider the state of x as being marked by >> some_attribute. >> But in fact, in the log, we can observe that we'll have something >> like >> this in the new '/ana::program_state/': >> {0x4b955b0: (int)42: marked_state (‘x’)} */ > Yes: sm-state is associated within a program_state with an svalue, in > this case with the constant 42. > > There isn't yet a way to instead give sm-state to "x" itself. I > suppose you could give sm-state to &x (the pointer that points to x is > an instance of region_svalue, which is an svalue), but I haven't tried > this approach myself. > >> int *y = &x; >> /* STEP-2 >> For analysis purpose, you want to know that from now on, y is >> pointing >> to marked data. >> So you set state of the LHS of the GIMPLE statement (i.e. some >> ssa_name >> instance of y) accordingly, with a state named 'points- >> to_marked_data' >> and setting 'x' as the origin of the state (in the sense of the >> argument >> /origin/ from '/ana::sm_context::on_transition/'. >> What we now have in the new '/ana::program_state/' is this: >> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >> (int)42 >> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > Yes: you've set the state on the svalue "&x", not on "y". > >> int z = *y; >> /* STEP-3 >> Now you have implicit copy of marked data, and you want to report it. >> So you state the LHS of the GIMPLE statement (i.e. some ssa_name >> instance of z) as being marked, with 'y' as the origin. >> What we now have in the new '/ana::program_state/' is this: >> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >> (int)42 >> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ >> ''' > Presumably the program_state also shows that you have a binding for the > region "z", containing the svalue 42 (this is represented within the > "store", within the region_model within the program_state). Indeed, there is a binding for region "z" to the svalue 42 within the new program state. >> In STEP-2: >> >> We lost the information saying that the rvalue of y (i.e. y), is >> pointing to marked data. >> Only remain the information that its lvalue (i.e. &x), is pointing to >> marked data, which is of course true. > Note that the analyzer by default attempts to purge state that it > doesn't think will be needed anymore, so it may have purged the state > of "y" if y isn't going to be read from anymore. You could try -fno- > analyzer-state-purge to avoid this purging. Nothing changes when I run it with the -fno-anlyzer-state-purge, it is still the state of &x which is tracked. >> In STEP-3: >> >> No information is added regarding z in the new program_state. >> In fact, if one have a closer look at the log, we see that the LHS of >> the GIMPLE statement (the ssa_name instance of z), is already in the >> state of 'marked_data'. >> Through the log to the call to '/sm_context::on_transition/' this can >> be >> seen: >> marked_sm: state transition of ‘z_5’: marked_data -> marked_data >> >> All of this step is somehow leading to confusing diagnostic. >> For example, you miss the fact that 'z' is being marked, because no >> event happen as it is somehow aliasing 'x' svalue. >> Though it might not be true in case of missed optimization. >> >> Of course, if anyone wants to have a look at my code, I can provide >> it >> to you (it is not yet publicly available at the moment). > I think I'd have to look at the code to be of more help; I confess that > I stopped understanding somewhere around step 3, sorry. No worries, the idea is that regarding the state_map, z_3 is already considered as marked. > Are you able to post a simple example of the code you'd like to > analyze, with an idea of the behavior you'd like from the analyzer? > (e.g. the state transitions) Sure here is an exemple: ''' int __attribute__("__some_attribute__") x = 42; // Transition state for x: start -> marked int *y = &x; // Transition state for y: start -> points-to_marked int z = *y; // Transition state for z: start -> marked if (z); // Diagnostic is issued because of the use of a marked data in a condition, for analysis purpose. ''' The issued diagnostic I do have in mind looks like: ''' 1 | int __attribute__("__some_attribute__") x = 42; | ^ | | | (1) ‘x’ gets marked here 2 | int *y = &x; | ~ | | | (2) ‘y’ points-to marked data here 3 | int z = *y; | ~ | | | (3) ‘z’ gets marked here 4 | if (z); | ~ | | | (4) used of a marked value 'z' in a condition ''' What I get from now on is this: ''' 1 | int __attribute__((__taint__)) x = 42; | ^ | | | (1) ‘x’ gets tainted here |...... 4 | if(z) return 666; | ~ | | | (2) use of tainted value ‘z’ in a condition ''' And for some reason, if I change the analyzed code to this: ''' int __attribute__("__some_attribute__") x = 42; int *y = &x; x = 6; // New line int z = *y; if (z); ''' This is what I get: ''' 2 | int __attribute__((__taint__)) x = 42; | ^ | | | (1) ‘x’ gets marked here 3 | int * y = &x; | ~ | | | (2) ‘&x’ points to tainted data here 4 | x = 6; 5 | int z = *y; | ~ | | | (3) ‘x’ gets tainted here 6 | if(z) return 666; | ~ | | | (4) use of tainted value ‘z’ in a condition ''' Sorry for the long mail here, Let me know if you want to have a look at the exploded graph or at the analyzer's log. Thank you for your time, Pierrick ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Static Analyzer: correlate state for different region/svalue for reporting 2023-03-16 8:54 ` Pierrick Philippe @ 2023-03-16 13:44 ` David Malcolm 2023-03-21 14:36 ` Pierrick Philippe 0 siblings, 1 reply; 5+ messages in thread From: David Malcolm @ 2023-03-16 13:44 UTC (permalink / raw) To: Pierrick Philippe, gcc On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote: > On 15/03/2023 17:26, David Malcolm wrote: > > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: [...snip...] > > > > > > An ana::svalue is a pattern of bits (possibly symbolic, such as > > "the > > constant 42" or "the initial value of param x") > > > > An ana::region is a description of a memory location for > > reads/writes > > (possibly symbolic, such as "local var x within this frame", or > > "the > > block reached by dereferencing the initial value of ptr in the > > frame > > above"). > > > > Sorry if I've misread things; I'll try to answer the rest of the > > email > > as best as I can... > > > > > So, first question: is there any way to associate and track the > > > state > > > of > > > a rvalue, independently of its lvalue? > > > > > > To try to clarify the question, here's an example: > > > > > > ''' > > > int __attribute__("__some_attribute__") x = 42; > > > /* STEP-1 > > > From now on, we consider the state of x as being marked by > > > some_attribute. > > > But in fact, in the log, we can observe that we'll have something > > > like > > > this in the new '/ana::program_state/': > > > {0x4b955b0: (int)42: marked_state (‘x’)} */ > > Yes: sm-state is associated within a program_state with an svalue, > > in > > this case with the constant 42. > > > > There isn't yet a way to instead give sm-state to "x" itself. I > > suppose you could give sm-state to &x (the pointer that points to x > > is > > an instance of region_svalue, which is an svalue), but I haven't > > tried > > this approach myself. > > > > > int *y = &x; > > > /* STEP-2 > > > For analysis purpose, you want to know that from now on, y is > > > pointing > > > to marked data. > > > So you set state of the LHS of the GIMPLE statement (i.e. some > > > ssa_name > > > instance of y) accordingly, with a state named 'points- > > > to_marked_data' > > > and setting 'x' as the origin of the state (in the sense of the > > > argument > > > /origin/ from '/ana::sm_context::on_transition/'. > > > What we now have in the new '/ana::program_state/' is this: > > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > > > (int)42 > > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > > Yes: you've set the state on the svalue "&x", not on "y". > > > > > int z = *y; > > > /* STEP-3 > > > Now you have implicit copy of marked data, and you want to report > > > it. > > > So you state the LHS of the GIMPLE statement (i.e. some ssa_name > > > instance of z) as being marked, with 'y' as the origin. > > > What we now have in the new '/ana::program_state/' is this: > > > {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: > > > (int)42 > > > (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > > > ''' > > Presumably the program_state also shows that you have a binding for > > the > > region "z", containing the svalue 42 (this is represented within > > the > > "store", within the region_model within the program_state). > > Indeed, there is a binding for region "z" to the svalue 42 within the > new program state. > > > > In STEP-2: > > > > > > We lost the information saying that the rvalue of y (i.e. y), is > > > pointing to marked data. > > > Only remain the information that its lvalue (i.e. &x), is > > > pointing to > > > marked data, which is of course true. > > Note that the analyzer by default attempts to purge state that it > > doesn't think will be needed anymore, so it may have purged the > > state > > of "y" if y isn't going to be read from anymore. You could try - > > fno- > > analyzer-state-purge to avoid this purging. > > Nothing changes when I run it with the -fno-anlyzer-state-purge, > it is still the state of &x which is tracked. > > > > In STEP-3: > > > > > > No information is added regarding z in the new program_state. > > > In fact, if one have a closer look at the log, we see that the > > > LHS of > > > the GIMPLE statement (the ssa_name instance of z), is already in > > > the > > > state of 'marked_data'. > > > Through the log to the call to '/sm_context::on_transition/' this > > > can > > > be > > > seen: > > > marked_sm: state transition of ‘z_5’: marked_data -> > > > marked_data > > > > > > All of this step is somehow leading to confusing diagnostic. > > > For example, you miss the fact that 'z' is being marked, because > > > no > > > event happen as it is somehow aliasing 'x' svalue. > > > Though it might not be true in case of missed optimization. > > > > > > Of course, if anyone wants to have a look at my code, I can > > > provide > > > it > > > to you (it is not yet publicly available at the moment). > > I think I'd have to look at the code to be of more help; I confess > > that > > I stopped understanding somewhere around step 3, sorry. > No worries, the idea is that regarding the state_map, z_3 is already > considered as marked. > > Are you able to post a simple example of the code you'd like to > > analyze, with an idea of the behavior you'd like from the analyzer? > > (e.g. the state transitions) > > Sure here is an exemple: > > ''' > int __attribute__("__some_attribute__") x = 42; > // Transition state for x: start -> marked > int *y = &x; > // Transition state for y: start -> points-to_marked > int z = *y; > // Transition state for z: start -> marked > if (z); > // Diagnostic is issued because of the use of a marked data in a > condition, for analysis purpose. > ''' > > The issued diagnostic I do have in mind looks like: > > ''' > 1 | int __attribute__("__some_attribute__") x = 42; > | ^ > | | > | (1) ‘x’ gets marked here > 2 | int *y = &x; > | ~ > | | > | (2) ‘y’ points-to marked data here > 3 | int z = *y; > | ~ > | | > | (3) ‘z’ gets marked here > 4 | if (z); > | ~ > | | > | (4) used of a marked value 'z' in a condition > ''' > > What I get from now on is this: > > ''' > 1 | int __attribute__((__taint__)) x = 42; > | ^ > | | > | (1) ‘x’ gets > tainted here > |...... > 4 | if(z) return 666; > | ~ > | | > | (2) use of tainted value ‘z’ in a condition > ''' If I'm reading things right, it looks like the analyzer is correctly complaining that the marked svalue is being used in a condition, but the problem is that the events in the diagnostic_path aren't very good: it's missing events (2) and (3) from the "what you had in mind" diagnostic. If I'm right, then the issue here seems to be more about how we generate the diagnostic_path (the list of event messages displayed for the diagnostic) when we emit the saved_diagnostic, rather than about the state-tracking. Unfortunately, when I wrote that code I was primarily looking at state machines for tracking allocation state (e.g. double-free), and it doesn't work well with other kinds of state machine. I've experimented with other approaches that might work better. I think ultimately we want to be able to walk backwards from a saved_diagnostic and be able to answer the question "why did it get into this state?", which might involve following assignments backwards. I have a semi-working prototype of this, but it would be a lot of work to get ready for trunk (it tries to record a set of reversible "change" objects, so that we can walk backwards from the diagnostic, undoing changes to see where a value came from). > > And for some reason, if I change the analyzed code to this: > > ''' > int __attribute__("__some_attribute__") x = 42; > int *y = &x; > x = 6; // New line > int z = *y; > if (z); > ''' > > This is what I get: > > ''' > 2 | int __attribute__((__taint__)) x = 42; > | ^ > | | > | (1) ‘x’ gets > marked > here > 3 | int * y = &x; > | ~ > | | > | (2) ‘&x’ points to tainted data here > 4 | x = 6; > 5 | int z = *y; > | ~ > | | > | (3) ‘x’ gets tainted here > 6 | if(z) return 666; > | ~ > | | > | (4) use of tainted value ‘z’ in a condition That event (3) looks bogus; looks like a bug in how we generate events (the existing version of the code I was talking about above). > ''' > > Sorry for the long mail here, No worries; sorry about the bugs in the analyzer :/ > Let me know if you want to have a look at the exploded graph or at > the > analyzer's log. Does the above help? Dave ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Static Analyzer: correlate state for different region/svalue for reporting 2023-03-16 13:44 ` David Malcolm @ 2023-03-21 14:36 ` Pierrick Philippe 0 siblings, 0 replies; 5+ messages in thread From: Pierrick Philippe @ 2023-03-21 14:36 UTC (permalink / raw) To: David Malcolm, gcc [-- Attachment #1: Type: text/plain, Size: 3659 bytes --] On 16/03/2023 14:44, David Malcolm wrote: > On Thu, 2023-03-16 at 09:54 +0100, Pierrick Philippe wrote: >> On 15/03/2023 17:26, David Malcolm wrote: >>> On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: [stripping] >>>> So, first question: is there any way to associate and track the >>>> state >>>> of >>>> a rvalue, independently of its lvalue? >>>> >>>> To try to clarify the question, here's an example: >>>> >>>> ''' >>>> int __attribute__("__some_attribute__") x = 42; >>>> /* STEP-1 >>>> From now on, we consider the state of x as being marked by >>>> some_attribute. >>>> But in fact, in the log, we can observe that we'll have something >>>> like >>>> this in the new '/ana::program_state/': >>>> {0x4b955b0: (int)42: marked_state (‘x’)} */ [stripping] >>>> int *y = &x; >>>> /* STEP-2 >>>> For analysis purpose, you want to know that from now on, y is >>>> pointing >>>> to marked data. >>>> So you set state of the LHS of the GIMPLE statement (i.e. some >>>> ssa_name >>>> instance of y) accordingly, with a state named 'points- >>>> to_marked_data' >>>> and setting 'x' as the origin of the state (in the sense of the >>>> argument >>>> /origin/ from '/ana::sm_context::on_transition/'. >>>> What we now have in the new '/ana::program_state/' is this: >>>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >>>> (int)42 >>>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ >>> Yes: you've set the state on the svalue "&x", not on "y". >>> >>>> int z = *y; >>>> /* STEP-3 >>>> Now you have implicit copy of marked data, and you want to report >>>> it. >>>> So you state the LHS of the GIMPLE statement (i.e. some ssa_name >>>> instance of z) as being marked, with 'y' as the origin. >>>> What we now have in the new '/ana::program_state/' is this: >>>> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >>>> (int)42 >>>> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ >>>> ''' >>> Presumably the program_state also shows that you have a binding for >>> the >>> region "z", containing the svalue 42 (this is represented within >>> the >>> "store", within the region_model within the program_state). [stripping] This is an update about tracking state of svalues instead of region for other kind of variables than pointers. If you consider the following code: ''' int __attribute__((__taint__)) x = 42; /* Program state: {0x4b955b0: (int)42: marked_state (‘x’)} */ int y = 42; // Program state unchanged if (y); /* When querying the sm_context about the state of y, it returns it as being in a "marked_state", because its svalue is the same as x's one. Even though no call to change y's state has been made. And here it triggers a diagnostic for my analysis. */ ''' I understand way better now the internals of the analyzer regarding the state's tracking. I do completely understand now, why you've said you've mainly designed it for pointers, because this allow you to avoid to do some points-to analysis, by associating state with pointer's svalues instead of pointer's region. But as you can see in the above code example, it has its drawback for analyzing variables with a different semantics, such as integer type variable. I will have to modified the analyzer's code to add a way for state machine to ask the analyzer to track region's state instead of svalue's state to be able to keep using it with my analysis plugin. Do you think it would be interesting having such features merged within the analyzer? In any case, I'll start to work on it over the last /trunk/ branch, within an appropriate branch. Thank you for your time, Pierrick ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-21 14:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-15 15:24 Static Analyzer: correlate state for different region/svalue for reporting Pierrick Philippe 2023-03-15 16:26 ` David Malcolm 2023-03-16 8:54 ` Pierrick Philippe 2023-03-16 13:44 ` David Malcolm 2023-03-21 14:36 ` Pierrick Philippe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).