From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sourceware.org (Postfix) with ESMTPS id 8BBE23858D38 for ; Thu, 16 Mar 2023 08:54:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8BBE23858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=irisa.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irisa.fr Authentication-Results: mail3-relais-sop.national.inria.fr; dkim=none (message not signed) header.i=none X-Ironport-Dmarc-Check-Result: validskip X-IronPort-AV: E=Sophos;i="5.98,265,1673910000"; d="scan'208";a="50343614" Received: from ptb-5cg22835fs.irisa.fr (HELO [131.254.21.198]) ([131.254.21.198]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Mar 2023 09:54:10 +0100 Message-ID: Date: Thu, 16 Mar 2023 09:54:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: Static Analyzer: correlate state for different region/svalue for reporting To: David Malcolm , gcc@gcc.gnu.org References: <98024bfba94a9dc306475fd2753bdcb55961dada.camel@redhat.com> Content-Language: fr, en-US From: Pierrick Philippe In-Reply-To: <98024bfba94a9dc306475fd2753bdcb55961dada.camel@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 15/03/2023 17:26, David Malcolm wrote: > On Wed, 2023-03-15 at 16:24 +0100, Pierrick Philippe wrote: >> Hi everyone, > Hi Pierrick > >> I have some question regarding the analyzer. >> I'm currently working on an fully out-of-tree static analyzer plugin. >> I started development on commit tagged as /basepoints/gcc-13/, but >> recently moved my code to a more recent /trunk/ branch (build >> /20230309/). >> >>  From my different experiments and analyzer's log readings, I have >> the >> intuition that '/ana::state_machine::state/' class is mostly (if not >> only) linked with '/ana::svalue/', > Yes: a program_state (which describes the state at a particular node > along an execution path) contains (among other things) an sm_state_map > per state_machine, and the sm_state_map maps from svalues to > state_machine::states. > >> i.e. the lvalue of a memory region. >> In my analysis case, I would like to also be able to track state for >> some rvalue of a memory region, i.e. '/ana::region/' objects. > You're using the terms "lvalue" and "rvalue", but if I'm reading your > email right, you're using them in exactly the opposite way that I > understand them. I apologize, you are completely right, I swapped their usage... > > I think of an "lvalue" as something that can be on the left-hand side > of an assignment: something that can be written to (an ana::region), > whereas an "rvalue" is something that can be on the right-hand side of > an assignment: a pattern of bits (an ana:svalue) that be written into > an lvalue. > > An ana::svalue is a pattern of bits (possibly symbolic, such as "the > constant 42" or "the initial value of param x") > > An ana::region is a description of a memory location for reads/writes > (possibly symbolic, such as "local var x within this frame", or "the > block reached by dereferencing the initial value of ptr in the frame > above"). > > Sorry if I've misread things; I'll try to answer the rest of the email > as best as I can... > >> So, first question: is there any way to associate and track the state >> of >> a rvalue, independently of its lvalue? >> >> To try to clarify the question, here's an example: >> >> ''' >> int __attribute__("__some_attribute__") x = 42; >> /* STEP-1 >>  From now on, we consider the state of x as being marked by >> some_attribute. >> But in fact, in the log, we can observe that we'll have something >> like >> this in the new '/ana::program_state/': >> {0x4b955b0: (int)42: marked_state (‘x’)} */ > Yes: sm-state is associated within a program_state with an svalue, in > this case with the constant 42. > > There isn't yet a way to instead give sm-state to "x" itself. I > suppose you could give sm-state to &x (the pointer that points to x is > an instance of region_svalue, which is an svalue), but I haven't tried > this approach myself. > >> int *y = &x; >> /* STEP-2 >> For analysis purpose, you want to know that from now on, y is >> pointing >> to marked data. >> So you set state of the LHS of the GIMPLE statement (i.e. some >> ssa_name >> instance of y) accordingly, with a state named 'points- >> to_marked_data' >> and setting 'x' as the origin of the state (in the sense of the >> argument >> /origin/ from '/ana::sm_context::on_transition/'. >> What we now have in the new '/ana::program_state/' is this: >> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >> (int)42 >> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ > Yes: you've set the state on the svalue "&x", not on "y". > >> int z = *y; >> /* STEP-3 >> Now you have implicit copy of marked data, and you want to report it. >> So you state the LHS of the GIMPLE statement (i.e. some ssa_name >> instance of z) as being marked, with 'y' as the origin. >> What we now have in the new '/ana::program_state/' is this: >> {0x4b9acb0: &x: points-to-marked_data (‘&x’) (origin: 0x4b955b0: >> (int)42 >> (‘x’)), 0x4b955b0: (int)42: marked_state (‘x’)} */ >> ''' > Presumably the program_state also shows that you have a binding for the > region "z", containing the svalue 42 (this is represented within the > "store", within the region_model within the program_state). Indeed, there is a binding for region "z" to the svalue 42 within the new program state. >> In STEP-2: >> >> We lost the information saying that the rvalue of y (i.e. y), is >> pointing to marked data. >> Only remain the information that its lvalue (i.e. &x), is pointing to >> marked data, which is of course true. > Note that the analyzer by default attempts to purge state that it > doesn't think will be needed anymore, so it may have purged the state > of "y" if y isn't going to be read from anymore. You could try -fno- > analyzer-state-purge to avoid this purging. Nothing changes when I run it with the -fno-anlyzer-state-purge, it is still the state of &x which is tracked. >> In STEP-3: >> >> No information is added regarding z in the new program_state. >> In fact, if one have a closer look at the log, we see that the LHS of >> the GIMPLE statement (the ssa_name instance of z), is already in the >> state of 'marked_data'. >> Through the log to the call to '/sm_context::on_transition/' this can >> be >> seen: >>      marked_sm: state transition of ‘z_5’: marked_data -> marked_data >> >> All of this step is somehow leading to confusing diagnostic. >> For example, you miss the fact that 'z' is being marked, because no >> event happen as it is somehow aliasing 'x' svalue. >> Though it might not be true in case of missed optimization. >> >> Of course, if anyone wants to have a look at my code, I can provide >> it >> to you (it is not yet publicly available at the moment). > I think I'd have to look at the code to be of more help; I confess that > I stopped understanding somewhere around step 3, sorry. No worries, the idea is that regarding the state_map, z_3 is already considered as marked. > Are you able to post a simple example of the code you'd like to > analyze, with an idea of the behavior you'd like from the analyzer? > (e.g. the state transitions) Sure here is an exemple: ''' int __attribute__("__some_attribute__") x = 42; // Transition state for x: start -> marked int *y = &x; // Transition state for y: start -> points-to_marked int z = *y; // Transition state for z: start -> marked if (z); // Diagnostic is issued because of the use of a marked data in a condition, for analysis purpose. ''' The issued diagnostic I do have in mind looks like: ''' 1 | int __attribute__("__some_attribute__") x = 42;    | ^    | |    | (1) ‘x’ gets marked here 2 | int *y = &x;    |       ~    |        |    |      (2) ‘y’ points-to marked data here 3 | int z = *y;    |     ~    |      |    |    (3) ‘z’ gets marked here 4 | if (z);    |     ~    |      |    |    (4) used of a marked value 'z' in a condition ''' What I get from now on is this: ''' 1 |     int __attribute__((__taint__)) x = 42;    |                                                 ^    |                                                  |    |                                                (1) ‘x’ gets tainted here    |...... 4 |     if(z) return 666;    |        ~    |         |    |       (2) use of tainted value ‘z’ in a condition ''' And for some reason, if I change the analyzed code to this: ''' int __attribute__("__some_attribute__") x = 42; int *y = &x; x = 6; // New line int z = *y; if (z); ''' This is what I get: ''' 2 |     int __attribute__((__taint__)) x = 42;    |                                                 ^    |                                                  |    |                                                (1) ‘x’ gets marked here 3 |     int * y = &x;    |            ~    |             |    |           (2) ‘&x’ points to tainted data here 4 |     x = 6; 5 |     int z = *y;    |          ~    |           |    |         (3) ‘x’ gets tainted here 6 |     if(z) return 666;    |        ~    |         |    |       (4) use of tainted value ‘z’ in a condition ''' Sorry for the long mail here, Let me know if you want to have a look at the exploded graph or at the analyzer's log. Thank you for your time, Pierrick