From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Fa0p=7N=irisa.fr=pierrick.philippe@sourceware.org>
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sourceware.org (Postfix) with ESMTPS id 1AC0338582A3
	for <gcc@gcc.gnu.org>; Tue, 21 Mar 2023 08:22:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1AC0338582A3
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=irisa.fr
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irisa.fr
Authentication-Results: mail3-relais-sop.national.inria.fr; dkim=none (message not signed) header.i=none
X-Ironport-Dmarc-Check-Result: validskip
X-IronPort-AV: E=Sophos;i="5.98,278,1673910000"; 
   d="scan'208";a="50770260"
Received: from ptb-5cg22835fs.irisa.fr (HELO [131.254.21.198]) ([131.254.21.198])
  by mail3-relais-sop.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2023 09:22:00 +0100
Message-ID: <805abf28-3991-df57-51b5-d1e1f4f398b6@irisa.fr>
Date: Tue, 21 Mar 2023 09:21:59 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.8.0
Subject: Re: [Static Analyzer] Loop handling - False positive for malloc-sm
To: David Malcolm <dmalcolm@redhat.com>, gcc@gcc.gnu.org
References: <34efc6e0-5bd8-879c-0288-154ba28f5f05@irisa.fr>
 <3b77234afb96947c9694d375b43b3096cbd45467.camel@redhat.com>
From: Pierrick Philippe <pierrick.philippe@irisa.fr>
Content-Language: en-US
In-Reply-To: <3b77234afb96947c9694d375b43b3096cbd45467.camel@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,NICE_REPLY_A,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

On 21/03/2023 00:30, David Malcolm wrote:
> On Mon, 2023-03-20 at 13:28 +0100, Pierrick Philippe wrote:
>> Hi everyone,
>>
>> I'm still playing around with the analyzer, and wanted to have a look
>> at
>> loop handling.
>> I'm using a build from /trunk/ branch (/20230309/).
>>
>> Here is my analyzed code:
>>
>> '''
>> 1| #include <stdlib.h>
>> 2| int main(void) {
>> 3|    void * ptr = malloc(sizeof(int));
>> 4|    for (int i = 0; i < 10; i++) {
>> 5|        if (i == 5) free(ptr);
>> 6|    }
>> 7|}
>> '''
[stripping]
>> So, I'm guessing that this false positive is due to how the analyzer
>> is
>> handling loops.
>> Which lead to my question: how are loops handled by the analyzer?
> Sadly, the answer is currently "not very well" :/
>
> I implemented my own approach, with a "widening_svalue" subclass of
> symbolic value.  This is widening in the Abstract Interpretation sense,
> (as opposed to the bitwise operations sense): if I see multiple values
> on successive iterations, the widening_svalue tries to simulate that we
> know the start value and the direction the variable is moving in.
>
> This doesn't work well; arguably I should rewrite it, perhaps with an
> iterator_svalue, though I'm not sure how it ought to work.  Some ideas:
>
> * reuse gcc's existing SSA-based loop analysis, which I believe can
> identify SSA names that are iterator variables, figure out their
> bounds, and their per-iteration increments, etc.
>
> * rework the program_point or supergraph code to have a notion of "1st
> iteration of loop", "2nd iteration of loop", "subsequent iterations",
> or similar, so that the analyzer can explore those cases differently
> (on the assumption that such iterations hopefully catch the most
> interesting bugs)

I see, I don't know if you ever considered allowing state machines to 
deal with loops on their own.
Such as having an API to allow to register a callback to handle loops, 
but not in a mandatory way.
Or having a set of APIs to optionally implement for the analyzer to call.

It would allow state machines to analyze loops with the meaning of their 
inner analysis.

Which could allow them to try to find a fixed point in the loop 
execution which doesn't have
any impact on the program state for that state machine. Kind of like a 
custom loop invariant.
Because depending of the analysis goal of the state machine, you might 
need to symbolically execute the loop
only a few times before reentering the loop and having the entry state 
being the same as the end-of-loop state.

In fact, this could be done directly by the analyzer, and only calling 
state machine APIs for loop handling which still has not reached
such a fixed point in their program state for the analyzed loop, with a 
maximum number of execution fixed by the analyzer to limit execution time.

Does what I'm saying make sense?

In terms of implementation, loop detection can be done by looking for 
strongly connected components (SCCs)
in a function graph having more than one node.
I don't know if this is how it is already done within the analyzer or not?

Thank you for your time,

Pierrick