From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=JWAA=7P=irisa.fr=pierrick.philippe@sourceware.org>
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sourceware.org (Postfix) with ESMTPS id 6B867385B516
	for <gcc@gcc.gnu.org>; Thu, 23 Mar 2023 08:06:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6B867385B516
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=irisa.fr
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irisa.fr
Authentication-Results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none
X-Ironport-Dmarc-Check-Result: validskip
X-IronPort-AV: E=Sophos;i="5.98,283,1673910000"; 
   d="scan'208";a="98661362"
Received: from ptb-5cg22835fs.irisa.fr (HELO [131.254.21.198]) ([131.254.21.198])
  by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2023 09:06:33 +0100
Message-ID: <0dc78fd3-d06f-61bb-993a-e2b1f587dac4@irisa.fr>
Date: Thu, 23 Mar 2023 09:06:33 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.8.0
Subject: Re: [Static Analyzer] Loop handling - False positive for malloc-sm
To: David Malcolm <dmalcolm@redhat.com>, gcc@gcc.gnu.org
References: <34efc6e0-5bd8-879c-0288-154ba28f5f05@irisa.fr>
 <3b77234afb96947c9694d375b43b3096cbd45467.camel@redhat.com>
 <805abf28-3991-df57-51b5-d1e1f4f398b6@irisa.fr>
 <b285f0a74eb8417e0a541002fb207bbcd1b49a74.camel@redhat.com>
From: Pierrick Philippe <pierrick.philippe@irisa.fr>
Content-Language: en-US
In-Reply-To: <b285f0a74eb8417e0a541002fb207bbcd1b49a74.camel@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

On 22/03/2023 19:19, David Malcolm wrote:
> On Tue, 2023-03-21 at 09:21 +0100, Pierrick Philippe wrote:
[stripping]
>> In fact, this could be done directly by the analyzer, and only
>> calling
>> state machine APIs for loop handling which still has not reached
>> such a fixed point in their program state for the analyzed loop, with
>> a
>> maximum number of execution fixed by the analyzer to limit execution
>> time.
>>
>> Does what I'm saying make sense?
> I think so, though I'm not sure how it would work in practice.
> Consider e.g.
>
>    for (int i = 0; i < n; i++)
>       head = prepend_node (head, i);
>
> which builds a chain of N dynamically-allocated nodes in a linked list.

Well, that would be a case where the loop's analysis will depend of the 
state machine.
If we consider the malloc-sm, it would allow it to track as different 
pointers each allocated pointers, until the limit of symbolic execution 
imposed by the analyzer is reached, are the svalue of N if it is a known 
integer at the current analysis point.

For other, such as a the file-sm, it would only be needed to 
symbolically execute it once, assuming prepend_node() is not opening any 
files.
So this state machine would not have to be executed more than once on 
the loop at this program point by the analyzer.

I think the different steps for such a different cases of loop analysis, 
is somehow using the second point of the RFE you shared above.

The "algorithm" I came with when thinking about it looks like this.
Of course, I'm definitely not an expert on the analyzer, so it is 
possibly not feasible.

* Detect loop, and try to get the termination constraint (possibly 
reduced if possible).
* Iterate on the loops' node N:
     * If N is the loop's first node:
         * Check if the actual program state is in a sufficient state to 
satisfy the loop's termination constraint,
             If so, stop analyzing the loop
         * Otherwise, check if the maximum number of symbolic execution 
fixed by the analyzer is reached,
             If so, stop analyzing the loop
         * Otherwise, keep going
     * Call every sm still impacting their program state map on node N

This should work for loops iterating on integer, for other kind of 
loops, it might be trickier though.

>> In terms of implementation, loop detection can be done by looking for
>> strongly connected components (SCCs)
>> in a function graph having more than one node.
>> I don't know if this is how it is already done within the analyzer or
>> not?
> It isn't yet done in the analyzer, but as noted above there is code in
> GCC that already does that (in cfgloop.{h,cc}).

I definitely have to look at this files.

Thank you for your time,

Pierrick