From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id DC8EA3858433 for ; Wed, 22 Mar 2023 18:19:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC8EA3858433 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1679509185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5T7KQILcNVbzB+bMKrdOp0Av5TCDOe8zSs6mReBfV5o=; b=BwBjhpbUQns5FEcflYw/s+SHpYnmfqeBBp+KpVh315intt5NQm/gXDod/Sym4B7xw1YuAv KesXjNq2dqbv1dFnYemTop9oigMORNE5cAJqB/W/JCPTecOW6i5idAin81fV3fFWjscIo3 JT5NxzyKV3jPPy/wW7uR1KXlkDTd4dY= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-URwKtZyNNPag06VJCK8j9g-1; Wed, 22 Mar 2023 14:19:44 -0400 X-MC-Unique: URwKtZyNNPag06VJCK8j9g-1 Received: by mail-qk1-f199.google.com with SMTP id dw26-20020a05620a601a00b0074300c772c0so9101663qkb.11 for ; Wed, 22 Mar 2023 11:19:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679509183; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5T7KQILcNVbzB+bMKrdOp0Av5TCDOe8zSs6mReBfV5o=; b=Ginl42qLrP2H3qpXvtKSgJlUOxu79sHXvYddv9V936D0j4rSS9F4Rx+x9df1MigZIW 8IxOVFwEA9JLvA/LbYRuDjLGrs+dU0XNcN8cp7Ae7QON5zgsz/fzUh/PIijZ+sJYg6m+ 8wpuFJmbN3fzFGqhMUkPuZ9KHjZpcgI6v8MeYKaTPFvT9eWRhzog7uRv1ID4CL0sZgGO Onx8D7SHNdbVrVoKGkZPTuRMKkKyHs+hZ+KcextJQlHi6c9emP3DVr3S6SvMT9bRKrDg g1Qhj53tMIh+eKlCK5PoeGwAEZTgj+ntBvB0f3/I/Sn8FiLb5acWcbA4QKHLj3BRSO7T TosQ== X-Gm-Message-State: AO0yUKXegB6BkVG4mqtN/fR3jjk6DfgwFGb1OwtsGxJHaktaKxr39DAV 26rIzxTAIW5+b4HSdao9OeCvSVetlFkIKw7SyYpMrhnB35Tif/ENZoj6PDf12xTZUoWNzsvzFHn 7+2S8tmAcLOzpbZI= X-Received: by 2002:ad4:5de4:0:b0:56e:bc62:e164 with SMTP id jn4-20020ad45de4000000b0056ebc62e164mr7225429qvb.8.1679509183073; Wed, 22 Mar 2023 11:19:43 -0700 (PDT) X-Google-Smtp-Source: AK7set95MeL+MUuVPQHQiv3ile3sjeMiD85JLZvxClTh+9wSl6zJS2hm0pFMFuTQPZHBCD87V8GX7Q== X-Received: by 2002:ad4:5de4:0:b0:56e:bc62:e164 with SMTP id jn4-20020ad45de4000000b0056ebc62e164mr7225398qvb.8.1679509182716; Wed, 22 Mar 2023 11:19:42 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id n67-20020a37bd46000000b007469c93ac2dsm3753332qkf.31.2023.03.22.11.19.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 11:19:42 -0700 (PDT) Message-ID: Subject: Re: [Static Analyzer] Loop handling - False positive for malloc-sm From: David Malcolm To: Pierrick Philippe , gcc@gcc.gnu.org Date: Wed, 22 Mar 2023 14:19:41 -0400 In-Reply-To: <805abf28-3991-df57-51b5-d1e1f4f398b6@irisa.fr> References: <34efc6e0-5bd8-879c-0288-154ba28f5f05@irisa.fr> <3b77234afb96947c9694d375b43b3096cbd45467.camel@redhat.com> <805abf28-3991-df57-51b5-d1e1f4f398b6@irisa.fr> User-Agent: Evolution 3.44.4 (3.44.4-1.fc36) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 2023-03-21 at 09:21 +0100, Pierrick Philippe wrote: > On 21/03/2023 00:30, David Malcolm wrote: > > On Mon, 2023-03-20 at 13:28 +0100, Pierrick Philippe wrote: > > > Hi everyone, > > >=20 > > > I'm still playing around with the analyzer, and wanted to have a > > > look > > > at > > > loop handling. > > > I'm using a build from /trunk/ branch (/20230309/). > > >=20 > > > Here is my analyzed code: > > >=20 > > > ''' > > > 1| #include > > > 2| int main(void) { > > > 3| =C2=A0=C2=A0 void * ptr =3D malloc(sizeof(int)); > > > 4| =C2=A0=C2=A0 for (int i =3D 0; i < 10; i++) { > > > 5| =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (i =3D=3D 5) free(ptr); > > > 6| =C2=A0=C2=A0 } > > > 7|} > > > ''' > [stripping] > > > So, I'm guessing that this false positive is due to how the > > > analyzer > > > is > > > handling loops. > > > Which lead to my question: how are loops handled by the analyzer? > > Sadly, the answer is currently "not very well" :/ > >=20 > > I implemented my own approach, with a "widening_svalue" subclass of > > symbolic value.=C2=A0 This is widening in the Abstract Interpretation > > sense, > > (as opposed to the bitwise operations sense): if I see multiple > > values > > on successive iterations, the widening_svalue tries to simulate > > that we > > know the start value and the direction the variable is moving in. > >=20 > > This doesn't work well; arguably I should rewrite it, perhaps with > > an > > iterator_svalue, though I'm not sure how it ought to work.=C2=A0 Some > > ideas: > >=20 > > * reuse gcc's existing SSA-based loop analysis, which I believe can > > identify SSA names that are iterator variables, figure out their > > bounds, and their per-iteration increments, etc. > >=20 > > * rework the program_point or supergraph code to have a notion of > > "1st > > iteration of loop", "2nd iteration of loop", "subsequent > > iterations", > > or similar, so that the analyzer can explore those cases > > differently > > (on the assumption that such iterations hopefully catch the most > > interesting bugs) I've filed an RFE discussing some of the problems with -fanalyzer's loop-handling here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109252 including the idea of making use of GCC's existing SSA-based loop analysis (which discovers a tree of loops within each function's CFG). >=20 > I see, I don't know if you ever considered allowing state machines to > deal with loops on their own. > Such as having an API to allow to register a callback to handle > loops,=20 > but not in a mandatory way. > Or having a set of APIs to optionally implement for the analyzer to > call. I hadn't thought of that, but it sounds like a reasonable idea. >=20 > It would allow state machines to analyze loops with the meaning of > their=20 > inner analysis. >=20 > Which could allow them to try to find a fixed point in the loop=20 > execution which doesn't have > any impact on the program state for that state machine. Kind of like > a=20 > custom loop invariant. > Because depending of the analysis goal of the state machine, you > might=20 > need to symbolically execute the loop > only a few times before reentering the loop and having the entry > state=20 > being the same as the end-of-loop state. The analyzer performs symbolic execution; it tries to achieve a reasonable balance between: * precision of state tracking versus * achieving decent coverage of code and data flow * ensuring termination via various heuristics. Its current loop implementation uses widening_svalue and the complexity limits on svalues/regions to attempt to have the symbolic execution terminate due to hitting already-visited nodes in the exploded_graph, or else hit per-program-point limits. Unfortuately this often doesn't work well. GCC's optimization code has both GIMPLE and RTL loop analysis code.=20 The RTL code runs too late for the analyzer, but the GIMPLE loop analysis code is in cfgloop.{h,cc} and thus we would have access to information about loops, at least for well-behaved cases - though possibly only when optimization is enabled. >=20 > In fact, this could be done directly by the analyzer, and only > calling=20 > state machine APIs for loop handling which still has not reached > such a fixed point in their program state for the analyzed loop, with > a=20 > maximum number of execution fixed by the analyzer to limit execution > time. >=20 > Does what I'm saying make sense? I think so, though I'm not sure how it would work in practice.=20 Consider e.g.=20 for (int i =3D 0; i < n; i++) head =3D prepend_node (head, i); which builds a chain of N dynamically-allocated nodes in a linked list. >=20 > In terms of implementation, loop detection can be done by looking for > strongly connected components (SCCs) > in a function graph having more than one node. > I don't know if this is how it is already done within the analyzer or > not? It isn't yet done in the analyzer, but as noted above there is code in GCC that already does that (in cfgloop.{h,cc}). Dave