From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from www523.your-server.de (www523.your-server.de [159.69.224.22]) by sourceware.org (Postfix) with ESMTPS id 4AC2E385742B for ; Tue, 21 Jun 2022 23:05:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4AC2E385742B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tim-lange.me Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tim-lange.me DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tim-lange.me; s=default2108; h=In-Reply-To:References:Message-Id:Date: Content-Transfer-Encoding:Cc:To:From:Subject:Content-Type:Mime-Version:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=iX+3FQ2CnqfWhoQU1yhhUhgZigBy4Br/HLz4A0B/OnE=; b=WGm51d46GODlFi4E2OrGfgfX0m iS5w9cgA7eIyTKcinBCGeA1amPksOa8HuHu0QDUvPsaoAAsDtuOC5PRfK4StO01WShqbx2jmS8TN8 dd3vM8EiXbc4lyGKMmhtBgtuN26sZjnxuxtK2qKtHWQqlmUgdKDktfowGpxHyU6NM1j57wPVLw8FT 3aPjErjMhPNnsKmd5Ai/PoFFpO3gbkV/nGA455v4c8dIrDHhn8SrdJkdSL7qJGRvoQXApm5dv35cQ ImHXrU9jQQX0mpyeQF3GakeliCFQ3kNWMVzLpN7tneU7PYgWVJWGI785AjsKI4lpxU1rn5FGbm+2p aWsob2CQ==; Received: from sslproxy06.your-server.de ([78.46.172.3]) by www523.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92.3) (envelope-from ) id 1o3mvp-0004c3-BS; Wed, 22 Jun 2022 01:05:13 +0200 Received: from [2a02:908:1861:d6a0::de4e] (helo=localhost) by sslproxy06.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o3mvp-000J0C-4j; Wed, 22 Jun 2022 01:05:13 +0200 Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Subject: Re: [RFC] analyzer: allocation size warning From: "Tim Lange" To: "David Malcolm" Cc: "GCC Mailing List" Content-Transfer-Encoding: quoted-printable Date: Tue, 21 Jun 2022 22:00:34 +0200 Message-Id: X-Mailer: aerc 0.10.0 References: <42afee94ad4da41b87980c6b4a7cd7dcc6cb1e97.camel@redhat.com> <20220617202305.kg2otzye6k57fgyo@fedora> <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com> In-Reply-To: <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com> X-Authenticated-Sender: mail@tim-lange.me X-Virus-Scanned: Clear (ClamAV 0.103.6/26579/Tue Jun 21 10:15:30 2022) X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_00, BODY_8BITS, DATE_IN_PAST_03_06, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_INFOUSMEBIZ, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2022 23:05:18 -0000 On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote: > On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote: > > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote: > > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote: > > [...snip...] > > > >=20 > >=20 > > I have resent the patch using git send-email as a reply to my original > > message.=20 > > The new message looks properly formatted in the archive: > > =C2=A0=C2=A0=C2=A0 https://gcc.gnu.org/pipermail/gcc/2022-June/238911.h= tml > > Thanks; that's *much* more readable. > > > [...snip...] > > > >=20 > > >=20 > > >=20 > > > On symbolic buffer sizes: > > > warning: Allocated buffer size is not a multiple of the pointee's > > > size=20 > > > [CWE-131] [-Wanalyzer-allocation-size] > > > =C2=A0=C2=A0 33 | int *ptr =3D malloc (n + sizeof(int)); /* { dg-line= malloc3 } > > > */ > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | ^~~~~~~~~~~~~~~~~~~~~~~~ > > > =C2=A0 =E2=80=98test_3=E2=80=99: event 1 > > > =C2=A0=C2=A0=C2=A0 | > > > =C2=A0=C2=A0=C2=A0 | 33 | int *ptr =3D malloc (n + sizeof(int)); /* {= dg-line malloc3 > > > }=20 > > > */ > > > =C2=A0=C2=A0=C2=A0 | | ^~~~~~~~~~~~~~~~~~~~~~~~ > > > =C2=A0=C2=A0=C2=A0 | | | > > > =C2=A0=C2=A0=C2=A0 | | (1) Allocation is incompatible with =E2=80=98i= nt *=E2=80=99; either the=20 > > > allocated size is bogus or the type on the left-hand side is wrong > > > =C2=A0=C2=A0=C2=A0 | > > >=20 > > >=20 > > > Is there location information for both the malloc and for the > > > assignment, here? > >=20 > > I'm not sure whether I understand your question but the warning is=20 > > emitted at the gcall* with a ssa var lhs and the call_fndecl on the > > rhs. > > I think that is enough to split that up into "(1) n + sizeof(int)=20 > > allocated here" and "(2) Allocation at (1) is incompatible with..."?=20 > > Probably, yes. > > FWIW I wrote some more notes about the events in my reply to to your > reply to Prathamesh, here: > https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html > > [...snip...] > > > >=20 > > > There are some things to discuss from my side: > > > * The tests with the "toy re-implementation of CPython's object=20 > > > model"[2] fail due to a extra warning emitted. Because the analyzer > > > can't know the calculation actually results in a correct buffer size= =20 > > > when viewed as a string_obj later on, it emits a warning, e.g. at > > > line=20 > > > 61 in data-model-5.c. The only mitigation would be to disable the=20 > > > warning for structs entirely. Now, the question is to rather have > > > noise > > > on these cases or disable the warning for structs entirely? > > >=20 > > > Can you post the full warning please? > >=20 > > /path/to/data-model-5.c: In function =E2=80=98alloc_obj=E2=80=99: > > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a > > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] > > =C2=A0=C2=A0 61 |=C2=A0=C2=A0 base_obj *obj =3D (base_obj *)malloc (sz)= ; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ^~~~~~~~= ~~~ > > =C2=A0 =E2=80=98new_string_obj=E2=80=99: events 1-2 > > =C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 69 | base_obj *new_string_obj (const c= har *str) > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ^~~~~~~~~~~~~~ > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (1) entry to =E2=80=98new_string= _obj=E2=80=99 > > =C2=A0=C2=A0=C2=A0 |...... > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 75 |=C2=A0=C2=A0=C2=A0=C2=A0 =3D (stri= ng_obj *)alloc_obj (&str_type, sizeof > > (string_obj) + len + 1); > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 (2) calling =E2=80=98alloc_obj=E2=80=99 from > > =E2=80=98new_string_obj=E2=80=99 > > =C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0 +--> =E2=80=98alloc_obj=E2=80=99: events 3-4 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0 59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 ^~~~~~~~~ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 (3) entry to =E2=80=98alloc_obj=E2=80=99 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0 60 | { > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0 61 |=C2=A0=C2=A0 base_obj *obj =3D (base_obj *)malloc (sz); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ~~~~~~~~~~~ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (4) Allocation is > > incompatible with =E2=80=98base_obj *=E2=80=99; either the allocated si= ze is bogus or > > the type on the left-hand side is wrong > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | > >=20 > > >=20 > > > These testcases exhibit a common way of faking inheritance in C, and > > > I > > > think it ought to be possible to support this in the warning. > > >=20 > > > I thing what's happening is we have > > >=20 > > > struct base > > > {=C2=A0 > > > =C2=A0 /* fields */ > > > }; > > >=20 > > > struct sub > > > { > > > =C2=A0 struct base m_base; > > > =C2=A0 /* extra fields.=C2=A0 */ > > > }; > > >=20 > > > struct base *construct_base (size_t sz) > > > { > > > =C2=A0 struct base *p =3D (struct base *) malloc (sz); > > >=20 > > > =C2=A0 /* set up fields of base in p=C2=A0 */ > > >=20 > > > =C2=A0 return p; > > > } > > >=20 > > > Or is this on the interprocedural path as called with a specific > > > sizeof > > > for struct sub? > >=20 > > At (4), it does not know that base_obj is later used as a "base > > struct".=20 > > As it is called with sizeof(struct sub), my checker thinks the buffer > > is > > too large for one but too small for another base_obj. > >=20 > > >=20 > > > Maybe we can special-case these by detecting where struct sub's first > > > field is struct base, and hence where we expect this pattern?=C2=A0 (= and > > > use > > > this to suppress the warning for such cases?) > >=20 > > I already excluded all structs with structs inside with=20 > > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not > > help=20 > > in the case size for struct sub is allocated but casted as base. Maybe, > > we > > should do a special case for structs where we only warn when the sizeof > > is > > too small to hold the base struct together with supressing warnings > > when > > the first field is a struct?=20 > > That sounds like it could work. > > There are several things going on in the above example: > - fake inheritance > - the "trailing array idiom": struct string_obj's final field is: > char str_buf[]; > meaning that the string_obj will have the char buffer trailing off > the end, and the allocation is expected to support this. > > This is not uncommon in C; it occurs in CPython, see e.g.: > https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h > > where CPython's PyBytesObject has the bytes in an ob_sval field > trailing off the end: > > typedef struct { > PyObject_VAR_HEAD > Py_DEPRECATED(3.11) Py_hash_t ob_shash; > char ob_sval[1]; > > /* Invariants: > * ob_sval contains space for 'ob_size+1' elements. > * ob_sval[ob_size] =3D=3D 0. > * ob_shash is the hash of the byte string or -1 if not computed = yet. > */ > } PyBytesObject; > > so it would be good for the warning to handle it gracefully, which I > think your proposal above would. > > I try to have plenty of idiomatic C code in the analyzer test suite to > try to catch this kind of thing (as well as more "unit test" kinds of > test coverage); we want the warnings to have a good signal:noise ratio. > > > > >=20 > >=20 > > * I'm unable to emit a warning whenever the cast happens at an=20 > > assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c.= =20 > > This is because I'm unable to access a region_svalue for the returned > > value. Even in the new_program_state, the svalue of the lhs is still a= =20 > > conjured_svalue. Maybe David can lead me to a place where I can access= =20 > > the return value's region_svalue or do I have to adapt the engine? > >=20 > > Please can you try reposting the patch?=C2=A0 I tried to read it, but a= m > > having trouble with the mangled indentation. > > See my inline answer above. Both, the test case and from where I want > to access the region_svalue are commented with // FIXME. > > What does the dump of the state look like? e.g. via calling=C2=A0 > > (gdb) call m_region_model->debug() > > from within gdb > > A conjured_svalue represents the result of a call to an external > function (or a side-effect written out to a *out-style param of such a > function), but we have the body of create_buffer, so the call to > create_buffer should be analyzed interprocedurally, and we should have > a region_svalue pointing at a heap_allocated_region. > > You might want to simplify things to just the functions of interest, > and then have a look at the output of -fdump-analyzer-exploded-graph in > your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora). > > I wonder if my idea from the other email of moving the test from sm- > malloc.cc to region-model.cc might affect this; the state machines run > at a slightly different time to the region model updates. I've now moved the checker inside the region_model. While I got everything working again and fixed the bogus struct error, I'm stuck at the warning if the cast is on a return. Lets take the example: void *create_buffer(int n) { return malloc(n); } int main (int argc, char **argv) { int *buf =3D create_buffer(42); free (buf); return 0; } After moving it to the region_model, it reaches ctxt->warn (new dubious_allocation_size(...)) but does not emit a warning because inside impl_region_model_context::warn m_stmt and m_stmt_finder are both NULL. m_stmt is null, because the exploded node at which set_value is called on, is the after node of create_buffer (the last bb of the function), which doesn't have any statements. I=C2=B7tried=C2=B7to=C2=B7add=C2=B7the=C2=B7call=C2=B7site=C2=B7to=C2=B7the= =C2=B7context=C2=B7inside program_state::on_edge to=C2=B7get=C2=B7the=C2=B7warning.=C2=B7While=C2=B7the=C2=B7warning=C2=B7ap= pears, the=C2=B7notes=C2=B7in=C2=B7the=C2=B7diagnostic are=C2=B7not=C2=B7correct=C2=B7anymore,=C2=B7i.e.=C2=B7the=C2=B7call=C2=B7s= ite has=C2=B7the=C2=B7same=C2=B7indentation=C2=B7as the=C2=B7callee=C2=B7and=C2=B7the=C2=B7return=C2=B7note=C2=B7is missing. On the other hand, I can not split the return value set_value call out of pop_frame, because then I'm unable to retrieve the as it is already gone. If I simply delay the pop_frame call until the 'before' supernode, I get the warning I wanted but break around 600 test, which can't be right either - at least not for a simple fix. Because I'm stuck at this for some hours, I'd like to ask you what do you think is the best way to get a warning at the call site? - Tim > > > > >=20 > >=20 > >e* attr-malloc-6.c and pr96639.c did both contain stn;cts withn;t an > > implementation. Something in the analyzer must have triggered another > > warning about the usage of those without them having an implementation. > > I changed those structs to have an empty implementation, such that the= =20 > > additional warning are gone. I think this shouldn't change the test > > case, so is this change okay? > >=20 > > What were the new warnings? > > /path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type > =E2=80=98struct FILE=E2=80=99 > =C2=A0 175 |=C2=A0=C2=A0=C2=A0=C2=A0 FILE *p =3D malloc (100);=C2=A0=C2= =A0 // { dg-message "allocated here" > } > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ^~~~~~~~~~~~ For anyone in the future with the same problem. Make sure to check that TYPE_SIZE_UNIT !=3D NULL_TREE before calling size_in_bytes. > > All were like the one above. error: invalid use of undefined type > 'struct XXX' > > That error looks bogus; I'm guessing that something the new diagnostics > is calling is generating it. You can probably track it down by using > > (gdb) break-on-diagnostic > > in the debugger, and then seeing what the backtrace shows when the > breakpoint fires. > > See: > https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html > > break-on-diagnostic is one of the things in the support scripts > mentioned on that page. > > Hope this is helpful > > (BTW, I'm about to disappear for a long weekend; I'm back on Tuesday) > > Dave