From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 503983857343 for ; Tue, 21 Jun 2022 23:16:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 503983857343 Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-120-_UtPTl4OPPWlJu6mWd1hmQ-1; Tue, 21 Jun 2022 19:16:56 -0400 X-MC-Unique: _UtPTl4OPPWlJu6mWd1hmQ-1 Received: by mail-qv1-f70.google.com with SMTP id w18-20020a0ce112000000b0046e7f2c5a06so14429249qvk.0 for ; Tue, 21 Jun 2022 16:16:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=ilDxUXPwVuplyQ5CwA3MXaQZW/rT6iTl7K6CqcQpvJg=; b=SCAw2nwQasRmw+eIspqc0j0wpJzsFLAbK6za2KnZDWwgSCuwWfwZfa/YaJXPpT5E+G mhiV/6uWR4V9yhr74Ph2d/MovAw8Kxm71RHViyRON6S0a5IqxGAeLFt96LoIzyZrk1k2 E8FsYmgUmyGNNr2InMQJhD/eLabtirXmHFKcp/n/sf8f5RDuJcN4G+nqcCsdbNMqMTLi L5U0S5R/sBzr+lAa5sPqjRDe+qz2XnPzho2ugnf+J4Thb2sTonlPP+cNV59VXgA1qn52 kY1tBPNMfZ5hG74hzsibvQ5JS74E4Wv0aSaREB4Nuw/UbpRlFy8/LSHs83BWyFPgyv47 oB7g== X-Gm-Message-State: AJIora/PVM9TpUGFSbwx0e+YE1RwyyCrgL5W8cwIgzV50CWyWUVCAbIJ 1J+3SD0Ga0F7CFyfp4+2I5K8NIRgLjDNg4+013EqK454JphT6cOVGnfP4w6x7KhfUWdfpAZduU2 XjvjjYy8= X-Received: by 2002:a05:620a:240a:b0:6a6:b841:a635 with SMTP id d10-20020a05620a240a00b006a6b841a635mr313776qkn.689.1655853415725; Tue, 21 Jun 2022 16:16:55 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tbKkyveais9hPAaqkNM68L8yPulcKuDAMe2Jt4mo1BroIS13HLbGxX2m0rVx9Y2dhqlSM+Fw== X-Received: by 2002:a05:620a:240a:b0:6a6:b841:a635 with SMTP id d10-20020a05620a240a00b006a6b841a635mr313756qkn.689.1655853415326; Tue, 21 Jun 2022 16:16:55 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id br31-20020a05620a461f00b006a6af4b40b1sm15267796qkb.73.2022.06.21.16.16.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Jun 2022 16:16:54 -0700 (PDT) Message-ID: <5e65c86d518c6e26a970b5fb23c2ed507abd2085.camel@redhat.com> Subject: Re: [RFC] analyzer: allocation size warning From: David Malcolm To: Tim Lange Cc: GCC Mailing List Date: Tue, 21 Jun 2022 19:16:53 -0400 In-Reply-To: References: <42afee94ad4da41b87980c6b4a7cd7dcc6cb1e97.camel@redhat.com> <20220617202305.kg2otzye6k57fgyo@fedora> <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com> User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jun 2022 23:17:00 -0000 On Tue, 2022-06-21 at 22:00 +0200, Tim Lange wrote: > On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote: > > On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote: > > > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote: > > > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote: > > > > [...snip...] > > > > > > > > > > > > I have resent the patch using git send-email as a reply to my > > > original > > > message. > > > The new message looks properly formatted in the archive: > > >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html > > > > Thanks; that's *much* more readable. > > > > > > [...snip...] > > > > > > > > > > > > > > > > > > On symbolic buffer sizes: > > > > warning: Allocated buffer size is not a multiple of the > > > > pointee's > > > > size > > > > [CWE-131] [-Wanalyzer-allocation-size] > > > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line > > > > malloc3 } > > > > */ > > > >       | ^~~~~~~~~~~~~~~~~~~~~~~~ > > > >   ‘test_3’: event 1 > > > >     | > > > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line > > > > malloc3 > > > > } > > > > */ > > > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~ > > > >     | | | > > > >     | | (1) Allocation is incompatible with ‘int *’; either the > > > > allocated size is bogus or the type on the left-hand side is > > > > wrong > > > >     | > > > > > > > > > > > > Is there location information for both the malloc and for the > > > > assignment, here? > > > > > > I'm not sure whether I understand your question but the warning > > > is > > > emitted at the gcall* with a ssa var lhs and the call_fndecl on > > > the > > > rhs. > > > I think that is enough to split that up into "(1) n + sizeof(int) > > > allocated here" and "(2) Allocation at (1) is incompatible > > > with..."? > > > > Probably, yes. > > > > FWIW I wrote some more notes about the events in my reply to to > > your > > reply to Prathamesh, here: > >   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html > > > > [...snip...] > > > > > > > > > > There are some things to discuss from my side: > > > > * The tests with the "toy re-implementation of CPython's object > > > > model"[2] fail due to a extra warning emitted. Because the > > > > analyzer > > > > can't know the calculation actually results in a correct buffer > > > > size > > > > when viewed as a string_obj later on, it emits a warning, e.g. > > > > at > > > > line > > > > 61 in data-model-5.c. The only mitigation would be to disable > > > > the > > > > warning for structs entirely. Now, the question is to rather > > > > have > > > > noise > > > > on these cases or disable the warning for structs entirely? > > > > > > > > Can you post the full warning please? > > > > > > /path/to/data-model-5.c: In function ‘alloc_obj’: > > > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is > > > not a > > > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation- > > > size] > > >    61 |   base_obj *obj = (base_obj *)malloc (sz); > > >       |                               ^~~~~~~~~~~ > > >   ‘new_string_obj’: events 1-2 > > >     | > > >     |   69 | base_obj *new_string_obj (const char *str) > > >     |      |           ^~~~~~~~~~~~~~ > > >     |      |           | > > >     |      |           (1) entry to ‘new_string_obj’ > > >     |...... > > >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof > > > (string_obj) + len + 1); > > >     |      |                     > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > >     |      |                     | > > >     |      |                     (2) calling ‘alloc_obj’ from > > > ‘new_string_obj’ > > >     | > > >     +--> ‘alloc_obj’: events 3-4 > > >            | > > >            |   59 | base_obj *alloc_obj (type_obj *ob_type, > > > size_t sz) > > >            |      |           ^~~~~~~~~ > > >            |      |           | > > >            |      |           (3) entry to ‘alloc_obj’ > > >            |   60 | { > > >            |   61 |   base_obj *obj = (base_obj *)malloc (sz); > > >            |      |                               ~~~~~~~~~~~ > > >            |      |                               | > > >            |      |                               (4) Allocation > > > is > > > incompatible with ‘base_obj *’; either the allocated size is > > > bogus or > > > the type on the left-hand side is wrong > > >            | > > > > > > > > > > > These testcases exhibit a common way of faking inheritance in > > > > C, and > > > > I > > > > think it ought to be possible to support this in the warning. > > > > > > > > I thing what's happening is we have > > > > > > > > struct base > > > > {  > > > >   /* fields */ > > > > }; > > > > > > > > struct sub > > > > { > > > >   struct base m_base; > > > >   /* extra fields.  */ > > > > }; > > > > > > > > struct base *construct_base (size_t sz) > > > > { > > > >   struct base *p = (struct base *) malloc (sz); > > > > > > > >   /* set up fields of base in p  */ > > > > > > > >   return p; > > > > } > > > > > > > > Or is this on the interprocedural path as called with a > > > > specific > > > > sizeof > > > > for struct sub? > > > > > > At (4), it does not know that base_obj is later used as a "base > > > struct". > > > As it is called with sizeof(struct sub), my checker thinks the > > > buffer > > > is > > > too large for one but too small for another base_obj. > > > > > > > > > > > Maybe we can special-case these by detecting where struct sub's > > > > first > > > > field is struct base, and hence where we expect this pattern?  > > > > (and > > > > use > > > > this to suppress the warning for such cases?) > > > > > > I already excluded all structs with structs inside with > > > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does > > > not > > > help > > > in the case size for struct sub is allocated but casted as base. > > > Maybe, > > > we > > > should do a special case for structs where we only warn when the > > > sizeof > > > is > > > too small to hold the base struct together with supressing > > > warnings > > > when > > > the first field is a struct? > > > > That sounds like it could work. > > > > There are several things going on in the above example: > > - fake inheritance > > - the "trailing array idiom": struct string_obj's final field is: > >      char str_buf[]; > >   meaning that the string_obj will have the char buffer trailing > > off > > the end, and the allocation is expected to support this. > > > > This is not uncommon in C; it occurs in CPython, see e.g.: > > https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h > > > > where CPython's PyBytesObject has the bytes in an ob_sval field > > trailing off the end: > > > > typedef struct { > >     PyObject_VAR_HEAD > >     Py_DEPRECATED(3.11) Py_hash_t ob_shash; > >     char ob_sval[1]; > > > >     /* Invariants: > >      *     ob_sval contains space for 'ob_size+1' elements. > >      *     ob_sval[ob_size] == 0. > >      *     ob_shash is the hash of the byte string or -1 if not > > computed yet. > >      */ > > } PyBytesObject; > > > > so it would be good for the warning to handle it gracefully, which > > I > > think your proposal above would. > > > > I try to have plenty of idiomatic C code in the analyzer test suite > > to > > try to catch this kind of thing (as well as more "unit test" kinds > > of > > test coverage); we want the warnings to have a good signal:noise > > ratio. > > > > > > > > > > > > > > > * I'm unable to emit a warning whenever the cast happens at an > > > assignment with a call as the rhs, e.g. test_1 in allocation- > > > size-4.c. > > > This is because I'm unable to access a region_svalue for the > > > returned > > > value. Even in the new_program_state, the svalue of the lhs is > > > still a > > > conjured_svalue. Maybe David can lead me to a place where I can > > > access > > > the return value's region_svalue or do I have to adapt the > > > engine? > > > > > > Please can you try reposting the patch?  I tried to read it, but > > > am > > > having trouble with the mangled indentation. > > > > See my inline answer above. Both, the test case and from where I > > want > > to access the region_svalue are commented with // FIXME. > > > > What does the dump of the state look like? e.g. via calling  > > > >   (gdb) call m_region_model->debug() > > > > from within gdb > > > > A conjured_svalue represents the result of a call to an external > > function (or a side-effect written out to a *out-style param of > > such a > > function), but we have the body of create_buffer, so the call to > > create_buffer should be analyzed interprocedurally, and we should > > have > > a region_svalue pointing at a heap_allocated_region. > > > > You might want to simplify things to just the functions of > > interest, > > and then have a look at the output of -fdump-analyzer-exploded- > > graph in > > your favorite .dot viewer (I like xdot; it's in python-xdot in > > Fedora). > > > > I wonder if my idea from the other email of moving the test from > > sm- > > malloc.cc to region-model.cc might affect this; the state machines > > run > > at a slightly different time to the region model updates. > > I've now moved the checker inside the region_model. While I got > everything working again and fixed the bogus struct error,  ...thanks! > I'm stuck at > the warning if the cast is on a return. > > Lets take the example: >   void *create_buffer(int n) >   { >     return malloc(n); >   } > >   int main (int argc, char **argv) >   { >     int *buf = create_buffer(42); >     free (buf); >     return 0; >   } > > After moving it to the region_model, it reaches >   ctxt->warn (new dubious_allocation_size(...)) > but does not emit a warning because inside > impl_region_model_context::warn > m_stmt and m_stmt_finder are both NULL. > m_stmt is null, because the exploded node at which set_value is > called > on, is the after node of create_buffer (the last bb of the function), > which doesn't have any statements. What optimization level are you trying this at? What's the output of -fdump-ipa-analyzer=stderr > > I·tried·to·add·the·call·site·to·the·context·inside > program_state::on_edge > to·get·the·warning.·While·the·warning·appears, > the·notes·in·the·diagnostic > are·not·correct·anymore,·i.e.·the·call·site > has·the·same·indentation·as > the·callee·and·the·return·note·is missing. > On the other hand, I can not split the return value set_value call > out > of pop_frame, because then I'm unable to retrieve the > as > it is already gone. > If I simply delay the pop_frame call until the 'before' supernode, I > get > the warning I wanted but break around 600 test, which can't be right > either - at least not for a simple fix. > > Because I'm stuck at this for some hours, I'd like to ask you what do > you think is the best way to get a warning at the call site? Can you post your patch please so that I can see what's going on more clearly. Thanks Dave