From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 2530E3857C71 for ; Fri, 17 Jun 2022 22:13:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2530E3857C71 Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-653-WXtBIISdMdK9t9y3E0Xs3A-1; Fri, 17 Jun 2022 18:13:20 -0400 X-MC-Unique: WXtBIISdMdK9t9y3E0Xs3A-1 Received: by mail-qv1-f69.google.com with SMTP id g29-20020a0caadd000000b004702ed3c3f5so443461qvb.11 for ; Fri, 17 Jun 2022 15:13:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=OzV/WZF8kAx618HNTj3JeiClFWVvjISz1/6g2zrl8JI=; b=I6mHwYQbvxppBfJhzH76jslVxJ58Q2ZvqRt/p4CBK8COfaq89TcZz1r58LKV5Rj5Gt SIwmgbfUMgBBuZLo1COYZl0dm9L9tpdGUdtxIgbFenxXx7ABElVtoworH7k8NyegDpdA PJ4PfMCpnebJEM6xjmTqPMp6A87sdcQaPJ0kLAbZa+9IyyBX9I5UjwIiENh0bJ6G6hvQ fv2mB/MPBl0uNVqaLXKyl18Q/j4WI7MkL6IDGZ+fZgcqBhp4CAYUlIO23Kst6dGah2LY QW7rxkigb0R9IuTjHq0dmnEfX03qAhUTX1GptQFmwN8OW3v/2fxp7F+MsfC5UtYpW34E af1w== X-Gm-Message-State: AJIora+eVOckMxYontVELSyAWcrRFoISUiSx2M89f9WjvoqZuTFp2lwb RvSyW9wgq1+2MHzZ3Rpwz3vIC9SvgyCTm1HIlVDLMfmuvjkD1APDkZ4YaI9WbIwtD0h8wi2NwcI OWGqa720= X-Received: by 2002:a37:9ad4:0:b0:6a6:a7d3:56a2 with SMTP id c203-20020a379ad4000000b006a6a7d356a2mr8699466qke.274.1655503999514; Fri, 17 Jun 2022 15:13:19 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vdS05GGGdhUYKa1wh3+ATZqHzcLD0GAwZSwOXry0ALRS8vlcxafb+qy4Q4vrcCs9QXWSY8jw== X-Received: by 2002:a37:9ad4:0:b0:6a6:a7d3:56a2 with SMTP id c203-20020a379ad4000000b006a6a7d356a2mr8699446qke.274.1655503999150; Fri, 17 Jun 2022 15:13:19 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id l16-20020a05620a28d000b006a6cadd89efsm5921837qkp.82.2022.06.17.15.13.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 15:13:18 -0700 (PDT) Message-ID: <77a08baeea445a0d91a8df2dec32e5d8b766bc0f.camel@redhat.com> Subject: Re: [RFC] analyzer: allocation size warning From: David Malcolm To: Tim Lange Cc: GCC Mailing List Date: Fri, 17 Jun 2022 18:13:17 -0400 In-Reply-To: <20220617202305.kg2otzye6k57fgyo@fedora> References: <42afee94ad4da41b87980c6b4a7cd7dcc6cb1e97.camel@redhat.com> <20220617202305.kg2otzye6k57fgyo@fedora> User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2022 22:13:24 -0000 On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote: > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote: > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote: [...snip...] > > > > I have resent the patch using git send-email as a reply to my original > message. > The new message looks properly formatted in the archive: >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html Thanks; that's *much* more readable. [...snip...] > > > > > > > > On symbolic buffer sizes: > > warning: Allocated buffer size is not a multiple of the pointee's > > size > > [CWE-131] [-Wanalyzer-allocation-size] > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } > > */ > >       | ^~~~~~~~~~~~~~~~~~~~~~~~ > >   ‘test_3’: event 1 > >     | > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 > > } > > */ > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~ > >     | | | > >     | | (1) Allocation is incompatible with ‘int *’; either the > > allocated size is bogus or the type on the left-hand side is wrong > >     | > > > > > > Is there location information for both the malloc and for the > > assignment, here? > > I'm not sure whether I understand your question but the warning is > emitted at the gcall* with a ssa var lhs and the call_fndecl on the > rhs. > I think that is enough to split that up into "(1) n + sizeof(int) > allocated here" and "(2) Allocation at (1) is incompatible with..."? Probably, yes. FWIW I wrote some more notes about the events in my reply to to your reply to Prathamesh, here: https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html [...snip...] > > > > There are some things to discuss from my side: > > * The tests with the "toy re-implementation of CPython's object > > model"[2] fail due to a extra warning emitted. Because the analyzer > > can't know the calculation actually results in a correct buffer size > > when viewed as a string_obj later on, it emits a warning, e.g. at > > line > > 61 in data-model-5.c. The only mitigation would be to disable the > > warning for structs entirely. Now, the question is to rather have > > noise > > on these cases or disable the warning for structs entirely? > > > > Can you post the full warning please? > > /path/to/data-model-5.c: In function ‘alloc_obj’: > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] >    61 |   base_obj *obj = (base_obj *)malloc (sz); >       |                               ^~~~~~~~~~~ >   ‘new_string_obj’: events 1-2 >     | >     |   69 | base_obj *new_string_obj (const char *str) >     |      |           ^~~~~~~~~~~~~~ >     |      |           | >     |      |           (1) entry to ‘new_string_obj’ >     |...... >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof > (string_obj) + len + 1); >     |      |                     > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >     |      |                     | >     |      |                     (2) calling ‘alloc_obj’ from > ‘new_string_obj’ >     | >     +--> ‘alloc_obj’: events 3-4 >            | >            |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz) >            |      |           ^~~~~~~~~ >            |      |           | >            |      |           (3) entry to ‘alloc_obj’ >            |   60 | { >            |   61 |   base_obj *obj = (base_obj *)malloc (sz); >            |      |                               ~~~~~~~~~~~ >            |      |                               | >            |      |                               (4) Allocation is > incompatible with ‘base_obj *’; either the allocated size is bogus or > the type on the left-hand side is wrong >            | > > > > > These testcases exhibit a common way of faking inheritance in C, and > > I > > think it ought to be possible to support this in the warning. > > > > I thing what's happening is we have > > > > struct base > > {  > >   /* fields */ > > }; > > > > struct sub > > { > >   struct base m_base; > >   /* extra fields.  */ > > }; > > > > struct base *construct_base (size_t sz) > > { > >   struct base *p = (struct base *) malloc (sz); > > > >   /* set up fields of base in p  */ > > > >   return p; > > } > > > > Or is this on the interprocedural path as called with a specific > > sizeof > > for struct sub? > > At (4), it does not know that base_obj is later used as a "base > struct". > As it is called with sizeof(struct sub), my checker thinks the buffer > is > too large for one but too small for another base_obj. > > > > > Maybe we can special-case these by detecting where struct sub's first > > field is struct base, and hence where we expect this pattern?  (and > > use > > this to suppress the warning for such cases?) > > I already excluded all structs with structs inside with > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not > help > in the case size for struct sub is allocated but casted as base. Maybe, > we > should do a special case for structs where we only warn when the sizeof > is > too small to hold the base struct together with supressing warnings > when > the first field is a struct? That sounds like it could work. There are several things going on in the above example: - fake inheritance - the "trailing array idiom": struct string_obj's final field is: char str_buf[]; meaning that the string_obj will have the char buffer trailing off the end, and the allocation is expected to support this. This is not uncommon in C; it occurs in CPython, see e.g.: https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h where CPython's PyBytesObject has the bytes in an ob_sval field trailing off the end: typedef struct { PyObject_VAR_HEAD Py_DEPRECATED(3.11) Py_hash_t ob_shash; char ob_sval[1]; /* Invariants: * ob_sval contains space for 'ob_size+1' elements. * ob_sval[ob_size] == 0. * ob_shash is the hash of the byte string or -1 if not computed yet. */ } PyBytesObject; so it would be good for the warning to handle it gracefully, which I think your proposal above would. I try to have plenty of idiomatic C code in the analyzer test suite to try to catch this kind of thing (as well as more "unit test" kinds of test coverage); we want the warnings to have a good signal:noise ratio. > > > * I'm unable to emit a warning whenever the cast happens at an > assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. > This is because I'm unable to access a region_svalue for the returned > value. Even in the new_program_state, the svalue of the lhs is still a > conjured_svalue. Maybe David can lead me to a place where I can access > the return value's region_svalue or do I have to adapt the engine? > > Please can you try reposting the patch?  I tried to read it, but am > having trouble with the mangled indentation. See my inline answer above. Both, the test case and from where I want to access the region_svalue are commented with // FIXME. What does the dump of the state look like? e.g. via calling  (gdb) call m_region_model->debug() from within gdb A conjured_svalue represents the result of a call to an external function (or a side-effect written out to a *out-style param of such a function), but we have the body of create_buffer, so the call to create_buffer should be analyzed interprocedurally, and we should have a region_svalue pointing at a heap_allocated_region. You might want to simplify things to just the functions of interest, and then have a look at the output of -fdump-analyzer-exploded-graph in your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora). I wonder if my idea from the other email of moving the test from sm- malloc.cc to region-model.cc might affect this; the state machines run at a slightly different time to the region model updates. > > > * attr-malloc-6.c and pr96639.c did both contain structs without an > implementation. Something in the analyzer must have triggered another > warning about the usage of those without them having an implementation. > I changed those structs to have an empty implementation, such that the > additional warning are gone. I think this shouldn't change the test > case, so is this change okay? > > What were the new warnings? /path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type ‘struct FILE’   175 |     FILE *p = malloc (100);   // { dg-message "allocated here" }       |               ^~~~~~~~~~~~ All were like the one above. error: invalid use of undefined type 'struct XXX' That error looks bogus; I'm guessing that something the new diagnostics is calling is generating it. You can probably track it down by using (gdb) break-on-diagnostic in the debugger, and then seeing what the backtrace shows when the breakpoint fires. See: https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html break-on-diagnostic is one of the things in the support scripts mentioned on that page. Hope this is helpful (BTW, I'm about to disappear for a long weekend; I'm back on Tuesday) Dave