From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id C80F0384D161 for ; Wed, 29 Jun 2022 17:39:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C80F0384D161 Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-321-EfhcoTwENNq-ppCPmBhRCQ-1; Wed, 29 Jun 2022 13:39:55 -0400 X-MC-Unique: EfhcoTwENNq-ppCPmBhRCQ-1 Received: by mail-qv1-f72.google.com with SMTP id mz4-20020a0562142d0400b004726d99aa49so8692977qvb.10 for ; Wed, 29 Jun 2022 10:39:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=fCBXBdOZHGaLXskFNeZ9HaLpMf4f2dDPGOPavRcINbw=; b=m6gbTucLdgLzspcBZCIQ3PxxVJZuSNYbZVA4kvw0XILZvJlYN+kFDTpNVOEEdEN+sO uUYYBNHkFN5akFKpvHxCtoGfOmD54Qy4+kFqKO+0l6HpkuNGgyV4HyxkR8iJwXHWZR2z WbPwOimLXjAlvvqwm1f/pmWZj/F/ULv4s4r+Dr7hbWM/ZWAhSEUobqF1iYuNkYclKgPC WHD/73n8k76BjFuUgRkKK1jheKR5MD2dcONGDbjVZ2U3ta4HqXlFFxpgDUOd3kFz/IoT bdwR9SqLl1L8nk5PN6wXl78yY/M+OZVywfP+x2tO5yfp8ZaYsW2mHU8V7nuNh9qikEDi xkoA== X-Gm-Message-State: AJIora9DVeNmM7LTVmM3Pz4Qi9+AvcK/DVhTH0FiqSbWBzrGhAEOU2eM NN3yoZXURokKycnTcs/brwq4JMpvGJPQxp5+yTvllE5Hxe3OMwSDttB+D8+rwBc5c/GruOBXFOX HsNWsW8Y= X-Received: by 2002:a05:6214:2605:b0:470:2cf3:82f2 with SMTP id gu5-20020a056214260500b004702cf382f2mr7652720qvb.124.1656524394886; Wed, 29 Jun 2022 10:39:54 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sy/xYivhEOVBYoXZL0gutzKfUIQxBg0I9bwJIAgQrSL2qlqOD5tKOZpJSC2WRos8aOaTi7xg== X-Received: by 2002:a05:6214:2605:b0:470:2cf3:82f2 with SMTP id gu5-20020a056214260500b004702cf382f2mr7652683qvb.124.1656524394431; Wed, 29 Jun 2022 10:39:54 -0700 (PDT) Received: from t14s.localdomain (c-73-69-212-193.hsd1.nh.comcast.net. [73.69.212.193]) by smtp.gmail.com with ESMTPSA id j14-20020ac8440e000000b003176b8f948esm11001779qtn.13.2022.06.29.10.39.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jun 2022 10:39:53 -0700 (PDT) Message-ID: Subject: Re: [PATCH v2] analyzer: add allocation size checker From: David Malcolm To: Tim Lange Cc: gcc@gcc.gnu.org Date: Wed, 29 Jun 2022 13:39:52 -0400 In-Reply-To: <20220629153949.74450-1-mail@tim-lange.me> References: <20220629153949.74450-1-mail@tim-lange.me> User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2022 17:40:00 -0000 On Wed, 2022-06-29 at 17:39 +0200, Tim Lange wrote: > Hi, Thanks for the updated patch. Overall, looks nearly ready; various nits inline below, throughout... > > I've addressed most of the points from the review. > * The allocation size warning warns whenever region_model::get_capacity returns > something useful, i.e. also on statically-allocated regions. Thanks. Looks like you added test coverage for this in allocation- size-5.c > * I added a new virtual function to the pending-diagnostic class, so that it > is possible to emit a custom region creation description. > * The test cases should have a better coverage now. > * Conservative struct handling > > The warning now looks like this: > /path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] > 9 | int *iptr = ptr; > | ^~~~ > ‘main’: events 1-2 > | > | 8 | void *ptr = malloc((long unsigned int)n * sizeof(short)); > | | ^~~~~~~~~~~~~~~~~~~~~~~~~ > | | | > | | (1) allocated ‘(long unsigned int)n * 2’ bytes here > | 9 | int *iptr = ptr; > | | ~~~~ > | | | > | | (2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’ > | Looks great. > > /path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size] > 15 | int *ptrw = malloc (sizeof (short)); > | ^~~~~~~~~~~~~~~~~~~~~~~ > ‘main’: events 1-2 > | > | 15 | int *ptrw = malloc (sizeof (short)); > | | ^~~~~~~~~~~~~~~~~~~~~~~ > | | | > | | (1) allocated ‘2’ bytes here Looks a bit weird to be quoting a number here; maybe whenever the expression is just a constant, print it unquoted? (though that could be fiddly to implement, so can be ignored if it turns out to be) . > | | (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’ > | > The only thing I couldn't address was moving the second event toward the lhs or > assign token here. I tracked it down till get_stmt_location where it seems that > the rhs is actually the location of the statement. Is there any way to get (2) > to be focused on the lhs? Annoyingly, we've lost a lot of location information by the time the analyzer runs. In theory we could special-case for when we have the def-stmt of the SSA_NAME that's that default (i.e. initial) value of a VAR_DECL, and if we see the write is there, we could use the DECL_SOUCE_LOCATION of the VAR_DECL for the write, so that we'd get: | 15 | int *ptrw = malloc (sizeof (short)); | | ^~~~ ^~~~~~~~~~~~~~~~~~~~~~~ | | | | | | | (1) allocated ‘2’ bytes here | | (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’ | which is perhaps slightly more readable. I'm not sure it's worth it though. > > Otherwise, the patch compiled coreutils, openssh, curl and httpd without any > false-positive (but none of them contained a bug found by the checker either). Great. > `make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on > the event splitting, the regression tests are yet to run. > > - Tim > > > This patch adds an checker that warns about code paths in which a buffer is > assigned to a incompatible type, i.e. when the allocated buffer size is not a > multiple of the pointee's size. > > gcc/analyzer/ChangeLog: You should add a reference to the RFE bug to the top of the ChangeLog entries: PR analyzer/105900 Please also add it to the commit message, in the form " [PR105900]"; see the examples section twoards the end of https://gcc.gnu.org/contribute.html#patches > > * analyzer.opt: Added Wanalyzer-allocation-size. [...snip...] > > gcc/ChangeLog: ...and here > > * doc/invoke.texi: Added Wanalyzer-allocation-size. > > gcc/testsuite/ChangeLog: ...and here > > * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning. > * gcc.dg/analyzer/allocation-size-1.c: New test. > * gcc.dg/analyzer/allocation-size-2.c: New test. > * gcc.dg/analyzer/allocation-size-3.c: New test. > * gcc.dg/analyzer/allocation-size-4.c: New test. > * gcc.dg/analyzer/allocation-size-5.c: New test. > > Signed-off-by: Tim Lange [...snip...] > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt > index 4aea52d3a87..912def2faf2 100644 > --- a/gcc/analyzer/analyzer.opt > +++ b/gcc/analyzer/analyzer.opt > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider > Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param > The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format. > > +Wanalyzer-allocation-size > +Common Var(warn_analyzer_allocation_size) Init(1) Warning > +Warn about code paths in which a buffer is assigned to a incompatible type. Reword "buffer" to "pointer to a buffer", I think. "a incompatible" -> "an incompatible" [...snip...] > +/* Concrete subclass for casts of pointers that lead to trailing bytes. */ > + > +class dubious_allocation_size > +: public pending_diagnostic_subclass > +{ > +public: > + dubious_allocation_size (const region *lhs, const region *rhs) > + : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE) > + {} [...snip...] > + bool operator== (const dubious_allocation_size &other) const > + { > + return m_lhs == other.m_lhs && m_rhs == other.m_rhs; Probably should also check that: same_tree_p (m_expr, other.m_expr); [...snip...] > +/* Return true on dubious allocation sizes for constant sizes. */ > + > +static bool > +capacity_compatible_with_type (tree cst, tree pointee_size_tree, > + bool is_struct) > +{ > + unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree); > + if (pointee_size == 0) > + return 0; "false" rather than 0, given that this is bool. [...snip...] > +/* Return true if the lhs and rhs of an assignment have different types. */ > + > +static bool > +is_any_cast_p (const gimple *stmt) > +{ > + if (const gassign *assign = dyn_cast(stmt)) > + return gimple_assign_cast_p (assign) > + || (gimple_num_ops (assign) == 2 > + && !pending_diagnostic::same_tree_p ( > + TREE_TYPE (gimple_assign_lhs (assign)), > + TREE_TYPE (gimple_assign_rhs1 (assign)))); The "== 2" subclause in the above condition doesn't look quite right to me; what statements did you encounter that needed this? [...snip...] > @@ -9758,6 +9759,18 @@ By default, the analysis silently stops if the code is too > complicated for the analyzer to fully explore and it reaches an internal > limit. The @option{-Wanalyzer-too-complex} option warns if this occurs. > > +@item -Wno-analyzer-allocation-size > +@opindex Wanalyzer-allocation-size > +@opindex Wno-analyzer-allocation-size > +This warning requires @option{-fanalyzer}, which enables it; use > +@option{-Wno-analyzer-allocation-size} > +to disable it. > + > +This diagnostic warns for paths through the code in which a buffer is casted > +to a type where the buffer size is not a multiple of the pointee size. At the risk of bikeshedding, how about: This diagnostic warns for paths through the code in which a pointer is assigned to point at a buffer with a size that is not a multiple of sizeof(*pointer). See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}. [...snip...] > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c > new file mode 100644 > index 00000000000..02634ae883b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c > @@ -0,0 +1,102 @@ > +#include > +#include > + > +/* Tests with constant buffer sizes. */ > + > +void test_1 (void) > +{ > + short *ptr = malloc (21 * sizeof (short)); > + free (ptr); > +} > + > +void test_2 (void) > +{ > + int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */ > + free (ptr); > + > + /* { dg-warning "" "" { target *-*-* } malloc2 } */ > + /* { dg-message "" "" { target *-*-* } malloc2 } */ The various dg-warning and dg-message directives here (and throughout the rest of the patch) shouldn't have just "" "" for their first two args. The first arg should be a regexp that matches some (nonempty) subset of the expected text. There's a balance to be struck between: (a) terseness to avoid "gold plating" the test output (where making any change to the wording would involve lots of tedious updates to test directives) versus (b) giving us test coverage that the message is sane, so that if we accidentally break the wording due to future changes to the analyzer, then at least one test starts failing Probably best for most of these regexps to be terse, but an empty regexp is too terse. The 2nd arg helps us disambiguate with directive we're talking about, so can be "warning" and "note" for the two above. [...snip...] > +void test_5 (void) > +{ > + int user_input; > + scanf("%i", &user_input); > + int n; > + if (user_input == 0) > + n = 21 * sizeof (short); > + else > + n = 42 * sizeof (short); I see you've used scanf, presumably to get a symbolic value for the variable. If so, a simpler way to do this is to simply use a parameter to the test function. But there's no need to change these test cases. Perhaps scanf should taint its arguments, which is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021 but obviously that would be a different patch. [...snip...] > +void test_9(void) > +{ > + // FIXME Please make this comment more descriptive about what the issue here is. > + int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */ > + free (buf); > +} > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c > new file mode 100644 > index 00000000000..cb35a9d717b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c [...snip...] > +void test_8(void) > +{ > + int n; > + scanf("%i", &n); > + // FIXME Make the comment more descriptive please. > + int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" "" { xfail *-*-* } } */ > + free (buf); > +} > + > +void test_9 (void) > +{ > + int n; > + scanf("%i", &n); > + /* n is a conjured_svalue. */ > + void *ptr = malloc (n); /* { dg-message } */ > + int *iptr = (int *)ptr; /* { dg-line assign9 } */ > + free (iptr); > + > + /* { dg-warning "" "" { target *-*-* } assign9 } */ > + /* { dg-message "" "" { target *-*-* } assign9 } */ > +} > + > +void test_11 (void) > +{ > + int n; > + scanf("%i", &n); > + void *ptr = malloc (n); > + if (n == 4) Presumably this should be a test against sizeof (int), rather than 4? Please add a testcase where the comparison is against the wrong constant. > + { > + /* n is a conjured_svalue but guarded. */ > + int *iptr = (int *)ptr; > + free (iptr); > + } > + else > + free (ptr); > +} [...snip...] > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c > new file mode 100644 > index 00000000000..afb1782e0cd > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c > @@ -0,0 +1,40 @@ > +#include > +#include > + > +/* Tests related to structs. */ Looks like this was copied-and-pasted, and should be updated to "Tests of statically-sized buffers" or somesuch. > + > +typedef struct a { > + short s; > +} a; > + > +int *test_1 (void) > +{ > + a A; > + A.s = 1; > + int *ptr = (int *) &A; /* { dg-line assign1 } */ > + return ptr; > + > + /* { dg-warning "" "" { target *-*-* } assign1 } */ > + /* { dg-message "" "" { target *-*-* } assign1 } */ > +} > + > +int *test2 (void) > +{ > + char arr[4]; I think this needs to be sizeof(int), rather than 4. > + int *ptr = (int *)arr; > + return ptr; > +} > + > +int *test3 (void) > +{ > + char arr[2]; > + int *ptr = (int *)arr; /* { dg-line assign3 } */ > + return ptr; > + > + /* { dg-warning "" "" { target *-*-* } assign3 } */ > + /* { dg-message "" "" { target *-*-* } assign3 } */ > +} > + > +int main() { > + return 0; > +} [...snip...] Thanks again for the v2 patch; hope the above makes sense Dave