From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dmalcolm@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 800383857BA8
 for <gcc@gcc.gnu.org>; Wed, 22 Jun 2022 18:23:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 800383857BA8
Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com
 [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-220-WJJropmEMSKNBIkKHZoojg-1; Wed, 22 Jun 2022 14:23:12 -0400
X-MC-Unique: WJJropmEMSKNBIkKHZoojg-1
Received: by mail-qv1-f72.google.com with SMTP id
 x18-20020a0ce252000000b004703cbb92ebso10866828qvl.21
 for <gcc@gcc.gnu.org>; Wed, 22 Jun 2022 11:23:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to
 :references:user-agent:mime-version:content-transfer-encoding;
 bh=GBoowp/09+P3u7DZbAORQ+VJlt7If7nL8lFYFoXW424=;
 b=SUWjm8So/WEq2ks0hScxUHlMftyBbScH7QTCKkFViq/fEVNXDYI0/gM+rCFHbMiQkm
 gOtamtF0TMEw6iRiviCUm/l3kRhSeBtQJOmK50ziFzssEcGsrJKR+CkLlO8g7Rhl5cGY
 pE2Z7A1XMX1RZYVcy5e81pHlhFKNEcLHIfZ9rKQEZCnD9sQOEpCc4AzpOjfGdcOih+BD
 SvGGYvhCUbphmc1bTvCbAYeJdJ3WOTFTxmGvnrfW8Pc/+IbL7ckRkuQAK215pyr15WEo
 Jhdy67IkC9mDnK/YbPXrsZxMVmTQiB3Rel+g+jez8CS1Z9bo6xizkW29pcRY23vahSsE
 qLqg==
X-Gm-Message-State: AJIora9fxWwiJlNacjs5oEC891ly2C5nzDCeE3FlPUUcVelXkNhFWQWc
 e9AOlsm2UlJsq6e74f7n2RFt7NWHe4zizCMsYA1Ll+HNGPkLMm+gWJiCHcMLjzaLcHjPiHKVwY/
 8ca7PkAE=
X-Received: by 2002:ad4:594f:0:b0:470:5b16:808 with SMTP id
 eo15-20020ad4594f000000b004705b160808mr6503146qvb.51.1655922190988; 
 Wed, 22 Jun 2022 11:23:10 -0700 (PDT)
X-Google-Smtp-Source: AGRyM1uBR1O3tYskS1CA3JmmVlzn0U9cOzFXEhb31PtIAZVwBTaZjg1NMolGL25YxZnDgQcb4RxzdQ==
X-Received: by 2002:ad4:594f:0:b0:470:5b16:808 with SMTP id
 eo15-20020ad4594f000000b004705b160808mr6503114qvb.51.1655922190435; 
 Wed, 22 Jun 2022 11:23:10 -0700 (PDT)
Received: from t14s.localdomain (c-73-69-212-193.hsd1.ma.comcast.net.
 [73.69.212.193]) by smtp.gmail.com with ESMTPSA id
 h3-20020a05622a170300b002f905347586sm16771470qtk.14.2022.06.22.11.23.09
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Wed, 22 Jun 2022 11:23:09 -0700 (PDT)
Message-ID: <15b109860a61776e87f83b0286558fcc9b7c4047.camel@redhat.com>
Subject: Re: [RFC] analyzer: allocation size warning
From: David Malcolm <dmalcolm@redhat.com>
To: Tim Lange <mail@tim-lange.me>
Cc: gcc@gcc.gnu.org
Date: Wed, 22 Jun 2022 14:23:08 -0400
In-Reply-To: <20220622145755.38734-1-mail@tim-lange.me>
References: <5e65c86d518c6e26a970b5fb23c2ed507abd2085.camel@redhat.com>
 <20220622145755.38734-1-mail@tim-lange.me>
User-Agent: Evolution 3.38.4 (3.38.4-1.fc33)
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF,
 GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE,
 WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Jun 2022 18:23:20 -0000

On Wed, 2022-06-22 at 16:57 +0200, Tim Lange wrote:
> The checker reaches region-model.cc#3083 in my patch with the
>   impl_region_model_context
> on the 'after' node of create_buffer() but then discards the warning
> inside
> impl_region_model_context::warn because m_stmt is null. Even if m_stmt
> were
> not be NULL at the 'after' node, my warning would be emitted before the
> return edge was taken and thus be wrongly indented like shown below:
> /path/to/.c:10:16: warning: Allocated buffer size is not a multiple of
> the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    10 |     int *buf = create_buffer(42);
>       |                ^~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |    9 |   int main (int argc, char **argv) {
>     |      |       ^~~~
>     |      |       |
>     |      |       (1) entry to ‘main’
>     |   10 |     int *buf = create_buffer(42);
>     |      |                ~~~~~~~~~~~~~~~~~
>     |      |                |
>     |      |                (2) calling ‘create_buffer’ from ‘main’
>     |
>     +--> ‘create_buffer’: events 3-4
>            |
>            |    4 |   void *create_buffer(int n)
>            |      |         ^~~~~~~~~~~~~
>            |      |         |
>            |      |         (3) entry to ‘create_buffer’
>            |    5 |   {
>            |    6 |     return malloc(n);
>            |      |            ~~~~~~~~~
>            |      |            |
>            |      |            (4) allocated 42 bytes here
>            |
>          ‘main’: event 5
>            |
>            |   10 |     int *buf = create_buffer(42);
>            |      |                ^~~~~~~~~~~~~~~~~
>            |      |                |
>            |      |                (5) Assigned to ‘int *’ here
>            |
>            
> The correct warning should be:
> /path/to/.c:10:16: warning: Allocated buffer size is not a multiple of
> the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    10 |     int *buf = create_buffer(42);
>       |                ^~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |    9 |   int main (int argc, char **argv) {
>     |      |       ^~~~
>     |      |       |
>     |      |       (1) entry to ‘main’
>     |   10 |     int *buf = create_buffer(42);
>     |      |                ~~~~~~~~~~~~~~~~~
>     |      |                |
>     |      |                (2) calling ‘create_buffer’ from ‘main’
>     |
>     +------------> ‘create_buffer’: events 3-4
>                    |
>                    |    4 |   void *create_buffer(int n)
>                    |      |         ^~~~~~~~~~~~~
>                    |      |         |
>                    |      |         (3) entry to ‘create_buffer’
>                    |    5 |   {
>                    |    6 |     return malloc(n);
>                    |      |            ~~~~~~~~~
>                    |      |            |
>                    |      |            (4) allocated 42 bytes here
>                    |
> ‘main’: event 5 <--+
>    |
>    |   10 |     int *buf = create_buffer(42);
>    |      |                ^~~~~~~~~~~~~~~~~
>    |      |                |
>    |      |                (5) Assigned to ‘int *’ here
>    |
> For that, the return edge has to be visited to be part of the
> emission_path.
> This is currently not the case as the assignment of the <return_value>
> to
> the caller lhs is handled inside pop_frame, which is transitively
> called
> from program_state::on_edge of the 'after' node of the callee.
> I tried to defer the set_value(caller lhs, <return_value>) call to the
> 'before' node after the return edge but failed to do elegantly. My last
> try
> is in the patch commented out with // FIXME.
> My main problem is that I can not pop the frame and later get the
> return value easily. Deferring the whole pop_frame to the before node
> breaks the assumptions inside exploded_graph::get_or_create_node.
> 
> I don't know what's the best/elegant way of solving this. Is a solution
> to
> attach the return svalue to the return edge and then use it later in
> the
> PK_BEFORE_SUPERNODE?

The ctxt is created here:

#5  0x00000000012f5856 in ana::program_state::on_edge (this=this@entry=0x7fffffffc8c0, eg=..., enode=enode@entry=0x2d8d970, 
    succ=succ@entry=0x2d0e590, uncertainty=uncertainty@entry=0x7fffffffc990) at ../../src/gcc/analyzer/program-state.cc:1035
1035	  if (!m_region_model->maybe_update_for_edge (*succ,
(gdb) list
1030	  impl_region_model_context ctxt (eg, enode,
1031					  &enode->get_state (),
1032					  this,
1033					  uncertainty, NULL,
1034					  last_stmt);
1035	  if (!m_region_model->maybe_update_for_edge (*succ,
1036						      last_stmt,
1037						      &ctxt, NULL))

I tried another approach: to provide a custom stmt_finder for this
ctxt, which uses the "returning call" stmt for the destination
supernode:

diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 7ad581c7fbd..11554de9484 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -996,6 +996,29 @@ program_state::get_current_function () const
   return m_region_model->get_current_function ();
 }
 
+// FIXME
+
+class returning_call_stmt_finder : public stmt_finder
+{
+public:
+  returning_call_stmt_finder (const superedge *succ): m_succ (succ) {}
+
+  stmt_finder *clone () const final override
+  {
+    return new returning_call_stmt_finder (m_succ);
+  }
+  const gimple *find_stmt (const exploded_path &) final override
+  {
+    if (m_succ->m_dest)
+      if (m_succ->m_dest->get_returning_call ())
+       return m_succ->m_dest->get_returning_call ();
+    return NULL;
+  }
+
+private:
+  const superedge *m_succ;
+};
+
 /* Determine if following edge SUCC from ENODE is valid within the graph EG
    and update this state accordingly in-place.
 
@@ -1018,6 +1041,8 @@ program_state::on_edge (exploded_graph &eg,
   const program_point &point = enode->get_point ();
   const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
 
+  returning_call_stmt_finder stmt_finder (succ);
+
   /* For conditionals and switch statements, add the
      relevant conditions (for the specific edge) to new_state;
      skip edges for which the resulting constraints
@@ -1031,7 +1056,7 @@ program_state::on_edge (exploded_graph &eg,
                                  &enode->get_state (),
                                  this,
                                  uncertainty, NULL,
-                                 last_stmt);
+                                 last_stmt, &stmt_finder);
   if (!m_region_model->maybe_update_for_edge (*succ,
                                              last_stmt,
                                              &ctxt, NULL))

Doing so leads to this output:

../../src/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c: In function ‘create_buffer’:
../../src/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c:15:14: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
      |              ^~~~~~~~~~~~~~~~~
  ‘test_1’: events 1-2
    |
    |   12 | void test_1(void)
    |      |      ^~~~~~
    |      |      |
    |      |      (1) entry to ‘test_1’
    |......
    |   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
    |      |              ~~~~~~~~~~~~~~~~~
    |      |              |
    |      |              (2) calling ‘create_buffer’ from ‘test_1’
    |
    +--> ‘create_buffer’: events 3-4
           |
           |    7 | void *create_buffer(int n)
           |      |       ^~~~~~~~~~~~~
           |      |       |
           |      |       (3) entry to ‘create_buffer’
           |    8 | {
           |    9 |   return malloc(n);
           |      |          ~~~~~~~~~
           |      |          |
           |      |          (4) allocated (long unsigned int)42 here
           |
         ‘test_1’: event 5
           |
           |   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
           |      |              ^~~~~~~~~~~~~~~~~
           |      |              |
           |      |              (5) Assigned to ‘int *’ here
           |

which fixes the stmt and the enclosing function decl for event 5 (the
assignment to the "int *buf"), but annoyingly the stack depth
information is wrong; I think the saved diagnostic is being associated
with the existing exploded_node (in create_buffer), whereas I want it
to use the supernode for test_1, which doesn't yet have an exploded
node when pop_frame is called.  I have various ideas for tackling this:
- have two contexts for pop_frame: one in the old frame, the other in
the new frame (for the caller)
- generalize stmt_finder so it can also update the supernode to use
- rework pop_frame (I've had to do this before, I've run into issues
like this before).

I think it's best to keep this issue as an expected failure, and file a
bug about it, so that we can tackle it by itself, and not block you
from making further progress on this patch.

Various review comments inline below...


Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/checker-path.cc                  |  12 +-
 gcc/analyzer/checker-path.h                   |   2 +-
 gcc/analyzer/engine.cc                        |  12 +
 gcc/analyzer/pending-diagnostic.h             |  21 ++
 gcc/analyzer/region-model.cc                  | 322 ++++++++++++++++++
 gcc/analyzer/region-model.h                   |   4 +
 .../gcc.dg/analyzer/allocation-size-1.c       |  63 ++++
 .../gcc.dg/analyzer/allocation-size-2.c       |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c       |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  92 +++++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c      |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +
 14 files changed, 627 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible
type.
+


Any time we add a new option to analyzer.opt we're going to need to add
corresponding documentation to gcc/doc/invoke.texi.  Grep for some of
the existing analyzer warnings to see examples.


 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is
called.


[...snip...]

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
model.cc
index 6b49719d521..acb8bd1bfca 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-operands.h"
 #include "ssa-iterators.h"
 #include "calls.h"
+#include "print-tree.h"
 
 #if ENABLE_ANALYZER
 
@@ -653,6 +654,66 @@ private:
   tree m_count_cst;
 };
 
+/* Concrete subclass for casts of pointers that lead to trailing
bytes.  */
+
+class dubious_allocation_size
+: public pending_diagnostic_subclass<dubious_allocation_size>
+{
+public:
+  dubious_allocation_size (const region *lhs, const region *rhs, 
+                           const svalue *capacity)
+  : m_lhs(lhs), m_rhs(rhs), m_capacity(capacity) {}
+
+  const char *get_kind () const final override 
+  { 
+    return "dubious_allocation_size"; 
+  }
+
+  bool operator== (const dubious_allocation_size &other) const
+  {
+    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;;
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+    return warning_meta (rich_loc, m, get_controlling_option (),
+              "Allocated buffer size is not a multiple of the
pointee's size");

Style nit: our diagnostic messages don't start with a capital letter.

I think this would benefit from a note, via "inform", saying the
sizeof() the pointee; something like:

  bool warned = warning_meta (rich_loc, m, get_controlling_option (),
                              "allocated buffer size is not a"
                              " multiple of the pointee's size");
  if (warned)
    inform (rich_loc->get_location, "%<sizeof(%E)%> is %qE",
                                    etc, etc);
  return warned;

or somesuch.


+  }
+
+  label_text 
+  describe_region_creation_event (const evdesc::region_creation &ev)
final override
+  {
+    // TODO: better way to print the capacity
+    return ev.formatted_print ("allocated %s here", 

Maybe: "allocated here (%s bytes)" ?

+                                          m_capacity-
>get_desc(true).m_buffer);

Annoyingly, label_text doesn't have an automatically working
destructor, due to us (until recently) only being able to use C++98.  
So as written, this leaks memory.  Now that we can use C++11, maybe we
should fix label_text to have a dtor, move assignment, etc, but it's
probably simpler in the short term to simply fix the leak.

+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev)
final override
+  {
+    return ev.formatted_print ("Assigned to %qT here", m_lhs->get_type
());

Style nit: make initial letter of message lower-case.

+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_lhs)
+      interest->add_region_creation (m_lhs);
+    if (m_rhs)
+      interest->add_region_creation (m_rhs);
+  }
+
+private:
+  const region *m_lhs;
+  const region *m_rhs;
+  const svalue *m_capacity;
+};
+
 /* If ASSIGN is a stmt that can be modelled via
      set_value (lhs_reg, SVALUE, CTXT)
    for some SVALUE, get the SVALUE.
@@ -2799,6 +2860,241 @@ region_model::check_region_for_read (const
region *src_reg,
   check_region_access (src_reg, DIR_READ, ctxt);
 }
 
+/* Returns the trailing bytes on dubious allocation sizes.  */
+
+static unsigned HOST_WIDE_INT 
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW
(pointee_size_tree);
+  if (pointee_size == 0)
+    return 0;
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  return alloc_size % pointee_size;
+}
+
+/* Visits svalues and checks whether the 
+   size_cst is a operand of the svalue.  */
+
+class size_visitor : public visitor
+{
+public:
+  size_visitor(tree size_cst, const svalue *sval, constraint_manager
*cm) 
+  : m_size_cst(size_cst), m_sval(sval), m_cm(cm)
+  {
+    sval->accept(this);
+  }
+
+  bool get_result()
+  {
+    /* The result_set gradually builts from atomtic nodes upwards. If
a node is

Typo: atomtic -> atomic

+       in the result_set, itself or one/all of its children have an
operand that
+       is a multiple of the size_cst. If the root is inside, the given
sval 
+       is valid aka a multiple of the size_cst.*/
+    return result_set.contains(m_sval);
+  }
+
+  void 
+  visit_constant_svalue (const constant_svalue *sval) final override
+  {
+    unsigned HOST_WIDE_INT sval_int
+         = TREE_INT_CST_LOW (sval->get_constant ());
+    unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW
(m_size_cst);
+    if (size_cst_int == 0 || sval_int % size_cst_int == 0)
+      result_set.add (sval);
+  }
+
+  void 
+  visit_unknown_svalue (const unknown_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void 
+  visit_poisoned_svalue (const poisoned_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+  
+  void visit_unaryop_svalue (const unaryop_svalue *sval) 
+  {
+    const svalue *arg = sval->get_arg ();
+    arg->accept (this);
+    if (result_set.contains (arg))
+       result_set.add (sval);
+  }
+
+  void visit_binop_svalue (const binop_svalue *sval) final override
+  {
+    const svalue *arg0 = sval->get_arg0 ();
+    const svalue *arg1 = sval->get_arg1 ();
+
+    arg0->accept (this);
+    arg1->accept (this);
+    if (sval->get_op () == MULT_EXPR)
+      {
+       if (result_set.contains (arg0) || result_set.contains (arg1))
+         result_set.add (sval);
+      }
+    else
+      {
+       if (result_set.contains (arg0) && result_set.contains (arg1))
+         result_set.add (sval);
+      }
+  }
+
+  void visit_repeated_svalue (const repeated_svalue *sval) 
+  {
+    sval->get_inner_svalue ()->accept(this);
+    if (result_set.contains (sval->get_inner_svalue ()))
+      result_set.add (sval);
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue *sval) final
override
+  {
+    sval->get_arg ()->accept (this);
+    if (result_set.contains (sval->get_arg ()))
+      result_set.add (sval);
+  }
+
+  void visit_widening_svalue (const widening_svalue *sval) final
override
+  {
+    const svalue *base = sval->get_base_svalue ();
+    const svalue *iter = sval->get_iter_svalue ();
+
+    base->accept(this);
+    iter->accept(this);
+    if (result_set.contains (base) && result_set.contains (iter))
+      result_set.add (sval);
+  }
+
+  void visit_conjured_svalue (const conjured_svalue *sval
ATTRIBUTE_UNUSED) 
+    final override
+  {
+    if (m_cm->get_equiv_class_by_svalue (sval, NULL))
+      result_set.add (sval);
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue *sval
ATTRIBUTE_UNUSED) 
+    final override
+  {
+    // TODO: Should we do something else than assume it could be
correct
+    result_set.add (sval);

I think we just have to assume it is.

+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue 
+                                     *sval ATTRIBUTE_UNUSED) final
override
+  {
+    // TODO: Should we do something else than assume it could be
correct
+    result_set.add (sval);

Probably here as well.

+  }
+
+private:
+  tree m_size_cst;
+  const svalue *m_sval;
+  constraint_manager *m_cm;
+  svalue_set result_set; /* Used as a mapping of svalue*->bool.  */
+};
+
+/* Returns true if there is a constant tree with 
+   the same constant value inside the sval.  */
+
+static bool
+const_operand_in_sval_p (tree type_size_cst, const svalue *sval,
+                         constraint_manager *cm)
+{
+  size_visitor v(type_size_cst, sval, cm);
+  // sval->accept(&v);
+  return v.get_result ();
+}
+
+/* Special handling for structs with "inheritance" or that hold an
unbounded 
+     type. Those will be skipped to prevent false positives.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree maybe_struct)
+{
+  if (RECORD_OR_UNION_TYPE_P (maybe_struct))
+    {
+      tree iter = TYPE_FIELDS (maybe_struct);
+      if (iter == NULL_TREE)
+        return false;
+      if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (iter)))
+        return true;
+
+      tree last_field;
+      while (iter != NULL_TREE)
+        {
+          last_field = iter;
+          iter = DECL_CHAIN (iter);
+        }
+
+      if (last_field != NULL_TREE 
+          && COMPLETE_OR_UNBOUND_ARRAY_TYPE_P (TREE_TYPE
(last_field)))
+        return true;
+    }
+  return false;
+}
+
+void
+region_model::check_region_size (const region *lhs_reg, const svalue
*rhs_sval,
+                                                  region_model_context
*ctxt) const
+{

Please add a comment immediately before this function summarizing its
intent.

+  if (!ctxt)
+    return;
+  
+  const region_svalue *reg_sval = dyn_cast <const region_svalue *>
(rhs_sval);
+  if (!reg_sval)
+    return;
+
+  tree pointer_type = lhs_reg->get_type ();
+  if (pointer_type == NULL_TREE || !POINTER_TYPE_P (pointer_type))
+    return;
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* void * is always compatible and make sure that the pointee_type
actually
+     has a size, or else size_in_bytes might fail.  */
+  if (pointee_type == NULL_TREE || VOID_TYPE_P (pointee_type) 
+      || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
+    return;
+  if (struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes(pointee_type);
+  /* The size might be unknown e.g. being a array with n elements
+     or casting to char * never has any trailing bytes.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+    return;
+
+  const svalue *capacity = get_capacity (reg_sval->get_pointee ());
+  switch (capacity->get_kind ())
+    {
+    case svalue_kind::SK_CONSTANT:
+      {
+       const constant_svalue *cap_sval = capacity-
>dyn_cast_constant_svalue ();

You can use:
  as_a <const constant_svalue *> (capacity)
here since we're within the SK_CONSTANT case, avoiding an extra vfunc
call.


+       tree cap = cap_sval->get_constant ();
+       unsigned HOST_WIDE_INT size_diff
+         = capacity_compatible_with_type (cap, pointee_size_tree);
+       if (size_diff != 0)
+         {
+           ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval-
>get_pointee (), capacity));

Nit: some overlong lines here; please wrap to avoid going over 80
columns.

+         }
+      }
+      break;
+    default:
+      {
+       if (!const_operand_in_sval_p (pointee_size_tree, capacity,
m_constraints))
+         {
+           ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval-
>get_pointee (), capacity));

Another overlong line.

+         }
+      }
+      break;
+    }
+}
+
 /* Set the value of the region given by LHS_REG to the value given
    by RHS_SVAL.
    Use CTXT to report any warnings associated with writing to
LHS_REG.  */
@@ -2810,6 +3106,8 @@ region_model::set_value (const region *lhs_reg,
const svalue *rhs_sval,
   gcc_assert (lhs_reg);
   gcc_assert (rhs_sval);
 
+  check_region_size(lhs_reg, rhs_sval, ctxt);
+
   check_region_for_write (lhs_reg, ctxt);
 
   m_store.set_value (m_mgr->get_store_manager(), lhs_reg, rhs_sval,

[...snip...]

diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..cb3df5516e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,63 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes.  */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
+
+struct s {
+  int i;
+};
+
+void test_5 (void)
+{
+  struct s *ptr = malloc (5 * sizeof (struct s));
+  free (ptr);
+}
+
+void test_6 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct s));  /* { dg-line malloc6 }
*/
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
+
+void test_7 (void)
+{
+  char buf[2];
+  int *ptr = (int *)buf; /* { dg-line malloc7 } */
+
+  /* { dg-warning "" "" { target *-*-* } malloc7 } */
+  /* { dg-message "" "" { target *-*-* } malloc7 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..a619a786a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes.  */
+
+void test_1 (void)
+{
+  int n;
+  scanf("%i", &n);
+  short *ptr = malloc (n * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc } */
+  /* { dg-message "" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "" "" { target *-*-* } malloc1 } */
+  /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc3 } */
+  /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+  int m;
+  scanf("%i", &m);
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..32e14bad6ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,92 @@
+#include <stddef.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_1(void) 
+{
+  // FIXME
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* }
} */
+  free (buf);
+}
+
+void test_2(void) 
+{
+  void *buf = create_buffer(42); /* { dg-message } */
+  int *ibuf = buf; /* { dg-line assign2 } */
+  free (ibuf);
+
+  /* { dg-warning "" "" { target *-*-* } assign2 } */
+  /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+  void *buf = malloc(42); /* { dg-message } */
+  if (buf != NULL) /* { dg-message } */
+    {
+      int *ibuf = buf; /* { dg-line assign3 } */
+      free (ibuf);
+    }
+
+  /* { dg-warning "" "" { target *-*-* } assign3 } */
+  /* { dg-message "" "" { target *-*-* } assign3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else if (n == 1)
+    size = 10;
+  else
+    size = 20;
+
+  int *buf = malloc(size); // Size should be 'unknown' at this point
+  free (buf);
+}
+
+void test_5(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 2;
+  else
+    size = 10;
+
+  short *buf = malloc(size); // Size should be widened to 2 and 10,
both fit
+  free (buf);
+}
+
+
+void test_6(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else
+    size = 10;
+
+  short *buf = malloc(size); /* { dg-line malloc6 } */
+  free (buf);
+  
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..8fa6a6eb570 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c.  */
 
 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
 
+struct FILE;
 typedef struct FILE   FILE;
 typedef __SIZE_TYPE__ size_t;
 

Hopefully this change isn't needed anymore.

diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..f9a73c79403 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types -Wno-
analyzer-allocation-size" } */
 
 #include <stdlib.h>

Why is this change needed?  Is this another left-over change from
fixing that stray error?

 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);


I added this testcase in 42c5ae5d7f0ad89b75d93c497fe44b6c66da7e76 to
fix a crash due to a NULL type.

Rather than add -Wno-analyzer-allocation-size, please fix the size
passed to the calloc call.

Thanks; hope this is constructive
Dave