public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] analyzer: allocation size warning
@ 2022-06-17 15:54 Tim Lange
  2022-06-17 17:15 ` Prathamesh Kulkarni
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Tim Lange @ 2022-06-17 15:54 UTC (permalink / raw)
  To: David Malcolm, GCC Mailing List

Hi everyone,

tracked in PR105900 [0], I'd like to add support for a new warning on 
dubious allocation sizes. The new checker emits a warning when the 
allocation size is not a multiple of the type's size. With the checker, 
following mistakes are detected:
  int *arr = malloc(3); // forgot to multiply by sizeof
  arr[0] = ...;
  arr[1] = ...;
or
  int *buf = malloc (n + sizeof(int)); // probably should be * instead 
of +
Because it is implemented inside the analyzer, it also emits warnings 
when the buffer is first of type void* and later on casted to something 
else. Though, this also inherits a limitation. The checker can not 
distinguish 2 * sizeof(short) from sizeof(int) because sizeof is 
resolved and constants are folded at the point when the analyzer runs. 
As a mitigation, I plan to implement a check in the frontend that emits 
a warning if sizeof(lhs pointee type) is not part of the malloc 
argument.

I'm looking for a first feedback on the phrasing of the diagnostics as 
well on the preliminary patch [1].

On constant buffer sizes, the warnings looks like this:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
      | ^~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_2’: event 1
    |
    | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } 
*/
    | | ^~~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
bytes; either the allocated size is bogus or the type on the left-hand 
side is wrong
    |

On symbolic buffer sizes:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } */
      | ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_3’: event 1
    |
    | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } 
*/
    | | ^~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) Allocation is incompatible with ‘int *’; either the 
allocated size is bogus or the type on the left-hand side is wrong
    |

And this is how a simple flow looks like:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   39 | int *iptr = (int *)ptr; /* { dg-line assign } */
      | ^~~~
  ‘test_4’: events 1-2
    |
    | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
    | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) allocated here
    | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
    | | ~~~~
    | | |
    | | (2) ‘ptr’ is incompatible with ‘int *’; either the 
allocated size at (1) is bogus or the type on the left-hand side is 
wrong
    |

There are some things to discuss from my side:
* The tests with the "toy re-implementation of CPython's object 
model"[2] fail due to a extra warning emitted. Because the analyzer 
can't know the calculation actually results in a correct buffer size 
when viewed as a string_obj later on, it emits a warning, e.g. at line 
61 in data-model-5.c. The only mitigation would be to disable the 
warning for structs entirely. Now, the question is to rather have noise 
on these cases or disable the warning for structs entirely?
* I'm unable to emit a warning whenever the cast happens at an 
assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
This is because I'm unable to access a region_svalue for the returned 
value. Even in the new_program_state, the svalue of the lhs is still a 
conjured_svalue. Maybe David can lead me to a place where I can access 
the return value's region_svalue or do I have to adapt the engine?
* attr-malloc-6.c and pr96639.c did both contain structs without an 
implementation. Something in the analyzer must have triggered another 
warning about the usage of those without them having an implementation. 
I changed those structs to have an empty implementation, such that the 
additional warning are gone. I think this shouldn't change the test 
case, so is this change okay?

- Tim

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105900
[1] While all tests except the cpython ones work, I have yet to test it 
on large C projects
[2] FAIL: gcc.dg/analyzer/data-model-5.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5b.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5c.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5d.c (test for excess errors)
    FAIL: gcc.dg/analyzer/first-field-2.c (test for excess errors)

-------

Subject: [PATCH] analyzer: add allocation size warning

This patch adds an allocation size checker to the analyzer.
The checker warns when the tracked buffer size is not a multiple of the 
left-hand side pointee's type. This resolves PR analyzer/105900.

The patch is not yet fully tested.

gcc/analyzer/ChangeLog:

        * analyzer.opt: Add Wanalyzer-allocation-size.
        * sm-malloc.cc (class dubious_allocation_size): New 
pending_diagnostic subclass.
        (capacity_compatible_with_type): New.
        (const_operand_in_sval_p): New.
        (struct_or_union_with_inheritance_p): New.
        (check_capacity): New.
        (malloc_state_machine::on_stmt): Add calls to 
on_pointer_assignment.
        (malloc_state_machine::on_allocator_call): Add node to 
parameters and call to on_pointer_assignment.
        (malloc_state_machine::on_pointer_assignment): New.

gcc/testsuite/ChangeLog:

        * gcc.dg/analyzer/attr-malloc-6.c: Disabled 
Wanalyzer-allocation-size and added default implementation for FILE.
        * gcc.dg/analyzer/capacity-1.c: Added dg directives.
        * gcc.dg/analyzer/malloc-4.c: Disabled 
Wanalyzer-allocation-size.
        * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size 
and added default implementation for foo and bar.
        * gcc.dg/analyzer/allocation-size-1.c: New test.
        * gcc.dg/analyzer/allocation-size-2.c: New test.
        * gcc.dg/analyzer/allocation-size-3.c: New test.
        * gcc.dg/analyzer/allocation-size-4.c: New test.

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt | 4 +
 gcc/analyzer/sm-malloc.cc | 363 +++++++++++++++++-
 .../gcc.dg/analyzer/allocation-size-1.c | 54 +++
 .../gcc.dg/analyzer/allocation-size-2.c | 44 +++
 .../gcc.dg/analyzer/allocation-size-3.c | 48 +++
 .../gcc.dg/analyzer/allocation-size-4.c | 39 ++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
 gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
 10 files changed, 559 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.

+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible 
type.
+
 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is 
called.
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 3bd40425919..790c9f0e57d 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
 #include "attribs.h"
 #include "analyzer/function-set.h"
 #include "analyzer/program-state.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"

 #if ENABLE_ANALYZER

@@ -428,6 +430,7 @@ private:
   get_or_create_deallocator (tree deallocator_fndecl);

   void on_allocator_call (sm_context *sm_ctxt,
+ const supernode *node,
      const gcall *call,
      const deallocator_set *deallocators,
      bool returns_nonnull = false) const;
@@ -444,6 +447,16 @@ private:
   void on_realloc_call (sm_context *sm_ctxt,
    const supernode *node,
    const gcall *call) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign_stmt,
+ tree lhs,
+ tree rhs) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gcall *call,
+ tree lhs,
+ tree rhs) const;
   void on_zero_assignment (sm_context *sm_ctxt,
       const gimple *stmt,
       tree lhs) const;
@@ -1432,6 +1445,117 @@ private:
   const char *m_funcname;
 };

+/* Concrete subclass for casts of pointers that lead to trailing 
bytes. */
+
+class dubious_allocation_size : public malloc_diagnostic
+{
+public:
+ dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
tree rhs,
+ tree size_tree, unsigned HOST_WIDE_INT size_diff)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::CONSTANT_SIZE),
+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
+ {}
+
+ dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
tree rhs,
+ tree size_tree)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::MISSING_OPERAND),
+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
+ {}
+
+ const char *get_kind () const final override
+ {
+ return "dubious_allocation_size";
+ }
+
+ int get_controlling_option () const final override
+ {
+ return OPT_Wanalyzer_allocation_size;
+ }
+
+ bool subclass_equal_p (const pending_diagnostic &base_other) const
+ final override
+ {
+ const dubious_allocation_size &other = (const dubious_allocation_size 
&)base_other;
+ return malloc_diagnostic::subclass_equal_p(other)
+ && m_type == other.m_type
+ && same_tree_p (m_lhs, other.m_lhs)
+ && same_tree_p (m_size_tree, other.m_size_tree)
+ && m_size_diff == other.m_size_diff;
+ }
+
+ bool emit (rich_location *rich_loc) final override
+ {
+ diagnostic_metadata m;
+ m.add_cwe (131);
+ return warning_meta (rich_loc, m, get_controlling_option (),
+ "Allocated buffer size is not a multiple of the pointee's size");
+ }
+
+ label_text describe_state_change (const evdesc::state_change &change)
+ override
+ {
+ if (change.m_old_state == m_sm.get_start_state ()
+ && unchecked_p (change.m_new_state))
+ {
+ m_alloc_event = change.m_event_id;
+ if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+ {
+ // TODO: verify that it's the allocation stmt, not a copy
+ return change.formatted_print ("%E bytes allocated here",
+ m_size_tree);
+ }
+ }
+ return malloc_diagnostic::describe_state_change (change);
+ }
+
+ label_text describe_final_event (const evdesc::final_event &ev) final 
override
+ {
+ if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+ {
+ if (m_alloc_event.known_p ())
+ return ev.formatted_print (
+ "Casting %qE to %qT leaves %wu trailing bytes; either the"
+ " allocated size is bogus or the type on the left-hand side is"
+ " wrong",
+ m_arg, TREE_TYPE (m_lhs), m_size_diff);
+ else
+ return ev.formatted_print (
+ "Casting a %E byte buffer to %qT leaves %wu trailing bytes; either"
+ " the allocated size is bogus or the type on the left-hand side is"
+ " wrong",
+ m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
+ }
+ else if (m_type == dubious_allocation_type::MISSING_OPERAND)
+ {
+ if (m_alloc_event.known_p ())
+ return ev.formatted_print (
+ "%qE is incompatible with %qT; either the allocated size at %@ is"
+ " bogus or the type on the left-hand side is wrong",
+ m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
+ else
+ return ev.formatted_print (
+ "Allocation is incompatible with %qT; either the allocated size is"
+ " bogus or the type on the left-hand side is wrong",
+ TREE_TYPE (m_lhs));
+ }
+
+ gcc_unreachable ();
+ return label_text ();
+ }
+
+private:
+ enum dubious_allocation_type {
+ CONSTANT_SIZE,
+ MISSING_OPERAND
+ };
+
+ dubious_allocation_type m_type;
+ diagnostic_event_id_t m_alloc_event;
+ tree m_lhs;
+ tree m_size_tree;
+ unsigned HOST_WIDE_INT m_size_diff;
+};
+
 /* struct allocation_state : public state_machine::state. */

 /* Implementation of state_machine::state::dump_to_pp vfunc
@@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const 
gcall *call)
   return false;
 }

+/* Returns the trailing bytes on dubious allocation sizes. */
+
+static unsigned HOST_WIDE_INT
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+ unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW 
(pointee_size_tree);
+ if (pointee_size == 0)
+ return 0;
+ unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+ return alloc_size % pointee_size;
+}
+
+/* Returns true if there is a constant tree with
+ the same constant value inside the sval. */
+
+static bool
+const_operand_in_sval_p (const svalue *sval, tree size_cst)
+{
+ auto_vec<const svalue *> non_mult_expr;
+ auto_vec<const svalue *> worklist;
+ worklist.safe_push(sval);
+ while (!worklist.is_empty())
+ {
+ const svalue *curr = worklist.pop ();
+ curr = curr->unwrap_any_unmergeable ();
+
+ switch (curr->get_kind())
+ {
+ default:
+ break;
+ case svalue_kind::SK_CONSTANT:
+ {
+ const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue ();
+ unsigned HOST_WIDE_INT sval_int
+ = TREE_INT_CST_LOW (cst_sval->get_constant ());
+ unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
+ if (sval_int % size_cst_int == 0)
+ return true;
+ }
+ break;
+ case svalue_kind::SK_BINOP:
+ {
+ const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
+ if (b_sval->get_op () == MULT_EXPR)
+ {
+ worklist.safe_push (b_sval->get_arg0 ());
+ worklist.safe_push (b_sval->get_arg1 ());
+ }
+ else
+ {
+ non_mult_expr.safe_push (b_sval->get_arg0 ());
+ non_mult_expr.safe_push (b_sval->get_arg1 ());
+ }
+ }
+ break;
+ case svalue_kind::SK_UNARYOP:
+ {
+ const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
+ worklist.safe_push (un_sval->get_arg ());
+ }
+ break;
+ case svalue_kind::SK_UNKNOWN:
+ return true;
+ }
+ }
+
+ /* Each expr should be a multiple of the size.
+ E.g. used to catch n + sizeof(int) errors. */
+ bool reduce = !non_mult_expr.is_empty ();
+ while (!non_mult_expr.is_empty() && reduce)
+ {
+ const svalue *expr_sval = non_mult_expr.pop ();
+ reduce &= const_operand_in_sval_p (expr_sval, size_cst);
+ }
+ return reduce;
+}
+
+/* Returns true iff the type is a struct with another struct inside. */
+
+static bool
+struct_or_union_with_inheritance_p (tree type)
+{
+ if (!RECORD_OR_UNION_TYPE_P (type))
+ return false;
+
+ for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
+ if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
+ return true;
+
+ return false;
+}
+
+static void
+check_capacity (sm_context *sm_ctxt,
+ const malloc_state_machine &sm,
+ const supernode *node,
+ const gimple *stmt,
+ tree lhs,
+ tree rhs,
+ const svalue *capacity)
+{
+ tree pointer_type = TREE_TYPE (lhs);
+ gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
+
+ tree pointee_type = TREE_TYPE (pointer_type);
+ /* void * is always compatible. */
+ if (TREE_CODE (pointee_type) == VOID_TYPE)
+ return;
+
+ if (struct_or_union_with_inheritance_p (pointee_type))
+ return;
+
+ tree pointee_size_tree = size_in_bytes(pointee_type);
+ /* The size might be unknown e.g. being a array with n elements
+ or casting to char * never has any trailing bytes. */
+ if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+ || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+ return;
+
+ switch (capacity->get_kind ())
+ {
+ default:
+ break;
+ case svalue_kind::SK_CONSTANT:
+ {
+ const constant_svalue *cst_sval = capacity->dyn_cast_constant_svalue 
();
+ tree cst = cst_sval->get_constant ();
+ unsigned HOST_WIDE_INT size_diff
+ = capacity_compatible_with_type (cst, pointee_size_tree);
+ if (size_diff != 0)
+ {
+ tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+ sm_ctxt->warn (node, stmt, diag_arg,
+ new dubious_allocation_size (sm, lhs, diag_arg,
+ cst, size_diff));
+ }
+ }
+ break;
+ case svalue_kind::SK_BINOP:
+ case svalue_kind::SK_UNARYOP:
+ {
+ if (!const_operand_in_sval_p (capacity, pointee_size_tree))
+ {
+ tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+ sm_ctxt->warn (node, stmt, diag_arg,
+ new dubious_allocation_size (sm, lhs, diag_arg,
+ pointee_size_tree));
+ }
+ }
+ break;
+ }
+}
+
 /* Implementation of state_machine::on_stmt vfunc for 
malloc_state_machine. */

 bool
@@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
       {
  if (known_allocator_p (callee_fndecl, call))
    {
- on_allocator_call (sm_ctxt, call, &m_free);
+ on_allocator_call (sm_ctxt, node, call, &m_free);
      return true;
    }

  if (is_named_call_p (callee_fndecl, "operator new", call, 1))
- on_allocator_call (sm_ctxt, call, &m_scalar_delete);
+ on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
  else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
- on_allocator_call (sm_ctxt, call, &m_vector_delete);
+ on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
  else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
    || is_named_call_p (callee_fndecl, "operator delete", call, 2))
    {
@@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
      tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
      bool returns_nonnull
        = lookup_attribute ("returns_nonnull", attrs);
- on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
+ on_allocator_call (sm_ctxt, node, call, deallocators, 
returns_nonnull);
    }

  /* Handle "__attribute__((nonnull))". */
@@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
        = mutable_this->get_or_create_deallocator (callee_fndecl);
      on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
    }
+
+ /* Handle returns from function calls. */
+ tree lhs = gimple_call_lhs (call);
+ if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
+ && TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
+ on_pointer_assignment (sm_ctxt, node, call, lhs,
+ gimple_call_fn (call));
       }

   if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
     if (any_pointer_p (lhs))
       on_zero_assignment (sm_ctxt, stmt,lhs);

+ /* Handle pointer assignments/casts for dubious allocation size. */
+ if (const gassign *assign_stmt = dyn_cast <const gassign *> (stmt))
+ {
+ if (gimple_num_ops (stmt) == 2)
+ {
+ tree lhs = gimple_assign_lhs (assign_stmt);
+ tree rhs = gimple_assign_rhs1 (assign_stmt);
+ if (any_pointer_p (lhs) && any_pointer_p (rhs))
+ on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
+ }
+ }
+
   /* Handle dereferences. */
   for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
     {
@@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,

 void
 malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
+ const supernode *node,
       const gcall *call,
       const deallocator_set *deallocators,
       bool returns_nonnull) const
@@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call 
(sm_context *sm_ctxt,
      (returns_nonnull
       ? deallocators->m_nonnull
       : deallocators->m_unchecked));
+
+ if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
+ on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn 
(call));
     }
   else
     {
@@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call 
(sm_context *sm_ctxt,
     }
 }

+/* Handle assignments between two pointers.
+ Check for dubious allocation sizes.
+*/
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign_stmt,
+ tree lhs,
+ tree rhs) const
+{
+ /* Do not warn if lhs and rhs are of the same type to not emit 
duplicate
+ warnings on assignments after the cast. */
+ if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE 
(rhs)))
+ return;
+
+ const program_state *state = sm_ctxt->get_old_program_state ();
+ const svalue *r_value = state->m_region_model->get_rvalue (rhs, NULL);
+ if (const region_svalue *reg = dyn_cast <const region_svalue *> 
(r_value))
+ {
+ const svalue *capacity = state->m_region_model->get_capacity
+ (reg->get_pointee ());
+ check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs, capacity);
+ }
+}
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+ const supernode *node,
+ const gcall *call,
+ tree lhs,
+ tree fn_decl) const
+{
+ /* Do not warn if lhs and rhs are of the same type to not emit 
duplicate
+ warnings on assignments after the cast. */
+ if (pending_diagnostic::same_tree_p
+ (TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
+ return;
+
+ const program_state *state = sm_ctxt->get_new_program_state ();
+ const svalue *r_value = state->m_region_model->get_rvalue (lhs, NULL);
+ if (const region_svalue *reg = dyn_cast <const region_svalue *> 
(r_value))
+ {
+ const svalue *capacity = state->m_region_model->get_capacity
+ (reg->get_pointee ());
+ check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, capacity);
+ }
+ else if (const conjured_svalue *con
+ = dyn_cast <const conjured_svalue *> (r_value))
+ {
+ // FIXME: How to get a region_svalue?
+ }
+}
+
 /* Implementation of state_machine::on_phi vfunc for 
malloc_state_machine. */

 void
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..5403c5f41f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,54 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes */
+
+void test_1 (void)
+{
+ short *ptr = malloc (21 * sizeof(short));
+ free (ptr);
+}
+
+void test_2 (void)
+{
+ int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
+ free (ptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } malloc } */
+ /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' leaves 
2 trailing bytes" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+ void *ptr = malloc (21 * sizeof (short));
+ short *sptr = (short *)ptr;
+ free (sptr);
+}
+
+void test_4 (void)
+{
+ void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+ int *iptr = (int *)ptr; /* { dg-line assign } */
+ free (iptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } assign } */
+ /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 trailing 
bytes" "" { target *-*-* } assign } */
+}
+
+struct s {
+ int i;
+};
+
+void test_5 (void)
+{
+ struct s *ptr = malloc (5 * sizeof (struct s));
+ free (ptr);
+}
+
+void test_6 (void)
+{
+ long *ptr = malloc (5 * sizeof (struct s)); /* { dg-line malloc6 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc6 } */
+ /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..e66d2793f13
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes */
+
+void test_1 (void)
+{
+ int n;
+ scanf("%i", &n);
+ short *ptr = malloc (n * sizeof(short));
+ free (ptr);
+}
+
+void test_2 (void)
+{
+ int n;
+ scanf("%i", &n);
+ int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+ free (ptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } malloc } */
+ /* { dg-message "\\(1\\) Allocation is incompatible with 'int \\*'" 
"" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+ int n;
+ scanf("%i", &n);
+ void *ptr = malloc (n * sizeof (short));
+ short *sptr = (short *)ptr;
+ free (sptr);
+}
+
+void test_4 (void)
+{
+ int n;
+ scanf("%i", &n);
+ void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+ int *iptr = (int *)ptr; /* { dg-line assign } */
+ free (iptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } assign } */
+ /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; either 
the allocated size at \\(1\\)" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void)
+{
+ int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+ if (id_sequence == NULL) exit (1);
+
+ id_sequence[0] = 13579;
+ id_sequence[1] = 24680;
+ id_sequence[2] = 97531;
+
+ free (id_sequence);
+
+ /* { dg-warning "" "" { target *-*-* } malloc1 } */
+ /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+ int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc2 } */
+ /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+ int n;
+ scanf("%i", &n);
+ int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc3 } */
+ /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+ int n;
+ scanf("%i", &n);
+ int m;
+ scanf("%i", &m);
+ int *ptr = malloc ((n + m) * sizeof (int));
+ free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..4c2b31d6e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,39 @@
+#include <stddef.h>
+#include <stdlib.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+ return malloc(n);
+}
+
+void test_1(void)
+{
+ // FIXME
+ int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } 
*/
+ free (buf);
+}
+
+void test_2(void)
+{
+ void *buf = create_buffer(42); /* { dg-message } */
+ int *ibuf = buf; /* { dg-line assign2 } */
+ free (ibuf);
+
+ /* { dg-warning "" "" { target *-*-* } assign2 } */
+ /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+ void *buf = malloc(42); /* { dg-message } */
+ if (buf != NULL) /* { dg-message } */
+ {
+ int *ibuf = buf; /* { dg-line assign3 } */
+ free (ibuf);
+ }
+
+ /* { dg-warning "" "" { target *-*-* } assign3 } */
+ /* { dg-message "" "" { target *-*-* } assign3 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c 
b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..809ee88cf07 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c. */

 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))

+struct FILE {};
 typedef struct FILE FILE;
 typedef __SIZE_TYPE__ size_t;

diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c 
b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
index 2d124833296..94f569e390b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
@@ -89,8 +89,11 @@ struct s
 static struct s * __attribute__((noinline))
 alloc_s (size_t num)
 {
- struct s *p = malloc (sizeof(struct s) + num);
+ struct s *p = malloc (sizeof(struct s) + num); /* { dg-line malloc } 
*/
   return p;
+
+ /* { dg-warning "" "" { target *-*-* } malloc } */
+ /* { dg-message "" "" { target *-*-* } malloc } */
 }

 struct s *
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c 
b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..0ca94250ba2 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,9 +1,9 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types 
-Wno-analyzer-allocation-size" } */

 #include <stdlib.h>

-struct foo;
-struct bar;
+struct foo {};
+struct bar {};
 void *hv (struct foo **tm)
 {
   void *p = __builtin_malloc (4);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);

 int
-- 
2.36.1




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
@ 2022-06-17 17:15 ` Prathamesh Kulkarni
  2022-06-17 19:23   ` Tim Lange
  2022-06-17 17:48 ` David Malcolm
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Prathamesh Kulkarni @ 2022-06-17 17:15 UTC (permalink / raw)
  To: Tim Lange; +Cc: David Malcolm, GCC Mailing List

On Fri, 17 Jun 2022 at 21:25, Tim Lange <mail@tim-lange.me> wrote:
>
> Hi everyone,
Hi Tim,
Thanks for posting the POC patch!
Just a couple of comments (inline)
>
> tracked in PR105900 [0], I'd like to add support for a new warning on
> dubious allocation sizes. The new checker emits a warning when the
> allocation size is not a multiple of the type's size. With the checker,
> following mistakes are detected:
>   int *arr = malloc(3); // forgot to multiply by sizeof
>   arr[0] = ...;
>   arr[1] = ...;
> or
>   int *buf = malloc (n + sizeof(int)); // probably should be * instead
> of +
> Because it is implemented inside the analyzer, it also emits warnings
> when the buffer is first of type void* and later on casted to something
> else. Though, this also inherits a limitation. The checker can not
> distinguish 2 * sizeof(short) from sizeof(int) because sizeof is
> resolved and constants are folded at the point when the analyzer runs.
> As a mitigation, I plan to implement a check in the frontend that emits
> a warning if sizeof(lhs pointee type) is not part of the malloc
> argument.
IMHO, warning if sizeof(lhs pointee_type) is not present inside
malloc, might not be a good idea because it
would reject valid calls to malloc.
For eg:
(1)
size_t size = sizeof(int);
int *p = malloc (size);

(2)
void *p = malloc (sizeof(int));
int *q = p;
>
> I'm looking for a first feedback on the phrasing of the diagnostics as
> well on the preliminary patch [1].
>
> On constant buffer sizes, the warnings looks like this:
> warning: Allocated buffer size is not a multiple of the pointee's size
> [CWE-131] [-Wanalyzer-allocation-size]
>    22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
>       | ^~~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_2’: event 1
>     |
>     | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 }
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing
> bytes; either the allocated size is bogus or the type on the left-hand
> side is wrong
>     |
>
> On symbolic buffer sizes:
> warning: Allocated buffer size is not a multiple of the pointee's size
> [CWE-131] [-Wanalyzer-allocation-size]
>    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } */
>       | ^~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_3’: event 1
>     |
>     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) Allocation is incompatible with ‘int *’; either the
> allocated size is bogus or the type on the left-hand side is wrong
>     |
Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ?
I assume by symbolic buffer size, 'n' is not known at compile time.

Thanks,
Prathamesh
>
> And this is how a simple flow looks like:
> warning: Allocated buffer size is not a multiple of the pointee's size
> [CWE-131] [-Wanalyzer-allocation-size]
>    39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>       | ^~~~
>   ‘test_4’: events 1-2
>     |
>     | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) allocated here
>     | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>     | | ~~~~
>     | | |
>     | | (2) ‘ptr’ is incompatible with ‘int *’; either the
> allocated size at (1) is bogus or the type on the left-hand side is
> wrong
>     |
>
> There are some things to discuss from my side:
> * The tests with the "toy re-implementation of CPython's object
> model"[2] fail due to a extra warning emitted. Because the analyzer
> can't know the calculation actually results in a correct buffer size
> when viewed as a string_obj later on, it emits a warning, e.g. at line
> 61 in data-model-5.c. The only mitigation would be to disable the
> warning for structs entirely. Now, the question is to rather have noise
> on these cases or disable the warning for structs entirely?
> * I'm unable to emit a warning whenever the cast happens at an
> assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c.
> This is because I'm unable to access a region_svalue for the returned
> value. Even in the new_program_state, the svalue of the lhs is still a
> conjured_svalue. Maybe David can lead me to a place where I can access
> the return value's region_svalue or do I have to adapt the engine?
> * attr-malloc-6.c and pr96639.c did both contain structs without an
> implementation. Something in the analyzer must have triggered another
> warning about the usage of those without them having an implementation.
> I changed those structs to have an empty implementation, such that the
> additional warning are gone. I think this shouldn't change the test
> case, so is this change okay?
>
> - Tim
>
> [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105900
> [1] While all tests except the cpython ones work, I have yet to test it
> on large C projects
> [2] FAIL: gcc.dg/analyzer/data-model-5.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5b.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5c.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5d.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/first-field-2.c (test for excess errors)
>
> -------
>
> Subject: [PATCH] analyzer: add allocation size warning
>
> This patch adds an allocation size checker to the analyzer.
> The checker warns when the tracked buffer size is not a multiple of the
> left-hand side pointee's type. This resolves PR analyzer/105900.
>
> The patch is not yet fully tested.
>
> gcc/analyzer/ChangeLog:
>
>         * analyzer.opt: Add Wanalyzer-allocation-size.
>         * sm-malloc.cc (class dubious_allocation_size): New
> pending_diagnostic subclass.
>         (capacity_compatible_with_type): New.
>         (const_operand_in_sval_p): New.
>         (struct_or_union_with_inheritance_p): New.
>         (check_capacity): New.
>         (malloc_state_machine::on_stmt): Add calls to
> on_pointer_assignment.
>         (malloc_state_machine::on_allocator_call): Add node to
> parameters and call to on_pointer_assignment.
>         (malloc_state_machine::on_pointer_assignment): New.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/analyzer/attr-malloc-6.c: Disabled
> Wanalyzer-allocation-size and added default implementation for FILE.
>         * gcc.dg/analyzer/capacity-1.c: Added dg directives.
>         * gcc.dg/analyzer/malloc-4.c: Disabled
> Wanalyzer-allocation-size.
>         * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size
> and added default implementation for foo and bar.
>         * gcc.dg/analyzer/allocation-size-1.c: New test.
>         * gcc.dg/analyzer/allocation-size-2.c: New test.
>         * gcc.dg/analyzer/allocation-size-3.c: New test.
>         * gcc.dg/analyzer/allocation-size-4.c: New test.
>
> Signed-off-by: Tim Lange <mail@tim-lange.me>
> ---
>  gcc/analyzer/analyzer.opt | 4 +
>  gcc/analyzer/sm-malloc.cc | 363 +++++++++++++++++-
>  .../gcc.dg/analyzer/allocation-size-1.c | 54 +++
>  .../gcc.dg/analyzer/allocation-size-2.c | 44 +++
>  .../gcc.dg/analyzer/allocation-size-3.c | 48 +++
>  .../gcc.dg/analyzer/allocation-size-4.c | 39 ++
>  gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
>  gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
>  gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
>  gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
>  10 files changed, 559 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
>
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 4aea52d3a87..f213989e0bb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
>  Common Var(warn_analyzer_malloc_leak) Init(1) Warning
>  Warn about code paths in which a heap-allocated pointer leaks.
>
> +Wanalyzer-allocation-size
> +Common Var(warn_analyzer_allocation_size) Init(1) Warning
> +Warn about code paths in which a buffer is assigned to a incompatible
> type.
> +
>  Wanalyzer-mismatching-deallocation
>  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
>  Warn about code paths in which the wrong deallocation function is
> called.
> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index 3bd40425919..790c9f0e57d 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
>  #include "attribs.h"
>  #include "analyzer/function-set.h"
>  #include "analyzer/program-state.h"
> +#include "print-tree.h"
> +#include "gimple-pretty-print.h"
>
>  #if ENABLE_ANALYZER
>
> @@ -428,6 +430,7 @@ private:
>    get_or_create_deallocator (tree deallocator_fndecl);
>
>    void on_allocator_call (sm_context *sm_ctxt,
> + const supernode *node,
>       const gcall *call,
>       const deallocator_set *deallocators,
>       bool returns_nonnull = false) const;
> @@ -444,6 +447,16 @@ private:
>    void on_realloc_call (sm_context *sm_ctxt,
>     const supernode *node,
>     const gcall *call) const;
> + void on_pointer_assignment(sm_context *sm_ctxt,
> + const supernode *node,
> + const gassign *assign_stmt,
> + tree lhs,
> + tree rhs) const;
> + void on_pointer_assignment(sm_context *sm_ctxt,
> + const supernode *node,
> + const gcall *call,
> + tree lhs,
> + tree rhs) const;
>    void on_zero_assignment (sm_context *sm_ctxt,
>        const gimple *stmt,
>        tree lhs) const;
> @@ -1432,6 +1445,117 @@ private:
>    const char *m_funcname;
>  };
>
> +/* Concrete subclass for casts of pointers that lead to trailing
> bytes. */
> +
> +class dubious_allocation_size : public malloc_diagnostic
> +{
> +public:
> + dubious_allocation_size (const malloc_state_machine &sm, tree lhs,
> tree rhs,
> + tree size_tree, unsigned HOST_WIDE_INT size_diff)
> + : malloc_diagnostic(sm, rhs),
> m_type(dubious_allocation_type::CONSTANT_SIZE),
> + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
> + {}
> +
> + dubious_allocation_size (const malloc_state_machine &sm, tree lhs,
> tree rhs,
> + tree size_tree)
> + : malloc_diagnostic(sm, rhs),
> m_type(dubious_allocation_type::MISSING_OPERAND),
> + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
> + {}
> +
> + const char *get_kind () const final override
> + {
> + return "dubious_allocation_size";
> + }
> +
> + int get_controlling_option () const final override
> + {
> + return OPT_Wanalyzer_allocation_size;
> + }
> +
> + bool subclass_equal_p (const pending_diagnostic &base_other) const
> + final override
> + {
> + const dubious_allocation_size &other = (const dubious_allocation_size
> &)base_other;
> + return malloc_diagnostic::subclass_equal_p(other)
> + && m_type == other.m_type
> + && same_tree_p (m_lhs, other.m_lhs)
> + && same_tree_p (m_size_tree, other.m_size_tree)
> + && m_size_diff == other.m_size_diff;
> + }
> +
> + bool emit (rich_location *rich_loc) final override
> + {
> + diagnostic_metadata m;
> + m.add_cwe (131);
> + return warning_meta (rich_loc, m, get_controlling_option (),
> + "Allocated buffer size is not a multiple of the pointee's size");
> + }
> +
> + label_text describe_state_change (const evdesc::state_change &change)
> + override
> + {
> + if (change.m_old_state == m_sm.get_start_state ()
> + && unchecked_p (change.m_new_state))
> + {
> + m_alloc_event = change.m_event_id;
> + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
> + {
> + // TODO: verify that it's the allocation stmt, not a copy
> + return change.formatted_print ("%E bytes allocated here",
> + m_size_tree);
> + }
> + }
> + return malloc_diagnostic::describe_state_change (change);
> + }
> +
> + label_text describe_final_event (const evdesc::final_event &ev) final
> override
> + {
> + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
> + {
> + if (m_alloc_event.known_p ())
> + return ev.formatted_print (
> + "Casting %qE to %qT leaves %wu trailing bytes; either the"
> + " allocated size is bogus or the type on the left-hand side is"
> + " wrong",
> + m_arg, TREE_TYPE (m_lhs), m_size_diff);
> + else
> + return ev.formatted_print (
> + "Casting a %E byte buffer to %qT leaves %wu trailing bytes; either"
> + " the allocated size is bogus or the type on the left-hand side is"
> + " wrong",
> + m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
> + }
> + else if (m_type == dubious_allocation_type::MISSING_OPERAND)
> + {
> + if (m_alloc_event.known_p ())
> + return ev.formatted_print (
> + "%qE is incompatible with %qT; either the allocated size at %@ is"
> + " bogus or the type on the left-hand side is wrong",
> + m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
> + else
> + return ev.formatted_print (
> + "Allocation is incompatible with %qT; either the allocated size is"
> + " bogus or the type on the left-hand side is wrong",
> + TREE_TYPE (m_lhs));
> + }
> +
> + gcc_unreachable ();
> + return label_text ();
> + }
> +
> +private:
> + enum dubious_allocation_type {
> + CONSTANT_SIZE,
> + MISSING_OPERAND
> + };
> +
> + dubious_allocation_type m_type;
> + diagnostic_event_id_t m_alloc_event;
> + tree m_lhs;
> + tree m_size_tree;
> + unsigned HOST_WIDE_INT m_size_diff;
> +};
> +
>  /* struct allocation_state : public state_machine::state. */
>
>  /* Implementation of state_machine::state::dump_to_pp vfunc
> @@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const
> gcall *call)
>    return false;
>  }
>
> +/* Returns the trailing bytes on dubious allocation sizes. */
> +
> +static unsigned HOST_WIDE_INT
> +capacity_compatible_with_type (tree cst, tree pointee_size_tree)
> +{
> + unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW
> (pointee_size_tree);
> + if (pointee_size == 0)
> + return 0;
> + unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
> +
> + return alloc_size % pointee_size;
> +}
> +
> +/* Returns true if there is a constant tree with
> + the same constant value inside the sval. */
> +
> +static bool
> +const_operand_in_sval_p (const svalue *sval, tree size_cst)
> +{
> + auto_vec<const svalue *> non_mult_expr;
> + auto_vec<const svalue *> worklist;
> + worklist.safe_push(sval);
> + while (!worklist.is_empty())
> + {
> + const svalue *curr = worklist.pop ();
> + curr = curr->unwrap_any_unmergeable ();
> +
> + switch (curr->get_kind())
> + {
> + default:
> + break;
> + case svalue_kind::SK_CONSTANT:
> + {
> + const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue ();
> + unsigned HOST_WIDE_INT sval_int
> + = TREE_INT_CST_LOW (cst_sval->get_constant ());
> + unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
> + if (sval_int % size_cst_int == 0)
> + return true;
> + }
> + break;
> + case svalue_kind::SK_BINOP:
> + {
> + const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
> + if (b_sval->get_op () == MULT_EXPR)
> + {
> + worklist.safe_push (b_sval->get_arg0 ());
> + worklist.safe_push (b_sval->get_arg1 ());
> + }
> + else
> + {
> + non_mult_expr.safe_push (b_sval->get_arg0 ());
> + non_mult_expr.safe_push (b_sval->get_arg1 ());
> + }
> + }
> + break;
> + case svalue_kind::SK_UNARYOP:
> + {
> + const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
> + worklist.safe_push (un_sval->get_arg ());
> + }
> + break;
> + case svalue_kind::SK_UNKNOWN:
> + return true;
> + }
> + }
> +
> + /* Each expr should be a multiple of the size.
> + E.g. used to catch n + sizeof(int) errors. */
> + bool reduce = !non_mult_expr.is_empty ();
> + while (!non_mult_expr.is_empty() && reduce)
> + {
> + const svalue *expr_sval = non_mult_expr.pop ();
> + reduce &= const_operand_in_sval_p (expr_sval, size_cst);
> + }
> + return reduce;
> +}
> +
> +/* Returns true iff the type is a struct with another struct inside. */
> +
> +static bool
> +struct_or_union_with_inheritance_p (tree type)
> +{
> + if (!RECORD_OR_UNION_TYPE_P (type))
> + return false;
> +
> + for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
> + if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
> + return true;
> +
> + return false;
> +}
> +
> +static void
> +check_capacity (sm_context *sm_ctxt,
> + const malloc_state_machine &sm,
> + const supernode *node,
> + const gimple *stmt,
> + tree lhs,
> + tree rhs,
> + const svalue *capacity)
> +{
> + tree pointer_type = TREE_TYPE (lhs);
> + gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
> +
> + tree pointee_type = TREE_TYPE (pointer_type);
> + /* void * is always compatible. */
> + if (TREE_CODE (pointee_type) == VOID_TYPE)
> + return;
> +
> + if (struct_or_union_with_inheritance_p (pointee_type))
> + return;
> +
> + tree pointee_size_tree = size_in_bytes(pointee_type);
> + /* The size might be unknown e.g. being a array with n elements
> + or casting to char * never has any trailing bytes. */
> + if (TREE_CODE (pointee_size_tree) != INTEGER_CST
> + || TREE_INT_CST_LOW (pointee_size_tree) == 1)
> + return;
> +
> + switch (capacity->get_kind ())
> + {
> + default:
> + break;
> + case svalue_kind::SK_CONSTANT:
> + {
> + const constant_svalue *cst_sval = capacity->dyn_cast_constant_svalue
> ();
> + tree cst = cst_sval->get_constant ();
> + unsigned HOST_WIDE_INT size_diff
> + = capacity_compatible_with_type (cst, pointee_size_tree);
> + if (size_diff != 0)
> + {
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
> + sm_ctxt->warn (node, stmt, diag_arg,
> + new dubious_allocation_size (sm, lhs, diag_arg,
> + cst, size_diff));
> + }
> + }
> + break;
> + case svalue_kind::SK_BINOP:
> + case svalue_kind::SK_UNARYOP:
> + {
> + if (!const_operand_in_sval_p (capacity, pointee_size_tree))
> + {
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
> + sm_ctxt->warn (node, stmt, diag_arg,
> + new dubious_allocation_size (sm, lhs, diag_arg,
> + pointee_size_tree));
> + }
> + }
> + break;
> + }
> +}
> +
>  /* Implementation of state_machine::on_stmt vfunc for
> malloc_state_machine. */
>
>  bool
> @@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context
> *sm_ctxt,
>        {
>   if (known_allocator_p (callee_fndecl, call))
>     {
> - on_allocator_call (sm_ctxt, call, &m_free);
> + on_allocator_call (sm_ctxt, node, call, &m_free);
>       return true;
>     }
>
>   if (is_named_call_p (callee_fndecl, "operator new", call, 1))
> - on_allocator_call (sm_ctxt, call, &m_scalar_delete);
> + on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
>   else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
> - on_allocator_call (sm_ctxt, call, &m_vector_delete);
> + on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
>   else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
>     || is_named_call_p (callee_fndecl, "operator delete", call, 2))
>     {
> @@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context
> *sm_ctxt,
>       tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
>       bool returns_nonnull
>         = lookup_attribute ("returns_nonnull", attrs);
> - on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
> + on_allocator_call (sm_ctxt, node, call, deallocators,
> returns_nonnull);
>     }
>
>   /* Handle "__attribute__((nonnull))". */
> @@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context
> *sm_ctxt,
>         = mutable_this->get_or_create_deallocator (callee_fndecl);
>       on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
>     }
> +
> + /* Handle returns from function calls. */
> + tree lhs = gimple_call_lhs (call);
> + if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
> + && TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
> + on_pointer_assignment (sm_ctxt, node, call, lhs,
> + gimple_call_fn (call));
>        }
>
>    if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
>      if (any_pointer_p (lhs))
>        on_zero_assignment (sm_ctxt, stmt,lhs);
>
> + /* Handle pointer assignments/casts for dubious allocation size. */
> + if (const gassign *assign_stmt = dyn_cast <const gassign *> (stmt))
> + {
> + if (gimple_num_ops (stmt) == 2)
> + {
> + tree lhs = gimple_assign_lhs (assign_stmt);
> + tree rhs = gimple_assign_rhs1 (assign_stmt);
> + if (any_pointer_p (lhs) && any_pointer_p (rhs))
> + on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
> + }
> + }
> +
>    /* Handle dereferences. */
>    for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
>      {
> @@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context
> *sm_ctxt,
>
>  void
>  malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
> + const supernode *node,
>        const gcall *call,
>        const deallocator_set *deallocators,
>        bool returns_nonnull) const
> @@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call
> (sm_context *sm_ctxt,
>       (returns_nonnull
>        ? deallocators->m_nonnull
>        : deallocators->m_unchecked));
> +
> + if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
> + on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn
> (call));
>      }
>    else
>      {
> @@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call
> (sm_context *sm_ctxt,
>      }
>  }
>
> +/* Handle assignments between two pointers.
> + Check for dubious allocation sizes.
> +*/
> +
> +void
> +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
> + const supernode *node,
> + const gassign *assign_stmt,
> + tree lhs,
> + tree rhs) const
> +{
> + /* Do not warn if lhs and rhs are of the same type to not emit
> duplicate
> + warnings on assignments after the cast. */
> + if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE
> (rhs)))
> + return;
> +
> + const program_state *state = sm_ctxt->get_old_program_state ();
> + const svalue *r_value = state->m_region_model->get_rvalue (rhs, NULL);
> + if (const region_svalue *reg = dyn_cast <const region_svalue *>
> (r_value))
> + {
> + const svalue *capacity = state->m_region_model->get_capacity
> + (reg->get_pointee ());
> + check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs, capacity);
> + }
> +}
> +
> +void
> +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
> + const supernode *node,
> + const gcall *call,
> + tree lhs,
> + tree fn_decl) const
> +{
> + /* Do not warn if lhs and rhs are of the same type to not emit
> duplicate
> + warnings on assignments after the cast. */
> + if (pending_diagnostic::same_tree_p
> + (TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
> + return;
> +
> + const program_state *state = sm_ctxt->get_new_program_state ();
> + const svalue *r_value = state->m_region_model->get_rvalue (lhs, NULL);
> + if (const region_svalue *reg = dyn_cast <const region_svalue *>
> (r_value))
> + {
> + const svalue *capacity = state->m_region_model->get_capacity
> + (reg->get_pointee ());
> + check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, capacity);
> + }
> + else if (const conjured_svalue *con
> + = dyn_cast <const conjured_svalue *> (r_value))
> + {
> + // FIXME: How to get a region_svalue?
> + }
> +}
> +
>  /* Implementation of state_machine::on_phi vfunc for
> malloc_state_machine. */
>
>  void
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> new file mode 100644
> index 00000000000..5403c5f41f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> @@ -0,0 +1,54 @@
> +#include <stdlib.h>
> +
> +/* Tests with constant buffer sizes */
> +
> +void test_1 (void)
> +{
> + short *ptr = malloc (21 * sizeof(short));
> + free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> + int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
> + free (ptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the
> pointee's size" "" { target *-*-* } malloc } */
> + /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' leaves
> 2 trailing bytes" "" { target *-*-* } malloc } */
> +}
> +
> +void test_3 (void)
> +{
> + void *ptr = malloc (21 * sizeof (short));
> + short *sptr = (short *)ptr;
> + free (sptr);
> +}
> +
> +void test_4 (void)
> +{
> + void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
> + int *iptr = (int *)ptr; /* { dg-line assign } */
> + free (iptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the
> pointee's size" "" { target *-*-* } assign } */
> + /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 trailing
> bytes" "" { target *-*-* } assign } */
> +}
> +
> +struct s {
> + int i;
> +};
> +
> +void test_5 (void)
> +{
> + struct s *ptr = malloc (5 * sizeof (struct s));
> + free (ptr);
> +}
> +
> +void test_6 (void)
> +{
> + long *ptr = malloc (5 * sizeof (struct s)); /* { dg-line malloc6 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc6 } */
> + /* { dg-message "" "" { target *-*-* } malloc6 } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> new file mode 100644
> index 00000000000..e66d2793f13
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> @@ -0,0 +1,44 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests with symbolic buffer sizes */
> +
> +void test_1 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + short *ptr = malloc (n * sizeof(short));
> + free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
> + free (ptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the
> pointee's size" "" { target *-*-* } malloc } */
> + /* { dg-message "\\(1\\) Allocation is incompatible with 'int \\*'"
> "" { target *-*-* } malloc } */
> +}
> +
> +void test_3 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + void *ptr = malloc (n * sizeof (short));
> + short *sptr = (short *)ptr;
> + free (sptr);
> +}
> +
> +void test_4 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
> + int *iptr = (int *)ptr; /* { dg-line assign } */
> + free (iptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the
> pointee's size" "" { target *-*-* } assign } */
> + /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; either
> the allocated size at \\(1\\)" "" { target *-*-* } assign } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
> new file mode 100644
> index 00000000000..dafc0e73c63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
> @@ -0,0 +1,48 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* CWE-131 example 5 */
> +void test_1(void)
> +{
> + int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
> + if (id_sequence == NULL) exit (1);
> +
> + id_sequence[0] = 13579;
> + id_sequence[1] = 24680;
> + id_sequence[2] = 97531;
> +
> + free (id_sequence);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc1 } */
> + /* { dg-message "" "" { target *-*-* } malloc1 } */
> +}
> +
> +void test_2(void)
> +{
> + int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc2 } */
> + /* { dg-message "" "" { target *-*-* } malloc2 } */
> +}
> +
> +void test_3(void)
> +{
> + int n;
> + scanf("%i", &n);
> + int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc3 } */
> + /* { dg-message "" "" { target *-*-* } malloc3 } */
> +}
> +
> +void test_4(void)
> +{
> + int n;
> + scanf("%i", &n);
> + int m;
> + scanf("%i", &m);
> + int *ptr = malloc ((n + m) * sizeof (int));
> + free (ptr);
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> new file mode 100644
> index 00000000000..4c2b31d6e0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> @@ -0,0 +1,39 @@
> +#include <stddef.h>
> +#include <stdlib.h>
> +
> +/* Flow warnings */
> +
> +void *create_buffer(int n)
> +{
> + return malloc(n);
> +}
> +
> +void test_1(void)
> +{
> + // FIXME
> + int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } }
> */
> + free (buf);
> +}
> +
> +void test_2(void)
> +{
> + void *buf = create_buffer(42); /* { dg-message } */
> + int *ibuf = buf; /* { dg-line assign2 } */
> + free (ibuf);
> +
> + /* { dg-warning "" "" { target *-*-* } assign2 } */
> + /* { dg-message "" "" { target *-*-* } assign2 } */
> +}
> +
> +void test_3(void)
> +{
> + void *buf = malloc(42); /* { dg-message } */
> + if (buf != NULL) /* { dg-message } */
> + {
> + int *ibuf = buf; /* { dg-line assign3 } */
> + free (ibuf);
> + }
> +
> + /* { dg-warning "" "" { target *-*-* } assign3 } */
> + /* { dg-message "" "" { target *-*-* } assign3 } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> index bd28107d0d7..809ee88cf07 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> @@ -1,7 +1,9 @@
> +/* { dg-additional-options -Wno-analyzer-allocation-size } */
>  /* Adapted from gcc.dg/Wmismatched-dealloc.c. */
>
>  #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
>
> +struct FILE {};
>  typedef struct FILE FILE;
>  typedef __SIZE_TYPE__ size_t;
>
> diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> index 2d124833296..94f569e390b 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> @@ -89,8 +89,11 @@ struct s
>  static struct s * __attribute__((noinline))
>  alloc_s (size_t num)
>  {
> - struct s *p = malloc (sizeof(struct s) + num);
> + struct s *p = malloc (sizeof(struct s) + num); /* { dg-line malloc }
> */
>    return p;
> +
> + /* { dg-warning "" "" { target *-*-* } malloc } */
> + /* { dg-message "" "" { target *-*-* } malloc } */
>  }
>
>  struct s *
> diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> index 908bb28ee50..0ca94250ba2 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> @@ -1,9 +1,9 @@
> -/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
> +/* { dg-additional-options "-Wno-incompatible-pointer-types
> -Wno-analyzer-allocation-size" } */
>
>  #include <stdlib.h>
>
> -struct foo;
> -struct bar;
> +struct foo {};
> +struct bar {};
>  void *hv (struct foo **tm)
>  {
>    void *p = __builtin_malloc (4);
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> index 02ca3f084a2..6f365c3cb5d 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> @@ -1,3 +1,5 @@
> +/* { dg-additional-options -Wno-analyzer-allocation-size } */
> +
>  void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
>
>  int
> --
> 2.36.1
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
  2022-06-17 17:15 ` Prathamesh Kulkarni
@ 2022-06-17 17:48 ` David Malcolm
  2022-06-17 20:23   ` Tim Lange
  2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: David Malcolm @ 2022-06-17 17:48 UTC (permalink / raw)
  To: Tim Lange, GCC Mailing List

On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
> Hi everyone,

Hi Tim.

Thanks for the patch.

Various comments inline below, throughout...

> 
> tracked in PR105900 [0], I'd like to add support for a new warning on
> dubious allocation sizes. The new checker emits a warning when the 
> allocation size is not a multiple of the type's size. With the checker,
> following mistakes are detected:
>   int *arr = malloc(3); // forgot to multiply by sizeof
>   arr[0] = ...;
>   arr[1] = ...;
> or
>   int *buf = malloc (n + sizeof(int)); // probably should be * instead 
> of +
> Because it is implemented inside the analyzer, it also emits warnings
> when the buffer is first of type void* and later on casted to something
> else. Though, this also inherits a limitation. The checker can not 
> distinguish 2 * sizeof(short) from sizeof(int) because sizeof is 
> resolved and constants are folded at the point when the analyzer runs. 
> As a mitigation, I plan to implement a check in the frontend that emits
> a warning if sizeof(lhs pointee type) is not part of the malloc 
> argument.
> 
> I'm looking for a first feedback on the phrasing of the diagnostics as 
> well on the preliminary patch [1].
> 
> On constant buffer sizes, the warnings looks like this:
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>    22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
>       | ^~~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_2’: event 1
>     |
>     | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 }
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
> bytes; either the allocated size is bogus or the type on the left-hand 
> side is wrong
>     |

Something strange seems to have happened with the indentation in your
email; the code in the patch seems to me to be strangely indented, and
looking at the archive here:
  https://gcc.gnu.org/pipermail/gcc/2022-June/238907.html
I see the same thing, so I think it's a problem with what the mailing
list received, rather than just in my mail client.  Maybe something 

FWIW I normally use "git send-email" to send patches.

The underlinings in the above look strange; I see this in your email:

warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
      | ^~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_2’: event 1
    |
    | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } 
*/
    | | ^~~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
bytes; either the allocated size is bogus or the type on the left-hand 
side is wrong
    |

Should it have been (omitting the dg-line directives for clarity):

warning: Allocated buffer size is not a multiple of the pointee's size  [CWE-131] [-Wanalyzer-allocation-size]
   22 | int *ptr = malloc (10 + sizeof(int));
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_2’: event 1
    |
    | 22 | int *ptr = malloc (10 + sizeof(int));
    |    |            ^~~~~~~~~~~~~~~~~~~~~~~~~
    |    |            |
    |    |            (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing bytes; either the allocated size is bogus or the type on the left-hand side is wrong
    |

?

It looks like something somewhere has collapsed repeated whitespace in
the message down to single spaces, which has broken the ASCII art in
your examples, and the indentation in your code.


It would probably be helpful for the message to tell the user what
sizeof(*ptr) is,  sizeof(int) in this case (much more helpful when it's
a struct)

Maybe something alike:

note: a buffer of 14 bytes is allocated...
note: ...but sizeof (int) is 4 bytes...
note: ...leaving 2 trailing bytes for an array of 3 'int's (which would
occupy 12 bytes)

or somesuch???

I'm brainstorming here, my ideas above aren't necessarily good. 
Sometimes it's good to chop up messages like this, to minimize
combinatorial explosion for all the different cases.



On symbolic buffer sizes:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } */
      | ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_3’: event 1
    |
    | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } 
*/
    | | ^~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) Allocation is incompatible with ‘int *’; either the 
allocated size is bogus or the type on the left-hand side is wrong
    |


Is there location information for both the malloc and for the
assignment, here?

If so, then maybe two events:

warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   33 | int *ptr = malloc (n + sizeof(int));
      |            ^~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_3’: events 1-2 
    |
    | 33 | int *ptr = malloc (n + sizeof(int));
    |    |          ^ ^~~~~~~~~~~~~~~~~~~~~~~~
    |    |          | |
    |    |          | (1) buffer allocated here with size 'n + 4'
    |    |          | 
    |    |          (2) sizeof(*ptr) is 4
    |

or somesuch.



And this is how a simple flow looks like:
warning: Allocated buffer size is not a multiple of the pointee's size 
[CWE-131] [-Wanalyzer-allocation-size]
   39 | int *iptr = (int *)ptr; /* { dg-line assign } */
      | ^~~~
  ‘test_4’: events 1-2
    |
    | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message }
*/
    | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    | | |
    | | (1) allocated here
    | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
    | | ~~~~
    | | |
    | | (2) ‘ptr’ is incompatible with ‘int *’; either the 
allocated size at (1) is bogus or the type on the left-hand side is 
wrong
    |



I think it would make the diagnostic more readable if the "allocated
here" event's message expresses how big the buffer is e.g.:

warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   39 | int *iptr = (int *)ptr;
      | ^~~~
  ‘test_4’: events 1-2
    |
    | 38 | void *ptr = malloc (n * sizeof (short));*/
    |    |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    |    |             |
    |    |            (1) allocated here with size '(n * 2)'
    | 39 | int *iptr = (int *)ptr;
    |    |      ~~~~
    |    |      |
    |    |      (2) ‘ptr’ is incompatible with ‘int *’; sizeof(int) is 4
    |


There are some things to discuss from my side:
* The tests with the "toy re-implementation of CPython's object 
model"[2] fail due to a extra warning emitted. Because the analyzer 
can't know the calculation actually results in a correct buffer size 
when viewed as a string_obj later on, it emits a warning, e.g. at line 
61 in data-model-5.c. The only mitigation would be to disable the 
warning for structs entirely. Now, the question is to rather have noise
on these cases or disable the warning for structs entirely?

Can you post the full warning please?

These testcases exhibit a common way of faking inheritance in C, and I
think it ought to be possible to support this in the warning.

I thing what's happening is we have

struct base
{ 
  /* fields */
};

struct sub
{
  struct base m_base;
  /* extra fields.  */
};

struct base *construct_base (size_t sz)
{
  struct base *p = (struct base *) malloc (sz);

  /* set up fields of base in p  */

  return p;
}

Or is this on the interprocedural path as called with a specific sizeof
for struct sub?

Maybe we can special-case these by detecting where struct sub's first
field is struct base, and hence where we expect this pattern?  (and use
this to suppress the warning for such cases?)


* I'm unable to emit a warning whenever the cast happens at an 
assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
This is because I'm unable to access a region_svalue for the returned
value. Even in the new_program_state, the svalue of the lhs is still a 
conjured_svalue. Maybe David can lead me to a place where I can access 
the return value's region_svalue or do I have to adapt the engine?

Please can you try reposting the patch?  I tried to read it, but am
having trouble with the mangled indentation.


* attr-malloc-6.c and pr96639.c did both contain structs without an 
implementation. Something in the analyzer must have triggered another
warning about the usage of those without them having an implementation.
I changed those structs to have an empty implementation, such that the 
additional warning are gone. I think this shouldn't change the test 
case, so is this change okay?

What were the new warnings?

Thanks for the patch; sorry if this seems nitpicky; the patch seems
promising

Dave



- Tim

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105900
[1] While all tests except the cpython ones work, I have yet to test it
on large C projects
[2] FAIL: gcc.dg/analyzer/data-model-5.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5b.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5c.c (test for excess errors)
    FAIL: gcc.dg/analyzer/data-model-5d.c (test for excess errors)
    FAIL: gcc.dg/analyzer/first-field-2.c (test for excess errors)

-------

Subject: [PATCH] analyzer: add allocation size warning

This patch adds an allocation size checker to the analyzer.
The checker warns when the tracked buffer size is not a multiple of the
left-hand side pointee's type. This resolves PR analyzer/105900.

The patch is not yet fully tested.

gcc/analyzer/ChangeLog:

        * analyzer.opt: Add Wanalyzer-allocation-size.
        * sm-malloc.cc (class dubious_allocation_size): New 
pending_diagnostic subclass.
        (capacity_compatible_with_type): New.
        (const_operand_in_sval_p): New.
        (struct_or_union_with_inheritance_p): New.
        (check_capacity): New.
        (malloc_state_machine::on_stmt): Add calls to 
on_pointer_assignment.
        (malloc_state_machine::on_allocator_call): Add node to 
parameters and call to on_pointer_assignment.
        (malloc_state_machine::on_pointer_assignment): New.

gcc/testsuite/ChangeLog:

        * gcc.dg/analyzer/attr-malloc-6.c: Disabled 
Wanalyzer-allocation-size and added default implementation for FILE.
        * gcc.dg/analyzer/capacity-1.c: Added dg directives.
        * gcc.dg/analyzer/malloc-4.c: Disabled 
Wanalyzer-allocation-size.
        * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size
and added default implementation for foo and bar.
        * gcc.dg/analyzer/allocation-size-1.c: New test.
        * gcc.dg/analyzer/allocation-size-2.c: New test.
        * gcc.dg/analyzer/allocation-size-3.c: New test.
        * gcc.dg/analyzer/allocation-size-4.c: New test.

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt | 4 +
 gcc/analyzer/sm-malloc.cc | 363 +++++++++++++++++-
 .../gcc.dg/analyzer/allocation-size-1.c | 54 +++
 .../gcc.dg/analyzer/allocation-size-2.c | 44 +++
 .../gcc.dg/analyzer/allocation-size-3.c | 48 +++
 .../gcc.dg/analyzer/allocation-size-4.c | 39 ++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
 gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
 10 files changed, 559 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.

+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible 
type.
+
 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is 
called.
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 3bd40425919..790c9f0e57d 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
 #include "attribs.h"
 #include "analyzer/function-set.h"
 #include "analyzer/program-state.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"

 #if ENABLE_ANALYZER

@@ -428,6 +430,7 @@ private:
   get_or_create_deallocator (tree deallocator_fndecl);

   void on_allocator_call (sm_context *sm_ctxt,
+ const supernode *node,
      const gcall *call,
      const deallocator_set *deallocators,
      bool returns_nonnull = false) const;
@@ -444,6 +447,16 @@ private:
   void on_realloc_call (sm_context *sm_ctxt,
    const supernode *node,
    const gcall *call) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign_stmt,
+ tree lhs,
+ tree rhs) const;
+ void on_pointer_assignment(sm_context *sm_ctxt,
+ const supernode *node,
+ const gcall *call,
+ tree lhs,
+ tree rhs) const;
   void on_zero_assignment (sm_context *sm_ctxt,
       const gimple *stmt,
       tree lhs) const;
@@ -1432,6 +1445,117 @@ private:
   const char *m_funcname;
 };

+/* Concrete subclass for casts of pointers that lead to trailing 
bytes. */
+
+class dubious_allocation_size : public malloc_diagnostic
+{
+public:
+ dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
tree rhs,
+ tree size_tree, unsigned HOST_WIDE_INT size_diff)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::CONSTANT_SIZE),
+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
+ {}
+
+ dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
tree rhs,
+ tree size_tree)
+ : malloc_diagnostic(sm, rhs), 
m_type(dubious_allocation_type::MISSING_OPERAND),
+ m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
+ {}
+
+ const char *get_kind () const final override
+ {
+ return "dubious_allocation_size";
+ }
+
+ int get_controlling_option () const final override
+ {
+ return OPT_Wanalyzer_allocation_size;
+ }
+
+ bool subclass_equal_p (const pending_diagnostic &base_other) const
+ final override
+ {
+ const dubious_allocation_size &other = (const dubious_allocation_size
&)base_other;
+ return malloc_diagnostic::subclass_equal_p(other)
+ && m_type == other.m_type
+ && same_tree_p (m_lhs, other.m_lhs)
+ && same_tree_p (m_size_tree, other.m_size_tree)
+ && m_size_diff == other.m_size_diff;
+ }
+
+ bool emit (rich_location *rich_loc) final override
+ {
+ diagnostic_metadata m;
+ m.add_cwe (131);
+ return warning_meta (rich_loc, m, get_controlling_option (),
+ "Allocated buffer size is not a multiple of the pointee's size");
+ }
+
+ label_text describe_state_change (const evdesc::state_change &change)
+ override
+ {
+ if (change.m_old_state == m_sm.get_start_state ()
+ && unchecked_p (change.m_new_state))
+ {
+ m_alloc_event = change.m_event_id;
+ if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+ {
+ // TODO: verify that it's the allocation stmt, not a copy
+ return change.formatted_print ("%E bytes allocated here",
+ m_size_tree);
+ }
+ }
+ return malloc_diagnostic::describe_state_change (change);
+ }
+
+ label_text describe_final_event (const evdesc::final_event &ev) final
override
+ {
+ if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+ {
+ if (m_alloc_event.known_p ())
+ return ev.formatted_print (
+ "Casting %qE to %qT leaves %wu trailing bytes; either the"
+ " allocated size is bogus or the type on the left-hand side is"
+ " wrong",
+ m_arg, TREE_TYPE (m_lhs), m_size_diff);
+ else
+ return ev.formatted_print (
+ "Casting a %E byte buffer to %qT leaves %wu trailing bytes; either"
+ " the allocated size is bogus or the type on the left-hand side is"
+ " wrong",
+ m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
+ }
+ else if (m_type == dubious_allocation_type::MISSING_OPERAND)
+ {
+ if (m_alloc_event.known_p ())
+ return ev.formatted_print (
+ "%qE is incompatible with %qT; either the allocated size at %@ is"
+ " bogus or the type on the left-hand side is wrong",
+ m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
+ else
+ return ev.formatted_print (
+ "Allocation is incompatible with %qT; either the allocated size is"
+ " bogus or the type on the left-hand side is wrong",
+ TREE_TYPE (m_lhs));
+ }
+
+ gcc_unreachable ();
+ return label_text ();
+ }
+
+private:
+ enum dubious_allocation_type {
+ CONSTANT_SIZE,
+ MISSING_OPERAND
+ };
+
+ dubious_allocation_type m_type;
+ diagnostic_event_id_t m_alloc_event;
+ tree m_lhs;
+ tree m_size_tree;
+ unsigned HOST_WIDE_INT m_size_diff;
+};
+
 /* struct allocation_state : public state_machine::state. */

 /* Implementation of state_machine::state::dump_to_pp vfunc
@@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const 
gcall *call)
   return false;
 }

+/* Returns the trailing bytes on dubious allocation sizes. */
+
+static unsigned HOST_WIDE_INT
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+ unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW 
(pointee_size_tree);
+ if (pointee_size == 0)
+ return 0;
+ unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+ return alloc_size % pointee_size;
+}
+
+/* Returns true if there is a constant tree with
+ the same constant value inside the sval. */
+
+static bool
+const_operand_in_sval_p (const svalue *sval, tree size_cst)
+{
+ auto_vec<const svalue *> non_mult_expr;
+ auto_vec<const svalue *> worklist;
+ worklist.safe_push(sval);
+ while (!worklist.is_empty())
+ {
+ const svalue *curr = worklist.pop ();
+ curr = curr->unwrap_any_unmergeable ();
+
+ switch (curr->get_kind())
+ {
+ default:
+ break;
+ case svalue_kind::SK_CONSTANT:
+ {
+ const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue ();
+ unsigned HOST_WIDE_INT sval_int
+ = TREE_INT_CST_LOW (cst_sval->get_constant ());
+ unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
+ if (sval_int % size_cst_int == 0)
+ return true;
+ }
+ break;
+ case svalue_kind::SK_BINOP:
+ {
+ const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
+ if (b_sval->get_op () == MULT_EXPR)
+ {
+ worklist.safe_push (b_sval->get_arg0 ());
+ worklist.safe_push (b_sval->get_arg1 ());
+ }
+ else
+ {
+ non_mult_expr.safe_push (b_sval->get_arg0 ());
+ non_mult_expr.safe_push (b_sval->get_arg1 ());
+ }
+ }
+ break;
+ case svalue_kind::SK_UNARYOP:
+ {
+ const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
+ worklist.safe_push (un_sval->get_arg ());
+ }
+ break;
+ case svalue_kind::SK_UNKNOWN:
+ return true;
+ }
+ }
+
+ /* Each expr should be a multiple of the size.
+ E.g. used to catch n + sizeof(int) errors. */
+ bool reduce = !non_mult_expr.is_empty ();
+ while (!non_mult_expr.is_empty() && reduce)
+ {
+ const svalue *expr_sval = non_mult_expr.pop ();
+ reduce &= const_operand_in_sval_p (expr_sval, size_cst);
+ }
+ return reduce;
+}
+
+/* Returns true iff the type is a struct with another struct inside.
*/
+
+static bool
+struct_or_union_with_inheritance_p (tree type)
+{
+ if (!RECORD_OR_UNION_TYPE_P (type))
+ return false;
+
+ for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
+ if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
+ return true;
+
+ return false;
+}
+
+static void
+check_capacity (sm_context *sm_ctxt,
+ const malloc_state_machine &sm,
+ const supernode *node,
+ const gimple *stmt,
+ tree lhs,
+ tree rhs,
+ const svalue *capacity)
+{
+ tree pointer_type = TREE_TYPE (lhs);
+ gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
+
+ tree pointee_type = TREE_TYPE (pointer_type);
+ /* void * is always compatible. */
+ if (TREE_CODE (pointee_type) == VOID_TYPE)
+ return;
+
+ if (struct_or_union_with_inheritance_p (pointee_type))
+ return;
+
+ tree pointee_size_tree = size_in_bytes(pointee_type);
+ /* The size might be unknown e.g. being a array with n elements
+ or casting to char * never has any trailing bytes. */
+ if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+ || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+ return;
+
+ switch (capacity->get_kind ())
+ {
+ default:
+ break;
+ case svalue_kind::SK_CONSTANT:
+ {
+ const constant_svalue *cst_sval = capacity->dyn_cast_constant_svalue 
();
+ tree cst = cst_sval->get_constant ();
+ unsigned HOST_WIDE_INT size_diff
+ = capacity_compatible_with_type (cst, pointee_size_tree);
+ if (size_diff != 0)
+ {
+ tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+ sm_ctxt->warn (node, stmt, diag_arg,
+ new dubious_allocation_size (sm, lhs, diag_arg,
+ cst, size_diff));
+ }
+ }
+ break;
+ case svalue_kind::SK_BINOP:
+ case svalue_kind::SK_UNARYOP:
+ {
+ if (!const_operand_in_sval_p (capacity, pointee_size_tree))
+ {
+ tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+ sm_ctxt->warn (node, stmt, diag_arg,
+ new dubious_allocation_size (sm, lhs, diag_arg,
+ pointee_size_tree));
+ }
+ }
+ break;
+ }
+}
+
 /* Implementation of state_machine::on_stmt vfunc for 
malloc_state_machine. */

 bool
@@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
       {
  if (known_allocator_p (callee_fndecl, call))
    {
- on_allocator_call (sm_ctxt, call, &m_free);
+ on_allocator_call (sm_ctxt, node, call, &m_free);
      return true;
    }

  if (is_named_call_p (callee_fndecl, "operator new", call, 1))
- on_allocator_call (sm_ctxt, call, &m_scalar_delete);
+ on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
  else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
- on_allocator_call (sm_ctxt, call, &m_vector_delete);
+ on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
  else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
    || is_named_call_p (callee_fndecl, "operator delete", call, 2))
    {
@@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
      tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
      bool returns_nonnull
        = lookup_attribute ("returns_nonnull", attrs);
- on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
+ on_allocator_call (sm_ctxt, node, call, deallocators, 
returns_nonnull);
    }

  /* Handle "__attribute__((nonnull))". */
@@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,
        = mutable_this->get_or_create_deallocator (callee_fndecl);
      on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
    }
+
+ /* Handle returns from function calls. */
+ tree lhs = gimple_call_lhs (call);
+ if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
+ && TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
+ on_pointer_assignment (sm_ctxt, node, call, lhs,
+ gimple_call_fn (call));
       }

   if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
     if (any_pointer_p (lhs))
       on_zero_assignment (sm_ctxt, stmt,lhs);

+ /* Handle pointer assignments/casts for dubious allocation size. */
+ if (const gassign *assign_stmt = dyn_cast <const gassign *> (stmt))
+ {
+ if (gimple_num_ops (stmt) == 2)
+ {
+ tree lhs = gimple_assign_lhs (assign_stmt);
+ tree rhs = gimple_assign_rhs1 (assign_stmt);
+ if (any_pointer_p (lhs) && any_pointer_p (rhs))
+ on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
+ }
+ }
+
   /* Handle dereferences. */
   for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
     {
@@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context 
*sm_ctxt,

 void
 malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
+ const supernode *node,
       const gcall *call,
       const deallocator_set *deallocators,
       bool returns_nonnull) const
@@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call 
(sm_context *sm_ctxt,
      (returns_nonnull
       ? deallocators->m_nonnull
       : deallocators->m_unchecked));
+
+ if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
+ on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn 
(call));
     }
   else
     {
@@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call 
(sm_context *sm_ctxt,
     }
 }

+/* Handle assignments between two pointers.
+ Check for dubious allocation sizes.
+*/
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign_stmt,
+ tree lhs,
+ tree rhs) const
+{
+ /* Do not warn if lhs and rhs are of the same type to not emit 
duplicate
+ warnings on assignments after the cast. */
+ if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE 
(rhs)))
+ return;
+
+ const program_state *state = sm_ctxt->get_old_program_state ();
+ const svalue *r_value = state->m_region_model->get_rvalue (rhs,
NULL);
+ if (const region_svalue *reg = dyn_cast <const region_svalue *> 
(r_value))
+ {
+ const svalue *capacity = state->m_region_model->get_capacity
+ (reg->get_pointee ());
+ check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs,
capacity);
+ }
+}
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+ const supernode *node,
+ const gcall *call,
+ tree lhs,
+ tree fn_decl) const
+{
+ /* Do not warn if lhs and rhs are of the same type to not emit 
duplicate
+ warnings on assignments after the cast. */
+ if (pending_diagnostic::same_tree_p
+ (TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
+ return;
+
+ const program_state *state = sm_ctxt->get_new_program_state ();
+ const svalue *r_value = state->m_region_model->get_rvalue (lhs,
NULL);
+ if (const region_svalue *reg = dyn_cast <const region_svalue *> 
(r_value))
+ {
+ const svalue *capacity = state->m_region_model->get_capacity
+ (reg->get_pointee ());
+ check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, capacity);
+ }
+ else if (const conjured_svalue *con
+ = dyn_cast <const conjured_svalue *> (r_value))
+ {
+ // FIXME: How to get a region_svalue?
+ }
+}
+
 /* Implementation of state_machine::on_phi vfunc for 
malloc_state_machine. */

 void
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..5403c5f41f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,54 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes */
+
+void test_1 (void)
+{
+ short *ptr = malloc (21 * sizeof(short));
+ free (ptr);
+}
+
+void test_2 (void)
+{
+ int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
+ free (ptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } malloc } */
+ /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' leaves
2 trailing bytes" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+ void *ptr = malloc (21 * sizeof (short));
+ short *sptr = (short *)ptr;
+ free (sptr);
+}
+
+void test_4 (void)
+{
+ void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+ int *iptr = (int *)ptr; /* { dg-line assign } */
+ free (iptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } assign } */
+ /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 trailing
bytes" "" { target *-*-* } assign } */
+}
+
+struct s {
+ int i;
+};
+
+void test_5 (void)
+{
+ struct s *ptr = malloc (5 * sizeof (struct s));
+ free (ptr);
+}
+
+void test_6 (void)
+{
+ long *ptr = malloc (5 * sizeof (struct s)); /* { dg-line malloc6 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc6 } */
+ /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..e66d2793f13
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes */
+
+void test_1 (void)
+{
+ int n;
+ scanf("%i", &n);
+ short *ptr = malloc (n * sizeof(short));
+ free (ptr);
+}
+
+void test_2 (void)
+{
+ int n;
+ scanf("%i", &n);
+ int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+ free (ptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } malloc } */
+ /* { dg-message "\\(1\\) Allocation is incompatible with 'int \\*'"
"" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+ int n;
+ scanf("%i", &n);
+ void *ptr = malloc (n * sizeof (short));
+ short *sptr = (short *)ptr;
+ free (sptr);
+}
+
+void test_4 (void)
+{
+ int n;
+ scanf("%i", &n);
+ void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+ int *iptr = (int *)ptr; /* { dg-line assign } */
+ free (iptr);
+
+ /* { dg-warning "Allocated buffer size is not a multiple of the 
pointee's size" "" { target *-*-* } assign } */
+ /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; either
the allocated size at \\(1\\)" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void)
+{
+ int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+ if (id_sequence == NULL) exit (1);
+
+ id_sequence[0] = 13579;
+ id_sequence[1] = 24680;
+ id_sequence[2] = 97531;
+
+ free (id_sequence);
+
+ /* { dg-warning "" "" { target *-*-* } malloc1 } */
+ /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+ int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc2 } */
+ /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+ int n;
+ scanf("%i", &n);
+ int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+ free (ptr);
+
+ /* { dg-warning "" "" { target *-*-* } malloc3 } */
+ /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+ int n;
+ scanf("%i", &n);
+ int m;
+ scanf("%i", &m);
+ int *ptr = malloc ((n + m) * sizeof (int));
+ free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c 
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..4c2b31d6e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,39 @@
+#include <stddef.h>
+#include <stdlib.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+ return malloc(n);
+}
+
+void test_1(void)
+{
+ // FIXME
+ int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } }
*/
+ free (buf);
+}
+
+void test_2(void)
+{
+ void *buf = create_buffer(42); /* { dg-message } */
+ int *ibuf = buf; /* { dg-line assign2 } */
+ free (ibuf);
+
+ /* { dg-warning "" "" { target *-*-* } assign2 } */
+ /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+ void *buf = malloc(42); /* { dg-message } */
+ if (buf != NULL) /* { dg-message } */
+ {
+ int *ibuf = buf; /* { dg-line assign3 } */
+ free (ibuf);
+ }
+
+ /* { dg-warning "" "" { target *-*-* } assign3 } */
+ /* { dg-message "" "" { target *-*-* } assign3 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c 
b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..809ee88cf07 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c. */

 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))

+struct FILE {};
 typedef struct FILE FILE;
 typedef __SIZE_TYPE__ size_t;

diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c 
b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
index 2d124833296..94f569e390b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
@@ -89,8 +89,11 @@ struct s
 static struct s * __attribute__((noinline))
 alloc_s (size_t num)
 {
- struct s *p = malloc (sizeof(struct s) + num);
+ struct s *p = malloc (sizeof(struct s) + num); /* { dg-line malloc } 
*/
   return p;
+
+ /* { dg-warning "" "" { target *-*-* } malloc } */
+ /* { dg-message "" "" { target *-*-* } malloc } */
 }

 struct s *
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c 
b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..0ca94250ba2 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,9 +1,9 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types 
-Wno-analyzer-allocation-size" } */

 #include <stdlib.h>

-struct foo;
-struct bar;
+struct foo {};
+struct bar {};
 void *hv (struct foo **tm)
 {
   void *p = __builtin_malloc (4);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c 
b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);

 int



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] analyzer: add allocation size warning
  2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
  2022-06-17 17:15 ` Prathamesh Kulkarni
  2022-06-17 17:48 ` David Malcolm
@ 2022-06-17 18:34 ` Tim Lange
  2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
  2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
  4 siblings, 0 replies; 17+ messages in thread
From: Tim Lange @ 2022-06-17 18:34 UTC (permalink / raw)
  To: gcc

I think my mail client did apply auto-wrap and reduced multiple spaces to a 
single one while doing so. Here again, the full patch as well as the ASCII
diagnostics. This should look better now.

On constant size allocations:
/path/to/allocation-size-3.c:22:14: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   22 |   int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_2’: event 1
    |
    |   22 |   int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
    |      |              ^~~~~~~~~~~~~~~~~~~~~~~~~
    |      |              |
    |      |              (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing bytes; either the allocated size is bogus or the type on the left-hand side is wrong
    |
    
On symbolic buffer sizes:
/path/to/allocation-size-3.c:33:14: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   33 |   int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~
  ‘test_3’: event 1
    |
    |   33 |   int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
    |      |              ^~~~~~~~~~~~~~~~~~~~~~~~~
    |      |              |
    |      |              (1) Allocation is incompatible with ‘int *’; either the allocated size is bogus or the type on the left-hand side is wrong
    |
   
And this is how a simple flow looks like:
warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
/path/to/allocation-size-2.c:39:8: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   39 |   int *iptr = (int *)ptr; /* { dg-line assign } */
      |        ^~~~
  ‘test_4’: events 1-2
    |
    |   38 |   void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
    |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |               |
    |      |               (1) allocated here
    |   39 |   int *iptr = (int *)ptr; /* { dg-line assign } */
    |      |        ~~~~    
    |      |        |
    |      |        (2) ‘ptr’ is incompatible with ‘int *’; either the allocated size at (1) is bogus or the type on the left-hand side is wrong
    |

This patch adds an allocation size checker to the analyzer.
The checker warns when the tracked buffer size is not a multiple of the left-hand side pointee's type. This resolves PR analyzer/105900.

The patch is not yet fully tested.

gcc/analyzer/ChangeLog:

       * analyzer.opt: Add Wanalyzer-allocation-size.
       * sm-malloc.cc (class dubious_allocation_size): New pending_diagnostic subclass.
       (capacity_compatible_with_type): New.
       (const_operand_in_sval_p): New.
       (struct_or_union_with_inheritance_p): New.
       (check_capacity): New.
       (malloc_state_machine::on_stmt): Add calls to on_pointer_assignment.
       (malloc_state_machine::on_allocator_call): Add node to parameters and call to on_pointer_assignment.
       (malloc_state_machine::on_pointer_assignment): New.

gcc/testsuite/ChangeLog:

       * gcc.dg/analyzer/attr-malloc-6.c: Disabled Wanalyzer-allocation-size and added default implementation for FILE.
       * gcc.dg/analyzer/capacity-1.c: Added dg directives.
       * gcc.dg/analyzer/malloc-4.c: Disabled Wanalyzer-allocation-size.
       * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size and added default implementation for foo and bar.
       * gcc.dg/analyzer/allocation-size-1.c: New test.
       * gcc.dg/analyzer/allocation-size-2.c: New test.
       * gcc.dg/analyzer/allocation-size-3.c: New test.
       * gcc.dg/analyzer/allocation-size-4.c: New test.

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/sm-malloc.cc                     | 363 +++++++++++++++++-
 .../gcc.dg/analyzer/allocation-size-1.c       |  54 +++
 .../gcc.dg/analyzer/allocation-size-2.c       |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c       |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  39 ++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/capacity-1.c    |   5 +-
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c      |   6 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +
 10 files changed, 559 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible type.
+
 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is called.
diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
index 3bd40425919..790c9f0e57d 100644
--- a/gcc/analyzer/sm-malloc.cc
+++ b/gcc/analyzer/sm-malloc.cc
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "attribs.h"
 #include "analyzer/function-set.h"
 #include "analyzer/program-state.h"
+#include "print-tree.h"
+#include "gimple-pretty-print.h"
 
 #if ENABLE_ANALYZER
 
@@ -428,6 +430,7 @@ private:
   get_or_create_deallocator (tree deallocator_fndecl);
 
   void on_allocator_call (sm_context *sm_ctxt,
+	const supernode *node,
 			  const gcall *call,
 			  const deallocator_set *deallocators,
 			  bool returns_nonnull = false) const;
@@ -444,6 +447,16 @@ private:
   void on_realloc_call (sm_context *sm_ctxt,
 			const supernode *node,
 			const gcall *call) const;
+  void on_pointer_assignment(sm_context *sm_ctxt,
+			     const supernode *node,
+			     const gassign *assign_stmt,
+			     tree lhs,
+			     tree rhs) const;
+  void on_pointer_assignment(sm_context *sm_ctxt,
+			     const supernode *node,
+			     const gcall *call,
+			     tree lhs,
+			     tree rhs) const;
   void on_zero_assignment (sm_context *sm_ctxt,
 			   const gimple *stmt,
 			   tree lhs) const;
@@ -1432,6 +1445,117 @@ private:
   const char *m_funcname;
 };
 
+/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
+
+class dubious_allocation_size : public malloc_diagnostic
+{
+public:
+  dubious_allocation_size (const malloc_state_machine &sm, tree lhs, tree rhs,
+			   tree size_tree, unsigned HOST_WIDE_INT size_diff)
+  : malloc_diagnostic(sm, rhs), m_type(dubious_allocation_type::CONSTANT_SIZE), 
+    m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff) 
+    {}
+  
+  dubious_allocation_size (const malloc_state_machine &sm, tree lhs, tree rhs,
+			   tree size_tree)
+  : malloc_diagnostic(sm, rhs), m_type(dubious_allocation_type::MISSING_OPERAND), 
+    m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0) 
+    {}
+
+  const char *get_kind () const final override 
+  { 
+    return "dubious_allocation_size"; 
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool subclass_equal_p (const pending_diagnostic &base_other) const
+  final override
+  {
+    const dubious_allocation_size &other = (const dubious_allocation_size &)base_other;
+    return malloc_diagnostic::subclass_equal_p(other)
+	   && m_type == other.m_type
+	   && same_tree_p (m_lhs, other.m_lhs)
+	   && same_tree_p (m_size_tree, other.m_size_tree)
+	   && m_size_diff == other.m_size_diff;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+    return warning_meta (rich_loc, m, get_controlling_option (),
+	       "Allocated buffer size is not a multiple of the pointee's size");
+  }
+
+  label_text describe_state_change (const evdesc::state_change &change)
+    override
+  {
+    if (change.m_old_state == m_sm.get_start_state ()
+	&& unchecked_p (change.m_new_state))
+      {
+	m_alloc_event = change.m_event_id;
+	if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+	  {
+	    // TODO: verify that it's the allocation stmt, not a copy
+	    return change.formatted_print ("%E bytes allocated here", 
+					   m_size_tree);
+	  }
+      }
+    return malloc_diagnostic::describe_state_change (change);
+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev) final override
+  {
+    if (m_type == dubious_allocation_type::CONSTANT_SIZE)
+      {
+	if (m_alloc_event.known_p ())
+	  return ev.formatted_print (
+	    "Casting %qE to %qT leaves %wu trailing bytes; either the"
+           " allocated size is bogus or the type on the left-hand side is"
+           " wrong",
+	    m_arg, TREE_TYPE (m_lhs), m_size_diff);
+	else
+	  return ev.formatted_print (
+	    "Casting a %E byte buffer to %qT leaves %wu trailing bytes; either"
+            " the allocated size is bogus or the type on the left-hand side is"
+            " wrong",
+	    m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
+      }
+    else if (m_type == dubious_allocation_type::MISSING_OPERAND)
+      {
+	if (m_alloc_event.known_p ())
+	  return ev.formatted_print (
+	    "%qE is incompatible with %qT; either the allocated size at %@ is"
+           " bogus or the type on the left-hand side is wrong",
+	    m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
+	else
+	  return ev.formatted_print (
+	    "Allocation is incompatible with %qT; either the allocated size is"
+           " bogus or the type on the left-hand side is wrong",
+	    TREE_TYPE (m_lhs));
+      }
+
+    gcc_unreachable ();
+    return label_text ();
+  }
+
+private:
+  enum dubious_allocation_type {
+    CONSTANT_SIZE,
+    MISSING_OPERAND
+  };
+
+  dubious_allocation_type m_type;
+  diagnostic_event_id_t m_alloc_event;
+  tree m_lhs;
+  tree m_size_tree;
+  unsigned HOST_WIDE_INT m_size_diff;
+};
+
 /* struct allocation_state : public state_machine::state.  */
 
 /* Implementation of state_machine::state::dump_to_pp vfunc
@@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const gcall *call)
   return false;
 }
 
+/* Returns the trailing bytes on dubious allocation sizes.  */
+
+static unsigned HOST_WIDE_INT 
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
+  if (pointee_size == 0)
+    return 0;
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  return alloc_size % pointee_size;
+}
+
+/* Returns true if there is a constant tree with 
+   the same constant value inside the sval.  */
+
+static bool
+const_operand_in_sval_p (const svalue *sval, tree size_cst)
+{
+  auto_vec<const svalue *> non_mult_expr;
+  auto_vec<const svalue *> worklist;
+  worklist.safe_push(sval);
+  while (!worklist.is_empty())
+    {
+      const svalue *curr = worklist.pop ();
+      curr = curr->unwrap_any_unmergeable ();
+
+      switch (curr->get_kind())
+	{
+	default:
+	  break;
+	case svalue_kind::SK_CONSTANT:
+	  {
+	    const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue ();
+      unsigned HOST_WIDE_INT sval_int
+			      = TREE_INT_CST_LOW (cst_sval->get_constant ());
+      unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
+	    if (sval_int % size_cst_int == 0)
+	      return true;
+	  }
+	  break;
+	case svalue_kind::SK_BINOP:
+	  {
+	    const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
+      if (b_sval->get_op () == MULT_EXPR)
+	{
+	  worklist.safe_push (b_sval->get_arg0 ());
+	  worklist.safe_push (b_sval->get_arg1 ());
+	}
+      else
+	{
+	  non_mult_expr.safe_push (b_sval->get_arg0 ());
+	  non_mult_expr.safe_push (b_sval->get_arg1 ());
+	}
+	  }
+	  break;
+	case svalue_kind::SK_UNARYOP:
+	  {
+	    const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
+	    worklist.safe_push (un_sval->get_arg ());
+	  }
+	  break;
+	case svalue_kind::SK_UNKNOWN:
+	  return true;
+	}
+    }
+
+  /* Each expr should be a multiple of the size. 
+     E.g. used to catch n + sizeof(int) errors.  */
+  bool reduce = !non_mult_expr.is_empty ();
+  while (!non_mult_expr.is_empty() && reduce)
+    {
+      const svalue *expr_sval = non_mult_expr.pop ();
+      reduce &= const_operand_in_sval_p (expr_sval, size_cst);
+    }
+  return reduce;
+}
+
+/* Returns true iff the type is a struct with another struct inside.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree type)
+{
+  if (!RECORD_OR_UNION_TYPE_P (type))
+    return false;
+
+  for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
+    if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
+      return true;
+
+  return false;
+}
+
+static void
+check_capacity (sm_context *sm_ctxt, 
+		const malloc_state_machine &sm,
+		const supernode *node,
+		const gimple *stmt,
+		tree lhs,
+		tree rhs,
+		const svalue *capacity)
+{
+  tree pointer_type = TREE_TYPE (lhs);
+  gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* void * is always compatible.  */
+  if (TREE_CODE (pointee_type) == VOID_TYPE)
+    return;
+
+  if (struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes(pointee_type);
+  /* The size might be unknown e.g. being a array with n elements
+     or casting to char * never has any trailing bytes.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+    return;
+
+  switch (capacity->get_kind ())
+    {
+    default:
+      break;
+    case svalue_kind::SK_CONSTANT:
+      {
+	const constant_svalue *cst_sval = capacity->dyn_cast_constant_svalue ();
+	tree cst = cst_sval->get_constant ();
+	unsigned HOST_WIDE_INT size_diff
+	  = capacity_compatible_with_type (cst, pointee_size_tree);
+	if (size_diff != 0)
+	  {
+	    tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+	    sm_ctxt->warn (node, stmt, diag_arg, 
+			  new dubious_allocation_size (sm, lhs, diag_arg,
+						       cst, size_diff));
+	  }
+      }
+      break;
+    case svalue_kind::SK_BINOP:
+    case svalue_kind::SK_UNARYOP:
+      {
+	if (!const_operand_in_sval_p (capacity, pointee_size_tree))
+	  {
+	    tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
+	    sm_ctxt->warn (node, stmt, diag_arg, 
+			  new dubious_allocation_size (sm, lhs, diag_arg,
+						       pointee_size_tree));
+	  }
+      }
+      break;
+    }
+}
+
 /* Implementation of state_machine::on_stmt vfunc for malloc_state_machine.  */
 
 bool
@@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
       {
 	if (known_allocator_p (callee_fndecl, call))
 	  {
-	    on_allocator_call (sm_ctxt, call, &m_free);
+	    on_allocator_call (sm_ctxt, node, call, &m_free);
 	    return true;
 	  }
 
 	if (is_named_call_p (callee_fndecl, "operator new", call, 1))
-	  on_allocator_call (sm_ctxt, call, &m_scalar_delete);
+	  on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
 	else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
-	  on_allocator_call (sm_ctxt, call, &m_vector_delete);
+	  on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
 	else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
 		 || is_named_call_p (callee_fndecl, "operator delete", call, 2))
 	  {
@@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
 	    tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
 	    bool returns_nonnull
 	      = lookup_attribute ("returns_nonnull", attrs);
-	    on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
+	    on_allocator_call (sm_ctxt, node, call, deallocators, returns_nonnull);
 	  }
 
 	/* Handle "__attribute__((nonnull))".   */
@@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
 	      = mutable_this->get_or_create_deallocator (callee_fndecl);
 	    on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
 	  }
+
+  /* Handle returns from function calls.  */ 
+	tree lhs = gimple_call_lhs (call);
+	if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
+		&& TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
+	  on_pointer_assignment (sm_ctxt, node, call, lhs, 
+				gimple_call_fn (call));
       }
 
   if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
     if (any_pointer_p (lhs))
       on_zero_assignment (sm_ctxt, stmt,lhs);
 
+  /* Handle pointer assignments/casts for dubious allocation size.  */
+  if (const gassign *assign_stmt = dyn_cast <const gassign *> (stmt)) 
+    {
+      if (gimple_num_ops (stmt) == 2) 
+	{
+	  tree lhs = gimple_assign_lhs (assign_stmt);
+	  tree rhs = gimple_assign_rhs1 (assign_stmt);
+	  if (any_pointer_p (lhs) && any_pointer_p (rhs))
+	      on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
+	}
+    }
+
   /* Handle dereferences.  */
   for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
     {
@@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context *sm_ctxt,
 
 void
 malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
+					 const supernode *node,
 					 const gcall *call,
 					 const deallocator_set *deallocators,
 					 bool returns_nonnull) const
@@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
 				 (returns_nonnull
 				  ? deallocators->m_nonnull
 				  : deallocators->m_unchecked));
+      
+      if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
+	on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn (call));
     }
   else
     {
@@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call (sm_context *sm_ctxt,
     }
 }
 
+/* Handle assignments between two pointers.
+   Check for dubious allocation sizes.
+*/
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+		      const supernode *node,
+		      const gassign *assign_stmt,
+		      tree lhs,
+		      tree rhs) const
+{
+  /* Do not warn if lhs and rhs are of the same type to not emit duplicate
+      warnings on assignments after the cast.  */
+  if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
+    return;
+
+  const program_state *state = sm_ctxt->get_old_program_state ();
+  const svalue *r_value = state->m_region_model->get_rvalue (rhs, NULL);
+  if (const region_svalue *reg = dyn_cast <const region_svalue *> (r_value))
+    {
+      const svalue *capacity = state->m_region_model->get_capacity 
+	    (reg->get_pointee ());
+      check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs, capacity);
+    }
+}
+
+void
+malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
+		      const supernode *node,
+		      const gcall *call,
+		      tree lhs,
+		      tree fn_decl) const
+{
+  /* Do not warn if lhs and rhs are of the same type to not emit duplicate
+      warnings on assignments after the cast.  */
+  if (pending_diagnostic::same_tree_p 
+	(TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
+    return;
+
+  const program_state *state = sm_ctxt->get_new_program_state ();
+  const svalue *r_value = state->m_region_model->get_rvalue (lhs, NULL);
+  if (const region_svalue *reg = dyn_cast <const region_svalue *> (r_value))
+    {
+      const svalue *capacity = state->m_region_model->get_capacity 
+	    (reg->get_pointee ());
+      check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, capacity);
+    }
+  else if (const conjured_svalue *con
+	     = dyn_cast <const conjured_svalue *> (r_value))
+    {
+      // FIXME: How to get a region_svalue? 
+    }
+}
+
 /* Implementation of state_machine::on_phi vfunc for malloc_state_machine.  */
 
 void
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..5403c5f41f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,54 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
+  free (ptr);
+
+  /* { dg-warning "Allocated buffer size is not a multiple of the pointee's size" "" { target *-*-* } malloc } */
+  /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' leaves 2 trailing bytes" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "Allocated buffer size is not a multiple of the pointee's size" "" { target *-*-* } assign } */
+  /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 trailing bytes" "" { target *-*-* } assign } */
+}
+
+struct s {
+  int i;
+};
+
+void test_5 (void)
+{
+  struct s *ptr = malloc (5 * sizeof (struct s));
+  free (ptr);
+}
+
+void test_6 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct s));  /* { dg-line malloc6 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..e66d2793f13
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes */
+
+void test_1 (void)
+{
+  int n;
+  scanf("%i", &n);
+  short *ptr = malloc (n * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+  free (ptr);
+
+  /* { dg-warning "Allocated buffer size is not a multiple of the pointee's size" "" { target *-*-* } malloc } */
+  /* { dg-message "\\(1\\) Allocation is incompatible with 'int \\*'" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "Allocated buffer size is not a multiple of the pointee's size" "" { target *-*-* } assign } */
+  /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; either the allocated size at \\(1\\)" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "" "" { target *-*-* } malloc1 } */
+  /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc3 } */
+  /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+  int m;
+  scanf("%i", &m);
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..4c2b31d6e0a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,39 @@
+#include <stddef.h>
+#include <stdlib.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_1(void) 
+{
+  // FIXME
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
+
+void test_2(void) 
+{
+  void *buf = create_buffer(42); /* { dg-message } */
+  int *ibuf = buf; /* { dg-line assign2 } */
+  free (ibuf);
+
+  /* { dg-warning "" "" { target *-*-* } assign2 } */
+  /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+  void *buf = malloc(42); /* { dg-message } */
+  if (buf != NULL) /* { dg-message } */
+    {
+      int *ibuf = buf; /* { dg-line assign3 } */
+      free (ibuf);
+    }
+
+  /* { dg-warning "" "" { target *-*-* } assign3 } */
+  /* { dg-message "" "" { target *-*-* } assign3 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..809ee88cf07 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c.  */
 
 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
 
+struct FILE {};
 typedef struct FILE   FILE;
 typedef __SIZE_TYPE__ size_t;
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
index 2d124833296..94f569e390b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
@@ -89,8 +89,11 @@ struct s
 static struct s * __attribute__((noinline))
 alloc_s (size_t num)
 {
-  struct s *p = malloc (sizeof(struct s) + num);
+  struct s *p = malloc (sizeof(struct s) + num); /* { dg-line malloc } */
   return p;
+
+  /* { dg-warning "" "" { target *-*-* } malloc } */
+  /* { dg-message "" "" { target *-*-* } malloc } */
 }
 
 struct s *
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..0ca94250ba2 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,9 +1,9 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types -Wno-analyzer-allocation-size" } */
 
 #include <stdlib.h>
 
-struct foo;
-struct bar;
+struct foo {};
+struct bar {};
 void *hv (struct foo **tm)
 {
   void *p = __builtin_malloc (4);
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
 
 int
-- 
2.36.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 17:15 ` Prathamesh Kulkarni
@ 2022-06-17 19:23   ` Tim Lange
  2022-06-17 21:39     ` David Malcolm
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-17 19:23 UTC (permalink / raw)
  To: Prathamesh Kulkarni; +Cc: David Malcolm, GCC Mailing List



On Fr, Jun 17 2022 at 22:45:42 +0530, Prathamesh Kulkarni 
<prathamesh.kulkarni@linaro.org> wrote:
> On Fri, 17 Jun 2022 at 21:25, Tim Lange <mail@tim-lange.me> wrote:
>> 
>>  Hi everyone,
> Hi Tim,
> Thanks for posting the POC patch!
> Just a couple of comments (inline)
Hi Prathamesh,
thanks for looking at it.
>> 
>>  tracked in PR105900 [0], I'd like to add support for a new warning 
>> on
>>  dubious allocation sizes. The new checker emits a warning when the
>>  allocation size is not a multiple of the type's size. With the 
>> checker,
>>  following mistakes are detected:
>>    int *arr = malloc(3); // forgot to multiply by sizeof
>>    arr[0] = ...;
>>    arr[1] = ...;
>>  or
>>    int *buf = malloc (n + sizeof(int)); // probably should be * 
>> instead
>>  of +
>>  Because it is implemented inside the analyzer, it also emits 
>> warnings
>>  when the buffer is first of type void* and later on casted to 
>> something
>>  else. Though, this also inherits a limitation. The checker can not
>>  distinguish 2 * sizeof(short) from sizeof(int) because sizeof is
>>  resolved and constants are folded at the point when the analyzer 
>> runs.
>>  As a mitigation, I plan to implement a check in the frontend that 
>> emits
>>  a warning if sizeof(lhs pointee type) is not part of the malloc
>>  argument.
> IMHO, warning if sizeof(lhs pointee_type) is not present inside
> malloc, might not be a good idea because it
> would reject valid calls to malloc.
> For eg:
> (1)
> size_t size = sizeof(int);
> int *p = malloc (size);
> 
> (2)
> void *p = malloc (sizeof(int));
> int *q = p;
Hm, that's right. Maybe only warn when there is a sizeof(type) in the 
argument and the lhs pointee_type != type (except for void*, maybe 
char* and "inherited" structs)?
>> 
>>  I'm looking for a first feedback on the phrasing of the diagnostics 
>> as
>>  well on the preliminary patch [1].
>> 
>>  On constant buffer sizes, the warnings looks like this:
>>  warning: Allocated buffer size is not a multiple of the pointee's 
>> size
>>  [CWE-131] [-Wanalyzer-allocation-size]
>>     22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 
>> } */
>>        | ^~~~~~~~~~~~~~~~~~~~~~~~~
>>    ‘test_2’: event 1
>>      |
>>      | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line 
>> malloc2 }
>>  */
>>      | | ^~~~~~~~~~~~~~~~~~~~~~~~~
>>      | | |
>>      | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 
>> trailing
>>  bytes; either the allocated size is bogus or the type on the 
>> left-hand
>>  side is wrong
>>      |
>> 
>>  On symbolic buffer sizes:
>>  warning: Allocated buffer size is not a multiple of the pointee's 
>> size
>>  [CWE-131] [-Wanalyzer-allocation-size]
>>     33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } 
>> */
>>        | ^~~~~~~~~~~~~~~~~~~~~~~~
>>    ‘test_3’: event 1
>>      |
>>      | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line 
>> malloc3 }
>>  */
>>      | | ^~~~~~~~~~~~~~~~~~~~~~~~
>>      | | |
>>      | | (1) Allocation is incompatible with ‘int *’; either the
>>  allocated size is bogus or the type on the left-hand side is wrong
>>      |
> Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ?
> I assume by symbolic buffer size, 'n' is not known at compile time.
* VLAs are resolved to n * sizeof(type) when the analyzer runs and work 
fine.
* Flows with if (cond) n = ...; else n = ...; are tracked by the 
analyzer with a widening_svalue and can be handled (While thinking 
about this answer, I noticed my patch is missing this case. Thanks!)
* In case of more complicated flows, the analyzer's buffer size 
tracking resorts to unknown_svalue. If any variable in an expression is 
unknown, no warning will be emitted.
* Generally, when requesting memory for a variable type, accepting an 
arbitrary number doesn't sound right. I do warn, e.g. if 'n' is a 
conjured_svalue (e.g. a from scanf call).

I think only the last case could in theory be a false-positive. I've 
noticed that this is the case when 'n' is guarded by an if making sure 
n is only a multiple of sizeof(type). In theory, I can fix this case 
too as the analysis is path-sensitive.
Do you know of some other case where 'n' might be an unknown value 
neither guarded an if condition nor resorted to 'unknown' by a 
complicated flow but still correct?

- Tim
> 
> Thanks,
> Prathamesh
>> 
>>  And this is how a simple flow looks like:
>>  warning: Allocated buffer size is not a multiple of the pointee's 
>> size
>>  [CWE-131] [-Wanalyzer-allocation-size]
>>     39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>>        | ^~~~
>>    ‘test_4’: events 1-2
>>      |
>>      | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message 
>> } */
>>      | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>>      | | |
>>      | | (1) allocated here
>>      | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>>      | | ~~~~
>>      | | |
>>      | | (2) ‘ptr’ is incompatible with ‘int *’; either the
>>  allocated size at (1) is bogus or the type on the left-hand side is
>>  wrong
>>      |
>> 
>>  There are some things to discuss from my side:
>>  * The tests with the "toy re-implementation of CPython's object
>>  model"[2] fail due to a extra warning emitted. Because the analyzer
>>  can't know the calculation actually results in a correct buffer size
>>  when viewed as a string_obj later on, it emits a warning, e.g. at 
>> line
>>  61 in data-model-5.c. The only mitigation would be to disable the
>>  warning for structs entirely. Now, the question is to rather have 
>> noise
>>  on these cases or disable the warning for structs entirely?
>>  * I'm unable to emit a warning whenever the cast happens at an
>>  assignment with a call as the rhs, e.g. test_1 in 
>> allocation-size-4.c.
>>  This is because I'm unable to access a region_svalue for the 
>> returned
>>  value. Even in the new_program_state, the svalue of the lhs is 
>> still a
>>  conjured_svalue. Maybe David can lead me to a place where I can 
>> access
>>  the return value's region_svalue or do I have to adapt the engine?
>>  * attr-malloc-6.c and pr96639.c did both contain structs without an
>>  implementation. Something in the analyzer must have triggered 
>> another
>>  warning about the usage of those without them having an 
>> implementation.
>>  I changed those structs to have an empty implementation, such that 
>> the
>>  additional warning are gone. I think this shouldn't change the test
>>  case, so is this change okay?
>> 
>>  - Tim
>> 
>>  [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105900
>>  [1] While all tests except the cpython ones work, I have yet to 
>> test it
>>  on large C projects
>>  [2] FAIL: gcc.dg/analyzer/data-model-5.c (test for excess errors)
>>      FAIL: gcc.dg/analyzer/data-model-5b.c (test for excess errors)
>>      FAIL: gcc.dg/analyzer/data-model-5c.c (test for excess errors)
>>      FAIL: gcc.dg/analyzer/data-model-5d.c (test for excess errors)
>>      FAIL: gcc.dg/analyzer/first-field-2.c (test for excess errors)
>> 
>>  -------
>> 
>>  Subject: [PATCH] analyzer: add allocation size warning
>> 
>>  This patch adds an allocation size checker to the analyzer.
>>  The checker warns when the tracked buffer size is not a multiple of 
>> the
>>  left-hand side pointee's type. This resolves PR analyzer/105900.
>> 
>>  The patch is not yet fully tested.
>> 
>>  gcc/analyzer/ChangeLog:
>> 
>>          * analyzer.opt: Add Wanalyzer-allocation-size.
>>          * sm-malloc.cc (class dubious_allocation_size): New
>>  pending_diagnostic subclass.
>>          (capacity_compatible_with_type): New.
>>          (const_operand_in_sval_p): New.
>>          (struct_or_union_with_inheritance_p): New.
>>          (check_capacity): New.
>>          (malloc_state_machine::on_stmt): Add calls to
>>  on_pointer_assignment.
>>          (malloc_state_machine::on_allocator_call): Add node to
>>  parameters and call to on_pointer_assignment.
>>          (malloc_state_machine::on_pointer_assignment): New.
>> 
>>  gcc/testsuite/ChangeLog:
>> 
>>          * gcc.dg/analyzer/attr-malloc-6.c: Disabled
>>  Wanalyzer-allocation-size and added default implementation for FILE.
>>          * gcc.dg/analyzer/capacity-1.c: Added dg directives.
>>          * gcc.dg/analyzer/malloc-4.c: Disabled
>>  Wanalyzer-allocation-size.
>>          * gcc.dg/analyzer/pr96639.c: Disabled 
>> Wanalyzer-allocation-size
>>  and added default implementation for foo and bar.
>>          * gcc.dg/analyzer/allocation-size-1.c: New test.
>>          * gcc.dg/analyzer/allocation-size-2.c: New test.
>>          * gcc.dg/analyzer/allocation-size-3.c: New test.
>>          * gcc.dg/analyzer/allocation-size-4.c: New test.
>> 
>>  Signed-off-by: Tim Lange <mail@tim-lange.me>
>>  ---
>>   gcc/analyzer/analyzer.opt | 4 +
>>   gcc/analyzer/sm-malloc.cc | 363 +++++++++++++++++-
>>   .../gcc.dg/analyzer/allocation-size-1.c | 54 +++
>>   .../gcc.dg/analyzer/allocation-size-2.c | 44 +++
>>   .../gcc.dg/analyzer/allocation-size-3.c | 48 +++
>>   .../gcc.dg/analyzer/allocation-size-4.c | 39 ++
>>   gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
>>   gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
>>   gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
>>   gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
>>   10 files changed, 559 insertions(+), 8 deletions(-)
>>   create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>>   create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>>   create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>>   create mode 100644 
>> gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
>> 
>>  diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
>>  index 4aea52d3a87..f213989e0bb 100644
>>  --- a/gcc/analyzer/analyzer.opt
>>  +++ b/gcc/analyzer/analyzer.opt
>>  @@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
>>   Common Var(warn_analyzer_malloc_leak) Init(1) Warning
>>   Warn about code paths in which a heap-allocated pointer leaks.
>> 
>>  +Wanalyzer-allocation-size
>>  +Common Var(warn_analyzer_allocation_size) Init(1) Warning
>>  +Warn about code paths in which a buffer is assigned to a 
>> incompatible
>>  type.
>>  +
>>   Wanalyzer-mismatching-deallocation
>>   Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
>>   Warn about code paths in which the wrong deallocation function is
>>  called.
>>  diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
>>  index 3bd40425919..790c9f0e57d 100644
>>  --- a/gcc/analyzer/sm-malloc.cc
>>  +++ b/gcc/analyzer/sm-malloc.cc
>>  @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
>>   #include "attribs.h"
>>   #include "analyzer/function-set.h"
>>   #include "analyzer/program-state.h"
>>  +#include "print-tree.h"
>>  +#include "gimple-pretty-print.h"
>> 
>>   #if ENABLE_ANALYZER
>> 
>>  @@ -428,6 +430,7 @@ private:
>>     get_or_create_deallocator (tree deallocator_fndecl);
>> 
>>     void on_allocator_call (sm_context *sm_ctxt,
>>  + const supernode *node,
>>        const gcall *call,
>>        const deallocator_set *deallocators,
>>        bool returns_nonnull = false) const;
>>  @@ -444,6 +447,16 @@ private:
>>     void on_realloc_call (sm_context *sm_ctxt,
>>      const supernode *node,
>>      const gcall *call) const;
>>  + void on_pointer_assignment(sm_context *sm_ctxt,
>>  + const supernode *node,
>>  + const gassign *assign_stmt,
>>  + tree lhs,
>>  + tree rhs) const;
>>  + void on_pointer_assignment(sm_context *sm_ctxt,
>>  + const supernode *node,
>>  + const gcall *call,
>>  + tree lhs,
>>  + tree rhs) const;
>>     void on_zero_assignment (sm_context *sm_ctxt,
>>         const gimple *stmt,
>>         tree lhs) const;
>>  @@ -1432,6 +1445,117 @@ private:
>>     const char *m_funcname;
>>   };
>> 
>>  +/* Concrete subclass for casts of pointers that lead to trailing
>>  bytes. */
>>  +
>>  +class dubious_allocation_size : public malloc_diagnostic
>>  +{
>>  +public:
>>  + dubious_allocation_size (const malloc_state_machine &sm, tree lhs,
>>  tree rhs,
>>  + tree size_tree, unsigned HOST_WIDE_INT size_diff)
>>  + : malloc_diagnostic(sm, rhs),
>>  m_type(dubious_allocation_type::CONSTANT_SIZE),
>>  + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
>>  + {}
>>  +
>>  + dubious_allocation_size (const malloc_state_machine &sm, tree lhs,
>>  tree rhs,
>>  + tree size_tree)
>>  + : malloc_diagnostic(sm, rhs),
>>  m_type(dubious_allocation_type::MISSING_OPERAND),
>>  + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
>>  + {}
>>  +
>>  + const char *get_kind () const final override
>>  + {
>>  + return "dubious_allocation_size";
>>  + }
>>  +
>>  + int get_controlling_option () const final override
>>  + {
>>  + return OPT_Wanalyzer_allocation_size;
>>  + }
>>  +
>>  + bool subclass_equal_p (const pending_diagnostic &base_other) const
>>  + final override
>>  + {
>>  + const dubious_allocation_size &other = (const 
>> dubious_allocation_size
>>  &)base_other;
>>  + return malloc_diagnostic::subclass_equal_p(other)
>>  + && m_type == other.m_type
>>  + && same_tree_p (m_lhs, other.m_lhs)
>>  + && same_tree_p (m_size_tree, other.m_size_tree)
>>  + && m_size_diff == other.m_size_diff;
>>  + }
>>  +
>>  + bool emit (rich_location *rich_loc) final override
>>  + {
>>  + diagnostic_metadata m;
>>  + m.add_cwe (131);
>>  + return warning_meta (rich_loc, m, get_controlling_option (),
>>  + "Allocated buffer size is not a multiple of the pointee's size");
>>  + }
>>  +
>>  + label_text describe_state_change (const evdesc::state_change 
>> &change)
>>  + override
>>  + {
>>  + if (change.m_old_state == m_sm.get_start_state ()
>>  + && unchecked_p (change.m_new_state))
>>  + {
>>  + m_alloc_event = change.m_event_id;
>>  + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
>>  + {
>>  + // TODO: verify that it's the allocation stmt, not a copy
>>  + return change.formatted_print ("%E bytes allocated here",
>>  + m_size_tree);
>>  + }
>>  + }
>>  + return malloc_diagnostic::describe_state_change (change);
>>  + }
>>  +
>>  + label_text describe_final_event (const evdesc::final_event &ev) 
>> final
>>  override
>>  + {
>>  + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
>>  + {
>>  + if (m_alloc_event.known_p ())
>>  + return ev.formatted_print (
>>  + "Casting %qE to %qT leaves %wu trailing bytes; either the"
>>  + " allocated size is bogus or the type on the left-hand side is"
>>  + " wrong",
>>  + m_arg, TREE_TYPE (m_lhs), m_size_diff);
>>  + else
>>  + return ev.formatted_print (
>>  + "Casting a %E byte buffer to %qT leaves %wu trailing bytes; 
>> either"
>>  + " the allocated size is bogus or the type on the left-hand side 
>> is"
>>  + " wrong",
>>  + m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
>>  + }
>>  + else if (m_type == dubious_allocation_type::MISSING_OPERAND)
>>  + {
>>  + if (m_alloc_event.known_p ())
>>  + return ev.formatted_print (
>>  + "%qE is incompatible with %qT; either the allocated size at %@ is"
>>  + " bogus or the type on the left-hand side is wrong",
>>  + m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
>>  + else
>>  + return ev.formatted_print (
>>  + "Allocation is incompatible with %qT; either the allocated size 
>> is"
>>  + " bogus or the type on the left-hand side is wrong",
>>  + TREE_TYPE (m_lhs));
>>  + }
>>  +
>>  + gcc_unreachable ();
>>  + return label_text ();
>>  + }
>>  +
>>  +private:
>>  + enum dubious_allocation_type {
>>  + CONSTANT_SIZE,
>>  + MISSING_OPERAND
>>  + };
>>  +
>>  + dubious_allocation_type m_type;
>>  + diagnostic_event_id_t m_alloc_event;
>>  + tree m_lhs;
>>  + tree m_size_tree;
>>  + unsigned HOST_WIDE_INT m_size_diff;
>>  +};
>>  +
>>   /* struct allocation_state : public state_machine::state. */
>> 
>>   /* Implementation of state_machine::state::dump_to_pp vfunc
>>  @@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const
>>  gcall *call)
>>     return false;
>>   }
>> 
>>  +/* Returns the trailing bytes on dubious allocation sizes. */
>>  +
>>  +static unsigned HOST_WIDE_INT
>>  +capacity_compatible_with_type (tree cst, tree pointee_size_tree)
>>  +{
>>  + unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW
>>  (pointee_size_tree);
>>  + if (pointee_size == 0)
>>  + return 0;
>>  + unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
>>  +
>>  + return alloc_size % pointee_size;
>>  +}
>>  +
>>  +/* Returns true if there is a constant tree with
>>  + the same constant value inside the sval. */
>>  +
>>  +static bool
>>  +const_operand_in_sval_p (const svalue *sval, tree size_cst)
>>  +{
>>  + auto_vec<const svalue *> non_mult_expr;
>>  + auto_vec<const svalue *> worklist;
>>  + worklist.safe_push(sval);
>>  + while (!worklist.is_empty())
>>  + {
>>  + const svalue *curr = worklist.pop ();
>>  + curr = curr->unwrap_any_unmergeable ();
>>  +
>>  + switch (curr->get_kind())
>>  + {
>>  + default:
>>  + break;
>>  + case svalue_kind::SK_CONSTANT:
>>  + {
>>  + const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue 
>> ();
>>  + unsigned HOST_WIDE_INT sval_int
>>  + = TREE_INT_CST_LOW (cst_sval->get_constant ());
>>  + unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
>>  + if (sval_int % size_cst_int == 0)
>>  + return true;
>>  + }
>>  + break;
>>  + case svalue_kind::SK_BINOP:
>>  + {
>>  + const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
>>  + if (b_sval->get_op () == MULT_EXPR)
>>  + {
>>  + worklist.safe_push (b_sval->get_arg0 ());
>>  + worklist.safe_push (b_sval->get_arg1 ());
>>  + }
>>  + else
>>  + {
>>  + non_mult_expr.safe_push (b_sval->get_arg0 ());
>>  + non_mult_expr.safe_push (b_sval->get_arg1 ());
>>  + }
>>  + }
>>  + break;
>>  + case svalue_kind::SK_UNARYOP:
>>  + {
>>  + const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
>>  + worklist.safe_push (un_sval->get_arg ());
>>  + }
>>  + break;
>>  + case svalue_kind::SK_UNKNOWN:
>>  + return true;
>>  + }
>>  + }
>>  +
>>  + /* Each expr should be a multiple of the size.
>>  + E.g. used to catch n + sizeof(int) errors. */
>>  + bool reduce = !non_mult_expr.is_empty ();
>>  + while (!non_mult_expr.is_empty() && reduce)
>>  + {
>>  + const svalue *expr_sval = non_mult_expr.pop ();
>>  + reduce &= const_operand_in_sval_p (expr_sval, size_cst);
>>  + }
>>  + return reduce;
>>  +}
>>  +
>>  +/* Returns true iff the type is a struct with another struct 
>> inside. */
>>  +
>>  +static bool
>>  +struct_or_union_with_inheritance_p (tree type)
>>  +{
>>  + if (!RECORD_OR_UNION_TYPE_P (type))
>>  + return false;
>>  +
>>  + for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
>>  + if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
>>  + return true;
>>  +
>>  + return false;
>>  +}
>>  +
>>  +static void
>>  +check_capacity (sm_context *sm_ctxt,
>>  + const malloc_state_machine &sm,
>>  + const supernode *node,
>>  + const gimple *stmt,
>>  + tree lhs,
>>  + tree rhs,
>>  + const svalue *capacity)
>>  +{
>>  + tree pointer_type = TREE_TYPE (lhs);
>>  + gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
>>  +
>>  + tree pointee_type = TREE_TYPE (pointer_type);
>>  + /* void * is always compatible. */
>>  + if (TREE_CODE (pointee_type) == VOID_TYPE)
>>  + return;
>>  +
>>  + if (struct_or_union_with_inheritance_p (pointee_type))
>>  + return;
>>  +
>>  + tree pointee_size_tree = size_in_bytes(pointee_type);
>>  + /* The size might be unknown e.g. being a array with n elements
>>  + or casting to char * never has any trailing bytes. */
>>  + if (TREE_CODE (pointee_size_tree) != INTEGER_CST
>>  + || TREE_INT_CST_LOW (pointee_size_tree) == 1)
>>  + return;
>>  +
>>  + switch (capacity->get_kind ())
>>  + {
>>  + default:
>>  + break;
>>  + case svalue_kind::SK_CONSTANT:
>>  + {
>>  + const constant_svalue *cst_sval = 
>> capacity->dyn_cast_constant_svalue
>>  ();
>>  + tree cst = cst_sval->get_constant ();
>>  + unsigned HOST_WIDE_INT size_diff
>>  + = capacity_compatible_with_type (cst, pointee_size_tree);
>>  + if (size_diff != 0)
>>  + {
>>  + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
>>  + sm_ctxt->warn (node, stmt, diag_arg,
>>  + new dubious_allocation_size (sm, lhs, diag_arg,
>>  + cst, size_diff));
>>  + }
>>  + }
>>  + break;
>>  + case svalue_kind::SK_BINOP:
>>  + case svalue_kind::SK_UNARYOP:
>>  + {
>>  + if (!const_operand_in_sval_p (capacity, pointee_size_tree))
>>  + {
>>  + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
>>  + sm_ctxt->warn (node, stmt, diag_arg,
>>  + new dubious_allocation_size (sm, lhs, diag_arg,
>>  + pointee_size_tree));
>>  + }
>>  + }
>>  + break;
>>  + }
>>  +}
>>  +
>>   /* Implementation of state_machine::on_stmt vfunc for
>>  malloc_state_machine. */
>> 
>>   bool
>>  @@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context
>>  *sm_ctxt,
>>         {
>>    if (known_allocator_p (callee_fndecl, call))
>>      {
>>  - on_allocator_call (sm_ctxt, call, &m_free);
>>  + on_allocator_call (sm_ctxt, node, call, &m_free);
>>        return true;
>>      }
>> 
>>    if (is_named_call_p (callee_fndecl, "operator new", call, 1))
>>  - on_allocator_call (sm_ctxt, call, &m_scalar_delete);
>>  + on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
>>    else if (is_named_call_p (callee_fndecl, "operator new []", call, 
>> 1))
>>  - on_allocator_call (sm_ctxt, call, &m_vector_delete);
>>  + on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
>>    else if (is_named_call_p (callee_fndecl, "operator delete", call, 
>> 1)
>>      || is_named_call_p (callee_fndecl, "operator delete", call, 2))
>>      {
>>  @@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context
>>  *sm_ctxt,
>>        tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
>>        bool returns_nonnull
>>          = lookup_attribute ("returns_nonnull", attrs);
>>  - on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
>>  + on_allocator_call (sm_ctxt, node, call, deallocators,
>>  returns_nonnull);
>>      }
>> 
>>    /* Handle "__attribute__((nonnull))". */
>>  @@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context
>>  *sm_ctxt,
>>          = mutable_this->get_or_create_deallocator (callee_fndecl);
>>        on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
>>      }
>>  +
>>  + /* Handle returns from function calls. */
>>  + tree lhs = gimple_call_lhs (call);
>>  + if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
>>  + && TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
>>  + on_pointer_assignment (sm_ctxt, node, call, lhs,
>>  + gimple_call_fn (call));
>>         }
>> 
>>     if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
>>       if (any_pointer_p (lhs))
>>         on_zero_assignment (sm_ctxt, stmt,lhs);
>> 
>>  + /* Handle pointer assignments/casts for dubious allocation size. 
>> */
>>  + if (const gassign *assign_stmt = dyn_cast <const gassign *> 
>> (stmt))
>>  + {
>>  + if (gimple_num_ops (stmt) == 2)
>>  + {
>>  + tree lhs = gimple_assign_lhs (assign_stmt);
>>  + tree rhs = gimple_assign_rhs1 (assign_stmt);
>>  + if (any_pointer_p (lhs) && any_pointer_p (rhs))
>>  + on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
>>  + }
>>  + }
>>  +
>>     /* Handle dereferences. */
>>     for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
>>       {
>>  @@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context
>>  *sm_ctxt,
>> 
>>   void
>>   malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
>>  + const supernode *node,
>>         const gcall *call,
>>         const deallocator_set *deallocators,
>>         bool returns_nonnull) const
>>  @@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call
>>  (sm_context *sm_ctxt,
>>        (returns_nonnull
>>         ? deallocators->m_nonnull
>>         : deallocators->m_unchecked));
>>  +
>>  + if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
>>  + on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn
>>  (call));
>>       }
>>     else
>>       {
>>  @@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call
>>  (sm_context *sm_ctxt,
>>       }
>>   }
>> 
>>  +/* Handle assignments between two pointers.
>>  + Check for dubious allocation sizes.
>>  +*/
>>  +
>>  +void
>>  +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
>>  + const supernode *node,
>>  + const gassign *assign_stmt,
>>  + tree lhs,
>>  + tree rhs) const
>>  +{
>>  + /* Do not warn if lhs and rhs are of the same type to not emit
>>  duplicate
>>  + warnings on assignments after the cast. */
>>  + if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE
>>  (rhs)))
>>  + return;
>>  +
>>  + const program_state *state = sm_ctxt->get_old_program_state ();
>>  + const svalue *r_value = state->m_region_model->get_rvalue (rhs, 
>> NULL);
>>  + if (const region_svalue *reg = dyn_cast <const region_svalue *>
>>  (r_value))
>>  + {
>>  + const svalue *capacity = state->m_region_model->get_capacity
>>  + (reg->get_pointee ());
>>  + check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs, 
>> capacity);
>>  + }
>>  +}
>>  +
>>  +void
>>  +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
>>  + const supernode *node,
>>  + const gcall *call,
>>  + tree lhs,
>>  + tree fn_decl) const
>>  +{
>>  + /* Do not warn if lhs and rhs are of the same type to not emit
>>  duplicate
>>  + warnings on assignments after the cast. */
>>  + if (pending_diagnostic::same_tree_p
>>  + (TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
>>  + return;
>>  +
>>  + const program_state *state = sm_ctxt->get_new_program_state ();
>>  + const svalue *r_value = state->m_region_model->get_rvalue (lhs, 
>> NULL);
>>  + if (const region_svalue *reg = dyn_cast <const region_svalue *>
>>  (r_value))
>>  + {
>>  + const svalue *capacity = state->m_region_model->get_capacity
>>  + (reg->get_pointee ());
>>  + check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, 
>> capacity);
>>  + }
>>  + else if (const conjured_svalue *con
>>  + = dyn_cast <const conjured_svalue *> (r_value))
>>  + {
>>  + // FIXME: How to get a region_svalue?
>>  + }
>>  +}
>>  +
>>   /* Implementation of state_machine::on_phi vfunc for
>>  malloc_state_machine. */
>> 
>>   void
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>>  b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>>  new file mode 100644
>>  index 00000000000..5403c5f41f1
>>  --- /dev/null
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>>  @@ -0,0 +1,54 @@
>>  +#include <stdlib.h>
>>  +
>>  +/* Tests with constant buffer sizes */
>>  +
>>  +void test_1 (void)
>>  +{
>>  + short *ptr = malloc (21 * sizeof(short));
>>  + free (ptr);
>>  +}
>>  +
>>  +void test_2 (void)
>>  +{
>>  + int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
>>  + free (ptr);
>>  +
>>  + /* { dg-warning "Allocated buffer size is not a multiple of the
>>  pointee's size" "" { target *-*-* } malloc } */
>>  + /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' 
>> leaves
>>  2 trailing bytes" "" { target *-*-* } malloc } */
>>  +}
>>  +
>>  +void test_3 (void)
>>  +{
>>  + void *ptr = malloc (21 * sizeof (short));
>>  + short *sptr = (short *)ptr;
>>  + free (sptr);
>>  +}
>>  +
>>  +void test_4 (void)
>>  +{
>>  + void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
>>  + int *iptr = (int *)ptr; /* { dg-line assign } */
>>  + free (iptr);
>>  +
>>  + /* { dg-warning "Allocated buffer size is not a multiple of the
>>  pointee's size" "" { target *-*-* } assign } */
>>  + /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 
>> trailing
>>  bytes" "" { target *-*-* } assign } */
>>  +}
>>  +
>>  +struct s {
>>  + int i;
>>  +};
>>  +
>>  +void test_5 (void)
>>  +{
>>  + struct s *ptr = malloc (5 * sizeof (struct s));
>>  + free (ptr);
>>  +}
>>  +
>>  +void test_6 (void)
>>  +{
>>  + long *ptr = malloc (5 * sizeof (struct s)); /* { dg-line malloc6 
>> } */
>>  + free (ptr);
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } malloc6 } */
>>  + /* { dg-message "" "" { target *-*-* } malloc6 } */
>>  +}
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>>  b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>>  new file mode 100644
>>  index 00000000000..e66d2793f13
>>  --- /dev/null
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>>  @@ -0,0 +1,44 @@
>>  +#include <stdlib.h>
>>  +#include <stdio.h>
>>  +
>>  +/* Tests with symbolic buffer sizes */
>>  +
>>  +void test_1 (void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + short *ptr = malloc (n * sizeof(short));
>>  + free (ptr);
>>  +}
>>  +
>>  +void test_2 (void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
>>  + free (ptr);
>>  +
>>  + /* { dg-warning "Allocated buffer size is not a multiple of the
>>  pointee's size" "" { target *-*-* } malloc } */
>>  + /* { dg-message "\\(1\\) Allocation is incompatible with 'int 
>> \\*'"
>>  "" { target *-*-* } malloc } */
>>  +}
>>  +
>>  +void test_3 (void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + void *ptr = malloc (n * sizeof (short));
>>  + short *sptr = (short *)ptr;
>>  + free (sptr);
>>  +}
>>  +
>>  +void test_4 (void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
>>  + int *iptr = (int *)ptr; /* { dg-line assign } */
>>  + free (iptr);
>>  +
>>  + /* { dg-warning "Allocated buffer size is not a multiple of the
>>  pointee's size" "" { target *-*-* } assign } */
>>  + /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; 
>> either
>>  the allocated size at \\(1\\)" "" { target *-*-* } assign } */
>>  +}
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>>  b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>>  new file mode 100644
>>  index 00000000000..dafc0e73c63
>>  --- /dev/null
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>>  @@ -0,0 +1,48 @@
>>  +#include <stdlib.h>
>>  +#include <stdio.h>
>>  +
>>  +/* CWE-131 example 5 */
>>  +void test_1(void)
>>  +{
>>  + int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
>>  + if (id_sequence == NULL) exit (1);
>>  +
>>  + id_sequence[0] = 13579;
>>  + id_sequence[1] = 24680;
>>  + id_sequence[2] = 97531;
>>  +
>>  + free (id_sequence);
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } malloc1 } */
>>  + /* { dg-message "" "" { target *-*-* } malloc1 } */
>>  +}
>>  +
>>  +void test_2(void)
>>  +{
>>  + int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
>>  + free (ptr);
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } malloc2 } */
>>  + /* { dg-message "" "" { target *-*-* } malloc2 } */
>>  +}
>>  +
>>  +void test_3(void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
>>  + free (ptr);
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } malloc3 } */
>>  + /* { dg-message "" "" { target *-*-* } malloc3 } */
>>  +}
>>  +
>>  +void test_4(void)
>>  +{
>>  + int n;
>>  + scanf("%i", &n);
>>  + int m;
>>  + scanf("%i", &m);
>>  + int *ptr = malloc ((n + m) * sizeof (int));
>>  + free (ptr);
>>  +}
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
>>  b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
>>  new file mode 100644
>>  index 00000000000..4c2b31d6e0a
>>  --- /dev/null
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
>>  @@ -0,0 +1,39 @@
>>  +#include <stddef.h>
>>  +#include <stdlib.h>
>>  +
>>  +/* Flow warnings */
>>  +
>>  +void *create_buffer(int n)
>>  +{
>>  + return malloc(n);
>>  +}
>>  +
>>  +void test_1(void)
>>  +{
>>  + // FIXME
>>  + int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* 
>> } }
>>  */
>>  + free (buf);
>>  +}
>>  +
>>  +void test_2(void)
>>  +{
>>  + void *buf = create_buffer(42); /* { dg-message } */
>>  + int *ibuf = buf; /* { dg-line assign2 } */
>>  + free (ibuf);
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } assign2 } */
>>  + /* { dg-message "" "" { target *-*-* } assign2 } */
>>  +}
>>  +
>>  +void test_3(void)
>>  +{
>>  + void *buf = malloc(42); /* { dg-message } */
>>  + if (buf != NULL) /* { dg-message } */
>>  + {
>>  + int *ibuf = buf; /* { dg-line assign3 } */
>>  + free (ibuf);
>>  + }
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } assign3 } */
>>  + /* { dg-message "" "" { target *-*-* } assign3 } */
>>  +}
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
>>  b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
>>  index bd28107d0d7..809ee88cf07 100644
>>  --- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
>>  @@ -1,7 +1,9 @@
>>  +/* { dg-additional-options -Wno-analyzer-allocation-size } */
>>   /* Adapted from gcc.dg/Wmismatched-dealloc.c. */
>> 
>>   #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
>> 
>>  +struct FILE {};
>>   typedef struct FILE FILE;
>>   typedef __SIZE_TYPE__ size_t;
>> 
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
>>  b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
>>  index 2d124833296..94f569e390b 100644
>>  --- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
>>  @@ -89,8 +89,11 @@ struct s
>>   static struct s * __attribute__((noinline))
>>   alloc_s (size_t num)
>>   {
>>  - struct s *p = malloc (sizeof(struct s) + num);
>>  + struct s *p = malloc (sizeof(struct s) + num); /* { dg-line 
>> malloc }
>>  */
>>     return p;
>>  +
>>  + /* { dg-warning "" "" { target *-*-* } malloc } */
>>  + /* { dg-message "" "" { target *-*-* } malloc } */
>>   }
>> 
>>   struct s *
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
>>  b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
>>  index 908bb28ee50..0ca94250ba2 100644
>>  --- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
>>  @@ -1,9 +1,9 @@
>>  -/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
>>  +/* { dg-additional-options "-Wno-incompatible-pointer-types
>>  -Wno-analyzer-allocation-size" } */
>> 
>>   #include <stdlib.h>
>> 
>>  -struct foo;
>>  -struct bar;
>>  +struct foo {};
>>  +struct bar {};
>>   void *hv (struct foo **tm)
>>   {
>>     void *p = __builtin_malloc (4);
>>  diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
>>  b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
>>  index 02ca3f084a2..6f365c3cb5d 100644
>>  --- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
>>  +++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
>>  @@ -1,3 +1,5 @@
>>  +/* { dg-additional-options -Wno-analyzer-allocation-size } */
>>  +
>>   void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
>> 
>>   int
>>  --
>>  2.36.1
>> 
>> 
>> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 17:48 ` David Malcolm
@ 2022-06-17 20:23   ` Tim Lange
  2022-06-17 22:13     ` David Malcolm
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-17 20:23 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Mailing List

On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
> > Hi everyone,
> 
> Hi Tim.
> 
> Thanks for the patch.
> 
> Various comments inline below, throughout...
> 
> > 
> > tracked in PR105900 [0], I'd like to add support for a new warning on
> > dubious allocation sizes. The new checker emits a warning when the 
> > allocation size is not a multiple of the type's size. With the checker,
> > following mistakes are detected:
> >   int *arr = malloc(3); // forgot to multiply by sizeof
> >   arr[0] = ...;
> >   arr[1] = ...;
> > or
> >   int *buf = malloc (n + sizeof(int)); // probably should be * instead 
> > of +
> > Because it is implemented inside the analyzer, it also emits warnings
> > when the buffer is first of type void* and later on casted to something
> > else. Though, this also inherits a limitation. The checker can not 
> > distinguish 2 * sizeof(short) from sizeof(int) because sizeof is 
> > resolved and constants are folded at the point when the analyzer runs. 
> > As a mitigation, I plan to implement a check in the frontend that emits
> > a warning if sizeof(lhs pointee type) is not part of the malloc 
> > argument.
> > 
> > I'm looking for a first feedback on the phrasing of the diagnostics as 
> > well on the preliminary patch [1].
> > 
> > On constant buffer sizes, the warnings looks like this:
> > warning: Allocated buffer size is not a multiple of the pointee's size 
> > [CWE-131] [-Wanalyzer-allocation-size]
> >    22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
> >       | ^~~~~~~~~~~~~~~~~~~~~~~~~
> >   ‘test_2’: event 1
> >     |
> >     | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 }
> > */
> >     | | ^~~~~~~~~~~~~~~~~~~~~~~~~
> >     | | |
> >     | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
> > bytes; either the allocated size is bogus or the type on the left-hand 
> > side is wrong
> >     |
> 
> Something strange seems to have happened with the indentation in your
> email; the code in the patch seems to me to be strangely indented, and
> looking at the archive here:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238907.html
> I see the same thing, so I think it's a problem with what the mailing
> list received, rather than just in my mail client.  Maybe something 
> 
> FWIW I normally use "git send-email" to send patches.
> 
> The underlinings in the above look strange; I see this in your email:

I have resent the patch using git send-email as a reply to my original message. 
The new message looks properly formatted in the archive:
    https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html

> 
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>    22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
>       | ^~~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_2’: event 1
>     |
>     | 22 | int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } 
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing 
> bytes; either the allocated size is bogus or the type on the left-hand 
> side is wrong
>     |
> 
> Should it have been (omitting the dg-line directives for clarity):
> 
> warning: Allocated buffer size is not a multiple of the pointee's size  [CWE-131] [-Wanalyzer-allocation-size]
>    22 | int *ptr = malloc (10 + sizeof(int));
>       |            ^~~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_2’: event 1
>     |
>     | 22 | int *ptr = malloc (10 + sizeof(int));
>     |    |            ^~~~~~~~~~~~~~~~~~~~~~~~~
>     |    |            |
>     |    |            (1) Casting a 14 byte buffer to ‘int *’ leaves 2 trailing bytes; either the allocated size is bogus or the type on the left-hand side is wrong
>     |
> 
> ?
> 
> It looks like something somewhere has collapsed repeated whitespace in
> the message down to single spaces, which has broken the ASCII art in
> your examples, and the indentation in your code.
> 
> 
> It would probably be helpful for the message to tell the user what
> sizeof(*ptr) is,  sizeof(int) in this case (much more helpful when it's
> a struct)
> 
> Maybe something alike:
> 
> note: a buffer of 14 bytes is allocated...
> note: ...but sizeof (int) is 4 bytes...
> note: ...leaving 2 trailing bytes for an array of 3 'int's (which would
> occupy 12 bytes)
> 
> or somesuch???
> 
> I'm brainstorming here, my ideas above aren't necessarily good. 
> Sometimes it's good to chop up messages like this, to minimize
> combinatorial explosion for all the different cases.
> 
> 
> 
> On symbolic buffer sizes:
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } */
>       | ^~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_3’: event 1
>     |
>     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 } 
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) Allocation is incompatible with ‘int *’; either the 
> allocated size is bogus or the type on the left-hand side is wrong
>     |
> 
> 
> Is there location information for both the malloc and for the
> assignment, here?

I'm not sure whether I understand your question but the warning is 
emitted at the gcall* with a ssa var lhs and the call_fndecl on the rhs.
I think that is enough to split that up into "(1) n + sizeof(int) 
allocated here" and "(2) Allocation at (1) is incompatible with..."? 

> 
> If so, then maybe two events:
> 
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>    33 | int *ptr = malloc (n + sizeof(int));
>       |            ^~~~~~~~~~~~~~~~~~~~~~~~
>   ‘test_3’: events 1-2 
>     |
>     | 33 | int *ptr = malloc (n + sizeof(int));
>     |    |          ^ ^~~~~~~~~~~~~~~~~~~~~~~~
>     |    |          | |
>     |    |          | (1) buffer allocated here with size 'n + 4'
>     |    |          | 
>     |    |          (2) sizeof(*ptr) is 4
>     |
> 
> or somesuch.
> 
> 
> 
> And this is how a simple flow looks like:
> warning: Allocated buffer size is not a multiple of the pointee's size 
> [CWE-131] [-Wanalyzer-allocation-size]
>    39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>       | ^~~~
>   ‘test_4’: events 1-2
>     |
>     | 38 | void *ptr = malloc (n * sizeof (short)); /* { dg-message }
> */
>     | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>     | | |
>     | | (1) allocated here
>     | 39 | int *iptr = (int *)ptr; /* { dg-line assign } */
>     | | ~~~~
>     | | |
>     | | (2) ‘ptr’ is incompatible with ‘int *’; either the 
> allocated size at (1) is bogus or the type on the left-hand side is 
> wrong
>     |
> 
> 
> 
> I think it would make the diagnostic more readable if the "allocated
> here" event's message expresses how big the buffer is e.g.:
> 
> warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    39 | int *iptr = (int *)ptr;
>       | ^~~~
>   ‘test_4’: events 1-2
>     |
>     | 38 | void *ptr = malloc (n * sizeof (short));*/
>     |    |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>     |    |             |
>     |    |            (1) allocated here with size '(n * 2)'
>     | 39 | int *iptr = (int *)ptr;
>     |    |      ~~~~
>     |    |      |
>     |    |      (2) ‘ptr’ is incompatible with ‘int *’; sizeof(int) is 4
>     |
> 
> 
> There are some things to discuss from my side:
> * The tests with the "toy re-implementation of CPython's object 
> model"[2] fail due to a extra warning emitted. Because the analyzer 
> can't know the calculation actually results in a correct buffer size 
> when viewed as a string_obj later on, it emits a warning, e.g. at line 
> 61 in data-model-5.c. The only mitigation would be to disable the 
> warning for structs entirely. Now, the question is to rather have noise
> on these cases or disable the warning for structs entirely?
> 
> Can you post the full warning please?

/path/to/data-model-5.c: In function ‘alloc_obj’:
/path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   61 |   base_obj *obj = (base_obj *)malloc (sz);
      |                               ^~~~~~~~~~~
  ‘new_string_obj’: events 1-2
    |
    |   69 | base_obj *new_string_obj (const char *str)
    |      |           ^~~~~~~~~~~~~~
    |      |           |
    |      |           (1) entry to ‘new_string_obj’
    |......
    |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof (string_obj) + len + 1);
    |      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |                     |
    |      |                     (2) calling ‘alloc_obj’ from ‘new_string_obj’
    |
    +--> ‘alloc_obj’: events 3-4
           |
           |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
           |      |           ^~~~~~~~~
           |      |           |
           |      |           (3) entry to ‘alloc_obj’
           |   60 | {
           |   61 |   base_obj *obj = (base_obj *)malloc (sz);
           |      |                               ~~~~~~~~~~~
           |      |                               |
           |      |                               (4) Allocation is incompatible with ‘base_obj *’; either the allocated size is bogus or the type on the left-hand side is wrong
           |

> 
> These testcases exhibit a common way of faking inheritance in C, and I
> think it ought to be possible to support this in the warning.
> 
> I thing what's happening is we have
> 
> struct base
> { 
>   /* fields */
> };
> 
> struct sub
> {
>   struct base m_base;
>   /* extra fields.  */
> };
> 
> struct base *construct_base (size_t sz)
> {
>   struct base *p = (struct base *) malloc (sz);
> 
>   /* set up fields of base in p  */
> 
>   return p;
> }
> 
> Or is this on the interprocedural path as called with a specific sizeof
> for struct sub?

At (4), it does not know that base_obj is later used as a "base struct". 
As it is called with sizeof(struct sub), my checker thinks the buffer is
too large for one but too small for another base_obj.

> 
> Maybe we can special-case these by detecting where struct sub's first
> field is struct base, and hence where we expect this pattern?  (and use
> this to suppress the warning for such cases?)

I already excluded all structs with structs inside with 
struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not help 
in the case size for struct sub is allocated but casted as base. Maybe, we
should do a special case for structs where we only warn when the sizeof is
too small to hold the base struct together with supressing warnings when
the first field is a struct? 

> 
> 
> * I'm unable to emit a warning whenever the cast happens at an 
> assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
> This is because I'm unable to access a region_svalue for the returned
> value. Even in the new_program_state, the svalue of the lhs is still a 
> conjured_svalue. Maybe David can lead me to a place where I can access 
> the return value's region_svalue or do I have to adapt the engine?
> 
> Please can you try reposting the patch?  I tried to read it, but am
> having trouble with the mangled indentation.

See my inline answer above. Both, the test case and from where I want 
to access the region_svalue are commented with // FIXME.

> 
> 
> * attr-malloc-6.c and pr96639.c did both contain structs without an 
> implementation. Something in the analyzer must have triggered another
> warning about the usage of those without them having an implementation.
> I changed those structs to have an empty implementation, such that the 
> additional warning are gone. I think this shouldn't change the test 
> case, so is this change okay?
> 
> What were the new warnings?

/path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type ‘struct FILE’
  175 |     FILE *p = malloc (100);   // { dg-message "allocated here" }
      |               ^~~~~~~~~~~~

All were like the one above. error: invalid use of undefined type 'struct XXX'

> 
> Thanks for the patch; sorry if this seems nitpicky; the patch seems
> promising

Thanks for the fast reply. I'll try out all the suggestions regarding
splitting up the allocation and assignment and see how they look.

- Tim

> 
> Dave
> 
> 
> 
> - Tim
> 
> [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105900
> [1] While all tests except the cpython ones work, I have yet to test it
> on large C projects
> [2] FAIL: gcc.dg/analyzer/data-model-5.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5b.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5c.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/data-model-5d.c (test for excess errors)
>     FAIL: gcc.dg/analyzer/first-field-2.c (test for excess errors)
> 
> -------
> 
> Subject: [PATCH] analyzer: add allocation size warning
> 
> This patch adds an allocation size checker to the analyzer.
> The checker warns when the tracked buffer size is not a multiple of the
> left-hand side pointee's type. This resolves PR analyzer/105900.
> 
> The patch is not yet fully tested.
> 
> gcc/analyzer/ChangeLog:
> 
>         * analyzer.opt: Add Wanalyzer-allocation-size.
>         * sm-malloc.cc (class dubious_allocation_size): New 
> pending_diagnostic subclass.
>         (capacity_compatible_with_type): New.
>         (const_operand_in_sval_p): New.
>         (struct_or_union_with_inheritance_p): New.
>         (check_capacity): New.
>         (malloc_state_machine::on_stmt): Add calls to 
> on_pointer_assignment.
>         (malloc_state_machine::on_allocator_call): Add node to 
> parameters and call to on_pointer_assignment.
>         (malloc_state_machine::on_pointer_assignment): New.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.dg/analyzer/attr-malloc-6.c: Disabled 
> Wanalyzer-allocation-size and added default implementation for FILE.
>         * gcc.dg/analyzer/capacity-1.c: Added dg directives.
>         * gcc.dg/analyzer/malloc-4.c: Disabled 
> Wanalyzer-allocation-size.
>         * gcc.dg/analyzer/pr96639.c: Disabled Wanalyzer-allocation-size
> and added default implementation for foo and bar.
>         * gcc.dg/analyzer/allocation-size-1.c: New test.
>         * gcc.dg/analyzer/allocation-size-2.c: New test.
>         * gcc.dg/analyzer/allocation-size-3.c: New test.
>         * gcc.dg/analyzer/allocation-size-4.c: New test.
> 
> Signed-off-by: Tim Lange <mail@tim-lange.me>
> ---
>  gcc/analyzer/analyzer.opt | 4 +
>  gcc/analyzer/sm-malloc.cc | 363 +++++++++++++++++-
>  .../gcc.dg/analyzer/allocation-size-1.c | 54 +++
>  .../gcc.dg/analyzer/allocation-size-2.c | 44 +++
>  .../gcc.dg/analyzer/allocation-size-3.c | 48 +++
>  .../gcc.dg/analyzer/allocation-size-4.c | 39 ++
>  gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c | 2 +
>  gcc/testsuite/gcc.dg/analyzer/capacity-1.c | 5 +-
>  gcc/testsuite/gcc.dg/analyzer/malloc-4.c | 6 +-
>  gcc/testsuite/gcc.dg/analyzer/pr96639.c | 2 +
>  10 files changed, 559 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> 
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 4aea52d3a87..f213989e0bb 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
>  Common Var(warn_analyzer_malloc_leak) Init(1) Warning
>  Warn about code paths in which a heap-allocated pointer leaks.
> 
> +Wanalyzer-allocation-size
> +Common Var(warn_analyzer_allocation_size) Init(1) Warning
> +Warn about code paths in which a buffer is assigned to a incompatible 
> type.
> +
>  Wanalyzer-mismatching-deallocation
>  Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
>  Warn about code paths in which the wrong deallocation function is 
> called.
> diff --git a/gcc/analyzer/sm-malloc.cc b/gcc/analyzer/sm-malloc.cc
> index 3bd40425919..790c9f0e57d 100644
> --- a/gcc/analyzer/sm-malloc.cc
> +++ b/gcc/analyzer/sm-malloc.cc
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3. If not see
>  #include "attribs.h"
>  #include "analyzer/function-set.h"
>  #include "analyzer/program-state.h"
> +#include "print-tree.h"
> +#include "gimple-pretty-print.h"
> 
>  #if ENABLE_ANALYZER
> 
> @@ -428,6 +430,7 @@ private:
>    get_or_create_deallocator (tree deallocator_fndecl);
> 
>    void on_allocator_call (sm_context *sm_ctxt,
> + const supernode *node,
>       const gcall *call,
>       const deallocator_set *deallocators,
>       bool returns_nonnull = false) const;
> @@ -444,6 +447,16 @@ private:
>    void on_realloc_call (sm_context *sm_ctxt,
>     const supernode *node,
>     const gcall *call) const;
> + void on_pointer_assignment(sm_context *sm_ctxt,
> + const supernode *node,
> + const gassign *assign_stmt,
> + tree lhs,
> + tree rhs) const;
> + void on_pointer_assignment(sm_context *sm_ctxt,
> + const supernode *node,
> + const gcall *call,
> + tree lhs,
> + tree rhs) const;
>    void on_zero_assignment (sm_context *sm_ctxt,
>        const gimple *stmt,
>        tree lhs) const;
> @@ -1432,6 +1445,117 @@ private:
>    const char *m_funcname;
>  };
> 
> +/* Concrete subclass for casts of pointers that lead to trailing 
> bytes. */
> +
> +class dubious_allocation_size : public malloc_diagnostic
> +{
> +public:
> + dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
> tree rhs,
> + tree size_tree, unsigned HOST_WIDE_INT size_diff)
> + : malloc_diagnostic(sm, rhs), 
> m_type(dubious_allocation_type::CONSTANT_SIZE),
> + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(size_diff)
> + {}
> +
> + dubious_allocation_size (const malloc_state_machine &sm, tree lhs, 
> tree rhs,
> + tree size_tree)
> + : malloc_diagnostic(sm, rhs), 
> m_type(dubious_allocation_type::MISSING_OPERAND),
> + m_lhs(lhs), m_size_tree(size_tree), m_size_diff(0)
> + {}
> +
> + const char *get_kind () const final override
> + {
> + return "dubious_allocation_size";
> + }
> +
> + int get_controlling_option () const final override
> + {
> + return OPT_Wanalyzer_allocation_size;
> + }
> +
> + bool subclass_equal_p (const pending_diagnostic &base_other) const
> + final override
> + {
> + const dubious_allocation_size &other = (const dubious_allocation_size
> &)base_other;
> + return malloc_diagnostic::subclass_equal_p(other)
> + && m_type == other.m_type
> + && same_tree_p (m_lhs, other.m_lhs)
> + && same_tree_p (m_size_tree, other.m_size_tree)
> + && m_size_diff == other.m_size_diff;
> + }
> +
> + bool emit (rich_location *rich_loc) final override
> + {
> + diagnostic_metadata m;
> + m.add_cwe (131);
> + return warning_meta (rich_loc, m, get_controlling_option (),
> + "Allocated buffer size is not a multiple of the pointee's size");
> + }
> +
> + label_text describe_state_change (const evdesc::state_change &change)
> + override
> + {
> + if (change.m_old_state == m_sm.get_start_state ()
> + && unchecked_p (change.m_new_state))
> + {
> + m_alloc_event = change.m_event_id;
> + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
> + {
> + // TODO: verify that it's the allocation stmt, not a copy
> + return change.formatted_print ("%E bytes allocated here",
> + m_size_tree);
> + }
> + }
> + return malloc_diagnostic::describe_state_change (change);
> + }
> +
> + label_text describe_final_event (const evdesc::final_event &ev) final
> override
> + {
> + if (m_type == dubious_allocation_type::CONSTANT_SIZE)
> + {
> + if (m_alloc_event.known_p ())
> + return ev.formatted_print (
> + "Casting %qE to %qT leaves %wu trailing bytes; either the"
> + " allocated size is bogus or the type on the left-hand side is"
> + " wrong",
> + m_arg, TREE_TYPE (m_lhs), m_size_diff);
> + else
> + return ev.formatted_print (
> + "Casting a %E byte buffer to %qT leaves %wu trailing bytes; either"
> + " the allocated size is bogus or the type on the left-hand side is"
> + " wrong",
> + m_size_tree, TREE_TYPE (m_lhs), m_size_diff);
> + }
> + else if (m_type == dubious_allocation_type::MISSING_OPERAND)
> + {
> + if (m_alloc_event.known_p ())
> + return ev.formatted_print (
> + "%qE is incompatible with %qT; either the allocated size at %@ is"
> + " bogus or the type on the left-hand side is wrong",
> + m_arg, TREE_TYPE (m_lhs), &m_alloc_event);
> + else
> + return ev.formatted_print (
> + "Allocation is incompatible with %qT; either the allocated size is"
> + " bogus or the type on the left-hand side is wrong",
> + TREE_TYPE (m_lhs));
> + }
> +
> + gcc_unreachable ();
> + return label_text ();
> + }
> +
> +private:
> + enum dubious_allocation_type {
> + CONSTANT_SIZE,
> + MISSING_OPERAND
> + };
> +
> + dubious_allocation_type m_type;
> + diagnostic_event_id_t m_alloc_event;
> + tree m_lhs;
> + tree m_size_tree;
> + unsigned HOST_WIDE_INT m_size_diff;
> +};
> +
>  /* struct allocation_state : public state_machine::state. */
> 
>  /* Implementation of state_machine::state::dump_to_pp vfunc
> @@ -1633,6 +1757,160 @@ known_allocator_p (const_tree fndecl, const 
> gcall *call)
>    return false;
>  }
> 
> +/* Returns the trailing bytes on dubious allocation sizes. */
> +
> +static unsigned HOST_WIDE_INT
> +capacity_compatible_with_type (tree cst, tree pointee_size_tree)
> +{
> + unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW 
> (pointee_size_tree);
> + if (pointee_size == 0)
> + return 0;
> + unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
> +
> + return alloc_size % pointee_size;
> +}
> +
> +/* Returns true if there is a constant tree with
> + the same constant value inside the sval. */
> +
> +static bool
> +const_operand_in_sval_p (const svalue *sval, tree size_cst)
> +{
> + auto_vec<const svalue *> non_mult_expr;
> + auto_vec<const svalue *> worklist;
> + worklist.safe_push(sval);
> + while (!worklist.is_empty())
> + {
> + const svalue *curr = worklist.pop ();
> + curr = curr->unwrap_any_unmergeable ();
> +
> + switch (curr->get_kind())
> + {
> + default:
> + break;
> + case svalue_kind::SK_CONSTANT:
> + {
> + const constant_svalue *cst_sval = curr->dyn_cast_constant_svalue ();
> + unsigned HOST_WIDE_INT sval_int
> + = TREE_INT_CST_LOW (cst_sval->get_constant ());
> + unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (size_cst);
> + if (sval_int % size_cst_int == 0)
> + return true;
> + }
> + break;
> + case svalue_kind::SK_BINOP:
> + {
> + const binop_svalue *b_sval = curr->dyn_cast_binop_svalue ();
> + if (b_sval->get_op () == MULT_EXPR)
> + {
> + worklist.safe_push (b_sval->get_arg0 ());
> + worklist.safe_push (b_sval->get_arg1 ());
> + }
> + else
> + {
> + non_mult_expr.safe_push (b_sval->get_arg0 ());
> + non_mult_expr.safe_push (b_sval->get_arg1 ());
> + }
> + }
> + break;
> + case svalue_kind::SK_UNARYOP:
> + {
> + const unaryop_svalue *un_sval = curr->dyn_cast_unaryop_svalue ();
> + worklist.safe_push (un_sval->get_arg ());
> + }
> + break;
> + case svalue_kind::SK_UNKNOWN:
> + return true;
> + }
> + }
> +
> + /* Each expr should be a multiple of the size.
> + E.g. used to catch n + sizeof(int) errors. */
> + bool reduce = !non_mult_expr.is_empty ();
> + while (!non_mult_expr.is_empty() && reduce)
> + {
> + const svalue *expr_sval = non_mult_expr.pop ();
> + reduce &= const_operand_in_sval_p (expr_sval, size_cst);
> + }
> + return reduce;
> +}
> +
> +/* Returns true iff the type is a struct with another struct inside.
> */
> +
> +static bool
> +struct_or_union_with_inheritance_p (tree type)
> +{
> + if (!RECORD_OR_UNION_TYPE_P (type))
> + return false;
> +
> + for (tree f = TYPE_FIELDS (type); f; f = TREE_CHAIN (f))
> + if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (f)))
> + return true;
> +
> + return false;
> +}
> +
> +static void
> +check_capacity (sm_context *sm_ctxt,
> + const malloc_state_machine &sm,
> + const supernode *node,
> + const gimple *stmt,
> + tree lhs,
> + tree rhs,
> + const svalue *capacity)
> +{
> + tree pointer_type = TREE_TYPE (lhs);
> + gcc_assert (TREE_CODE (pointer_type) == POINTER_TYPE);
> +
> + tree pointee_type = TREE_TYPE (pointer_type);
> + /* void * is always compatible. */
> + if (TREE_CODE (pointee_type) == VOID_TYPE)
> + return;
> +
> + if (struct_or_union_with_inheritance_p (pointee_type))
> + return;
> +
> + tree pointee_size_tree = size_in_bytes(pointee_type);
> + /* The size might be unknown e.g. being a array with n elements
> + or casting to char * never has any trailing bytes. */
> + if (TREE_CODE (pointee_size_tree) != INTEGER_CST
> + || TREE_INT_CST_LOW (pointee_size_tree) == 1)
> + return;
> +
> + switch (capacity->get_kind ())
> + {
> + default:
> + break;
> + case svalue_kind::SK_CONSTANT:
> + {
> + const constant_svalue *cst_sval = capacity->dyn_cast_constant_svalue 
> ();
> + tree cst = cst_sval->get_constant ();
> + unsigned HOST_WIDE_INT size_diff
> + = capacity_compatible_with_type (cst, pointee_size_tree);
> + if (size_diff != 0)
> + {
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
> + sm_ctxt->warn (node, stmt, diag_arg,
> + new dubious_allocation_size (sm, lhs, diag_arg,
> + cst, size_diff));
> + }
> + }
> + break;
> + case svalue_kind::SK_BINOP:
> + case svalue_kind::SK_UNARYOP:
> + {
> + if (!const_operand_in_sval_p (capacity, pointee_size_tree))
> + {
> + tree diag_arg = sm_ctxt->get_diagnostic_tree (rhs);
> + sm_ctxt->warn (node, stmt, diag_arg,
> + new dubious_allocation_size (sm, lhs, diag_arg,
> + pointee_size_tree));
> + }
> + }
> + break;
> + }
> +}
> +
>  /* Implementation of state_machine::on_stmt vfunc for 
> malloc_state_machine. */
> 
>  bool
> @@ -1645,14 +1923,14 @@ malloc_state_machine::on_stmt (sm_context 
> *sm_ctxt,
>        {
>   if (known_allocator_p (callee_fndecl, call))
>     {
> - on_allocator_call (sm_ctxt, call, &m_free);
> + on_allocator_call (sm_ctxt, node, call, &m_free);
>       return true;
>     }
> 
>   if (is_named_call_p (callee_fndecl, "operator new", call, 1))
> - on_allocator_call (sm_ctxt, call, &m_scalar_delete);
> + on_allocator_call (sm_ctxt, node, call, &m_scalar_delete);
>   else if (is_named_call_p (callee_fndecl, "operator new []", call, 1))
> - on_allocator_call (sm_ctxt, call, &m_vector_delete);
> + on_allocator_call (sm_ctxt, node, call, &m_vector_delete);
>   else if (is_named_call_p (callee_fndecl, "operator delete", call, 1)
>     || is_named_call_p (callee_fndecl, "operator delete", call, 2))
>     {
> @@ -1707,7 +1985,7 @@ malloc_state_machine::on_stmt (sm_context 
> *sm_ctxt,
>       tree attrs = TYPE_ATTRIBUTES (TREE_TYPE (callee_fndecl));
>       bool returns_nonnull
>         = lookup_attribute ("returns_nonnull", attrs);
> - on_allocator_call (sm_ctxt, call, deallocators, returns_nonnull);
> + on_allocator_call (sm_ctxt, node, call, deallocators, 
> returns_nonnull);
>     }
> 
>   /* Handle "__attribute__((nonnull))". */
> @@ -1763,12 +2041,31 @@ malloc_state_machine::on_stmt (sm_context 
> *sm_ctxt,
>         = mutable_this->get_or_create_deallocator (callee_fndecl);
>       on_deallocator_call (sm_ctxt, node, call, d, dealloc_argno);
>     }
> +
> + /* Handle returns from function calls. */
> + tree lhs = gimple_call_lhs (call);
> + if (lhs && TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE
> + && TREE_CODE (gimple_call_return_type (call)) == POINTER_TYPE)
> + on_pointer_assignment (sm_ctxt, node, call, lhs,
> + gimple_call_fn (call));
>        }
> 
>    if (tree lhs = sm_ctxt->is_zero_assignment (stmt))
>      if (any_pointer_p (lhs))
>        on_zero_assignment (sm_ctxt, stmt,lhs);
> 
> + /* Handle pointer assignments/casts for dubious allocation size. */
> + if (const gassign *assign_stmt = dyn_cast <const gassign *> (stmt))
> + {
> + if (gimple_num_ops (stmt) == 2)
> + {
> + tree lhs = gimple_assign_lhs (assign_stmt);
> + tree rhs = gimple_assign_rhs1 (assign_stmt);
> + if (any_pointer_p (lhs) && any_pointer_p (rhs))
> + on_pointer_assignment (sm_ctxt, node, assign_stmt, lhs, rhs);
> + }
> + }
> +
>    /* Handle dereferences. */
>    for (unsigned i = 0; i < gimple_num_ops (stmt); i++)
>      {
> @@ -1818,6 +2115,7 @@ malloc_state_machine::on_stmt (sm_context 
> *sm_ctxt,
> 
>  void
>  malloc_state_machine::on_allocator_call (sm_context *sm_ctxt,
> + const supernode *node,
>        const gcall *call,
>        const deallocator_set *deallocators,
>        bool returns_nonnull) const
> @@ -1830,6 +2128,9 @@ malloc_state_machine::on_allocator_call 
> (sm_context *sm_ctxt,
>       (returns_nonnull
>        ? deallocators->m_nonnull
>        : deallocators->m_unchecked));
> +
> + if (TREE_CODE (TREE_TYPE (lhs)) == POINTER_TYPE)
> + on_pointer_assignment (sm_ctxt, node, call, lhs, gimple_call_fn 
> (call));
>      }
>    else
>      {
> @@ -1968,6 +2269,60 @@ malloc_state_machine::on_realloc_call 
> (sm_context *sm_ctxt,
>      }
>  }
> 
> +/* Handle assignments between two pointers.
> + Check for dubious allocation sizes.
> +*/
> +
> +void
> +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
> + const supernode *node,
> + const gassign *assign_stmt,
> + tree lhs,
> + tree rhs) const
> +{
> + /* Do not warn if lhs and rhs are of the same type to not emit 
> duplicate
> + warnings on assignments after the cast. */
> + if (pending_diagnostic::same_tree_p (TREE_TYPE (lhs), TREE_TYPE 
> (rhs)))
> + return;
> +
> + const program_state *state = sm_ctxt->get_old_program_state ();
> + const svalue *r_value = state->m_region_model->get_rvalue (rhs,
> NULL);
> + if (const region_svalue *reg = dyn_cast <const region_svalue *> 
> (r_value))
> + {
> + const svalue *capacity = state->m_region_model->get_capacity
> + (reg->get_pointee ());
> + check_capacity(sm_ctxt, *this, node, assign_stmt, lhs, rhs,
> capacity);
> + }
> +}
> +
> +void
> +malloc_state_machine::on_pointer_assignment (sm_context *sm_ctxt,
> + const supernode *node,
> + const gcall *call,
> + tree lhs,
> + tree fn_decl) const
> +{
> + /* Do not warn if lhs and rhs are of the same type to not emit 
> duplicate
> + warnings on assignments after the cast. */
> + if (pending_diagnostic::same_tree_p
> + (TREE_TYPE (lhs), TREE_TYPE (gimple_call_return_type (call))))
> + return;
> +
> + const program_state *state = sm_ctxt->get_new_program_state ();
> + const svalue *r_value = state->m_region_model->get_rvalue (lhs,
> NULL);
> + if (const region_svalue *reg = dyn_cast <const region_svalue *> 
> (r_value))
> + {
> + const svalue *capacity = state->m_region_model->get_capacity
> + (reg->get_pointee ());
> + check_capacity (sm_ctxt, *this, node, call, lhs, fn_decl, capacity);
> + }
> + else if (const conjured_svalue *con
> + = dyn_cast <const conjured_svalue *> (r_value))
> + {
> + // FIXME: How to get a region_svalue?
> + }
> +}
> +
>  /* Implementation of state_machine::on_phi vfunc for 
> malloc_state_machine. */
> 
>  void
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c 
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> new file mode 100644
> index 00000000000..5403c5f41f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> @@ -0,0 +1,54 @@
> +#include <stdlib.h>
> +
> +/* Tests with constant buffer sizes */
> +
> +void test_1 (void)
> +{
> + short *ptr = malloc (21 * sizeof(short));
> + free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> + int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc } */
> + free (ptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the 
> pointee's size" "" { target *-*-* } malloc } */
> + /* { dg-message "\\(1\\) Casting a 42 byte buffer to 'int \\*' leaves
> 2 trailing bytes" "" { target *-*-* } malloc } */
> +}
> +
> +void test_3 (void)
> +{
> + void *ptr = malloc (21 * sizeof (short));
> + short *sptr = (short *)ptr;
> + free (sptr);
> +}
> +
> +void test_4 (void)
> +{
> + void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
> + int *iptr = (int *)ptr; /* { dg-line assign } */
> + free (iptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the 
> pointee's size" "" { target *-*-* } assign } */
> + /* { dg-message "\\(2\\) Casting 'ptr' to 'int \\*' leaves 2 trailing
> bytes" "" { target *-*-* } assign } */
> +}
> +
> +struct s {
> + int i;
> +};
> +
> +void test_5 (void)
> +{
> + struct s *ptr = malloc (5 * sizeof (struct s));
> + free (ptr);
> +}
> +
> +void test_6 (void)
> +{
> + long *ptr = malloc (5 * sizeof (struct s)); /* { dg-line malloc6 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc6 } */
> + /* { dg-message "" "" { target *-*-* } malloc6 } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c 
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> new file mode 100644
> index 00000000000..e66d2793f13
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> @@ -0,0 +1,44 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests with symbolic buffer sizes */
> +
> +void test_1 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + short *ptr = malloc (n * sizeof(short));
> + free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
> + free (ptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the 
> pointee's size" "" { target *-*-* } malloc } */
> + /* { dg-message "\\(1\\) Allocation is incompatible with 'int \\*'"
> "" { target *-*-* } malloc } */
> +}
> +
> +void test_3 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + void *ptr = malloc (n * sizeof (short));
> + short *sptr = (short *)ptr;
> + free (sptr);
> +}
> +
> +void test_4 (void)
> +{
> + int n;
> + scanf("%i", &n);
> + void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
> + int *iptr = (int *)ptr; /* { dg-line assign } */
> + free (iptr);
> +
> + /* { dg-warning "Allocated buffer size is not a multiple of the 
> pointee's size" "" { target *-*-* } assign } */
> + /* { dg-message "\\(2\\) 'ptr' is incompatible with 'int \\*'; either
> the allocated size at \\(1\\)" "" { target *-*-* } assign } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c 
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
> new file mode 100644
> index 00000000000..dafc0e73c63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
> @@ -0,0 +1,48 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* CWE-131 example 5 */
> +void test_1(void)
> +{
> + int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
> + if (id_sequence == NULL) exit (1);
> +
> + id_sequence[0] = 13579;
> + id_sequence[1] = 24680;
> + id_sequence[2] = 97531;
> +
> + free (id_sequence);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc1 } */
> + /* { dg-message "" "" { target *-*-* } malloc1 } */
> +}
> +
> +void test_2(void)
> +{
> + int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc2 } */
> + /* { dg-message "" "" { target *-*-* } malloc2 } */
> +}
> +
> +void test_3(void)
> +{
> + int n;
> + scanf("%i", &n);
> + int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
> + free (ptr);
> +
> + /* { dg-warning "" "" { target *-*-* } malloc3 } */
> + /* { dg-message "" "" { target *-*-* } malloc3 } */
> +}
> +
> +void test_4(void)
> +{
> + int n;
> + scanf("%i", &n);
> + int m;
> + scanf("%i", &m);
> + int *ptr = malloc ((n + m) * sizeof (int));
> + free (ptr);
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c 
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> new file mode 100644
> index 00000000000..4c2b31d6e0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
> @@ -0,0 +1,39 @@
> +#include <stddef.h>
> +#include <stdlib.h>
> +
> +/* Flow warnings */
> +
> +void *create_buffer(int n)
> +{
> + return malloc(n);
> +}
> +
> +void test_1(void)
> +{
> + // FIXME
> + int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } }
> */
> + free (buf);
> +}
> +
> +void test_2(void)
> +{
> + void *buf = create_buffer(42); /* { dg-message } */
> + int *ibuf = buf; /* { dg-line assign2 } */
> + free (ibuf);
> +
> + /* { dg-warning "" "" { target *-*-* } assign2 } */
> + /* { dg-message "" "" { target *-*-* } assign2 } */
> +}
> +
> +void test_3(void)
> +{
> + void *buf = malloc(42); /* { dg-message } */
> + if (buf != NULL) /* { dg-message } */
> + {
> + int *ibuf = buf; /* { dg-line assign3 } */
> + free (ibuf);
> + }
> +
> + /* { dg-warning "" "" { target *-*-* } assign3 } */
> + /* { dg-message "" "" { target *-*-* } assign3 } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c 
> b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> index bd28107d0d7..809ee88cf07 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
> @@ -1,7 +1,9 @@
> +/* { dg-additional-options -Wno-analyzer-allocation-size } */
>  /* Adapted from gcc.dg/Wmismatched-dealloc.c. */
> 
>  #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
> 
> +struct FILE {};
>  typedef struct FILE FILE;
>  typedef __SIZE_TYPE__ size_t;
> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c 
> b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> index 2d124833296..94f569e390b 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/capacity-1.c
> @@ -89,8 +89,11 @@ struct s
>  static struct s * __attribute__((noinline))
>  alloc_s (size_t num)
>  {
> - struct s *p = malloc (sizeof(struct s) + num);
> + struct s *p = malloc (sizeof(struct s) + num); /* { dg-line malloc } 
> */
>    return p;
> +
> + /* { dg-warning "" "" { target *-*-* } malloc } */
> + /* { dg-message "" "" { target *-*-* } malloc } */
>  }
> 
>  struct s *
> diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c 
> b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> index 908bb28ee50..0ca94250ba2 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
> @@ -1,9 +1,9 @@
> -/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
> +/* { dg-additional-options "-Wno-incompatible-pointer-types 
> -Wno-analyzer-allocation-size" } */
> 
>  #include <stdlib.h>
> 
> -struct foo;
> -struct bar;
> +struct foo {};
> +struct bar {};
>  void *hv (struct foo **tm)
>  {
>    void *p = __builtin_malloc (4);
> diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c 
> b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> index 02ca3f084a2..6f365c3cb5d 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
> @@ -1,3 +1,5 @@
> +/* { dg-additional-options -Wno-analyzer-allocation-size } */
> +
>  void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
> 
>  int
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 19:23   ` Tim Lange
@ 2022-06-17 21:39     ` David Malcolm
  0 siblings, 0 replies; 17+ messages in thread
From: David Malcolm @ 2022-06-17 21:39 UTC (permalink / raw)
  To: Tim Lange, Prathamesh Kulkarni; +Cc: GCC Mailing List

On Fri, 2022-06-17 at 21:23 +0200, Tim Lange wrote:
> 
> 
> On Fr, Jun 17 2022 at 22:45:42 +0530, Prathamesh Kulkarni 
> <prathamesh.kulkarni@linaro.org> wrote:
> > On Fri, 17 Jun 2022 at 21:25, Tim Lange <mail@tim-lange.me> wrote:
> > > 
> > >  Hi everyone,
> > Hi Tim,
> > Thanks for posting the POC patch!
> > Just a couple of comments (inline)
> Hi Prathamesh,
> thanks for looking at it.
> > > 
> > >  tracked in PR105900 [0], I'd like to add support for a new warning
> > > on
> > >  dubious allocation sizes. The new checker emits a warning when the
> > >  allocation size is not a multiple of the type's size. With the 
> > > checker,
> > >  following mistakes are detected:
> > >    int *arr = malloc(3); // forgot to multiply by sizeof
> > >    arr[0] = ...;
> > >    arr[1] = ...;
> > >  or
> > >    int *buf = malloc (n + sizeof(int)); // probably should be * 
> > > instead
> > >  of +
> > >  Because it is implemented inside the analyzer, it also emits 
> > > warnings
> > >  when the buffer is first of type void* and later on casted to 
> > > something
> > >  else. Though, this also inherits a limitation. The checker can not
> > >  distinguish 2 * sizeof(short) from sizeof(int) because sizeof is
> > >  resolved and constants are folded at the point when the analyzer
> > > runs.
> > >  As a mitigation, I plan to implement a check in the frontend that 
> > > emits
> > >  a warning if sizeof(lhs pointee type) is not part of the malloc
> > >  argument.
> > IMHO, warning if sizeof(lhs pointee_type) is not present inside
> > malloc, might not be a good idea because it
> > would reject valid calls to malloc.
> > For eg:
> > (1)
> > size_t size = sizeof(int);
> > int *p = malloc (size);
> > 
> > (2)
> > void *p = malloc (sizeof(int));
> > int *q = p;
> Hm, that's right. Maybe only warn when there is a sizeof(type) in the
> argument and the lhs pointee_type != type (except for void*, maybe 
> char* and "inherited" structs)?

That sounds plausible.


[...snip...]

> > > 
> > Won't the warning be incorrect if 'n' is a multiple of sizeof(int) ?
> > I assume by symbolic buffer size, 'n' is not known at compile time.
> * VLAs are resolved to n * `sizeof(type) when the analyzer runs and
> work 
> fine.

Great - and please make sure the test suite has test coverage for
anything we're talking about!


IIRC, VLAs work using __builtin_alloca, rather than malloc, and the new
diagnostic is currently implemented as an extension of the sm-malloc.cc
code, so I don't think it could fire anyway for a __builtin_alloca. 
Does it fire for:

  int *ptr = alloca (sizeof (short)); // BUG: sizeof(short) != sizeof(int), probably


There are two places in the analyzer that are tracking memory
allocations:
(a) the sm-malloc.cc code, and
(b) in the region_model's m_dynamic_extents hash_map, which tracks a
symbolic value for the size of each reachable dynamically-allocated
region (and inhibits merging of exploded_nodes that have different
dynamic extents).

See
  region_model::set_dynamic_extents
and
  region_model::get_dynamic_extents

These are called by:
  region_model::create_region_for_alloca
and
  region_model::create_region_for_heap_alloc
and:
  region_model::impl_call_realloc

My gut feeling is that the diagnostic would work better implemented in
terms of (b), rather than (a).  If you test in region_model::set_value,
rather than in a state machine, the test could use the dynamic-extent
tracking of region_model (and thus work e.g. with alloca and realloc,
see the example above), and could also be extended to catch assignments
to pointers from statically-allocated regions of the wrong size e.g.:

  char buf[2];
  int *ptr = (int *)buf; // BUG: sizeof(buf) is only 2 bytes

  short s;
  int *ptr = (int *)&s; // BUG: sizeof(short) != sizeof(int), probably


diagnostic_manager::add_events_for_eedge already has some code that
will add a region_creation_event to the checker_path for the case where
a region is determined to be the one of interest to the diagnostic,

See e.g.
poisoned_value_diagnostic::mark_interesting_stuff, which indicates the
region of interest, which is how we get event (1) in the following for
an alloca:

../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c: In function ‘test_1’:
../../src/gcc/testsuite/gcc.dg/analyzer/uninit-alloca.c:6:10: warning: use of uninitialized 
  value ‘*p’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
    6 |   return *p; /* { dg-warning "use of uninitialized value '\\*p'" } */
      |          ^~
  ‘test_1’: events 1-2
    |
    |    5 |   int *p = __builtin_alloca (sizeof (int));
    |      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |            |
    |      |            (1) region created on stack here
    |    6 |   return *p;
    |      |          ~~ 
    |      |          |
    |      |          (2) use of uninitialized value ‘*p’ here
    |

and event (1) for a statically-sized variable here:

../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c: In function ‘test_1’:
../../src/gcc/testsuite/gcc.dg/analyzer/uninit-1.c:7:10: warning: use of uninitialized 
  value ‘i’ [CWE-457] [-Wanalyzer-use-of-uninitialized-value]
    7 |   return i;
      |          ^
  ‘test_1’: events 1-2
    |
    |    6 |   int i;
    |      |       ^
    |      |       |
    |      |       (1) region created on stack here
    |    7 |   return i;
    |      |          ~
    |      |          |
    |      |          (2) use of uninitialized value ‘i’ here
    |


So I think you could implement this warning inside region-model.cc
instead of sm-malloc.cc, and by implementing the mark_interesting_stuff
vfunc to set the region of interest to the buffer with the wrong size,
you'd get the allocation event "for free", and could handle assignment
to pointers from the address of statically-sized objects with
incompatible size.

That said, I haven't tried implementing this, and there may be some
snag I've forgotten :/


* Flows with if (cond) n = ...; else n = ...; are tracked by the 
analyzer with a widening_svalue and can be handled (While thinking 
about this answer, I noticed my patch is missing this case. Thanks!)

Great!  Please add test coverage :)


* In case of more complicated flows, the analyzer's buffer size 
tracking resorts to unknown_svalue. If any variable in an expression is
unknown, no warning will be emitted.

That's OK, I think.

* Generally, when requesting memory for a variable type, accepting an
arbitrary number doesn't sound right. I do warn, e.g. if 'n' is a 
conjured_svalue (e.g. a from scanf call).

scanf should probably mark its arguments as tainted, and thus 
-Wanalyzer-tainted-allocation-size should complain if you do a malloc
based on scanf-supplied data without sanitization (otherwise it's a
denial-of-service attack waiting to happen, I think).

I've filed this as:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021



I think only the last case could in theory be a false-positive. I've 
noticed that this is the case when 'n' is guarded by an if making sure 
n is only a multiple of sizeof(type). In theory, I can fix this case 
too as the analysis is path-sensitive.

Sounds like this should be in the test suite also :)

Do you know of some other case where 'n' might be an unknown value 
neither guarded an if condition nor resorted to 'unknown' by a 
complicated flow but still correct?

- Tim
> 
> Thanks,
> Prathamesh
> 

Hope this makes sense and is constructive

Thanks
Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 20:23   ` Tim Lange
@ 2022-06-17 22:13     ` David Malcolm
  2022-06-21 20:00       ` Tim Lange
  0 siblings, 1 reply; 17+ messages in thread
From: David Malcolm @ 2022-06-17 22:13 UTC (permalink / raw)
  To: Tim Lange; +Cc: GCC Mailing List

On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:

[...snip...]

> > 
> 
> I have resent the patch using git send-email as a reply to my original
> message. 
> The new message looks properly formatted in the archive:
>     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html

Thanks; that's *much* more readable.


[...snip...]

> > 
> > 
> > 
> > On symbolic buffer sizes:
> > warning: Allocated buffer size is not a multiple of the pointee's
> > size 
> > [CWE-131] [-Wanalyzer-allocation-size]
> >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > */
> >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> >   ‘test_3’: event 1
> >     |
> >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > } 
> > */
> >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> >     | | |
> >     | | (1) Allocation is incompatible with ‘int *’; either the 
> > allocated size is bogus or the type on the left-hand side is wrong
> >     |
> > 
> > 
> > Is there location information for both the malloc and for the
> > assignment, here?
> 
> I'm not sure whether I understand your question but the warning is 
> emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> rhs.
> I think that is enough to split that up into "(1) n + sizeof(int) 
> allocated here" and "(2) Allocation at (1) is incompatible with..."? 

Probably, yes.

FWIW I wrote some more notes about the events in my reply to to your
reply to Prathamesh, here:
  https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html

[...snip...]

> > 
> > There are some things to discuss from my side:
> > * The tests with the "toy re-implementation of CPython's object 
> > model"[2] fail due to a extra warning emitted. Because the analyzer
> > can't know the calculation actually results in a correct buffer size 
> > when viewed as a string_obj later on, it emits a warning, e.g. at
> > line 
> > 61 in data-model-5.c. The only mitigation would be to disable the 
> > warning for structs entirely. Now, the question is to rather have
> > noise
> > on these cases or disable the warning for structs entirely?
> > 
> > Can you post the full warning please?
> 
> /path/to/data-model-5.c: In function ‘alloc_obj’:
> /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    61 |   base_obj *obj = (base_obj *)malloc (sz);
>       |                               ^~~~~~~~~~~
>   ‘new_string_obj’: events 1-2
>     |
>     |   69 | base_obj *new_string_obj (const char *str)
>     |      |           ^~~~~~~~~~~~~~
>     |      |           |
>     |      |           (1) entry to ‘new_string_obj’
>     |......
>     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> (string_obj) + len + 1);
>     |      |                    
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |                     |
>     |      |                     (2) calling ‘alloc_obj’ from
> ‘new_string_obj’
>     |
>     +--> ‘alloc_obj’: events 3-4
>            |
>            |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
>            |      |           ^~~~~~~~~
>            |      |           |
>            |      |           (3) entry to ‘alloc_obj’
>            |   60 | {
>            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
>            |      |                               ~~~~~~~~~~~
>            |      |                               |
>            |      |                               (4) Allocation is
> incompatible with ‘base_obj *’; either the allocated size is bogus or
> the type on the left-hand side is wrong
>            |
> 
> > 
> > These testcases exhibit a common way of faking inheritance in C, and
> > I
> > think it ought to be possible to support this in the warning.
> > 
> > I thing what's happening is we have
> > 
> > struct base
> > { 
> >   /* fields */
> > };
> > 
> > struct sub
> > {
> >   struct base m_base;
> >   /* extra fields.  */
> > };
> > 
> > struct base *construct_base (size_t sz)
> > {
> >   struct base *p = (struct base *) malloc (sz);
> > 
> >   /* set up fields of base in p  */
> > 
> >   return p;
> > }
> > 
> > Or is this on the interprocedural path as called with a specific
> > sizeof
> > for struct sub?
> 
> At (4), it does not know that base_obj is later used as a "base
> struct". 
> As it is called with sizeof(struct sub), my checker thinks the buffer
> is
> too large for one but too small for another base_obj.
> 
> > 
> > Maybe we can special-case these by detecting where struct sub's first
> > field is struct base, and hence where we expect this pattern?  (and
> > use
> > this to suppress the warning for such cases?)
> 
> I already excluded all structs with structs inside with 
> struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> help 
> in the case size for struct sub is allocated but casted as base. Maybe,
> we
> should do a special case for structs where we only warn when the sizeof
> is
> too small to hold the base struct together with supressing warnings
> when
> the first field is a struct? 

That sounds like it could work.

There are several things going on in the above example:
- fake inheritance
- the "trailing array idiom": struct string_obj's final field is:
     char str_buf[];
  meaning that the string_obj will have the char buffer trailing off
the end, and the allocation is expected to support this.

This is not uncommon in C; it occurs in CPython, see e.g.:
https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h

where CPython's PyBytesObject has the bytes in an ob_sval field
trailing off the end:

typedef struct {
    PyObject_VAR_HEAD
    Py_DEPRECATED(3.11) Py_hash_t ob_shash;
    char ob_sval[1];

    /* Invariants:
     *     ob_sval contains space for 'ob_size+1' elements.
     *     ob_sval[ob_size] == 0.
     *     ob_shash is the hash of the byte string or -1 if not computed yet.
     */
} PyBytesObject;

so it would be good for the warning to handle it gracefully, which I
think your proposal above would.

I try to have plenty of idiomatic C code in the analyzer test suite to
try to catch this kind of thing (as well as more "unit test" kinds of
test coverage); we want the warnings to have a good signal:noise ratio.



> 
> 
> * I'm unable to emit a warning whenever the cast happens at an 
> assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
> This is because I'm unable to access a region_svalue for the returned
> value. Even in the new_program_state, the svalue of the lhs is still a 
> conjured_svalue. Maybe David can lead me to a place where I can access 
> the return value's region_svalue or do I have to adapt the engine?
> 
> Please can you try reposting the patch?  I tried to read it, but am
> having trouble with the mangled indentation.

See my inline answer above. Both, the test case and from where I want
to access the region_svalue are commented with // FIXME.

What does the dump of the state look like? e.g. via calling 

  (gdb) call m_region_model->debug()

from within gdb

A conjured_svalue represents the result of a call to an external
function (or a side-effect written out to a *out-style param of such a
function), but we have the body of create_buffer, so the call to
create_buffer should be analyzed interprocedurally, and we should have
a region_svalue pointing at a heap_allocated_region.

You might want to simplify things to just the functions of interest,
and then have a look at the output of -fdump-analyzer-exploded-graph in
your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).

I wonder if my idea from the other email of moving the test from sm-
malloc.cc to region-model.cc might affect this; the state machines run
at a slightly different time to the region model updates.



> 
> 
> * attr-malloc-6.c and pr96639.c did both contain structs without an
> implementation. Something in the analyzer must have triggered another
> warning about the usage of those without them having an implementation.
> I changed those structs to have an empty implementation, such that the 
> additional warning are gone. I think this shouldn't change the test
> case, so is this change okay?
> 
> What were the new warnings?

/path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
‘struct FILE’
  175 |     FILE *p = malloc (100);   // { dg-message "allocated here"
}
      |               ^~~~~~~~~~~~

All were like the one above. error: invalid use of undefined type
'struct XXX'

That error looks bogus; I'm guessing that something the new diagnostics
is calling is generating it.  You can probably track it down by using

  (gdb) break-on-diagnostic

in the debugger, and then seeing what the backtrace shows when the
breakpoint fires.

See:
  https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html

break-on-diagnostic is one of the things in the support scripts
mentioned on that page.

Hope this is helpful

(BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)

Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-17 22:13     ` David Malcolm
@ 2022-06-21 20:00       ` Tim Lange
  2022-06-21 23:16         ` David Malcolm
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-21 20:00 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Mailing List

On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
>
> [...snip...]
>
> > > 
> > 
> > I have resent the patch using git send-email as a reply to my original
> > message. 
> > The new message looks properly formatted in the archive:
> >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
>
> Thanks; that's *much* more readable.
>
>
> [...snip...]
>
> > > 
> > > 
> > > 
> > > On symbolic buffer sizes:
> > > warning: Allocated buffer size is not a multiple of the pointee's
> > > size 
> > > [CWE-131] [-Wanalyzer-allocation-size]
> > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3 }
> > > */
> > >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> > >   ‘test_3’: event 1
> > >     |
> > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line malloc3
> > > } 
> > > */
> > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > >     | | |
> > >     | | (1) Allocation is incompatible with ‘int *’; either the 
> > > allocated size is bogus or the type on the left-hand side is wrong
> > >     |
> > > 
> > > 
> > > Is there location information for both the malloc and for the
> > > assignment, here?
> > 
> > I'm not sure whether I understand your question but the warning is 
> > emitted at the gcall* with a ssa var lhs and the call_fndecl on the
> > rhs.
> > I think that is enough to split that up into "(1) n + sizeof(int) 
> > allocated here" and "(2) Allocation at (1) is incompatible with..."? 
>
> Probably, yes.
>
> FWIW I wrote some more notes about the events in my reply to to your
> reply to Prathamesh, here:
>   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
>
> [...snip...]
>
> > > 
> > > There are some things to discuss from my side:
> > > * The tests with the "toy re-implementation of CPython's object 
> > > model"[2] fail due to a extra warning emitted. Because the analyzer
> > > can't know the calculation actually results in a correct buffer size 
> > > when viewed as a string_obj later on, it emits a warning, e.g. at
> > > line 
> > > 61 in data-model-5.c. The only mitigation would be to disable the 
> > > warning for structs entirely. Now, the question is to rather have
> > > noise
> > > on these cases or disable the warning for structs entirely?
> > > 
> > > Can you post the full warning please?
> > 
> > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is not a
> > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >    61 |   base_obj *obj = (base_obj *)malloc (sz);
> >       |                               ^~~~~~~~~~~
> >   ‘new_string_obj’: events 1-2
> >     |
> >     |   69 | base_obj *new_string_obj (const char *str)
> >     |      |           ^~~~~~~~~~~~~~
> >     |      |           |
> >     |      |           (1) entry to ‘new_string_obj’
> >     |......
> >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> > (string_obj) + len + 1);
> >     |      |                    
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |                     |
> >     |      |                     (2) calling ‘alloc_obj’ from
> > ‘new_string_obj’
> >     |
> >     +--> ‘alloc_obj’: events 3-4
> >            |
> >            |   59 | base_obj *alloc_obj (type_obj *ob_type, size_t sz)
> >            |      |           ^~~~~~~~~
> >            |      |           |
> >            |      |           (3) entry to ‘alloc_obj’
> >            |   60 | {
> >            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
> >            |      |                               ~~~~~~~~~~~
> >            |      |                               |
> >            |      |                               (4) Allocation is
> > incompatible with ‘base_obj *’; either the allocated size is bogus or
> > the type on the left-hand side is wrong
> >            |
> > 
> > > 
> > > These testcases exhibit a common way of faking inheritance in C, and
> > > I
> > > think it ought to be possible to support this in the warning.
> > > 
> > > I thing what's happening is we have
> > > 
> > > struct base
> > > { 
> > >   /* fields */
> > > };
> > > 
> > > struct sub
> > > {
> > >   struct base m_base;
> > >   /* extra fields.  */
> > > };
> > > 
> > > struct base *construct_base (size_t sz)
> > > {
> > >   struct base *p = (struct base *) malloc (sz);
> > > 
> > >   /* set up fields of base in p  */
> > > 
> > >   return p;
> > > }
> > > 
> > > Or is this on the interprocedural path as called with a specific
> > > sizeof
> > > for struct sub?
> > 
> > At (4), it does not know that base_obj is later used as a "base
> > struct". 
> > As it is called with sizeof(struct sub), my checker thinks the buffer
> > is
> > too large for one but too small for another base_obj.
> > 
> > > 
> > > Maybe we can special-case these by detecting where struct sub's first
> > > field is struct base, and hence where we expect this pattern?  (and
> > > use
> > > this to suppress the warning for such cases?)
> > 
> > I already excluded all structs with structs inside with 
> > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does not
> > help 
> > in the case size for struct sub is allocated but casted as base. Maybe,
> > we
> > should do a special case for structs where we only warn when the sizeof
> > is
> > too small to hold the base struct together with supressing warnings
> > when
> > the first field is a struct? 
>
> That sounds like it could work.
>
> There are several things going on in the above example:
> - fake inheritance
> - the "trailing array idiom": struct string_obj's final field is:
>      char str_buf[];
>   meaning that the string_obj will have the char buffer trailing off
> the end, and the allocation is expected to support this.
>
> This is not uncommon in C; it occurs in CPython, see e.g.:
> https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
>
> where CPython's PyBytesObject has the bytes in an ob_sval field
> trailing off the end:
>
> typedef struct {
>     PyObject_VAR_HEAD
>     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
>     char ob_sval[1];
>
>     /* Invariants:
>      *     ob_sval contains space for 'ob_size+1' elements.
>      *     ob_sval[ob_size] == 0.
>      *     ob_shash is the hash of the byte string or -1 if not computed yet.
>      */
> } PyBytesObject;
>
> so it would be good for the warning to handle it gracefully, which I
> think your proposal above would.
>
> I try to have plenty of idiomatic C code in the analyzer test suite to
> try to catch this kind of thing (as well as more "unit test" kinds of
> test coverage); we want the warnings to have a good signal:noise ratio.
>
>
>
> > 
> > 
> > * I'm unable to emit a warning whenever the cast happens at an 
> > assignment with a call as the rhs, e.g. test_1 in allocation-size-4.c. 
> > This is because I'm unable to access a region_svalue for the returned
> > value. Even in the new_program_state, the svalue of the lhs is still a 
> > conjured_svalue. Maybe David can lead me to a place where I can access 
> > the return value's region_svalue or do I have to adapt the engine?
> > 
> > Please can you try reposting the patch?  I tried to read it, but am
> > having trouble with the mangled indentation.
>
> See my inline answer above. Both, the test case and from where I want
> to access the region_svalue are commented with // FIXME.
>
> What does the dump of the state look like? e.g. via calling 
>
>   (gdb) call m_region_model->debug()
>
> from within gdb
>
> A conjured_svalue represents the result of a call to an external
> function (or a side-effect written out to a *out-style param of such a
> function), but we have the body of create_buffer, so the call to
> create_buffer should be analyzed interprocedurally, and we should have
> a region_svalue pointing at a heap_allocated_region.
>
> You might want to simplify things to just the functions of interest,
> and then have a look at the output of -fdump-analyzer-exploded-graph in
> your favorite .dot viewer (I like xdot; it's in python-xdot in Fedora).
>
> I wonder if my idea from the other email of moving the test from sm-
> malloc.cc to region-model.cc might affect this; the state machines run
> at a slightly different time to the region model updates.

I've now moved the checker inside the region_model. While I got
everything working again and fixed the bogus struct error, I'm stuck at
the warning if the cast is on a return.

Lets take the example:
  void *create_buffer(int n)
  {
    return malloc(n);
  }

  int main (int argc, char **argv)
  {
    int *buf = create_buffer(42);
    free (buf);
    return 0;
  }

After moving it to the region_model, it reaches
  ctxt->warn (new dubious_allocation_size(...))
but does not emit a warning because inside impl_region_model_context::warn
m_stmt and m_stmt_finder are both NULL.
m_stmt is null, because the exploded node at which set_value is called
on, is the after node of create_buffer (the last bb of the function),
which doesn't have any statements.

I·tried·to·add·the·call·site·to·the·context·inside program_state::on_edge
to·get·the·warning.·While·the·warning·appears, the·notes·in·the·diagnostic
are·not·correct·anymore,·i.e.·the·call·site has·the·same·indentation·as
the·callee·and·the·return·note·is missing.
On the other hand, I can not split the return value set_value call out
of pop_frame, because then I'm unable to retrieve the <return_value> as
it is already gone.
If I simply delay the pop_frame call until the 'before' supernode, I get
the warning I wanted but break around 600 test, which can't be right
either - at least not for a simple fix.

Because I'm stuck at this for some hours, I'd like to ask you what do
you think is the best way to get a warning at the call site?

- Tim
>
>
>
> > 
> > 
> >e* attr-malloc-6.c and pr96639.c did both contain stn;cts withn;t an
> > implementation. Something in the analyzer must have triggered another
> > warning about the usage of those without them having an implementation.
> > I changed those structs to have an empty implementation, such that the 
> > additional warning are gone. I think this shouldn't change the test
> > case, so is this change okay?
> > 
> > What were the new warnings?
>
> /path/to/attr-malloc-6.c:175:15: error: invalid use of undefined type
> ‘struct FILE’
>   175 |     FILE *p = malloc (100);   // { dg-message "allocated here"
> }
>       |               ^~~~~~~~~~~~

For anyone in the future with the same problem. Make sure to check that
TYPE_SIZE_UNIT != NULL_TREE before calling size_in_bytes.

>
> All were like the one above. error: invalid use of undefined type
> 'struct XXX'
>
> That error looks bogus; I'm guessing that something the new diagnostics
> is calling is generating it.  You can probably track it down by using
>
>   (gdb) break-on-diagnostic
>
> in the debugger, and then seeing what the backtrace shows when the
> breakpoint fires.
>
> See:
>   https://gcc-newbies-guide.readthedocs.io/en/latest/debugging.html
>
> break-on-diagnostic is one of the things in the support scripts
> mentioned on that page.
>
> Hope this is helpful
>
> (BTW, I'm about to disappear for a long weekend; I'm back on Tuesday)
>
> Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-21 20:00       ` Tim Lange
@ 2022-06-21 23:16         ` David Malcolm
  2022-06-22 14:57           ` Tim Lange
  0 siblings, 1 reply; 17+ messages in thread
From: David Malcolm @ 2022-06-21 23:16 UTC (permalink / raw)
  To: Tim Lange; +Cc: GCC Mailing List

On Tue, 2022-06-21 at 22:00 +0200, Tim Lange wrote:
> On Sat Jun 18, 2022 at 12:13 AM CEST, David Malcolm wrote:
> > On Fri, 2022-06-17 at 22:23 +0200, Tim Lange wrote:
> > > On Fri, Jun 17, 2022 at 01:48:09PM -0400, David Malcolm wrote:
> > > > On Fri, 2022-06-17 at 17:54 +0200, Tim Lange wrote:
> > 
> > [...snip...]
> > 
> > > > 
> > > 
> > > I have resent the patch using git send-email as a reply to my
> > > original
> > > message. 
> > > The new message looks properly formatted in the archive:
> > >     https://gcc.gnu.org/pipermail/gcc/2022-June/238911.html
> > 
> > Thanks; that's *much* more readable.
> > 
> > 
> > [...snip...]
> > 
> > > > 
> > > > 
> > > > 
> > > > On symbolic buffer sizes:
> > > > warning: Allocated buffer size is not a multiple of the
> > > > pointee's
> > > > size 
> > > > [CWE-131] [-Wanalyzer-allocation-size]
> > > >    33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line
> > > > malloc3 }
> > > > */
> > > >       | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > >   ‘test_3’: event 1
> > > >     |
> > > >     | 33 | int *ptr = malloc (n + sizeof(int)); /* { dg-line
> > > > malloc3
> > > > } 
> > > > */
> > > >     | | ^~~~~~~~~~~~~~~~~~~~~~~~
> > > >     | | |
> > > >     | | (1) Allocation is incompatible with ‘int *’; either the
> > > > allocated size is bogus or the type on the left-hand side is
> > > > wrong
> > > >     |
> > > > 
> > > > 
> > > > Is there location information for both the malloc and for the
> > > > assignment, here?
> > > 
> > > I'm not sure whether I understand your question but the warning
> > > is 
> > > emitted at the gcall* with a ssa var lhs and the call_fndecl on
> > > the
> > > rhs.
> > > I think that is enough to split that up into "(1) n + sizeof(int)
> > > allocated here" and "(2) Allocation at (1) is incompatible
> > > with..."? 
> > 
> > Probably, yes.
> > 
> > FWIW I wrote some more notes about the events in my reply to to
> > your
> > reply to Prathamesh, here:
> >   https://gcc.gnu.org/pipermail/gcc/2022-June/238917.html
> > 
> > [...snip...]
> > 
> > > > 
> > > > There are some things to discuss from my side:
> > > > * The tests with the "toy re-implementation of CPython's object
> > > > model"[2] fail due to a extra warning emitted. Because the
> > > > analyzer
> > > > can't know the calculation actually results in a correct buffer
> > > > size 
> > > > when viewed as a string_obj later on, it emits a warning, e.g.
> > > > at
> > > > line 
> > > > 61 in data-model-5.c. The only mitigation would be to disable
> > > > the 
> > > > warning for structs entirely. Now, the question is to rather
> > > > have
> > > > noise
> > > > on these cases or disable the warning for structs entirely?
> > > > 
> > > > Can you post the full warning please?
> > > 
> > > /path/to/data-model-5.c: In function ‘alloc_obj’:
> > > /path/to/data-model-5.c:61:31: warning: Allocated buffer size is
> > > not a
> > > multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-
> > > size]
> > >    61 |   base_obj *obj = (base_obj *)malloc (sz);
> > >       |                               ^~~~~~~~~~~
> > >   ‘new_string_obj’: events 1-2
> > >     |
> > >     |   69 | base_obj *new_string_obj (const char *str)
> > >     |      |           ^~~~~~~~~~~~~~
> > >     |      |           |
> > >     |      |           (1) entry to ‘new_string_obj’
> > >     |......
> > >     |   75 |     = (string_obj *)alloc_obj (&str_type, sizeof
> > > (string_obj) + len + 1);
> > >     |      |                    
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |                     |
> > >     |      |                     (2) calling ‘alloc_obj’ from
> > > ‘new_string_obj’
> > >     |
> > >     +--> ‘alloc_obj’: events 3-4
> > >            |
> > >            |   59 | base_obj *alloc_obj (type_obj *ob_type,
> > > size_t sz)
> > >            |      |           ^~~~~~~~~
> > >            |      |           |
> > >            |      |           (3) entry to ‘alloc_obj’
> > >            |   60 | {
> > >            |   61 |   base_obj *obj = (base_obj *)malloc (sz);
> > >            |      |                               ~~~~~~~~~~~
> > >            |      |                               |
> > >            |      |                               (4) Allocation
> > > is
> > > incompatible with ‘base_obj *’; either the allocated size is
> > > bogus or
> > > the type on the left-hand side is wrong
> > >            |
> > > 
> > > > 
> > > > These testcases exhibit a common way of faking inheritance in
> > > > C, and
> > > > I
> > > > think it ought to be possible to support this in the warning.
> > > > 
> > > > I thing what's happening is we have
> > > > 
> > > > struct base
> > > > { 
> > > >   /* fields */
> > > > };
> > > > 
> > > > struct sub
> > > > {
> > > >   struct base m_base;
> > > >   /* extra fields.  */
> > > > };
> > > > 
> > > > struct base *construct_base (size_t sz)
> > > > {
> > > >   struct base *p = (struct base *) malloc (sz);
> > > > 
> > > >   /* set up fields of base in p  */
> > > > 
> > > >   return p;
> > > > }
> > > > 
> > > > Or is this on the interprocedural path as called with a
> > > > specific
> > > > sizeof
> > > > for struct sub?
> > > 
> > > At (4), it does not know that base_obj is later used as a "base
> > > struct". 
> > > As it is called with sizeof(struct sub), my checker thinks the
> > > buffer
> > > is
> > > too large for one but too small for another base_obj.
> > > 
> > > > 
> > > > Maybe we can special-case these by detecting where struct sub's
> > > > first
> > > > field is struct base, and hence where we expect this pattern? 
> > > > (and
> > > > use
> > > > this to suppress the warning for such cases?)
> > > 
> > > I already excluded all structs with structs inside with 
> > > struct_or_union_with_inheritance_p inside sm-malloc.cc. This does
> > > not
> > > help 
> > > in the case size for struct sub is allocated but casted as base.
> > > Maybe,
> > > we
> > > should do a special case for structs where we only warn when the
> > > sizeof
> > > is
> > > too small to hold the base struct together with supressing
> > > warnings
> > > when
> > > the first field is a struct? 
> > 
> > That sounds like it could work.
> > 
> > There are several things going on in the above example:
> > - fake inheritance
> > - the "trailing array idiom": struct string_obj's final field is:
> >      char str_buf[];
> >   meaning that the string_obj will have the char buffer trailing
> > off
> > the end, and the allocation is expected to support this.
> > 
> > This is not uncommon in C; it occurs in CPython, see e.g.:
> > https://github.com/python/cpython/blob/main/Include/cpython/bytesobject.h
> > 
> > where CPython's PyBytesObject has the bytes in an ob_sval field
> > trailing off the end:
> > 
> > typedef struct {
> >     PyObject_VAR_HEAD
> >     Py_DEPRECATED(3.11) Py_hash_t ob_shash;
> >     char ob_sval[1];
> > 
> >     /* Invariants:
> >      *     ob_sval contains space for 'ob_size+1' elements.
> >      *     ob_sval[ob_size] == 0.
> >      *     ob_shash is the hash of the byte string or -1 if not
> > computed yet.
> >      */
> > } PyBytesObject;
> > 
> > so it would be good for the warning to handle it gracefully, which
> > I
> > think your proposal above would.
> > 
> > I try to have plenty of idiomatic C code in the analyzer test suite
> > to
> > try to catch this kind of thing (as well as more "unit test" kinds
> > of
> > test coverage); we want the warnings to have a good signal:noise
> > ratio.
> > 
> > 
> > 
> > > 
> > > 
> > > * I'm unable to emit a warning whenever the cast happens at an 
> > > assignment with a call as the rhs, e.g. test_1 in allocation-
> > > size-4.c. 
> > > This is because I'm unable to access a region_svalue for the
> > > returned
> > > value. Even in the new_program_state, the svalue of the lhs is
> > > still a 
> > > conjured_svalue. Maybe David can lead me to a place where I can
> > > access 
> > > the return value's region_svalue or do I have to adapt the
> > > engine?
> > > 
> > > Please can you try reposting the patch?  I tried to read it, but
> > > am
> > > having trouble with the mangled indentation.
> > 
> > See my inline answer above. Both, the test case and from where I
> > want
> > to access the region_svalue are commented with // FIXME.
> > 
> > What does the dump of the state look like? e.g. via calling 
> > 
> >   (gdb) call m_region_model->debug()
> > 
> > from within gdb
> > 
> > A conjured_svalue represents the result of a call to an external
> > function (or a side-effect written out to a *out-style param of
> > such a
> > function), but we have the body of create_buffer, so the call to
> > create_buffer should be analyzed interprocedurally, and we should
> > have
> > a region_svalue pointing at a heap_allocated_region.
> > 
> > You might want to simplify things to just the functions of
> > interest,
> > and then have a look at the output of -fdump-analyzer-exploded-
> > graph in
> > your favorite .dot viewer (I like xdot; it's in python-xdot in
> > Fedora).
> > 
> > I wonder if my idea from the other email of moving the test from
> > sm-
> > malloc.cc to region-model.cc might affect this; the state machines
> > run
> > at a slightly different time to the region model updates.
> 
> I've now moved the checker inside the region_model. While I got
> everything working again and fixed the bogus struct error, 

...thanks!

> I'm stuck at
> the warning if the cast is on a return.
> 
> Lets take the example:
>   void *create_buffer(int n)
>   {
>     return malloc(n);
>   }
> 
>   int main (int argc, char **argv)
>   {
>     int *buf = create_buffer(42);
>     free (buf);
>     return 0;
>   }
> 
> After moving it to the region_model, it reaches
>   ctxt->warn (new dubious_allocation_size(...))
> but does not emit a warning because inside
> impl_region_model_context::warn
> m_stmt and m_stmt_finder are both NULL.
> m_stmt is null, because the exploded node at which set_value is
> called
> on, is the after node of create_buffer (the last bb of the function),
> which doesn't have any statements.

What optimization level are you trying this at?

What's the output of -fdump-ipa-analyzer=stderr

> 
> I·tried·to·add·the·call·site·to·the·context·inside
> program_state::on_edge
> to·get·the·warning.·While·the·warning·appears,
> the·notes·in·the·diagnostic
> are·not·correct·anymore,·i.e.·the·call·site
> has·the·same·indentation·as
> the·callee·and·the·return·note·is missing.
> On the other hand, I can not split the return value set_value call
> out
> of pop_frame, because then I'm unable to retrieve the <return_value>
> as
> it is already gone.
> If I simply delay the pop_frame call until the 'before' supernode, I
> get
> the warning I wanted but break around 600 test, which can't be right
> either - at least not for a simple fix.
> 
> Because I'm stuck at this for some hours, I'd like to ask you what do
> you think is the best way to get a warning at the call site?

Can you post your patch please so that I can see what's going on more
clearly.

Thanks
Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-21 23:16         ` David Malcolm
@ 2022-06-22 14:57           ` Tim Lange
  2022-06-22 18:23             ` David Malcolm
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-22 14:57 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, Tim Lange

The checker reaches region-model.cc#3083 in my patch with the
  impl_region_model_context
on the 'after' node of create_buffer() but then discards the warning inside
impl_region_model_context::warn because m_stmt is null. Even if m_stmt were
not be NULL at the 'after' node, my warning would be emitted before the
return edge was taken and thus be wrongly indented like shown below:
/path/to/.c:10:16: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   10 |     int *buf = create_buffer(42);
      |                ^~~~~~~~~~~~~~~~~
  ‘main’: events 1-2
    |
    |    9 |   int main (int argc, char **argv) {
    |      |       ^~~~
    |      |       |
    |      |       (1) entry to ‘main’
    |   10 |     int *buf = create_buffer(42);
    |      |                ~~~~~~~~~~~~~~~~~
    |      |                |
    |      |                (2) calling ‘create_buffer’ from ‘main’
    |
    +--> ‘create_buffer’: events 3-4
           |
           |    4 |   void *create_buffer(int n)
           |      |         ^~~~~~~~~~~~~
           |      |         |
           |      |         (3) entry to ‘create_buffer’
           |    5 |   {
           |    6 |     return malloc(n);
           |      |            ~~~~~~~~~
           |      |            |
           |      |            (4) allocated 42 bytes here
           |
         ‘main’: event 5
           |
           |   10 |     int *buf = create_buffer(42);
           |      |                ^~~~~~~~~~~~~~~~~
           |      |                |
           |      |                (5) Assigned to ‘int *’ here
           |
           
The correct warning should be:
/path/to/.c:10:16: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   10 |     int *buf = create_buffer(42);
      |                ^~~~~~~~~~~~~~~~~
  ‘main’: events 1-2
    |
    |    9 |   int main (int argc, char **argv) {
    |      |       ^~~~
    |      |       |
    |      |       (1) entry to ‘main’
    |   10 |     int *buf = create_buffer(42);
    |      |                ~~~~~~~~~~~~~~~~~
    |      |                |
    |      |                (2) calling ‘create_buffer’ from ‘main’
    |
    +------------> ‘create_buffer’: events 3-4
                   |
                   |    4 |   void *create_buffer(int n)
                   |      |         ^~~~~~~~~~~~~
                   |      |         |
                   |      |         (3) entry to ‘create_buffer’
                   |    5 |   {
                   |    6 |     return malloc(n);
                   |      |            ~~~~~~~~~
                   |      |            |
                   |      |            (4) allocated 42 bytes here
                   |
‘main’: event 5 <--+
   |
   |   10 |     int *buf = create_buffer(42);
   |      |                ^~~~~~~~~~~~~~~~~
   |      |                |
   |      |                (5) Assigned to ‘int *’ here
   |
For that, the return edge has to be visited to be part of the emission_path.
This is currently not the case as the assignment of the <return_value> to
the caller lhs is handled inside pop_frame, which is transitively called
from program_state::on_edge of the 'after' node of the callee.
I tried to defer the set_value(caller lhs, <return_value>) call to the
'before' node after the return edge but failed to do elegantly. My last try
is in the patch commented out with // FIXME.
My main problem is that I can not pop the frame and later get the
return value easily. Deferring the whole pop_frame to the before node
breaks the assumptions inside exploded_graph::get_or_create_node.

I don't know what's the best/elegant way of solving this. Is a solution to
attach the return svalue to the return edge and then use it later in the
PK_BEFORE_SUPERNODE?

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/checker-path.cc                  |  12 +-
 gcc/analyzer/checker-path.h                   |   2 +-
 gcc/analyzer/engine.cc                        |  12 +
 gcc/analyzer/pending-diagnostic.h             |  21 ++
 gcc/analyzer/region-model.cc                  | 322 ++++++++++++++++++
 gcc/analyzer/region-model.h                   |   4 +
 .../gcc.dg/analyzer/allocation-size-1.c       |  63 ++++
 .../gcc.dg/analyzer/allocation-size-2.c       |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c       |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  92 +++++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c      |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +
 14 files changed, 627 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible type.
+
 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is called.
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 0133dc94137..4ad75e636c1 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -302,8 +302,18 @@ region_creation_event::region_creation_event (const region *reg,
    region_creation_event.  */
 
 label_text
-region_creation_event::get_desc (bool) const
+region_creation_event::get_desc (bool can_colorize) const
 {
+  if (m_pending_diagnostic)
+    {
+      label_text custom_desc
+            = m_pending_diagnostic->describe_region_creation_event 
+                (evdesc::region_creation(can_colorize, m_reg));
+      if (custom_desc.m_buffer)
+        return custom_desc;
+    }
+
+
   switch (m_reg->get_memory_space ())
     {
     default:
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 24decf5ce3d..8e48d8a07ab 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -219,7 +219,7 @@ public:
   region_creation_event (const region *reg,
 			 location_t loc, tree fndecl, int depth);
 
-  label_text get_desc (bool) const final override;
+  label_text get_desc (bool can_colorize) const final override;
 
 private:
   const region *m_reg;
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 7237cc1a1ca..5bf8697d8f7 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3740,6 +3740,18 @@ exploded_graph::process_node (exploded_node *node)
 	    program_state::detect_leaks (state, next_state, NULL,
 					 get_ext_state (), &ctxt);
 	  }
+  // FIX ME: Other way than calling return again here?
+  // else 
+  //   {
+  //     const supernode *snode = point.get_supernode ();
+  //     if (snode->m_returning_call)
+  //     {
+  //       impl_region_model_context ctxt (*this, node,
+  //           &state, &next_state,
+  //           &uncertainty, NULL, snode->m_returning_call);
+  //       next_state.m_region_model->set_return (snode->m_returning_call, &ctxt);
+  //     }
+  //   }
 
 	program_point next_point (point.get_next ());
 	exploded_node *next = get_or_create_node (next_point, next_state, node);
diff --git a/gcc/analyzer/pending-diagnostic.h b/gcc/analyzer/pending-diagnostic.h
index 9e1c656bf0a..3f79444ef40 100644
--- a/gcc/analyzer/pending-diagnostic.h
+++ b/gcc/analyzer/pending-diagnostic.h
@@ -1,3 +1,4 @@
+
 /* Classes for analyzer diagnostics.
    Copyright (C) 2019-2022 Free Software Foundation, Inc.
    Contributed by David Malcolm <dmalcolm@redhat.com>.
@@ -58,6 +59,17 @@ struct event_desc
   bool m_colorize;
 };
 
+/* For use by pending_diagnostic::describe_region_creation.  */
+
+struct region_creation : public event_desc
+{
+  region_creation (bool colorize, const region *reg)
+  : event_desc (colorize), m_reg(reg)
+  {}
+
+  const region *m_reg;
+};
+
 /* For use by pending_diagnostic::describe_state_change.  */
 
 struct state_change : public event_desc
@@ -215,6 +227,15 @@ class pending_diagnostic
      description; NULL otherwise (falling back on a more generic
      description).  */
 
+  /* Precision-of-wording vfunc for describing a region creation event
+     triggered by the mark_interesting_stuff vfunc.  */
+  virtual label_text 
+  describe_region_creation_event (const evdesc::region_creation &)
+  {
+    /* Default no-op implementation.  */
+    return label_text ();
+  }
+
   /* Precision-of-wording vfunc for describing a critical state change
      within the diagnostic_path.
 
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6b49719d521..acb8bd1bfca 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-operands.h"
 #include "ssa-iterators.h"
 #include "calls.h"
+#include "print-tree.h"
 
 #if ENABLE_ANALYZER
 
@@ -653,6 +654,66 @@ private:
   tree m_count_cst;
 };
 
+/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
+
+class dubious_allocation_size
+: public pending_diagnostic_subclass<dubious_allocation_size>
+{
+public:
+  dubious_allocation_size (const region *lhs, const region *rhs, 
+                           const svalue *capacity)
+  : m_lhs(lhs), m_rhs(rhs), m_capacity(capacity) {}
+
+  const char *get_kind () const final override 
+  { 
+    return "dubious_allocation_size"; 
+  }
+
+  bool operator== (const dubious_allocation_size &other) const
+  {
+    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;;
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+    return warning_meta (rich_loc, m, get_controlling_option (),
+	       "Allocated buffer size is not a multiple of the pointee's size");
+  }
+
+  label_text 
+  describe_region_creation_event (const evdesc::region_creation &ev) final override
+  {
+    // TODO: better way to print the capacity
+    return ev.formatted_print ("allocated %s here", 
+                                          m_capacity->get_desc(true).m_buffer);
+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev) final override
+  {
+    return ev.formatted_print ("Assigned to %qT here", m_lhs->get_type ());
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_lhs)
+      interest->add_region_creation (m_lhs);
+    if (m_rhs)
+      interest->add_region_creation (m_rhs);
+  }
+
+private:
+  const region *m_lhs;
+  const region *m_rhs;
+  const svalue *m_capacity;
+};
+
 /* If ASSIGN is a stmt that can be modelled via
      set_value (lhs_reg, SVALUE, CTXT)
    for some SVALUE, get the SVALUE.
@@ -2799,6 +2860,241 @@ region_model::check_region_for_read (const region *src_reg,
   check_region_access (src_reg, DIR_READ, ctxt);
 }
 
+/* Returns the trailing bytes on dubious allocation sizes.  */
+
+static unsigned HOST_WIDE_INT 
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
+  if (pointee_size == 0)
+    return 0;
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  return alloc_size % pointee_size;
+}
+
+/* Visits svalues and checks whether the 
+   size_cst is a operand of the svalue.  */
+
+class size_visitor : public visitor
+{
+public:
+  size_visitor(tree size_cst, const svalue *sval, constraint_manager *cm) 
+  : m_size_cst(size_cst), m_sval(sval), m_cm(cm)
+  {
+    sval->accept(this);
+  }
+
+  bool get_result()
+  {
+    /* The result_set gradually builts from atomtic nodes upwards. If a node is
+       in the result_set, itself or one/all of its children have an operand that
+       is a multiple of the size_cst. If the root is inside, the given sval 
+       is valid aka a multiple of the size_cst.*/
+    return result_set.contains(m_sval);
+  }
+
+  void 
+  visit_constant_svalue (const constant_svalue *sval) final override
+  {
+    unsigned HOST_WIDE_INT sval_int
+	  = TREE_INT_CST_LOW (sval->get_constant ());
+    unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (m_size_cst);
+    if (size_cst_int == 0 || sval_int % size_cst_int == 0)
+      result_set.add (sval);
+  }
+
+  void 
+  visit_unknown_svalue (const unknown_svalue *sval ATTRIBUTE_UNUSED) 
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void 
+  visit_poisoned_svalue (const poisoned_svalue *sval ATTRIBUTE_UNUSED) 
+    final override
+  {
+    result_set.add (sval);
+  }
+  
+  void visit_unaryop_svalue (const unaryop_svalue *sval) 
+  {
+    const svalue *arg = sval->get_arg ();
+    arg->accept (this);
+    if (result_set.contains (arg))
+	result_set.add (sval);
+  }
+
+  void visit_binop_svalue (const binop_svalue *sval) final override
+  {
+    const svalue *arg0 = sval->get_arg0 ();
+    const svalue *arg1 = sval->get_arg1 ();
+
+    arg0->accept (this);
+    arg1->accept (this);
+    if (sval->get_op () == MULT_EXPR)
+      {
+	if (result_set.contains (arg0) || result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+    else
+      {
+	if (result_set.contains (arg0) && result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+  }
+
+  void visit_repeated_svalue (const repeated_svalue *sval) 
+  {
+    sval->get_inner_svalue ()->accept(this);
+    if (result_set.contains (sval->get_inner_svalue ()))
+      result_set.add (sval);
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue *sval) final override
+  {
+    sval->get_arg ()->accept (this);
+    if (result_set.contains (sval->get_arg ()))
+      result_set.add (sval);
+  }
+
+  void visit_widening_svalue (const widening_svalue *sval) final override
+  {
+    const svalue *base = sval->get_base_svalue ();
+    const svalue *iter = sval->get_iter_svalue ();
+
+    base->accept(this);
+    iter->accept(this);
+    if (result_set.contains (base) && result_set.contains (iter))
+      result_set.add (sval);
+  }
+
+  void visit_conjured_svalue (const conjured_svalue *sval ATTRIBUTE_UNUSED) 
+    final override
+  {
+    if (m_cm->get_equiv_class_by_svalue (sval, NULL))
+      result_set.add (sval);
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue *sval ATTRIBUTE_UNUSED) 
+    final override
+  {
+    // TODO: Should we do something else than assume it could be correct
+    result_set.add (sval);
+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue 
+				      *sval ATTRIBUTE_UNUSED) final override
+  {
+    // TODO: Should we do something else than assume it could be correct
+    result_set.add (sval);
+  }
+
+private:
+  tree m_size_cst;
+  const svalue *m_sval;
+  constraint_manager *m_cm;
+  svalue_set result_set; /* Used as a mapping of svalue*->bool.  */
+};
+
+/* Returns true if there is a constant tree with 
+   the same constant value inside the sval.  */
+
+static bool
+const_operand_in_sval_p (tree type_size_cst, const svalue *sval,
+                         constraint_manager *cm)
+{
+  size_visitor v(type_size_cst, sval, cm);
+  // sval->accept(&v);
+  return v.get_result ();
+}
+
+/* Special handling for structs with "inheritance" or that hold an unbounded 
+     type. Those will be skipped to prevent false positives.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree maybe_struct)
+{
+  if (RECORD_OR_UNION_TYPE_P (maybe_struct))
+    {
+      tree iter = TYPE_FIELDS (maybe_struct);
+      if (iter == NULL_TREE)
+        return false;
+      if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (iter)))
+        return true;
+
+      tree last_field;
+      while (iter != NULL_TREE)
+        {
+          last_field = iter;
+          iter = DECL_CHAIN (iter);
+        }
+
+      if (last_field != NULL_TREE 
+          && COMPLETE_OR_UNBOUND_ARRAY_TYPE_P (TREE_TYPE (last_field)))
+        return true;
+    }
+  return false;
+}
+
+void
+region_model::check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+			                           region_model_context *ctxt) const
+{
+  if (!ctxt)
+    return;
+  
+  const region_svalue *reg_sval = dyn_cast <const region_svalue *> (rhs_sval);
+  if (!reg_sval)
+    return;
+
+  tree pointer_type = lhs_reg->get_type ();
+  if (pointer_type == NULL_TREE || !POINTER_TYPE_P (pointer_type))
+    return;
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* void * is always compatible and make sure that the pointee_type actually
+     has a size, or else size_in_bytes might fail.  */
+  if (pointee_type == NULL_TREE || VOID_TYPE_P (pointee_type) 
+      || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
+    return;
+  if (struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes(pointee_type);
+  /* The size might be unknown e.g. being a array with n elements
+     or casting to char * never has any trailing bytes.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+    return;
+
+  const svalue *capacity = get_capacity (reg_sval->get_pointee ());
+  switch (capacity->get_kind ())
+    {
+    case svalue_kind::SK_CONSTANT:
+      {
+	const constant_svalue *cap_sval = capacity->dyn_cast_constant_svalue ();
+	tree cap = cap_sval->get_constant ();
+	unsigned HOST_WIDE_INT size_diff
+	  = capacity_compatible_with_type (cap, pointee_size_tree);
+	if (size_diff != 0)
+	  {
+	    ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval->get_pointee (), capacity));
+	  }
+      }
+      break;
+    default:
+      {
+	if (!const_operand_in_sval_p (pointee_size_tree, capacity, m_constraints))
+	  {
+	    ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval->get_pointee (), capacity));
+	  }
+      }
+      break;
+    }
+}
+
 /* Set the value of the region given by LHS_REG to the value given
    by RHS_SVAL.
    Use CTXT to report any warnings associated with writing to LHS_REG.  */
@@ -2810,6 +3106,8 @@ region_model::set_value (const region *lhs_reg, const svalue *rhs_sval,
   gcc_assert (lhs_reg);
   gcc_assert (rhs_sval);
 
+  check_region_size(lhs_reg, rhs_sval, ctxt);
+
   check_region_for_write (lhs_reg, ctxt);
 
   m_store.set_value (m_mgr->get_store_manager(), lhs_reg, rhs_sval,
@@ -3990,6 +4288,30 @@ region_model::pop_frame (tree result_lvalue,
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
 }
 
+// FIXME: How to call the call_lhs <- <return_value> on the caller context
+//        with the call site set as a stmt?
+void
+region_model::set_return (const gcall *call, region_model_context *ctxt)
+{
+  if (0)
+    {
+      tree lhs = gimple_call_lhs (call);
+      tree fndecl = gimple_call_fndecl (call);
+      tree result = DECL_RESULT (fndecl);
+      const svalue *retval = NULL;
+      if (result && TREE_TYPE (result) != void_type_node)
+          retval = get_rvalue (result, ctxt);
+
+      if (lhs && retval)
+      {
+        /* Compute result_dst_reg using RESULT_LVALUE *after* popping
+      the frame, but before poisoning pointers into the old frame.  */
+        const region *result_dst_reg = get_lvalue (lhs, ctxt);
+        set_value (result_dst_reg, retval, ctxt);
+      }
+    }
+}
+
 /* Get the number of frames in this region_model's stack.  */
 
 int
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 1bfa56a8cd2..5dbbf2cd546 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -668,6 +668,8 @@ class region_model
   void update_for_return_gcall (const gcall *call_stmt,
                                 region_model_context *ctxt);
 
+  void set_return (const gcall *call_stmt, region_model_context *ctxt);
+
   const region *push_frame (function *fun, const vec<const svalue *> *arg_sids,
 			    region_model_context *ctxt);
   const frame_region *get_current_frame () const { return m_current_frame; }
@@ -857,6 +859,8 @@ class region_model
 			       region_model_context *ctxt) const;
   void check_region_for_read (const region *src_reg,
 			      region_model_context *ctxt) const;
+  void check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+                          region_model_context *ctxt) const;
 
   void check_call_args (const call_details &cd) const;
   void check_external_function_for_access_attr (const gcall *call,
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..cb3df5516e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,63 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes.  */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
+
+struct s {
+  int i;
+};
+
+void test_5 (void)
+{
+  struct s *ptr = malloc (5 * sizeof (struct s));
+  free (ptr);
+}
+
+void test_6 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct s));  /* { dg-line malloc6 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
+
+void test_7 (void)
+{
+  char buf[2];
+  int *ptr = (int *)buf; /* { dg-line malloc7 } */
+
+  /* { dg-warning "" "" { target *-*-* } malloc7 } */
+  /* { dg-message "" "" { target *-*-* } malloc7 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..a619a786a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes.  */
+
+void test_1 (void)
+{
+  int n;
+  scanf("%i", &n);
+  short *ptr = malloc (n * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc } */
+  /* { dg-message "" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "" "" { target *-*-* } malloc1 } */
+  /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc3 } */
+  /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+  int m;
+  scanf("%i", &m);
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..32e14bad6ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,92 @@
+#include <stddef.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_1(void) 
+{
+  // FIXME
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
+
+void test_2(void) 
+{
+  void *buf = create_buffer(42); /* { dg-message } */
+  int *ibuf = buf; /* { dg-line assign2 } */
+  free (ibuf);
+
+  /* { dg-warning "" "" { target *-*-* } assign2 } */
+  /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+  void *buf = malloc(42); /* { dg-message } */
+  if (buf != NULL) /* { dg-message } */
+    {
+      int *ibuf = buf; /* { dg-line assign3 } */
+      free (ibuf);
+    }
+
+  /* { dg-warning "" "" { target *-*-* } assign3 } */
+  /* { dg-message "" "" { target *-*-* } assign3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else if (n == 1)
+    size = 10;
+  else
+    size = 20;
+
+  int *buf = malloc(size); // Size should be 'unknown' at this point
+  free (buf);
+}
+
+void test_5(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 2;
+  else
+    size = 10;
+
+  short *buf = malloc(size); // Size should be widened to 2 and 10, both fit
+  free (buf);
+}
+
+
+void test_6(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else
+    size = 10;
+
+  short *buf = malloc(size); /* { dg-line malloc6 } */
+  free (buf);
+  
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..8fa6a6eb570 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c.  */
 
 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
 
+struct FILE;
 typedef struct FILE   FILE;
 typedef __SIZE_TYPE__ size_t;
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..f9a73c79403 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types -Wno-analyzer-allocation-size" } */
 
 #include <stdlib.h>
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
 
 int
-- 
2.36.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] analyzer: allocation size warning
  2022-06-22 14:57           ` Tim Lange
@ 2022-06-22 18:23             ` David Malcolm
  0 siblings, 0 replies; 17+ messages in thread
From: David Malcolm @ 2022-06-22 18:23 UTC (permalink / raw)
  To: Tim Lange; +Cc: gcc

On Wed, 2022-06-22 at 16:57 +0200, Tim Lange wrote:
> The checker reaches region-model.cc#3083 in my patch with the
>   impl_region_model_context
> on the 'after' node of create_buffer() but then discards the warning
> inside
> impl_region_model_context::warn because m_stmt is null. Even if m_stmt
> were
> not be NULL at the 'after' node, my warning would be emitted before the
> return edge was taken and thus be wrongly indented like shown below:
> /path/to/.c:10:16: warning: Allocated buffer size is not a multiple of
> the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    10 |     int *buf = create_buffer(42);
>       |                ^~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |    9 |   int main (int argc, char **argv) {
>     |      |       ^~~~
>     |      |       |
>     |      |       (1) entry to ‘main’
>     |   10 |     int *buf = create_buffer(42);
>     |      |                ~~~~~~~~~~~~~~~~~
>     |      |                |
>     |      |                (2) calling ‘create_buffer’ from ‘main’
>     |
>     +--> ‘create_buffer’: events 3-4
>            |
>            |    4 |   void *create_buffer(int n)
>            |      |         ^~~~~~~~~~~~~
>            |      |         |
>            |      |         (3) entry to ‘create_buffer’
>            |    5 |   {
>            |    6 |     return malloc(n);
>            |      |            ~~~~~~~~~
>            |      |            |
>            |      |            (4) allocated 42 bytes here
>            |
>          ‘main’: event 5
>            |
>            |   10 |     int *buf = create_buffer(42);
>            |      |                ^~~~~~~~~~~~~~~~~
>            |      |                |
>            |      |                (5) Assigned to ‘int *’ here
>            |
>            
> The correct warning should be:
> /path/to/.c:10:16: warning: Allocated buffer size is not a multiple of
> the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    10 |     int *buf = create_buffer(42);
>       |                ^~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |    9 |   int main (int argc, char **argv) {
>     |      |       ^~~~
>     |      |       |
>     |      |       (1) entry to ‘main’
>     |   10 |     int *buf = create_buffer(42);
>     |      |                ~~~~~~~~~~~~~~~~~
>     |      |                |
>     |      |                (2) calling ‘create_buffer’ from ‘main’
>     |
>     +------------> ‘create_buffer’: events 3-4
>                    |
>                    |    4 |   void *create_buffer(int n)
>                    |      |         ^~~~~~~~~~~~~
>                    |      |         |
>                    |      |         (3) entry to ‘create_buffer’
>                    |    5 |   {
>                    |    6 |     return malloc(n);
>                    |      |            ~~~~~~~~~
>                    |      |            |
>                    |      |            (4) allocated 42 bytes here
>                    |
> ‘main’: event 5 <--+
>    |
>    |   10 |     int *buf = create_buffer(42);
>    |      |                ^~~~~~~~~~~~~~~~~
>    |      |                |
>    |      |                (5) Assigned to ‘int *’ here
>    |
> For that, the return edge has to be visited to be part of the
> emission_path.
> This is currently not the case as the assignment of the <return_value>
> to
> the caller lhs is handled inside pop_frame, which is transitively
> called
> from program_state::on_edge of the 'after' node of the callee.
> I tried to defer the set_value(caller lhs, <return_value>) call to the
> 'before' node after the return edge but failed to do elegantly. My last
> try
> is in the patch commented out with // FIXME.
> My main problem is that I can not pop the frame and later get the
> return value easily. Deferring the whole pop_frame to the before node
> breaks the assumptions inside exploded_graph::get_or_create_node.
> 
> I don't know what's the best/elegant way of solving this. Is a solution
> to
> attach the return svalue to the return edge and then use it later in
> the
> PK_BEFORE_SUPERNODE?

The ctxt is created here:

#5  0x00000000012f5856 in ana::program_state::on_edge (this=this@entry=0x7fffffffc8c0, eg=..., enode=enode@entry=0x2d8d970, 
    succ=succ@entry=0x2d0e590, uncertainty=uncertainty@entry=0x7fffffffc990) at ../../src/gcc/analyzer/program-state.cc:1035
1035	  if (!m_region_model->maybe_update_for_edge (*succ,
(gdb) list
1030	  impl_region_model_context ctxt (eg, enode,
1031					  &enode->get_state (),
1032					  this,
1033					  uncertainty, NULL,
1034					  last_stmt);
1035	  if (!m_region_model->maybe_update_for_edge (*succ,
1036						      last_stmt,
1037						      &ctxt, NULL))

I tried another approach: to provide a custom stmt_finder for this
ctxt, which uses the "returning call" stmt for the destination
supernode:

diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc
index 7ad581c7fbd..11554de9484 100644
--- a/gcc/analyzer/program-state.cc
+++ b/gcc/analyzer/program-state.cc
@@ -996,6 +996,29 @@ program_state::get_current_function () const
   return m_region_model->get_current_function ();
 }
 
+// FIXME
+
+class returning_call_stmt_finder : public stmt_finder
+{
+public:
+  returning_call_stmt_finder (const superedge *succ): m_succ (succ) {}
+
+  stmt_finder *clone () const final override
+  {
+    return new returning_call_stmt_finder (m_succ);
+  }
+  const gimple *find_stmt (const exploded_path &) final override
+  {
+    if (m_succ->m_dest)
+      if (m_succ->m_dest->get_returning_call ())
+       return m_succ->m_dest->get_returning_call ();
+    return NULL;
+  }
+
+private:
+  const superedge *m_succ;
+};
+
 /* Determine if following edge SUCC from ENODE is valid within the graph EG
    and update this state accordingly in-place.
 
@@ -1018,6 +1041,8 @@ program_state::on_edge (exploded_graph &eg,
   const program_point &point = enode->get_point ();
   const gimple *last_stmt = point.get_supernode ()->get_last_stmt ();
 
+  returning_call_stmt_finder stmt_finder (succ);
+
   /* For conditionals and switch statements, add the
      relevant conditions (for the specific edge) to new_state;
      skip edges for which the resulting constraints
@@ -1031,7 +1056,7 @@ program_state::on_edge (exploded_graph &eg,
                                  &enode->get_state (),
                                  this,
                                  uncertainty, NULL,
-                                 last_stmt);
+                                 last_stmt, &stmt_finder);
   if (!m_region_model->maybe_update_for_edge (*succ,
                                              last_stmt,
                                              &ctxt, NULL))

Doing so leads to this output:

../../src/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c: In function ‘create_buffer’:
../../src/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c:15:14: warning: Allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
      |              ^~~~~~~~~~~~~~~~~
  ‘test_1’: events 1-2
    |
    |   12 | void test_1(void)
    |      |      ^~~~~~
    |      |      |
    |      |      (1) entry to ‘test_1’
    |......
    |   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
    |      |              ~~~~~~~~~~~~~~~~~
    |      |              |
    |      |              (2) calling ‘create_buffer’ from ‘test_1’
    |
    +--> ‘create_buffer’: events 3-4
           |
           |    7 | void *create_buffer(int n)
           |      |       ^~~~~~~~~~~~~
           |      |       |
           |      |       (3) entry to ‘create_buffer’
           |    8 | {
           |    9 |   return malloc(n);
           |      |          ~~~~~~~~~
           |      |          |
           |      |          (4) allocated (long unsigned int)42 here
           |
         ‘test_1’: event 5
           |
           |   15 |   int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
           |      |              ^~~~~~~~~~~~~~~~~
           |      |              |
           |      |              (5) Assigned to ‘int *’ here
           |

which fixes the stmt and the enclosing function decl for event 5 (the
assignment to the "int *buf"), but annoyingly the stack depth
information is wrong; I think the saved diagnostic is being associated
with the existing exploded_node (in create_buffer), whereas I want it
to use the supernode for test_1, which doesn't yet have an exploded
node when pop_frame is called.  I have various ideas for tackling this:
- have two contexts for pop_frame: one in the old frame, the other in
the new frame (for the caller)
- generalize stmt_finder so it can also update the supernode to use
- rework pop_frame (I've had to do this before, I've run into issues
like this before).

I think it's best to keep this issue as an expected failure, and file a
bug about it, so that we can tackle it by itself, and not block you
from making further progress on this patch.

Various review comments inline below...


Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/checker-path.cc                  |  12 +-
 gcc/analyzer/checker-path.h                   |   2 +-
 gcc/analyzer/engine.cc                        |  12 +
 gcc/analyzer/pending-diagnostic.h             |  21 ++
 gcc/analyzer/region-model.cc                  | 322 ++++++++++++++++++
 gcc/analyzer/region-model.h                   |   4 +
 .../gcc.dg/analyzer/allocation-size-1.c       |  63 ++++
 .../gcc.dg/analyzer/allocation-size-2.c       |  44 +++
 .../gcc.dg/analyzer/allocation-size-3.c       |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  92 +++++
 gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c |   2 +
 gcc/testsuite/gcc.dg/analyzer/malloc-4.c      |   2 +-
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +
 14 files changed, 627 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..f213989e0bb 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -78,6 +78,10 @@ Wanalyzer-malloc-leak
 Common Var(warn_analyzer_malloc_leak) Init(1) Warning
 Warn about code paths in which a heap-allocated pointer leaks.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible
type.
+


Any time we add a new option to analyzer.opt we're going to need to add
corresponding documentation to gcc/doc/invoke.texi.  Grep for some of
the existing analyzer warnings to see examples.


 Wanalyzer-mismatching-deallocation
 Common Var(warn_analyzer_mismatching_deallocation) Init(1) Warning
 Warn about code paths in which the wrong deallocation function is
called.


[...snip...]

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
model.cc
index 6b49719d521..acb8bd1bfca 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-operands.h"
 #include "ssa-iterators.h"
 #include "calls.h"
+#include "print-tree.h"
 
 #if ENABLE_ANALYZER
 
@@ -653,6 +654,66 @@ private:
   tree m_count_cst;
 };
 
+/* Concrete subclass for casts of pointers that lead to trailing
bytes.  */
+
+class dubious_allocation_size
+: public pending_diagnostic_subclass<dubious_allocation_size>
+{
+public:
+  dubious_allocation_size (const region *lhs, const region *rhs, 
+                           const svalue *capacity)
+  : m_lhs(lhs), m_rhs(rhs), m_capacity(capacity) {}
+
+  const char *get_kind () const final override 
+  { 
+    return "dubious_allocation_size"; 
+  }
+
+  bool operator== (const dubious_allocation_size &other) const
+  {
+    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;;
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+    return warning_meta (rich_loc, m, get_controlling_option (),
+              "Allocated buffer size is not a multiple of the
pointee's size");

Style nit: our diagnostic messages don't start with a capital letter.

I think this would benefit from a note, via "inform", saying the
sizeof() the pointee; something like:

  bool warned = warning_meta (rich_loc, m, get_controlling_option (),
                              "allocated buffer size is not a"
                              " multiple of the pointee's size");
  if (warned)
    inform (rich_loc->get_location, "%<sizeof(%E)%> is %qE",
                                    etc, etc);
  return warned;

or somesuch.


+  }
+
+  label_text 
+  describe_region_creation_event (const evdesc::region_creation &ev)
final override
+  {
+    // TODO: better way to print the capacity
+    return ev.formatted_print ("allocated %s here", 

Maybe: "allocated here (%s bytes)" ?

+                                          m_capacity-
>get_desc(true).m_buffer);

Annoyingly, label_text doesn't have an automatically working
destructor, due to us (until recently) only being able to use C++98.  
So as written, this leaks memory.  Now that we can use C++11, maybe we
should fix label_text to have a dtor, move assignment, etc, but it's
probably simpler in the short term to simply fix the leak.

+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev)
final override
+  {
+    return ev.formatted_print ("Assigned to %qT here", m_lhs->get_type
());

Style nit: make initial letter of message lower-case.

+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    if (m_lhs)
+      interest->add_region_creation (m_lhs);
+    if (m_rhs)
+      interest->add_region_creation (m_rhs);
+  }
+
+private:
+  const region *m_lhs;
+  const region *m_rhs;
+  const svalue *m_capacity;
+};
+
 /* If ASSIGN is a stmt that can be modelled via
      set_value (lhs_reg, SVALUE, CTXT)
    for some SVALUE, get the SVALUE.
@@ -2799,6 +2860,241 @@ region_model::check_region_for_read (const
region *src_reg,
   check_region_access (src_reg, DIR_READ, ctxt);
 }
 
+/* Returns the trailing bytes on dubious allocation sizes.  */
+
+static unsigned HOST_WIDE_INT 
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW
(pointee_size_tree);
+  if (pointee_size == 0)
+    return 0;
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  return alloc_size % pointee_size;
+}
+
+/* Visits svalues and checks whether the 
+   size_cst is a operand of the svalue.  */
+
+class size_visitor : public visitor
+{
+public:
+  size_visitor(tree size_cst, const svalue *sval, constraint_manager
*cm) 
+  : m_size_cst(size_cst), m_sval(sval), m_cm(cm)
+  {
+    sval->accept(this);
+  }
+
+  bool get_result()
+  {
+    /* The result_set gradually builts from atomtic nodes upwards. If
a node is

Typo: atomtic -> atomic

+       in the result_set, itself or one/all of its children have an
operand that
+       is a multiple of the size_cst. If the root is inside, the given
sval 
+       is valid aka a multiple of the size_cst.*/
+    return result_set.contains(m_sval);
+  }
+
+  void 
+  visit_constant_svalue (const constant_svalue *sval) final override
+  {
+    unsigned HOST_WIDE_INT sval_int
+         = TREE_INT_CST_LOW (sval->get_constant ());
+    unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW
(m_size_cst);
+    if (size_cst_int == 0 || sval_int % size_cst_int == 0)
+      result_set.add (sval);
+  }
+
+  void 
+  visit_unknown_svalue (const unknown_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void 
+  visit_poisoned_svalue (const poisoned_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+  
+  void visit_unaryop_svalue (const unaryop_svalue *sval) 
+  {
+    const svalue *arg = sval->get_arg ();
+    arg->accept (this);
+    if (result_set.contains (arg))
+       result_set.add (sval);
+  }
+
+  void visit_binop_svalue (const binop_svalue *sval) final override
+  {
+    const svalue *arg0 = sval->get_arg0 ();
+    const svalue *arg1 = sval->get_arg1 ();
+
+    arg0->accept (this);
+    arg1->accept (this);
+    if (sval->get_op () == MULT_EXPR)
+      {
+       if (result_set.contains (arg0) || result_set.contains (arg1))
+         result_set.add (sval);
+      }
+    else
+      {
+       if (result_set.contains (arg0) && result_set.contains (arg1))
+         result_set.add (sval);
+      }
+  }
+
+  void visit_repeated_svalue (const repeated_svalue *sval) 
+  {
+    sval->get_inner_svalue ()->accept(this);
+    if (result_set.contains (sval->get_inner_svalue ()))
+      result_set.add (sval);
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue *sval) final
override
+  {
+    sval->get_arg ()->accept (this);
+    if (result_set.contains (sval->get_arg ()))
+      result_set.add (sval);
+  }
+
+  void visit_widening_svalue (const widening_svalue *sval) final
override
+  {
+    const svalue *base = sval->get_base_svalue ();
+    const svalue *iter = sval->get_iter_svalue ();
+
+    base->accept(this);
+    iter->accept(this);
+    if (result_set.contains (base) && result_set.contains (iter))
+      result_set.add (sval);
+  }
+
+  void visit_conjured_svalue (const conjured_svalue *sval
ATTRIBUTE_UNUSED) 
+    final override
+  {
+    if (m_cm->get_equiv_class_by_svalue (sval, NULL))
+      result_set.add (sval);
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue *sval
ATTRIBUTE_UNUSED) 
+    final override
+  {
+    // TODO: Should we do something else than assume it could be
correct
+    result_set.add (sval);

I think we just have to assume it is.

+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue 
+                                     *sval ATTRIBUTE_UNUSED) final
override
+  {
+    // TODO: Should we do something else than assume it could be
correct
+    result_set.add (sval);

Probably here as well.

+  }
+
+private:
+  tree m_size_cst;
+  const svalue *m_sval;
+  constraint_manager *m_cm;
+  svalue_set result_set; /* Used as a mapping of svalue*->bool.  */
+};
+
+/* Returns true if there is a constant tree with 
+   the same constant value inside the sval.  */
+
+static bool
+const_operand_in_sval_p (tree type_size_cst, const svalue *sval,
+                         constraint_manager *cm)
+{
+  size_visitor v(type_size_cst, sval, cm);
+  // sval->accept(&v);
+  return v.get_result ();
+}
+
+/* Special handling for structs with "inheritance" or that hold an
unbounded 
+     type. Those will be skipped to prevent false positives.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree maybe_struct)
+{
+  if (RECORD_OR_UNION_TYPE_P (maybe_struct))
+    {
+      tree iter = TYPE_FIELDS (maybe_struct);
+      if (iter == NULL_TREE)
+        return false;
+      if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (iter)))
+        return true;
+
+      tree last_field;
+      while (iter != NULL_TREE)
+        {
+          last_field = iter;
+          iter = DECL_CHAIN (iter);
+        }
+
+      if (last_field != NULL_TREE 
+          && COMPLETE_OR_UNBOUND_ARRAY_TYPE_P (TREE_TYPE
(last_field)))
+        return true;
+    }
+  return false;
+}
+
+void
+region_model::check_region_size (const region *lhs_reg, const svalue
*rhs_sval,
+                                                  region_model_context
*ctxt) const
+{

Please add a comment immediately before this function summarizing its
intent.

+  if (!ctxt)
+    return;
+  
+  const region_svalue *reg_sval = dyn_cast <const region_svalue *>
(rhs_sval);
+  if (!reg_sval)
+    return;
+
+  tree pointer_type = lhs_reg->get_type ();
+  if (pointer_type == NULL_TREE || !POINTER_TYPE_P (pointer_type))
+    return;
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* void * is always compatible and make sure that the pointee_type
actually
+     has a size, or else size_in_bytes might fail.  */
+  if (pointee_type == NULL_TREE || VOID_TYPE_P (pointee_type) 
+      || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
+    return;
+  if (struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes(pointee_type);
+  /* The size might be unknown e.g. being a array with n elements
+     or casting to char * never has any trailing bytes.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+    return;
+
+  const svalue *capacity = get_capacity (reg_sval->get_pointee ());
+  switch (capacity->get_kind ())
+    {
+    case svalue_kind::SK_CONSTANT:
+      {
+       const constant_svalue *cap_sval = capacity-
>dyn_cast_constant_svalue ();

You can use:
  as_a <const constant_svalue *> (capacity)
here since we're within the SK_CONSTANT case, avoiding an extra vfunc
call.


+       tree cap = cap_sval->get_constant ();
+       unsigned HOST_WIDE_INT size_diff
+         = capacity_compatible_with_type (cap, pointee_size_tree);
+       if (size_diff != 0)
+         {
+           ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval-
>get_pointee (), capacity));

Nit: some overlong lines here; please wrap to avoid going over 80
columns.

+         }
+      }
+      break;
+    default:
+      {
+       if (!const_operand_in_sval_p (pointee_size_tree, capacity,
m_constraints))
+         {
+           ctxt->warn (new dubious_allocation_size (lhs_reg, reg_sval-
>get_pointee (), capacity));

Another overlong line.

+         }
+      }
+      break;
+    }
+}
+
 /* Set the value of the region given by LHS_REG to the value given
    by RHS_SVAL.
    Use CTXT to report any warnings associated with writing to
LHS_REG.  */
@@ -2810,6 +3106,8 @@ region_model::set_value (const region *lhs_reg,
const svalue *rhs_sval,
   gcc_assert (lhs_reg);
   gcc_assert (rhs_sval);
 
+  check_region_size(lhs_reg, rhs_sval, ctxt);
+
   check_region_for_write (lhs_reg, ctxt);
 
   m_store.set_value (m_mgr->get_store_manager(), lhs_reg, rhs_sval,

[...snip...]

diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..cb3df5516e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,63 @@
+#include <stdlib.h>
+
+/* Tests with constant buffer sizes.  */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
+
+struct s {
+  int i;
+};
+
+void test_5 (void)
+{
+  struct s *ptr = malloc (5 * sizeof (struct s));
+  free (ptr);
+}
+
+void test_6 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct s));  /* { dg-line malloc6 }
*/
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
+
+void test_7 (void)
+{
+  char buf[2];
+  int *ptr = (int *)buf; /* { dg-line malloc7 } */
+
+  /* { dg-warning "" "" { target *-*-* } malloc7 } */
+  /* { dg-message "" "" { target *-*-* } malloc7 } */
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..a619a786a4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,44 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes.  */
+
+void test_1 (void)
+{
+  int n;
+  scanf("%i", &n);
+  short *ptr = malloc (n * sizeof(short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc } */
+  /* { dg-message "" "" { target *-*-* } malloc } */
+}
+
+void test_3 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign } */
+  /* { dg-message "" "" { target *-*-* } assign } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "" "" { target *-*-* } malloc1 } */
+  /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc3 } */
+  /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+  int m;
+  scanf("%i", &m);
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..32e14bad6ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,92 @@
+#include <stddef.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Flow warnings */
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_1(void) 
+{
+  // FIXME
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* }
} */
+  free (buf);
+}
+
+void test_2(void) 
+{
+  void *buf = create_buffer(42); /* { dg-message } */
+  int *ibuf = buf; /* { dg-line assign2 } */
+  free (ibuf);
+
+  /* { dg-warning "" "" { target *-*-* } assign2 } */
+  /* { dg-message "" "" { target *-*-* } assign2 } */
+}
+
+void test_3(void)
+{
+  void *buf = malloc(42); /* { dg-message } */
+  if (buf != NULL) /* { dg-message } */
+    {
+      int *ibuf = buf; /* { dg-line assign3 } */
+      free (ibuf);
+    }
+
+  /* { dg-warning "" "" { target *-*-* } assign3 } */
+  /* { dg-message "" "" { target *-*-* } assign3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else if (n == 1)
+    size = 10;
+  else
+    size = 20;
+
+  int *buf = malloc(size); // Size should be 'unknown' at this point
+  free (buf);
+}
+
+void test_5(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 2;
+  else
+    size = 10;
+
+  short *buf = malloc(size); // Size should be widened to 2 and 10,
both fit
+  free (buf);
+}
+
+
+void test_6(void)
+{
+  int n;
+  scanf("%i", &n);
+
+  int size;
+  if (n == 0)
+    size = 1;
+  else
+    size = 10;
+
+  short *buf = malloc(size); /* { dg-line malloc6 } */
+  free (buf);
+  
+
+  /* { dg-warning "" "" { target *-*-* } malloc6 } */
+  /* { dg-message "" "" { target *-*-* } malloc6 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
index bd28107d0d7..8fa6a6eb570 100644
--- a/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
+++ b/gcc/testsuite/gcc.dg/analyzer/attr-malloc-6.c
@@ -1,7 +1,9 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
 /* Adapted from gcc.dg/Wmismatched-dealloc.c.  */
 
 #define A(...) __attribute__ ((malloc (__VA_ARGS__)))
 
+struct FILE;
 typedef struct FILE   FILE;
 typedef __SIZE_TYPE__ size_t;
 

Hopefully this change isn't needed anymore.

diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
index 908bb28ee50..f9a73c79403 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-4.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-Wno-incompatible-pointer-types" } */
+/* { dg-additional-options "-Wno-incompatible-pointer-types -Wno-
analyzer-allocation-size" } */
 
 #include <stdlib.h>

Why is this change needed?  Is this another left-over change from
fixing that stray error?

 
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..6f365c3cb5d 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options -Wno-analyzer-allocation-size } */
+
 void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);


I added this testcase in 42c5ae5d7f0ad89b75d93c497fe44b6c66da7e76 to
fix a crash due to a NULL type.

Rather than add -Wno-analyzer-allocation-size, please fix the size
passed to the calloc call.

Thanks; hope this is constructive
Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] analyzer: add allocation size checker
  2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
                   ` (2 preceding siblings ...)
  2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
@ 2022-06-29 15:39 ` Tim Lange
  2022-06-29 17:39   ` David Malcolm
  2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
  4 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-29 15:39 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, Tim Lange

Hi,

I've addressed most of the points from the review.
* The allocation size warning warns whenever region_model::get_capacity returns
something useful, i.e. also on statically-allocated regions.
* I added a new virtual function to the pending-diagnostic class, so that it
is possible to emit a custom region creation description.
* The test cases should have a better coverage now.
* Conservative struct handling

The warning now looks like this:
/path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
    9 |   int *iptr = ptr;
      |        ^~~~
  ‘main’: events 1-2
    |
    |    8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
    |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
    |      |               |
    |      |               (1) allocated ‘(long unsigned int)n * 2’ bytes here
    |    9 |   int *iptr = ptr;
    |      |        ~~~~    
    |      |        |
    |      |        (2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
    |

/path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   15 |   int *ptrw = malloc (sizeof (short));
      |               ^~~~~~~~~~~~~~~~~~~~~~~
  ‘main’: events 1-2
    |
    |   15 |   int *ptrw = malloc (sizeof (short));
    |      |               ^~~~~~~~~~~~~~~~~~~~~~~
    |      |               |
    |      |               (1) allocated ‘2’ bytes here
    |      |               (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
    |
The only thing I couldn't address was moving the second event toward the lhs or
assign token here. I tracked it down till get_stmt_location where it seems that
the rhs is actually the location of the statement. Is there any way to get (2)
to be focused on the lhs?

Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
false-positive (but none of them contained a bug found by the checker either). 
`make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on
the event splitting, the regression tests are yet to run.

- Tim


This patch adds an checker that warns about code paths in which a buffer is
assigned to a incompatible type, i.e. when the allocated buffer size is not a
multiple of the pointee's size.

gcc/analyzer/ChangeLog:

        * analyzer.opt: Added Wanalyzer-allocation-size.
        * checker-path.cc (region_creation_event::get_desc): Added call to new
        virtual function pending_diagnostic::describe_region_creation_event.
        * checker-path.h: Added region_creation_event::get_desc.
        * diagnostic-manager.cc (diagnostic_manager::add_event_on_final_node):
        New function.
        * diagnostic-manager.h:
        Added diagnostic_manager::add_event_on_final_node.
        * pending-diagnostic.h (struct region_creation): New event_desc struct.
        (pending_diagnostic::describe_region_creation_event): Added virtual
        function to overwrite description of a region creation.
        * region-model.cc (class dubious_allocation_size): New class.
        (capacity_compatible_with_type): New helper function.
        (class size_visitor): New class.
        (struct_or_union_with_inheritance_p): New helper function.
        (is_any_cast_p): New helper function.
        (region_model::check_region_size): New function.
        (region_model::set_value): Added call to
        region_model::check_region_size.
        * region-model.h (class region_model): New function check_region_size.
        * svalue.cc (region_svalue::accept): Changed to post-order traversal.
        (initial_svalue::accept): Likewise.
        (unaryop_svalue::accept): Likewise.
        (binop_svalue::accept): Likewise.
        (sub_svalue::accept): Likewise.
        (repeated_svalue::accept): Likewise.
        (bits_within_svalue::accept): Likewise.
        (widening_svalue::accept): Likewise.
        (unmergeable_svalue::accept): Likewise.
        (compound_svalue::accept): Likewise.
        (conjured_svalue::accept): Likewise.
        (asm_output_svalue::accept): Likewise.
        (const_fn_result_svalue::accept): Likewise.

gcc/ChangeLog:

        * doc/invoke.texi: Added Wanalyzer-allocation-size.

gcc/testsuite/ChangeLog:

        * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
        * gcc.dg/analyzer/allocation-size-1.c: New test.
        * gcc.dg/analyzer/allocation-size-2.c: New test.
        * gcc.dg/analyzer/allocation-size-3.c: New test.
        * gcc.dg/analyzer/allocation-size-4.c: New test.
        * gcc.dg/analyzer/allocation-size-5.c: New test.

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/checker-path.cc                  |  11 +-
 gcc/analyzer/checker-path.h                   |   2 +-
 gcc/analyzer/diagnostic-manager.cc            |  61 ++++
 gcc/analyzer/diagnostic-manager.h             |   4 +
 gcc/analyzer/pending-diagnostic.h             |  20 +
 gcc/analyzer/region-model.cc                  | 344 ++++++++++++++++++
 gcc/analyzer/region-model.h                   |   2 +
 gcc/analyzer/svalue.cc                        |  26 +-
 gcc/doc/invoke.texi                           |  13 +
 .../gcc.dg/analyzer/allocation-size-1.c       | 102 ++++++
 .../gcc.dg/analyzer/allocation-size-2.c       | 125 +++++++
 .../gcc.dg/analyzer/allocation-size-3.c       |  48 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  58 +++
 .../gcc.dg/analyzer/allocation-size-5.c       |  40 ++
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +-
 16 files changed, 846 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..912def2faf2 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider
 Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param
 The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a buffer is assigned to a incompatible type.
+
 Wanalyzer-double-fclose
 Common Var(warn_analyzer_double_fclose) Init(1) Warning
 Warn about code paths in which a stdio FILE can be closed more than once.
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 0133dc94137..953e192cd55 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -302,8 +302,17 @@ region_creation_event::region_creation_event (const region *reg,
    region_creation_event.  */
 
 label_text
-region_creation_event::get_desc (bool) const
+region_creation_event::get_desc (bool can_colorize) const
 {
+  if (m_pending_diagnostic)
+    {
+      label_text custom_desc
+	    = m_pending_diagnostic->describe_region_creation_event
+		(evdesc::region_creation (can_colorize, m_reg));
+      if (custom_desc.m_buffer)
+	return custom_desc;
+    }
+
   switch (m_reg->get_memory_space ())
     {
     default:
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 24decf5ce3d..8e48d8a07ab 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -219,7 +219,7 @@ public:
   region_creation_event (const region *reg,
 			 location_t loc, tree fndecl, int depth);
 
-  label_text get_desc (bool) const final override;
+  label_text get_desc (bool can_colorize) const final override;
 
 private:
   const region *m_reg;
diff --git a/gcc/analyzer/diagnostic-manager.cc b/gcc/analyzer/diagnostic-manager.cc
index 8ea1f61776e..4adfda1af65 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1476,6 +1476,67 @@ diagnostic_manager::build_emission_path (const path_builder &pb,
       const exploded_edge *eedge = epath.m_edges[i];
       add_events_for_eedge (pb, *eedge, emission_path, &interest);
     }
+  add_event_on_final_node (epath.get_final_enode (), emission_path, &interest);
+}
+
+/* Emit a region_creation_event when requested on the last statement in
+   the path.
+
+   If a region_creation_event should be emitted on the last statement of the
+   path, we need to peek to the successors to get whether the final enode
+   created a region.
+*/
+
+void
+diagnostic_manager::add_event_on_final_node (const exploded_node *final_enode,
+					     checker_path *emission_path,
+					     interesting_t *interest) const
+{
+  const program_point &src_point = final_enode->get_point ();
+  const int src_stack_depth = src_point.get_stack_depth ();
+  const program_state &src_state = final_enode->get_state ();
+  const region_model *src_model = src_state.m_region_model;
+
+  unsigned j;
+  exploded_edge *e;
+  FOR_EACH_VEC_ELT (final_enode->m_succs, j, e)
+  {
+    exploded_node *dst = e->m_dest;
+    const program_state &dst_state = dst->get_state ();
+    const region_model *dst_model = dst_state.m_region_model;
+    if (src_model->get_dynamic_extents ()
+	!= dst_model->get_dynamic_extents ())
+      {
+	unsigned i;
+	const region *reg;
+	bool emitted = false;
+	FOR_EACH_VEC_ELT (interest->m_region_creation, i, reg)
+	  {
+	    const region *base_reg = reg->get_base_region ();
+	    const svalue *old_extents
+	= src_model->get_dynamic_extents (base_reg);
+	    const svalue *new_extents
+	= dst_model->get_dynamic_extents (base_reg);
+	    if (old_extents == NULL && new_extents != NULL)
+	      switch (base_reg->get_kind ())
+		{
+		default:
+		  break;
+		case RK_HEAP_ALLOCATED:
+		case RK_ALLOCA:
+		  emission_path->add_region_creation_event
+		    (reg,
+		    src_point.get_location (),
+		    src_point.get_fndecl (),
+		    src_stack_depth);
+		  emitted = true;
+		  break;
+		}
+	  }
+	if (emitted)
+	  break;
+      }
+  }
 }
 
 /* Subclass of state_change_visitor that creates state_change_event
diff --git a/gcc/analyzer/diagnostic-manager.h b/gcc/analyzer/diagnostic-manager.h
index b9bb7c8c254..266eed8f9cb 100644
--- a/gcc/analyzer/diagnostic-manager.h
+++ b/gcc/analyzer/diagnostic-manager.h
@@ -149,6 +149,10 @@ private:
 			    const exploded_path &epath,
 			    checker_path *emission_path) const;
 
+  void add_event_on_final_node (const exploded_node *final_enode,
+				checker_path *emission_path,
+				interesting_t *interest) const;
+
   void add_events_for_eedge (const path_builder &pb,
 			     const exploded_edge &eedge,
 			     checker_path *emission_path,
diff --git a/gcc/analyzer/pending-diagnostic.h b/gcc/analyzer/pending-diagnostic.h
index 9e1c656bf0a..4ea469e1879 100644
--- a/gcc/analyzer/pending-diagnostic.h
+++ b/gcc/analyzer/pending-diagnostic.h
@@ -58,6 +58,17 @@ struct event_desc
   bool m_colorize;
 };
 
+/* For use by pending_diagnostic::describe_region_creation.  */
+
+struct region_creation : public event_desc
+{
+  region_creation (bool colorize, const region *reg)
+  : event_desc (colorize), m_reg (reg)
+  {}
+
+  const region *m_reg;
+};
+
 /* For use by pending_diagnostic::describe_state_change.  */
 
 struct state_change : public event_desc
@@ -215,6 +226,15 @@ class pending_diagnostic
      description; NULL otherwise (falling back on a more generic
      description).  */
 
+  /* Precision-of-wording vfunc for describing a region creation event
+     triggered by the mark_interesting_stuff vfunc.  */
+  virtual label_text
+  describe_region_creation_event (const evdesc::region_creation &)
+  {
+    /* Default no-op implementation.  */
+    return label_text ();
+  }
+
   /* Precision-of-wording vfunc for describing a critical state change
      within the diagnostic_path.
 
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6b49719d521..7805aa26752 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-operands.h"
 #include "ssa-iterators.h"
 #include "calls.h"
+#include "is-a.h"
 
 #if ENABLE_ANALYZER
 
@@ -2799,6 +2800,347 @@ region_model::check_region_for_read (const region *src_reg,
   check_region_access (src_reg, DIR_READ, ctxt);
 }
 
+/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
+
+class dubious_allocation_size
+: public pending_diagnostic_subclass<dubious_allocation_size>
+{
+public:
+  dubious_allocation_size (const region *lhs, const region *rhs)
+  : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE)
+  {}
+
+  dubious_allocation_size (const region *lhs, const region *rhs,
+			   tree expr)
+  : m_lhs (lhs), m_rhs (rhs), m_expr (expr)
+  {}
+
+  const char *get_kind () const final override
+  {
+    return "dubious_allocation_size";
+  }
+
+  bool operator== (const dubious_allocation_size &other) const
+  {
+    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+
+    return warning_meta (rich_loc, m, get_controlling_option (),
+	       "allocated buffer size is not a multiple of the pointee's size");
+  }
+
+  label_text
+  describe_region_creation_event (const evdesc::region_creation &ev) final
+  override
+  {
+    m_allocation_event = &ev;
+    if (m_expr)
+      return ev.formatted_print ("allocated %qE bytes here", m_expr);
+    return ev.formatted_print ("allocated here");
+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev) final
+  override
+  {
+    tree pointee_type = TREE_TYPE (m_lhs->get_type ());
+    if (m_allocation_event)
+      /* Fallback: Typically we should always
+         see an m_allocation_event before.  */
+      return ev.formatted_print ("assigned to %qT here;"
+				 " %<sizeof (%T)%> is %qE",
+				 m_lhs->get_type (), pointee_type,
+				 size_in_bytes (pointee_type));
+    else
+      if (m_expr)
+	return ev.formatted_print ("allocated %qE bytes and assigned to"
+				   " %qT here; %<sizeof (%T)%> is %qE",
+				   m_expr,
+				   m_lhs->get_type (), pointee_type,
+				   size_in_bytes (pointee_type));
+      else
+	return ev.formatted_print ("allocated and assigned to %qT here;"
+				   " %<sizeof (%T)%> is %qE",
+				   m_lhs->get_type (), pointee_type,
+				   size_in_bytes (pointee_type));
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    interest->add_region_creation (m_rhs);
+  }
+
+private:
+  const region *m_lhs;
+  const region *m_rhs;
+  const tree m_expr;
+  const evdesc::region_creation *m_allocation_event;
+};
+
+/* Return true on dubious allocation sizes for constant sizes.  */
+
+static bool
+capacity_compatible_with_type (tree cst, tree pointee_size_tree,
+			       bool is_struct)
+{
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
+  if (pointee_size == 0)
+    return 0;
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  if (is_struct)
+    return alloc_size >= pointee_size;
+  return alloc_size % pointee_size == 0;
+}
+
+/* Checks whether SVAL could be a multiple of SIZE_CST.
+
+   It works by visiting all svalues inside SVAL until it reaches
+   atomic nodes.  From those, it goes back up again and adds each
+   node that might be a multiple of SIZE_CST to the RESULT_SET.  */
+
+class size_visitor : public visitor
+{
+public:
+  size_visitor (tree size_cst, const svalue *sval, constraint_manager *cm)
+  : m_size_cst (size_cst), m_sval (sval), m_cm (cm)
+  {
+    sval->accept (this);
+  }
+
+  bool get_result ()
+  {
+    return result_set.contains (m_sval);
+  }
+
+  void
+  visit_constant_svalue (const constant_svalue *sval) final override
+  {
+    unsigned HOST_WIDE_INT sval_int
+	  = TREE_INT_CST_LOW (sval->get_constant ());
+    unsigned HOST_WIDE_INT size_cst_int = TREE_INT_CST_LOW (m_size_cst);
+    if (size_cst_int == 0 || sval_int % size_cst_int == 0)
+      result_set.add (sval);
+  }
+
+  void
+  visit_unknown_svalue (const unknown_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void
+  visit_poisoned_svalue (const poisoned_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void visit_unaryop_svalue (const unaryop_svalue *sval)
+  {
+    const svalue *arg = sval->get_arg ();
+    if (result_set.contains (arg))
+      result_set.add (sval);
+  }
+
+  void visit_binop_svalue (const binop_svalue *sval) final override
+  {
+    const svalue *arg0 = sval->get_arg0 ();
+    const svalue *arg1 = sval->get_arg1 ();
+
+    if (sval->get_op () == MULT_EXPR)
+      {
+	if (result_set.contains (arg0) || result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+    else
+      {
+	if (result_set.contains (arg0) && result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+  }
+
+  void visit_repeated_svalue (const repeated_svalue *sval)
+  {
+    sval->get_inner_svalue ()->accept (this);
+    if (result_set.contains (sval->get_inner_svalue ()))
+      result_set.add (sval);
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue *sval) final override
+  {
+    sval->get_arg ()->accept (this);
+    if (result_set.contains (sval->get_arg ()))
+      result_set.add (sval);
+  }
+
+  void visit_widening_svalue (const widening_svalue *sval) final override
+  {
+    const svalue *base = sval->get_base_svalue ();
+    const svalue *iter = sval->get_iter_svalue ();
+
+    if (result_set.contains (base) && result_set.contains (iter))
+      result_set.add (sval);
+  }
+
+  void visit_conjured_svalue (const conjured_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    if (m_cm->get_equiv_class_by_svalue (sval, NULL))
+      result_set.add (sval);
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue
+				      *sval ATTRIBUTE_UNUSED) final override
+  {
+    result_set.add (sval);
+  }
+
+private:
+  tree m_size_cst;
+  const svalue *m_sval;
+  constraint_manager *m_cm;
+  svalue_set result_set; /* Used as a mapping of svalue*->bool.  */
+};
+
+/* Return true if a struct or union either uses the inheritance pattern,
+   where the first field is a base struct, or the flexible array member
+   pattern, where the last field is an array without a specified size.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree struc)
+{
+  tree iter = TYPE_FIELDS (struc);
+  if (iter == NULL_TREE)
+	  return false;
+  if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (iter)))
+	  return true;
+
+  tree last_field;
+  while (iter != NULL_TREE)
+    {
+      last_field = iter;
+      iter = DECL_CHAIN (iter);
+    }
+
+  if (last_field != NULL_TREE
+      && TREE_CODE (TREE_TYPE (last_field)) == ARRAY_TYPE)
+	  return true;
+
+  return false;
+}
+
+/* Return true if the lhs and rhs of an assignment have different types.  */
+
+static bool
+is_any_cast_p (const gimple *stmt)
+{
+  if (const gassign *assign = dyn_cast<const gassign *>(stmt))
+    return gimple_assign_cast_p (assign)
+	  || (gimple_num_ops (assign) == 2
+	      && !pending_diagnostic::same_tree_p (
+				    TREE_TYPE (gimple_assign_lhs (assign)),
+				    TREE_TYPE (gimple_assign_rhs1 (assign))));
+  else if (const gcall *call = dyn_cast<const gcall *>(stmt))
+    {
+      tree lhs = gimple_call_lhs (call);
+      return lhs != NULL_TREE && !pending_diagnostic::same_tree_p (
+				    TREE_TYPE (gimple_call_lhs (call)),
+				    gimple_call_return_type (call));
+    }
+
+  return false;
+}
+
+/* On pointer assignments, check whether the buffer size of
+   RHS_SVAL is compatible with the type of the LHS_REG.
+   Use a non-null CTXT to report allocation size warnings.  */
+
+void
+region_model::check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+				 region_model_context *ctxt) const
+{
+  if (!ctxt || ctxt->get_stmt () == NULL)
+    return;
+  /* Only report warnings on assignments that actually change the type.  */
+  if (!is_any_cast_p (ctxt->get_stmt ()))
+    return;
+
+  const region_svalue *reg_sval = dyn_cast <const region_svalue *> (rhs_sval);
+  if (!reg_sval)
+    return;
+
+  tree pointer_type = lhs_reg->get_type ();
+  if (pointer_type == NULL_TREE || !POINTER_TYPE_P (pointer_type))
+    return;
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* Make sure that the type on the left-hand size actually has a size.  */
+  if (pointee_type == NULL_TREE || VOID_TYPE_P (pointee_type)
+      || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
+    return;
+
+  /* Bail out early on pointers to structs where we can
+     not deduce whether the buffer size is compatible.  */
+  bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);
+  if (is_struct && struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes (pointee_type);
+  /* We give up if the type size is not known at compile-time or the
+     type size is always compatible regardless of the buffer size.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || TREE_INT_CST_LOW (pointee_size_tree) == 1)
+    return;
+
+  const region *rhs_reg = reg_sval->get_pointee ();
+  const svalue *capacity = get_capacity (rhs_reg);
+  switch (capacity->get_kind ())
+    {
+    case svalue_kind::SK_CONSTANT:
+      {
+	const constant_svalue *cst_cap_sval
+		= as_a <const constant_svalue *> (capacity);
+	tree cst_cap = cst_cap_sval->get_constant ();
+	if (!capacity_compatible_with_type (cst_cap, pointee_size_tree,
+					    is_struct))
+	  ctxt->warn (new dubious_allocation_size (lhs_reg, rhs_reg,
+						   cst_cap));
+      }
+      break;
+    default:
+      {
+	if (!is_struct)
+	  {
+	    size_visitor v (pointee_size_tree, capacity, m_constraints);
+	    if (!v.get_result ())
+	      {
+		tree expr = get_representative_tree (capacity);
+		ctxt->warn (new dubious_allocation_size (lhs_reg, rhs_reg,
+			    expr));
+	      }
+	  }
+      break;
+      }
+    }
+}
+
 /* Set the value of the region given by LHS_REG to the value given
    by RHS_SVAL.
    Use CTXT to report any warnings associated with writing to LHS_REG.  */
@@ -2810,6 +3152,8 @@ region_model::set_value (const region *lhs_reg, const svalue *rhs_sval,
   gcc_assert (lhs_reg);
   gcc_assert (rhs_sval);
 
+  check_region_size (lhs_reg, rhs_sval, ctxt);
+
   check_region_for_write (lhs_reg, ctxt);
 
   m_store.set_value (m_mgr->get_store_manager(), lhs_reg, rhs_sval,
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 1bfa56a8cd2..91b7b370b81 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -857,6 +857,8 @@ class region_model
 			       region_model_context *ctxt) const;
   void check_region_for_read (const region *src_reg,
 			      region_model_context *ctxt) const;
+  void check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+			  region_model_context *ctxt) const;
 
   void check_call_args (const call_details &cd) const;
   void check_external_function_for_access_attr (const gcall *call,
diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 2f9149412b9..7bad3cea31b 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -732,8 +732,8 @@ region_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 region_svalue::accept (visitor *v) const
 {
-  v->visit_region_svalue (this);
   m_reg->accept (v);
+  v->visit_region_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for region_svalue.  */
@@ -1031,8 +1031,8 @@ initial_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 initial_svalue::accept (visitor *v) const
 {
-  v->visit_initial_svalue (this);
   m_reg->accept (v);
+  v->visit_initial_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for initial_svalue.  */
@@ -1123,8 +1123,8 @@ unaryop_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 unaryop_svalue::accept (visitor *v) const
 {
-  v->visit_unaryop_svalue (this);
   m_arg->accept (v);
+  v->visit_unaryop_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for unaryop_svalue.  */
@@ -1225,9 +1225,9 @@ binop_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 binop_svalue::accept (visitor *v) const
 {
-  v->visit_binop_svalue (this);
   m_arg0->accept (v);
   m_arg1->accept (v);
+  v->visit_binop_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for binop_svalue.  */
@@ -1283,9 +1283,9 @@ sub_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 sub_svalue::accept (visitor *v) const
 {
-  v->visit_sub_svalue (this);
   m_parent_svalue->accept (v);
   m_subregion->accept (v);
+  v->visit_sub_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for sub_svalue.  */
@@ -1352,8 +1352,8 @@ repeated_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 repeated_svalue::accept (visitor *v) const
 {
-  v->visit_repeated_svalue (this);
   m_inner_svalue->accept (v);
+  v->visit_repeated_svalue (this);
 }
 
 /* Implementation of svalue::all_zeroes_p for repeated_svalue.  */
@@ -1494,8 +1494,8 @@ bits_within_svalue::maybe_fold_bits_within (tree type,
 void
 bits_within_svalue::accept (visitor *v) const
 {
-  v->visit_bits_within_svalue (this);
   m_inner_svalue->accept (v);
+  v->visit_bits_within_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for bits_within_svalue.  */
@@ -1544,9 +1544,9 @@ widening_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 widening_svalue::accept (visitor *v) const
 {
-  v->visit_widening_svalue (this);
   m_base_sval->accept (v);
   m_iter_sval->accept (v);
+  v->visit_widening_svalue (this);
 }
 
 /* Attempt to determine in which direction this value is changing
@@ -1711,8 +1711,8 @@ unmergeable_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 unmergeable_svalue::accept (visitor *v) const
 {
-  v->visit_unmergeable_svalue (this);
   m_arg->accept (v);
+  v->visit_unmergeable_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for unmergeable_svalue.  */
@@ -1776,13 +1776,13 @@ compound_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 compound_svalue::accept (visitor *v) const
 {
-  v->visit_compound_svalue (this);
   for (binding_map::iterator_t iter = m_map.begin ();
        iter != m_map.end (); ++iter)
     {
       //(*iter).first.accept (v);
       (*iter).second->accept (v);
     }
+  v->visit_compound_svalue (this);
 }
 
 /* Calculate what the complexity of a compound_svalue instance for MAP
@@ -1903,8 +1903,8 @@ conjured_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 conjured_svalue::accept (visitor *v) const
 {
-  v->visit_conjured_svalue (this);
   m_id_reg->accept (v);
+  v->visit_conjured_svalue (this);
 }
 
 /* class asm_output_svalue : public svalue.  */
@@ -1968,9 +1968,9 @@ asm_output_svalue::input_idx_to_asm_idx (unsigned input_idx) const
 void
 asm_output_svalue::accept (visitor *v) const
 {
-  v->visit_asm_output_svalue (this);
   for (unsigned i = 0; i < m_num_inputs; i++)
     m_input_arr[i]->accept (v);
+  v->visit_asm_output_svalue (this);
 }
 
 /* class const_fn_result_svalue : public svalue.  */
@@ -2021,9 +2021,9 @@ const_fn_result_svalue::dump_input (pretty_printer *pp,
 void
 const_fn_result_svalue::accept (visitor *v) const
 {
-  v->visit_const_fn_result_svalue (this);
   for (unsigned i = 0; i < m_num_inputs; i++)
     m_input_arr[i]->accept (v);
+  v->visit_const_fn_result_svalue (this);
 }
 
 } // namespace ana
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 60b7b5a26bb..cf77e9b3a43 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9711,6 +9711,7 @@ This analysis is much more expensive than other GCC warnings.
 Enabling this option effectively enables the following warnings:
 
 @gccoptlist{ @gol
+-Wanalyzer-allocation-size @gol
 -Wanalyzer-double-fclose @gol
 -Wanalyzer-double-free @gol
 -Wanalyzer-exposure-through-output-file @gol
@@ -9758,6 +9759,18 @@ By default, the analysis silently stops if the code is too
 complicated for the analyzer to fully explore and it reaches an internal
 limit.  The @option{-Wanalyzer-too-complex} option warns if this occurs.
 
+@item -Wno-analyzer-allocation-size
+@opindex Wanalyzer-allocation-size
+@opindex Wno-analyzer-allocation-size
+This warning requires @option{-fanalyzer}, which enables it; use
+@option{-Wno-analyzer-allocation-size}
+to disable it.
+
+This diagnostic warns for paths through the code in which a buffer is casted
+to a type where the buffer size is not a multiple of the pointee size.
+
+See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}.
+
 @item -Wno-analyzer-double-fclose
 @opindex Wanalyzer-double-fclose
 @opindex Wno-analyzer-double-fclose
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..02634ae883b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,102 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with constant buffer sizes.  */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof (short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign4 } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign4 } */
+  /* { dg-message "" "" { target *-*-* } assign4 } */
+}
+
+void test_5 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 21 * sizeof (short);
+  else
+    n = 42 * sizeof (short);
+  void *ptr = malloc (n);
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_6 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 21 * sizeof (short);
+  else
+    n = 42 * sizeof (short);
+  void *ptr = malloc (n); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign6 } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign6 } */
+  /* { dg-message "" "" { target *-*-* } assign6 } */
+}
+
+void test_7 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 1;
+  else if (user_input == 2)
+    n = 5;
+  else
+    n = 7;
+  /* n is an unknown_svalue at this point.  */
+  void *ptr = malloc (n);
+  int *iptr = (int *)ptr;
+  free (iptr);
+}
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_8(void) 
+{
+  int *buf = create_buffer(4 * sizeof (int));
+  free (buf);
+}
+
+void test_9(void) 
+{
+  // FIXME
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..cb35a9d717b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,125 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes.  */
+
+void test_1 (void)
+{  
+  int n;
+  scanf("%i", &n);
+  short *ptr = malloc (n * sizeof (short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign4 } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign4 } */
+  /* { dg-message "" "" { target *-*-* } assign4 } */
+}
+
+void test_5 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 3 * user_input * sizeof (short);
+  else
+    n = 5 * user_input * sizeof (short);
+  void *ptr = malloc (n);
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_6 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = user_input;
+  else if (user_input == 2)
+    n = user_input * 3;
+  else
+    n = user_input * 5;
+  /* n is an unknown_svalue at this point.  */
+  void *ptr = malloc (n);
+  int *iptr = (int *)ptr;
+  free (iptr);
+}
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_7(void) 
+{
+  int n;
+  scanf("%i", &n);
+  int *buf = create_buffer(n * sizeof (int));
+  free (buf);
+}
+
+void test_8(void) 
+{
+  int n;
+  scanf("%i", &n);
+  // FIXME
+  int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
+
+void test_9 (void)
+{
+  int n;
+  scanf("%i", &n);
+  /* n is a conjured_svalue.  */
+  void *ptr = malloc (n); /* { dg-message } */
+  int *iptr = (int *)ptr; /* { dg-line assign9 } */
+  free (iptr);
+
+  /* { dg-warning "" "" { target *-*-* } assign9 } */
+  /* { dg-message "" "" { target *-*-* } assign9 } */
+}
+
+void test_11 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n);
+  if (n == 4)
+    {
+      /* n is a conjured_svalue but guarded.  */
+      int *iptr = (int *)ptr;
+      free (iptr);
+    }
+  else
+    free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..dafc0e73c63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,48 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1(void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "" "" { target *-*-* } malloc1 } */
+  /* { dg-message "" "" { target *-*-* } malloc1 } */
+}
+
+void test_2(void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3(void)
+{
+  int n;
+  scanf("%i", &n);
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc3 } */
+  /* { dg-message "" "" { target *-*-* } malloc3 } */
+}
+
+void test_4(void)
+{
+  int n;
+  scanf("%i", &n);
+  int m;
+  scanf("%i", &m);
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..7f992eb8c3e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,58 @@
+#include <stdlib.h>
+
+/* Tests related to structs.  */
+
+struct base {
+  int i;
+};
+
+struct sub {
+  struct base b;
+  int j;
+};
+
+struct var_len {
+  int i;
+  char arr[];
+};
+
+
+void test_1 (void)
+{
+  struct base *ptr = malloc (5 * sizeof (struct base));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct base));  /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc2 } */
+  /* { dg-message "" "" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  /* Even though 10 bytes is not a multiple of 4, we do not warn to prevent
+     a false positive in case s is the base struct of a struct inheritance.  */
+  struct base *ptr = malloc (10);
+  free (ptr);
+}
+
+void test_4 (void)
+{
+  struct var_len *ptr = malloc (10);
+  free (ptr);
+}
+
+void test_5 (void)
+{
+  /* For constant sizes, we warn if the buffer
+     is too small to hold a single struct.  */
+  struct base *ptr = malloc (2);  /* { dg-line malloc5 } */
+  free (ptr);
+
+  /* { dg-warning "" "" { target *-*-* } malloc5 } */
+  /* { dg-message "" "" { target *-*-* } malloc5 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
new file mode 100644
index 00000000000..afb1782e0cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
@@ -0,0 +1,40 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests related to structs.  */
+
+typedef struct a {
+  short s;
+} a;
+
+int *test_1 (void)
+{
+  a A;
+  A.s = 1;
+  int *ptr = (int *) &A; /* { dg-line assign1 } */
+  return ptr;
+
+  /* { dg-warning "" "" { target *-*-* } assign1 } */
+  /* { dg-message "" "" { target *-*-* } assign1 } */
+}
+
+int *test2 (void)
+{
+  char arr[4];
+  int *ptr = (int *)arr;
+  return ptr;
+}
+
+int *test3 (void)
+{
+  char arr[2];
+  int *ptr = (int *)arr; /* { dg-line assign3 } */
+  return ptr;
+
+  /* { dg-warning "" "" { target *-*-* } assign3 } */
+  /* { dg-message "" "" { target *-*-* } assign3 } */
+}
+
+int main() {
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..aedf0464dc9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -3,7 +3,7 @@ void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
 int
 x7 (void)
 {
-  int **md = calloc (1, 1);
+  int **md = calloc (1, sizeof (void *));
 
   return md[0][0]; /* { dg-warning "possibly-NULL" "unchecked deref" } */
   /* { dg-warning "leak of 'md'" "leak" { target *-*-* } .-1 } */
-- 
2.36.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] analyzer: add allocation size checker
  2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
@ 2022-06-29 17:39   ` David Malcolm
  2022-06-30 20:40     ` Tim Lange
  0 siblings, 1 reply; 17+ messages in thread
From: David Malcolm @ 2022-06-29 17:39 UTC (permalink / raw)
  To: Tim Lange; +Cc: gcc

On Wed, 2022-06-29 at 17:39 +0200, Tim Lange wrote:

> Hi,

Thanks for the updated patch.

Overall, looks nearly ready; various nits inline below, throughout...

> 
> I've addressed most of the points from the review.
> * The allocation size warning warns whenever region_model::get_capacity returns
> something useful, i.e. also on statically-allocated regions.

Thanks.  Looks like you added test coverage for this in allocation-
size-5.c

> * I added a new virtual function to the pending-diagnostic class, so
that it
> is possible to emit a custom region creation description.
> * The test cases should have a better coverage now.
> * Conservative struct handling
> 
> The warning now looks like this:
> /path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>     9 |   int *iptr = ptr;
>       |        ^~~~
>   ‘main’: events 1-2
>     |
>     |    8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
>     |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |               |
>     |      |               (1) allocated ‘(long unsigned int)n * 2’ bytes here
>     |    9 |   int *iptr = ptr;
>     |      |        ~~~~    
>     |      |        |
>     |      |        (2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
>     |

Looks great.

> 
> /path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
>    15 |   int *ptrw = malloc (sizeof (short));
>       |               ^~~~~~~~~~~~~~~~~~~~~~~
>   ‘main’: events 1-2
>     |
>     |   15 |   int *ptrw = malloc (sizeof (short));
>     |      |               ^~~~~~~~~~~~~~~~~~~~~~~
>     |      |               |
>     |      |               (1) allocated ‘2’ bytes here

Looks a bit weird to be quoting a number here; maybe whenever the
expression is just a constant, print it unquoted?  (though that could
be fiddly to implement, so can be ignored if it turns out to be) .


>     |      |               (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
>     |
> The only thing I couldn't address was moving the second event toward the lhs or
> assign token here. I tracked it down till get_stmt_location where it seems that
> the rhs is actually the location of the statement. Is there any way to get (2)
> to be focused on the lhs?

Annoyingly, we've lost a lot of location information by the time the
analyzer runs.

In theory we could special-case for when we have the def-stmt of the
SSA_NAME that's that default (i.e. initial) value of a VAR_DECL, and if
we see the write is there, we could use the DECL_SOUCE_LOCATION of the
VAR_DECL for the write, so that we'd get:

    |   15 |   int *ptrw = malloc (sizeof (short));
    |      |        ^~~~   ^~~~~~~~~~~~~~~~~~~~~~~
    |      |        |      |
    |      |        |      (1) allocated ‘2’ bytes here
    |      |        (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
    |

which is perhaps slightly more readable.  I'm not sure it's worth it
though.

> 
> Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
> false-positive (but none of them contained a bug found by the checker either).

Great.

> `make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on
> the event splitting, the regression tests are yet to run.
> 
> - Tim
> 
> 
> This patch adds an checker that warns about code paths in which a buffer is
> assigned to a incompatible type, i.e. when the allocated buffer size is not a
> multiple of the pointee's size.
> 
> gcc/analyzer/ChangeLog:

You should add a reference to the RFE bug to the top of the ChangeLog entries:
          PR analyzer/105900

Please also add it to the commit message, in the form " [PR105900]";
see the examples section twoards the end of
https://gcc.gnu.org/contribute.html#patches


> 
>         * analyzer.opt: Added Wanalyzer-allocation-size.

[...snip...]

> 
> gcc/ChangeLog:

...and here

> 
>         * doc/invoke.texi: Added Wanalyzer-allocation-size.
> 
> gcc/testsuite/ChangeLog:

...and here

> 
>         * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
>         * gcc.dg/analyzer/allocation-size-1.c: New test.
>         * gcc.dg/analyzer/allocation-size-2.c: New test.
>         * gcc.dg/analyzer/allocation-size-3.c: New test.
>         * gcc.dg/analyzer/allocation-size-4.c: New test.
>         * gcc.dg/analyzer/allocation-size-5.c: New test.
> 
> Signed-off-by: Tim Lange <mail@tim-lange.me>

[...snip...]

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 4aea52d3a87..912def2faf2 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider
>  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param
>  The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format.
>  
> +Wanalyzer-allocation-size
> +Common Var(warn_analyzer_allocation_size) Init(1) Warning
> +Warn about code paths in which a buffer is assigned to a incompatible type.

Reword "buffer" to "pointer to a buffer", I think.

"a incompatible" -> "an incompatible"

[...snip...]

> +/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
> +
> +class dubious_allocation_size
> +: public pending_diagnostic_subclass<dubious_allocation_size>
> +{
> +public:
> +  dubious_allocation_size (const region *lhs, const region *rhs)
> +  : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE)
> +  {}

[...snip...]

> +  bool operator== (const dubious_allocation_size &other) const
> +  {
> +    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;

Probably should also check that:
  same_tree_p (m_expr, other.m_expr);

[...snip...]

> +/* Return true on dubious allocation sizes for constant sizes.  */
> +
> +static bool
> +capacity_compatible_with_type (tree cst, tree pointee_size_tree,
> +			       bool is_struct)
> +{
> +  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
> +  if (pointee_size == 0)
> +    return 0;

"false" rather than 0, given that this is bool.

[...snip...]

> +/* Return true if the lhs and rhs of an assignment have different types.  */
> +
> +static bool
> +is_any_cast_p (const gimple *stmt)
> +{
> +  if (const gassign *assign = dyn_cast<const gassign *>(stmt))
> +    return gimple_assign_cast_p (assign)
> +	  || (gimple_num_ops (assign) == 2
> +	      && !pending_diagnostic::same_tree_p (
> +				    TREE_TYPE (gimple_assign_lhs (assign)),
> +				    TREE_TYPE (gimple_assign_rhs1 (assign))));

The "== 2" subclause in the above condition doesn't look quite right to
me; what statements did you encounter that needed this?

[...snip...]

> @@ -9758,6 +9759,18 @@ By default, the analysis silently stops if the code is too
>  complicated for the analyzer to fully explore and it reaches an internal
>  limit.  The @option{-Wanalyzer-too-complex} option warns if this occurs.
>  
> +@item -Wno-analyzer-allocation-size
> +@opindex Wanalyzer-allocation-size
> +@opindex Wno-analyzer-allocation-size
> +This warning requires @option{-fanalyzer}, which enables it; use
> +@option{-Wno-analyzer-allocation-size}
> +to disable it.
> +
> +This diagnostic warns for paths through the code in which a buffer is casted
> +to a type where the buffer size is not a multiple of the pointee size.

At the risk of bikeshedding, how about:

This diagnostic warns for paths through the code in which a pointer
is assigned to point at a buffer with a size that is not a multiple
of sizeof(*pointer).

See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}.


[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> new file mode 100644
> index 00000000000..02634ae883b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> @@ -0,0 +1,102 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests with constant buffer sizes.  */
> +
> +void test_1 (void)
> +{
> +  short *ptr = malloc (21 * sizeof (short));
> +  free (ptr);
> +}
> +
> +void test_2 (void)
> +{
> +  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
> +  free (ptr);
> +
> +  /* { dg-warning "" "" { target *-*-* } malloc2 } */
> +  /* { dg-message "" "" { target *-*-* } malloc2 } */

The various dg-warning and dg-message directives here (and throughout
the rest of the patch) shouldn't have just "" "" for their first two
args.

The first arg should be a regexp that matches some (nonempty) subset of
the expected text.  There's a balance to be struck between:
(a) terseness to avoid "gold plating" the test output (where making any
change to the wording would involve lots of tedious updates to test
directives)
versus
(b) giving us test coverage that the message is sane, so that if we
accidentally break the wording due to future changes to the analyzer,
then at least one test starts failing

Probably best for most of these regexps to be terse, but an empty
regexp is too terse.

The 2nd arg helps us disambiguate with directive we're talking about,
so can be "warning" and "note" for the two above.

[...snip...]

> +void test_5 (void)
> +{
> +  int user_input;
> +  scanf("%i", &user_input);
> +  int n;
> +  if (user_input == 0)
> +    n = 21 * sizeof (short);
> +  else
> +    n = 42 * sizeof (short);

I see you've used scanf, presumably to get a symbolic value for the
variable.  If so, a simpler way to do this is to simply use a parameter
to the test function.  But there's no need to change these test cases.

Perhaps scanf should taint its arguments, which is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021 but obviously that
would be a different patch.

[...snip...]

> +void test_9(void) 
> +{
> +  // FIXME

Please make this comment more descriptive about what the issue here is.

> +  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-*
} } */
> +  free (buf);
> +}
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> new file mode 100644
> index 00000000000..cb35a9d717b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c

[...snip...]

> +void test_8(void) 
> +{
> +  int n;
> +  scanf("%i", &n);
> +  // FIXME

Make the comment more descriptive please.

> +  int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" ""
{ xfail *-*-* } } */
> +  free (buf);
> +}
> +
> +void test_9 (void)
> +{
> +  int n;
> +  scanf("%i", &n);
> +  /* n is a conjured_svalue.  */
> +  void *ptr = malloc (n); /* { dg-message } */
> +  int *iptr = (int *)ptr; /* { dg-line assign9 } */
> +  free (iptr);
> +
> +  /* { dg-warning "" "" { target *-*-* } assign9 } */
> +  /* { dg-message "" "" { target *-*-* } assign9 } */
> +}
> +
> +void test_11 (void)
> +{
> +  int n;
> +  scanf("%i", &n);
> +  void *ptr = malloc (n);
> +  if (n == 4)

Presumably this should be a test against sizeof (int), rather than 4?

Please add a testcase where the comparison is against the wrong
constant.

> +    {
> +      /* n is a conjured_svalue but guarded.  */
> +      int *iptr = (int *)ptr;
> +      free (iptr);
> +    }
> +  else
> +    free (ptr);
> +}

[...snip...]

> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> new file mode 100644
> index 00000000000..afb1782e0cd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> @@ -0,0 +1,40 @@
> +#include <stdlib.h>
> +#include <stdio.h>
> +
> +/* Tests related to structs.  */

Looks like this was copied-and-pasted, and should be updated to "Tests
of statically-sized buffers" or somesuch.

> +
> +typedef struct a {
> +  short s;
> +} a;
> +
> +int *test_1 (void)
> +{
> +  a A;
> +  A.s = 1;
> +  int *ptr = (int *) &A; /* { dg-line assign1 } */
> +  return ptr;
> +
> +  /* { dg-warning "" "" { target *-*-* } assign1 } */
> +  /* { dg-message "" "" { target *-*-* } assign1 } */
> +}
> +
> +int *test2 (void)
> +{
> +  char arr[4];

I think this needs to be sizeof(int), rather than 4.

> +  int *ptr = (int *)arr;
> +  return ptr;
> +}
> +
> +int *test3 (void)
> +{
> +  char arr[2];
> +  int *ptr = (int *)arr; /* { dg-line assign3 } */
> +  return ptr;
> +
> +  /* { dg-warning "" "" { target *-*-* } assign3 } */
> +  /* { dg-message "" "" { target *-*-* } assign3 } */
> +}
> +
> +int main() {
> +  return 0;
> +}

[...snip...]

Thanks again for the v2 patch; hope the above makes sense
Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] analyzer: add allocation size checker
  2022-06-29 17:39   ` David Malcolm
@ 2022-06-30 20:40     ` Tim Lange
  0 siblings, 0 replies; 17+ messages in thread
From: Tim Lange @ 2022-06-30 20:40 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc

On Wed Jun 29, 2022 at 7:39 PM CEST, David Malcolm wrote:
> On Wed, 2022-06-29 at 17:39 +0200, Tim Lange wrote:
>
> > Hi,
>
> Thanks for the updated patch.
>
> Overall, looks nearly ready; various nits inline below, throughout...
>
> > 
> > I've addressed most of the points from the review.
> > * The allocation size warning warns whenever region_model::get_capacity returns
> > something useful, i.e. also on statically-allocated regions.
>
> Thanks.  Looks like you added test coverage for this in allocation-
> size-5.c
>
> > * I added a new virtual function to the pending-diagnostic class, so
> that it
> > is possible to emit a custom region creation description.
> > * The test cases should have a better coverage now.
> > * Conservative struct handling
> > 
> > The warning now looks like this:
> > /path/to/main.c:9:8: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >     9 |   int *iptr = ptr;
> >       |        ^~~~
> >   ‘main’: events 1-2
> >     |
> >     |    8 |   void *ptr = malloc((long unsigned int)n * sizeof(short));
> >     |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |               |
> >     |      |               (1) allocated ‘(long unsigned int)n * 2’ bytes here
> >     |    9 |   int *iptr = ptr;
> >     |      |        ~~~~    
> >     |      |        |
> >     |      |        (2) assigned to ‘int *’ here; ‘sizeof(int)’ is ‘4’
> >     |
>
> Looks great.
>
> > 
> > /path/to/main.c:15:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> >    15 |   int *ptrw = malloc (sizeof (short));
> >       |               ^~~~~~~~~~~~~~~~~~~~~~~
> >   ‘main’: events 1-2
> >     |
> >     |   15 |   int *ptrw = malloc (sizeof (short));
> >     |      |               ^~~~~~~~~~~~~~~~~~~~~~~
> >     |      |               |
> >     |      |               (1) allocated ‘2’ bytes here
>
> Looks a bit weird to be quoting a number here; maybe whenever the
> expression is just a constant, print it unquoted?  (though that could
> be fiddly to implement, so can be ignored if it turns out to be) .

Isn't the 'q' in '%qE' responsible for quoting. Using '%E' instead if
m_expr is an INTEGER_CST works.

Otherwise, I've left the quoting on the "'sizeof (int)' is '4'" note.
I do think thata looks better than without.

/path/to/main.c:16:15: warning: allocated buffer size is not a multiple of the pointee's size [CWE-131] [-Wanalyzer-allocation-size]
   16 |   int *ptrw = malloc (21 * sizeof (short));
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ‘main’: events 1-2
    |
    |   16 |   int *ptrw = malloc (21 * sizeof (short));
    |      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |               |
    |      |               (1) allocated 42 bytes here
    |      |               (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
    |

>
>
> >     |      |               (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
> >     |
> > The only thing I couldn't address was moving the second event toward the lhs or
> > assign token here. I tracked it down till get_stmt_location where it seems that
> > the rhs is actually the location of the statement. Is there any way to get (2)
> > to be focused on the lhs?
>
> Annoyingly, we've lost a lot of location information by the time the
> analyzer runs.
>
> In theory we could special-case for when we have the def-stmt of the
> SSA_NAME that's that default (i.e. initial) value of a VAR_DECL, and if
> we see the write is there, we could use the DECL_SOUCE_LOCATION of the
> VAR_DECL for the write, so that we'd get:
>
>     |   15 |   int *ptrw = malloc (sizeof (short));
>     |      |        ^~~~   ^~~~~~~~~~~~~~~~~~~~~~~
>     |      |        |      |
>     |      |        |      (1) allocated ‘2’ bytes here
>     |      |        (2) assigned to ‘int *’ here; ‘sizeof (int)’ is ‘4’
>     |
>
> which is perhaps slightly more readable.  I'm not sure it's worth it
> though.

Hm, okay. I've left that out for now.

>
> > 
> > Otherwise, the patch compiled coreutils, openssh, curl and httpd without any
> > false-positive (but none of them contained a bug found by the checker either).
>
> Great.
>
> > `make check-gcc RUNTESTFLAGS="analyzer.exp"` tests pass and as I just worked on
> > the event splitting, the regression tests are yet to run.
> > 
> > - Tim
> > 
> > 
> > This patch adds an checker that warns about code paths in which a buffer is
> > assigned to a incompatible type, i.e. when the allocated buffer size is not a
> > multiple of the pointee's size.
> > 
> > gcc/analyzer/ChangeLog:
>
> You should add a reference to the RFE bug to the top of the ChangeLog entries:
>           PR analyzer/105900
>
> Please also add it to the commit message, in the form " [PR105900]";
> see the examples section twoards the end of
> https://gcc.gnu.org/contribute.html#patches
>
>
> > 
> >         * analyzer.opt: Added Wanalyzer-allocation-size.
>
> [...snip...]
>
> > 
> > gcc/ChangeLog:
>
> ...and here
>
> > 
> >         * doc/invoke.texi: Added Wanalyzer-allocation-size.
> > 
> > gcc/testsuite/ChangeLog:
>
> ...and here
>
> > 
> >         * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
> >         * gcc.dg/analyzer/allocation-size-1.c: New test.
> >         * gcc.dg/analyzer/allocation-size-2.c: New test.
> >         * gcc.dg/analyzer/allocation-size-3.c: New test.
> >         * gcc.dg/analyzer/allocation-size-4.c: New test.
> >         * gcc.dg/analyzer/allocation-size-5.c: New test.
> > 
> > Signed-off-by: Tim Lange <mail@tim-lange.me>
>
> [...snip...]
>
> > diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> > index 4aea52d3a87..912def2faf2 100644
> > --- a/gcc/analyzer/analyzer.opt
> > +++ b/gcc/analyzer/analyzer.opt
> > @@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider
> >  Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param
> >  The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format.
> >  
> > +Wanalyzer-allocation-size
> > +Common Var(warn_analyzer_allocation_size) Init(1) Warning
> > +Warn about code paths in which a buffer is assigned to a incompatible type.
>
> Reword "buffer" to "pointer to a buffer", I think.
>
> "a incompatible" -> "an incompatible"
>
> [...snip...]
>
> > +/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
> > +
> > +class dubious_allocation_size
> > +: public pending_diagnostic_subclass<dubious_allocation_size>
> > +{
> > +public:
> > +  dubious_allocation_size (const region *lhs, const region *rhs)
> > +  : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE)
> > +  {}
>
> [...snip...]
>
> > +  bool operator== (const dubious_allocation_size &other) const
> > +  {
> > +    return m_lhs == other.m_lhs && m_rhs == other.m_rhs;
>
> Probably should also check that:
>   same_tree_p (m_expr, other.m_expr);
>
> [...snip...]
>
> > +/* Return true on dubious allocation sizes for constant sizes.  */
> > +
> > +static bool
> > +capacity_compatible_with_type (tree cst, tree pointee_size_tree,
> > +			       bool is_struct)
> > +{
> > +  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
> > +  if (pointee_size == 0)
> > +    return 0;
>
> "false" rather than 0, given that this is bool.
>
> [...snip...]
>
> > +/* Return true if the lhs and rhs of an assignment have different types.  */
> > +
> > +static bool
> > +is_any_cast_p (const gimple *stmt)
> > +{
> > +  if (const gassign *assign = dyn_cast<const gassign *>(stmt))
> > +    return gimple_assign_cast_p (assign)
> > +	  || (gimple_num_ops (assign) == 2
> > +	      && !pending_diagnostic::same_tree_p (
> > +				    TREE_TYPE (gimple_assign_lhs (assign)),
> > +				    TREE_TYPE (gimple_assign_rhs1 (assign))));
>
> The "== 2" subclause in the above condition doesn't look quite right to
> me; what statements did you encounter that needed this?

I wanted to ensure that I do actually have a pointer on the rhs. In the
patch this is already ensured by only handling region_svals. Thus, I
removed the check.

>
> [...snip...]
>
> > @@ -9758,6 +9759,18 @@ By default, the analysis silently stops if the code is too
> >  complicated for the analyzer to fully explore and it reaches an internal
> >  limit.  The @option{-Wanalyzer-too-complex} option warns if this occurs.
> >  
> > +@item -Wno-analyzer-allocation-size
> > +@opindex Wanalyzer-allocation-size
> > +@opindex Wno-analyzer-allocation-size
> > +This warning requires @option{-fanalyzer}, which enables it; use
> > +@option{-Wno-analyzer-allocation-size}
> > +to disable it.
> > +
> > +This diagnostic warns for paths through the code in which a buffer is casted
> > +to a type where the buffer size is not a multiple of the pointee size.
>
> At the risk of bikeshedding, how about:
>
> This diagnostic warns for paths through the code in which a pointer
> is assigned to point at a buffer with a size that is not a multiple
> of sizeof(*pointer).
>
> See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}.
>
>
> [...snip...]
>
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> > new file mode 100644
> > index 00000000000..02634ae883b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
> > @@ -0,0 +1,102 @@
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +
> > +/* Tests with constant buffer sizes.  */
> > +
> > +void test_1 (void)
> > +{
> > +  short *ptr = malloc (21 * sizeof (short));
> > +  free (ptr);
> > +}
> > +
> > +void test_2 (void)
> > +{
> > +  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
> > +  free (ptr);
> > +
> > +  /* { dg-warning "" "" { target *-*-* } malloc2 } */
> > +  /* { dg-message "" "" { target *-*-* } malloc2 } */
>
> The various dg-warning and dg-message directives here (and throughout
> the rest of the patch) shouldn't have just "" "" for their first two
> args.
>
> The first arg should be a regexp that matches some (nonempty) subset of
> the expected text.  There's a balance to be struck between:
> (a) terseness to avoid "gold plating" the test output (where making any
> change to the wording would involve lots of tedious updates to test
> directives)
> versus
> (b) giving us test coverage that the message is sane, so that if we
> accidentally break the wording due to future changes to the analyzer,
> then at least one test starts failing
>
> Probably best for most of these regexps to be terse, but an empty
> regexp is too terse.
>
> The 2nd arg helps us disambiguate with directive we're talking about,
> so can be "warning" and "note" for the two above.

I've looked at other tests and added the regexps similar to those. The
notes are terse and only make sure that the size is printed
whenever its possible. Otherwise, I couldn't think of a way to provide a
terse regexp of the warning. But thats easily replacable because it is
just a constant string.

>
> [...snip...]
>
> > +void test_5 (void)
> > +{
> > +  int user_input;
> > +  scanf("%i", &user_input);
> > +  int n;
> > +  if (user_input == 0)
> > +    n = 21 * sizeof (short);
> > +  else
> > +    n = 42 * sizeof (short);
>
> I see you've used scanf, presumably to get a symbolic value for the
> variable.  If so, a simpler way to do this is to simply use a parameter
> to the test function.  But there's no need to change these test cases.
>
> Perhaps scanf should taint its arguments, which is
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106021 but obviously that
> would be a different patch.
>
> [...snip...]
>
> > +void test_9(void) 
> > +{
> > +  // FIXME
>
> Please make this comment more descriptive about what the issue here is.
>
> > +  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-*
> } } */
> > +  free (buf);
> > +}
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
> > new file mode 100644
> > index 00000000000..cb35a9d717b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
>
> [...snip...]
>
> > +void test_8(void) 
> > +{
> > +  int n;
> > +  scanf("%i", &n);
> > +  // FIXME
>
> Make the comment more descriptive please.
>
> > +  int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" ""
> { xfail *-*-* } } */
> > +  free (buf);
> > +}
> > +
> > +void test_9 (void)
> > +{
> > +  int n;
> > +  scanf("%i", &n);
> > +  /* n is a conjured_svalue.  */
> > +  void *ptr = malloc (n); /* { dg-message } */
> > +  int *iptr = (int *)ptr; /* { dg-line assign9 } */
> > +  free (iptr);
> > +
> > +  /* { dg-warning "" "" { target *-*-* } assign9 } */
> > +  /* { dg-message "" "" { target *-*-* } assign9 } */
> > +}
> > +
> > +void test_11 (void)
> > +{
> > +  int n;
> > +  scanf("%i", &n);
> > +  void *ptr = malloc (n);
> > +  if (n == 4)
>
> Presumably this should be a test against sizeof (int), rather than 4?

In the patch, it didn't matter. As it seemed impossible to evaluate
whether 'n % sizeof (type) == 0' is true, I did fall back to assume that
a guarded variable is okay. In the new patch, I weakened the
approximation and actually use the equivalence classes, so that this
actually should be sizeof (int).

I will send the updated patch in another mail in some time after I have
formatted it.

- Tim

>
> Please add a testcase where the comparison is against the wrong
> constant.
>
> > +    {
> > +      /* n is a conjured_svalue but guarded.  */
> > +      int *iptr = (int *)ptr;
> > +      free (iptr);
> > +    }
> > +  else
> > +    free (ptr);
> > +}
>
> [...snip...]
>
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> > new file mode 100644
> > index 00000000000..afb1782e0cd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
> > @@ -0,0 +1,40 @@
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +
> > +/* Tests related to structs.  */
>
> Looks like this was copied-and-pasted, and should be updated to "Tests
> of statically-sized buffers" or somesuch.
>
> > +
> > +typedef struct a {
> > +  short s;
> > +} a;
> > +
> > +int *test_1 (void)
> > +{
> > +  a A;
> > +  A.s = 1;
> > +  int *ptr = (int *) &A; /* { dg-line assign1 } */
> > +  return ptr;
> > +
> > +  /* { dg-warning "" "" { target *-*-* } assign1 } */
> > +  /* { dg-message "" "" { target *-*-* } assign1 } */
> > +}
> > +
> > +int *test2 (void)
> > +{
> > +  char arr[4];
>
> I think this needs to be sizeof(int), rather than 4.
>
> > +  int *ptr = (int *)arr;
> > +  return ptr;
> > +}
> > +
> > +int *test3 (void)
> > +{
> > +  char arr[2];
> > +  int *ptr = (int *)arr; /* { dg-line assign3 } */
> > +  return ptr;
> > +
> > +  /* { dg-warning "" "" { target *-*-* } assign3 } */
> > +  /* { dg-message "" "" { target *-*-* } assign3 } */
> > +}
> > +
> > +int main() {
> > +  return 0;
> > +}
>
> [...snip...]
>
> Thanks again for the v2 patch; hope the above makes sense
> Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3] analyzer: add allocation size checker [PR105900]
  2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
                   ` (3 preceding siblings ...)
  2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
@ 2022-06-30 22:11 ` Tim Lange
  2022-06-30 22:47   ` David Malcolm
  4 siblings, 1 reply; 17+ messages in thread
From: Tim Lange @ 2022-06-30 22:11 UTC (permalink / raw)
  To: dmalcolm; +Cc: gcc, Tim Lange

Hi,

here's the updated patch that should address all the comments from the v2.

- Tim


This patch adds an checker that warns about code paths in which a buffer is
assigned to a incompatible type, i.e. when the allocated buffer size is not a
multiple of the pointee's size.

2022-07-30  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	PR analyzer/105900
	* analyzer.opt: Added Wanalyzer-allocation-size.
	* checker-path.cc (region_creation_event::get_desc): Added call to new
	virtual function pending_diagnostic::describe_region_creation_event.
	* checker-path.h: Added region_creation_event::get_desc.
	* diagnostic-manager.cc (diagnostic_manager::add_event_on_final_node):
	New function.
	* diagnostic-manager.h:
	Added diagnostic_manager::add_event_on_final_node.
	* pending-diagnostic.h (struct region_creation): New event_desc struct.
	(pending_diagnostic::describe_region_creation_event): Added virtual
	function to overwrite description of a region creation.
	* region-model.cc (class dubious_allocation_size): New class.
	(capacity_compatible_with_type): New helper function.
	(class size_visitor): New class.
	(struct_or_union_with_inheritance_p): New helper function.
	(is_any_cast_p): New helper function.
	(region_model::check_region_size): New function.
	(region_model::set_value): Added call to
	region_model::check_region_size.
	* region-model.h (class region_model): New function check_region_size.
	* svalue.cc (region_svalue::accept): Changed to post-order traversal.
	(initial_svalue::accept): Likewise.
	(unaryop_svalue::accept): Likewise.
	(binop_svalue::accept): Likewise.
	(sub_svalue::accept): Likewise.
	(repeated_svalue::accept): Likewise.
	(bits_within_svalue::accept): Likewise.
	(widening_svalue::accept): Likewise.
	(unmergeable_svalue::accept): Likewise.
	(compound_svalue::accept): Likewise.
	(conjured_svalue::accept): Likewise.
	(asm_output_svalue::accept): Likewise.
	(const_fn_result_svalue::accept): Likewise.

gcc/ChangeLog:

	PR analyzer/105900
	* doc/invoke.texi: Added Wanalyzer-allocation-size.

gcc/testsuite/ChangeLog:

	PR analyzer/105900
* gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
	* gcc.dg/analyzer/allocation-size-1.c: New test.
	* gcc.dg/analyzer/allocation-size-2.c: New test.
	* gcc.dg/analyzer/allocation-size-3.c: New test.
	* gcc.dg/analyzer/allocation-size-4.c: New test.
	* gcc.dg/analyzer/allocation-size-5.c: New test.

Signed-off-by: Tim Lange <mail@tim-lange.me>
---
 gcc/analyzer/analyzer.opt                     |   4 +
 gcc/analyzer/checker-path.cc                  |  11 +-
 gcc/analyzer/checker-path.h                   |   2 +-
 gcc/analyzer/diagnostic-manager.cc            |  61 +++
 gcc/analyzer/diagnostic-manager.h             |   4 +
 gcc/analyzer/pending-diagnostic.h             |  20 +
 gcc/analyzer/region-model.cc                  | 370 ++++++++++++++++++
 gcc/analyzer/region-model.h                   |   2 +
 gcc/analyzer/svalue.cc                        |  26 +-
 gcc/doc/invoke.texi                           |  14 +
 .../gcc.dg/analyzer/allocation-size-1.c       | 116 ++++++
 .../gcc.dg/analyzer/allocation-size-2.c       | 155 ++++++++
 .../gcc.dg/analyzer/allocation-size-3.c       |  45 +++
 .../gcc.dg/analyzer/allocation-size-4.c       |  60 +++
 .../gcc.dg/analyzer/allocation-size-5.c       |  36 ++
 gcc/testsuite/gcc.dg/analyzer/pr96639.c       |   2 +-
 16 files changed, 912 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c

diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 4aea52d3a87..1d612246a30 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -54,6 +54,10 @@ The minimum number of supernodes within a function for the analyzer to consider
 Common Joined UInteger Var(param_analyzer_max_enodes_for_full_dump) Init(200) Param
 The maximum depth of exploded nodes that should appear in a dot dump before switching to a less verbose format.
 
+Wanalyzer-allocation-size
+Common Var(warn_analyzer_allocation_size) Init(1) Warning
+Warn about code paths in which a pointer to a buffer is assigned to an incompatible type.
+
 Wanalyzer-double-fclose
 Common Var(warn_analyzer_double_fclose) Init(1) Warning
 Warn about code paths in which a stdio FILE can be closed more than once.
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 0133dc94137..953e192cd55 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -302,8 +302,17 @@ region_creation_event::region_creation_event (const region *reg,
    region_creation_event.  */
 
 label_text
-region_creation_event::get_desc (bool) const
+region_creation_event::get_desc (bool can_colorize) const
 {
+  if (m_pending_diagnostic)
+    {
+      label_text custom_desc
+	    = m_pending_diagnostic->describe_region_creation_event
+		(evdesc::region_creation (can_colorize, m_reg));
+      if (custom_desc.m_buffer)
+	return custom_desc;
+    }
+
   switch (m_reg->get_memory_space ())
     {
     default:
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 24decf5ce3d..8e48d8a07ab 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -219,7 +219,7 @@ public:
   region_creation_event (const region *reg,
 			 location_t loc, tree fndecl, int depth);
 
-  label_text get_desc (bool) const final override;
+  label_text get_desc (bool can_colorize) const final override;
 
 private:
   const region *m_reg;
diff --git a/gcc/analyzer/diagnostic-manager.cc b/gcc/analyzer/diagnostic-manager.cc
index 8ea1f61776e..4adfda1af65 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -1476,6 +1476,67 @@ diagnostic_manager::build_emission_path (const path_builder &pb,
       const exploded_edge *eedge = epath.m_edges[i];
       add_events_for_eedge (pb, *eedge, emission_path, &interest);
     }
+  add_event_on_final_node (epath.get_final_enode (), emission_path, &interest);
+}
+
+/* Emit a region_creation_event when requested on the last statement in
+   the path.
+
+   If a region_creation_event should be emitted on the last statement of the
+   path, we need to peek to the successors to get whether the final enode
+   created a region.
+*/
+
+void
+diagnostic_manager::add_event_on_final_node (const exploded_node *final_enode,
+					     checker_path *emission_path,
+					     interesting_t *interest) const
+{
+  const program_point &src_point = final_enode->get_point ();
+  const int src_stack_depth = src_point.get_stack_depth ();
+  const program_state &src_state = final_enode->get_state ();
+  const region_model *src_model = src_state.m_region_model;
+
+  unsigned j;
+  exploded_edge *e;
+  FOR_EACH_VEC_ELT (final_enode->m_succs, j, e)
+  {
+    exploded_node *dst = e->m_dest;
+    const program_state &dst_state = dst->get_state ();
+    const region_model *dst_model = dst_state.m_region_model;
+    if (src_model->get_dynamic_extents ()
+	!= dst_model->get_dynamic_extents ())
+      {
+	unsigned i;
+	const region *reg;
+	bool emitted = false;
+	FOR_EACH_VEC_ELT (interest->m_region_creation, i, reg)
+	  {
+	    const region *base_reg = reg->get_base_region ();
+	    const svalue *old_extents
+	= src_model->get_dynamic_extents (base_reg);
+	    const svalue *new_extents
+	= dst_model->get_dynamic_extents (base_reg);
+	    if (old_extents == NULL && new_extents != NULL)
+	      switch (base_reg->get_kind ())
+		{
+		default:
+		  break;
+		case RK_HEAP_ALLOCATED:
+		case RK_ALLOCA:
+		  emission_path->add_region_creation_event
+		    (reg,
+		    src_point.get_location (),
+		    src_point.get_fndecl (),
+		    src_stack_depth);
+		  emitted = true;
+		  break;
+		}
+	  }
+	if (emitted)
+	  break;
+      }
+  }
 }
 
 /* Subclass of state_change_visitor that creates state_change_event
diff --git a/gcc/analyzer/diagnostic-manager.h b/gcc/analyzer/diagnostic-manager.h
index b9bb7c8c254..266eed8f9cb 100644
--- a/gcc/analyzer/diagnostic-manager.h
+++ b/gcc/analyzer/diagnostic-manager.h
@@ -149,6 +149,10 @@ private:
 			    const exploded_path &epath,
 			    checker_path *emission_path) const;
 
+  void add_event_on_final_node (const exploded_node *final_enode,
+				checker_path *emission_path,
+				interesting_t *interest) const;
+
   void add_events_for_eedge (const path_builder &pb,
 			     const exploded_edge &eedge,
 			     checker_path *emission_path,
diff --git a/gcc/analyzer/pending-diagnostic.h b/gcc/analyzer/pending-diagnostic.h
index 9e1c656bf0a..4ea469e1879 100644
--- a/gcc/analyzer/pending-diagnostic.h
+++ b/gcc/analyzer/pending-diagnostic.h
@@ -58,6 +58,17 @@ struct event_desc
   bool m_colorize;
 };
 
+/* For use by pending_diagnostic::describe_region_creation.  */
+
+struct region_creation : public event_desc
+{
+  region_creation (bool colorize, const region *reg)
+  : event_desc (colorize), m_reg (reg)
+  {}
+
+  const region *m_reg;
+};
+
 /* For use by pending_diagnostic::describe_state_change.  */
 
 struct state_change : public event_desc
@@ -215,6 +226,15 @@ class pending_diagnostic
      description; NULL otherwise (falling back on a more generic
      description).  */
 
+  /* Precision-of-wording vfunc for describing a region creation event
+     triggered by the mark_interesting_stuff vfunc.  */
+  virtual label_text
+  describe_region_creation_event (const evdesc::region_creation &)
+  {
+    /* Default no-op implementation.  */
+    return label_text ();
+  }
+
   /* Precision-of-wording vfunc for describing a critical state change
      within the diagnostic_path.
 
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6b49719d521..893566a811b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-operands.h"
 #include "ssa-iterators.h"
 #include "calls.h"
+#include "is-a.h"
 
 #if ENABLE_ANALYZER
 
@@ -2799,6 +2800,373 @@ region_model::check_region_for_read (const region *src_reg,
   check_region_access (src_reg, DIR_READ, ctxt);
 }
 
+/* Concrete subclass for casts of pointers that lead to trailing bytes.  */
+
+class dubious_allocation_size
+: public pending_diagnostic_subclass<dubious_allocation_size>
+{
+public:
+  dubious_allocation_size (const region *lhs, const region *rhs)
+  : m_lhs (lhs), m_rhs (rhs), m_expr (NULL_TREE)
+  {}
+
+  dubious_allocation_size (const region *lhs, const region *rhs,
+			   tree expr)
+  : m_lhs (lhs), m_rhs (rhs), m_expr (expr)
+  {}
+
+  const char *get_kind () const final override
+  {
+    return "dubious_allocation_size";
+  }
+
+  bool operator== (const dubious_allocation_size &other) const
+  {
+    return m_lhs == other.m_lhs && m_rhs == other.m_rhs
+	   && pending_diagnostic::same_tree_p (m_expr, other.m_expr);
+  }
+
+  int get_controlling_option () const final override
+  {
+    return OPT_Wanalyzer_allocation_size;
+  }
+
+  bool emit (rich_location *rich_loc) final override
+  {
+    diagnostic_metadata m;
+    m.add_cwe (131);
+
+    return warning_meta (rich_loc, m, get_controlling_option (),
+	       "allocated buffer size is not a multiple of the pointee's size");
+  }
+
+  label_text
+  describe_region_creation_event (const evdesc::region_creation &ev) final
+  override
+  {
+    m_allocation_event = &ev;
+    if (m_expr)
+      {
+	if (TREE_CODE (m_expr) == INTEGER_CST)
+	  return ev.formatted_print ("allocated %E bytes here", m_expr);
+	else
+	  return ev.formatted_print ("allocated %qE bytes here", m_expr);
+      }
+
+    return ev.formatted_print ("allocated here");
+  }
+
+  label_text describe_final_event (const evdesc::final_event &ev) final
+  override
+  {
+    tree pointee_type = TREE_TYPE (m_lhs->get_type ());
+    if (m_allocation_event)
+      /* Fallback: Typically, we should always
+	 see an m_allocation_event before.  */
+      return ev.formatted_print ("assigned to %qT here;"
+				 " %<sizeof (%T)%> is %qE",
+				 m_lhs->get_type (), pointee_type,
+				 size_in_bytes (pointee_type));
+
+    if (m_expr)
+      {
+	if (TREE_CODE (m_expr) == INTEGER_CST)
+	  return ev.formatted_print ("allocated %E bytes and assigned to"
+				    " %qT here; %<sizeof (%T)%> is %qE",
+				    m_expr, m_lhs->get_type (), pointee_type,
+				    size_in_bytes (pointee_type));
+	else
+	  return ev.formatted_print ("allocated %qE bytes and assigned to"
+				    " %qT here; %<sizeof (%T)%> is %qE",
+				    m_expr, m_lhs->get_type (), pointee_type,
+				    size_in_bytes (pointee_type));
+      }
+
+    return ev.formatted_print ("allocated and assigned to %qT here;"
+			       " %<sizeof (%T)%> is %qE",
+			       m_lhs->get_type (), pointee_type,
+			       size_in_bytes (pointee_type));
+  }
+
+  void mark_interesting_stuff (interesting_t *interest) final override
+  {
+    interest->add_region_creation (m_rhs);
+  }
+
+private:
+  const region *m_lhs;
+  const region *m_rhs;
+  const tree m_expr;
+  const evdesc::region_creation *m_allocation_event;
+};
+
+/* Return true on dubious allocation sizes for constant sizes.  */
+
+static bool
+capacity_compatible_with_type (tree cst, tree pointee_size_tree,
+			       bool is_struct)
+{
+  gcc_assert (TREE_CODE (cst) == INTEGER_CST);
+  gcc_assert (TREE_CODE (pointee_size_tree) == INTEGER_CST);
+
+  unsigned HOST_WIDE_INT pointee_size = TREE_INT_CST_LOW (pointee_size_tree);
+  unsigned HOST_WIDE_INT alloc_size = TREE_INT_CST_LOW (cst);
+
+  if (is_struct)
+    return alloc_size >= pointee_size;
+  return alloc_size % pointee_size == 0;
+}
+
+static bool
+capacity_compatible_with_type (tree cst, tree pointee_size_tree)
+{
+  return capacity_compatible_with_type (cst, pointee_size_tree, false);
+}
+
+/* Checks whether SVAL could be a multiple of SIZE_CST.
+
+   It works by visiting all svalues inside SVAL until it reaches
+   atomic nodes.  From those, it goes back up again and adds each
+   node that might be a multiple of SIZE_CST to the RESULT_SET.  */
+
+class size_visitor : public visitor
+{
+public:
+  size_visitor (tree size_cst, const svalue *sval, constraint_manager *cm)
+  : m_size_cst (size_cst), m_sval (sval), m_cm (cm)
+  {
+    sval->accept (this);
+  }
+
+  bool get_result ()
+  {
+    return result_set.contains (m_sval);
+  }
+
+  void visit_constant_svalue (const constant_svalue *sval) final override
+  {
+    if (capacity_compatible_with_type (sval->get_constant (), m_size_cst))
+      result_set.add (sval);
+  }
+
+  void visit_unknown_svalue (const unknown_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void visit_poisoned_svalue (const poisoned_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void visit_unaryop_svalue (const unaryop_svalue *sval)
+  {
+    const svalue *arg = sval->get_arg ();
+    if (result_set.contains (arg))
+      result_set.add (sval);
+  }
+
+  void visit_binop_svalue (const binop_svalue *sval) final override
+  {
+    const svalue *arg0 = sval->get_arg0 ();
+    const svalue *arg1 = sval->get_arg1 ();
+
+    if (sval->get_op () == MULT_EXPR)
+      {
+	if (result_set.contains (arg0) || result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+    else
+      {
+	if (result_set.contains (arg0) && result_set.contains (arg1))
+	  result_set.add (sval);
+      }
+  }
+
+  void visit_repeated_svalue (const repeated_svalue *sval)
+  {
+    sval->get_inner_svalue ()->accept (this);
+    if (result_set.contains (sval->get_inner_svalue ()))
+      result_set.add (sval);
+  }
+
+  void visit_unmergeable_svalue (const unmergeable_svalue *sval) final override
+  {
+    sval->get_arg ()->accept (this);
+    if (result_set.contains (sval->get_arg ()))
+      result_set.add (sval);
+  }
+
+  void visit_widening_svalue (const widening_svalue *sval) final override
+  {
+    const svalue *base = sval->get_base_svalue ();
+    const svalue *iter = sval->get_iter_svalue ();
+
+    if (result_set.contains (base) && result_set.contains (iter))
+      result_set.add (sval);
+  }
+
+  void visit_conjured_svalue (const conjured_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    equiv_class_id id (-1);
+    if (m_cm->get_equiv_class_by_svalue (sval, &id))
+      {
+	if (tree cst_val = id.get_obj (*m_cm).get_any_constant ())
+	  {
+	    if (capacity_compatible_with_type (cst_val, m_size_cst))
+	      result_set.add (sval);
+	  }
+	else
+	  {
+	    result_set.add (sval);
+	  }
+      }
+  }
+
+  void visit_asm_output_svalue (const asm_output_svalue *sval ATTRIBUTE_UNUSED)
+    final override
+  {
+    result_set.add (sval);
+  }
+
+  void visit_const_fn_result_svalue (const const_fn_result_svalue
+				      *sval ATTRIBUTE_UNUSED) final override
+  {
+    result_set.add (sval);
+  }
+
+private:
+  tree m_size_cst;
+  const svalue *m_sval;
+  constraint_manager *m_cm;
+  svalue_set result_set; /* Used as a mapping of svalue*->bool.  */
+};
+
+/* Return true if a struct or union either uses the inheritance pattern,
+   where the first field is a base struct, or the flexible array member
+   pattern, where the last field is an array without a specified size.  */
+
+static bool
+struct_or_union_with_inheritance_p (tree struc)
+{
+  tree iter = TYPE_FIELDS (struc);
+  if (iter == NULL_TREE)
+	  return false;
+  if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (iter)))
+	  return true;
+
+  tree last_field;
+  while (iter != NULL_TREE)
+    {
+      last_field = iter;
+      iter = DECL_CHAIN (iter);
+    }
+
+  if (last_field != NULL_TREE
+      && TREE_CODE (TREE_TYPE (last_field)) == ARRAY_TYPE)
+	  return true;
+
+  return false;
+}
+
+/* Return true if the lhs and rhs of an assignment have different types.  */
+
+static bool
+is_any_cast_p (const gimple *stmt)
+{
+  if (const gassign *assign = dyn_cast<const gassign *>(stmt))
+    return gimple_assign_cast_p (assign)
+	   || !pending_diagnostic::same_tree_p (
+		  TREE_TYPE (gimple_assign_lhs (assign)),
+		  TREE_TYPE (gimple_assign_rhs1 (assign)));
+  else if (const gcall *call = dyn_cast<const gcall *>(stmt))
+    {
+      tree lhs = gimple_call_lhs (call);
+      return lhs != NULL_TREE && !pending_diagnostic::same_tree_p (
+				    TREE_TYPE (gimple_call_lhs (call)),
+				    gimple_call_return_type (call));
+    }
+
+  return false;
+}
+
+/* On pointer assignments, check whether the buffer size of
+   RHS_SVAL is compatible with the type of the LHS_REG.
+   Use a non-null CTXT to report allocation size warnings.  */
+
+void
+region_model::check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+				 region_model_context *ctxt) const
+{
+  if (!ctxt || ctxt->get_stmt () == NULL)
+    return;
+  /* Only report warnings on assignments that actually change the type.  */
+  if (!is_any_cast_p (ctxt->get_stmt ()))
+    return;
+
+  const region_svalue *reg_sval = dyn_cast <const region_svalue *> (rhs_sval);
+  if (!reg_sval)
+    return;
+
+  tree pointer_type = lhs_reg->get_type ();
+  if (pointer_type == NULL_TREE || !POINTER_TYPE_P (pointer_type))
+    return;
+
+  tree pointee_type = TREE_TYPE (pointer_type);
+  /* Make sure that the type on the left-hand size actually has a size.  */
+  if (pointee_type == NULL_TREE || VOID_TYPE_P (pointee_type)
+      || TYPE_SIZE_UNIT (pointee_type) == NULL_TREE)
+    return;
+
+  /* Bail out early on pointers to structs where we can
+     not deduce whether the buffer size is compatible.  */
+  bool is_struct = RECORD_OR_UNION_TYPE_P (pointee_type);
+  if (is_struct && struct_or_union_with_inheritance_p (pointee_type))
+    return;
+
+  tree pointee_size_tree = size_in_bytes (pointee_type);
+  /* We give up if the type size is not known at compile-time or the
+     type size is always compatible regardless of the buffer size.  */
+  if (TREE_CODE (pointee_size_tree) != INTEGER_CST
+      || integer_zerop (pointee_size_tree)
+      || integer_onep (pointee_size_tree))
+    return;
+
+  const region *rhs_reg = reg_sval->get_pointee ();
+  const svalue *capacity = get_capacity (rhs_reg);
+  switch (capacity->get_kind ())
+    {
+    case svalue_kind::SK_CONSTANT:
+      {
+	const constant_svalue *cst_cap_sval
+		= as_a <const constant_svalue *> (capacity);
+	tree cst_cap = cst_cap_sval->get_constant ();
+	if (!capacity_compatible_with_type (cst_cap, pointee_size_tree,
+					    is_struct))
+	  ctxt->warn (new dubious_allocation_size (lhs_reg, rhs_reg,
+						   cst_cap));
+      }
+      break;
+    default:
+      {
+	if (!is_struct)
+	  {
+	    size_visitor v (pointee_size_tree, capacity, m_constraints);
+	    if (!v.get_result ())
+	      {
+		tree expr = get_representative_tree (capacity);
+		ctxt->warn (new dubious_allocation_size (lhs_reg, rhs_reg,
+			    expr));
+	      }
+	  }
+      break;
+      }
+    }
+}
+
 /* Set the value of the region given by LHS_REG to the value given
    by RHS_SVAL.
    Use CTXT to report any warnings associated with writing to LHS_REG.  */
@@ -2810,6 +3178,8 @@ region_model::set_value (const region *lhs_reg, const svalue *rhs_sval,
   gcc_assert (lhs_reg);
   gcc_assert (rhs_sval);
 
+  check_region_size (lhs_reg, rhs_sval, ctxt);
+
   check_region_for_write (lhs_reg, ctxt);
 
   m_store.set_value (m_mgr->get_store_manager(), lhs_reg, rhs_sval,
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 1bfa56a8cd2..91b7b370b81 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -857,6 +857,8 @@ class region_model
 			       region_model_context *ctxt) const;
   void check_region_for_read (const region *src_reg,
 			      region_model_context *ctxt) const;
+  void check_region_size (const region *lhs_reg, const svalue *rhs_sval,
+			  region_model_context *ctxt) const;
 
   void check_call_args (const call_details &cd) const;
   void check_external_function_for_access_attr (const gcall *call,
diff --git a/gcc/analyzer/svalue.cc b/gcc/analyzer/svalue.cc
index 2f9149412b9..7bad3cea31b 100644
--- a/gcc/analyzer/svalue.cc
+++ b/gcc/analyzer/svalue.cc
@@ -732,8 +732,8 @@ region_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 region_svalue::accept (visitor *v) const
 {
-  v->visit_region_svalue (this);
   m_reg->accept (v);
+  v->visit_region_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for region_svalue.  */
@@ -1031,8 +1031,8 @@ initial_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 initial_svalue::accept (visitor *v) const
 {
-  v->visit_initial_svalue (this);
   m_reg->accept (v);
+  v->visit_initial_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for initial_svalue.  */
@@ -1123,8 +1123,8 @@ unaryop_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 unaryop_svalue::accept (visitor *v) const
 {
-  v->visit_unaryop_svalue (this);
   m_arg->accept (v);
+  v->visit_unaryop_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for unaryop_svalue.  */
@@ -1225,9 +1225,9 @@ binop_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 binop_svalue::accept (visitor *v) const
 {
-  v->visit_binop_svalue (this);
   m_arg0->accept (v);
   m_arg1->accept (v);
+  v->visit_binop_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for binop_svalue.  */
@@ -1283,9 +1283,9 @@ sub_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 sub_svalue::accept (visitor *v) const
 {
-  v->visit_sub_svalue (this);
   m_parent_svalue->accept (v);
   m_subregion->accept (v);
+  v->visit_sub_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for sub_svalue.  */
@@ -1352,8 +1352,8 @@ repeated_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 repeated_svalue::accept (visitor *v) const
 {
-  v->visit_repeated_svalue (this);
   m_inner_svalue->accept (v);
+  v->visit_repeated_svalue (this);
 }
 
 /* Implementation of svalue::all_zeroes_p for repeated_svalue.  */
@@ -1494,8 +1494,8 @@ bits_within_svalue::maybe_fold_bits_within (tree type,
 void
 bits_within_svalue::accept (visitor *v) const
 {
-  v->visit_bits_within_svalue (this);
   m_inner_svalue->accept (v);
+  v->visit_bits_within_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for bits_within_svalue.  */
@@ -1544,9 +1544,9 @@ widening_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 widening_svalue::accept (visitor *v) const
 {
-  v->visit_widening_svalue (this);
   m_base_sval->accept (v);
   m_iter_sval->accept (v);
+  v->visit_widening_svalue (this);
 }
 
 /* Attempt to determine in which direction this value is changing
@@ -1711,8 +1711,8 @@ unmergeable_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 unmergeable_svalue::accept (visitor *v) const
 {
-  v->visit_unmergeable_svalue (this);
   m_arg->accept (v);
+  v->visit_unmergeable_svalue (this);
 }
 
 /* Implementation of svalue::implicitly_live_p vfunc for unmergeable_svalue.  */
@@ -1776,13 +1776,13 @@ compound_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 compound_svalue::accept (visitor *v) const
 {
-  v->visit_compound_svalue (this);
   for (binding_map::iterator_t iter = m_map.begin ();
        iter != m_map.end (); ++iter)
     {
       //(*iter).first.accept (v);
       (*iter).second->accept (v);
     }
+  v->visit_compound_svalue (this);
 }
 
 /* Calculate what the complexity of a compound_svalue instance for MAP
@@ -1903,8 +1903,8 @@ conjured_svalue::dump_to_pp (pretty_printer *pp, bool simple) const
 void
 conjured_svalue::accept (visitor *v) const
 {
-  v->visit_conjured_svalue (this);
   m_id_reg->accept (v);
+  v->visit_conjured_svalue (this);
 }
 
 /* class asm_output_svalue : public svalue.  */
@@ -1968,9 +1968,9 @@ asm_output_svalue::input_idx_to_asm_idx (unsigned input_idx) const
 void
 asm_output_svalue::accept (visitor *v) const
 {
-  v->visit_asm_output_svalue (this);
   for (unsigned i = 0; i < m_num_inputs; i++)
     m_input_arr[i]->accept (v);
+  v->visit_asm_output_svalue (this);
 }
 
 /* class const_fn_result_svalue : public svalue.  */
@@ -2021,9 +2021,9 @@ const_fn_result_svalue::dump_input (pretty_printer *pp,
 void
 const_fn_result_svalue::accept (visitor *v) const
 {
-  v->visit_const_fn_result_svalue (this);
   for (unsigned i = 0; i < m_num_inputs; i++)
     m_input_arr[i]->accept (v);
+  v->visit_const_fn_result_svalue (this);
 }
 
 } // namespace ana
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 60b7b5a26bb..ddf5125a2b1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9711,6 +9711,7 @@ This analysis is much more expensive than other GCC warnings.
 Enabling this option effectively enables the following warnings:
 
 @gccoptlist{ @gol
+-Wanalyzer-allocation-size @gol
 -Wanalyzer-double-fclose @gol
 -Wanalyzer-double-free @gol
 -Wanalyzer-exposure-through-output-file @gol
@@ -9758,6 +9759,19 @@ By default, the analysis silently stops if the code is too
 complicated for the analyzer to fully explore and it reaches an internal
 limit.  The @option{-Wanalyzer-too-complex} option warns if this occurs.
 
+@item -Wno-analyzer-allocation-size
+@opindex Wanalyzer-allocation-size
+@opindex Wno-analyzer-allocation-size
+This warning requires @option{-fanalyzer}, which enables it; use
+@option{-Wno-analyzer-allocation-size}
+to disable it.
+
+This diagnostic warns for paths through the code in which a pointer to
+a buffer is assigned to point at a buffer with a size that is not a
+multiple of @code{sizeof (*pointer)}.
+
+See @url{https://cwe.mitre.org/data/definitions/131.html, CWE-131: Incorrect Calculation of Buffer Size}.
+
 @item -Wno-analyzer-double-fclose
 @opindex Wanalyzer-double-fclose
 @opindex Wno-analyzer-double-fclose
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
new file mode 100644
index 00000000000..4fc2bf75d6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-1.c
@@ -0,0 +1,116 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with constant buffer sizes.  */
+
+void test_1 (void)
+{
+  short *ptr = malloc (21 * sizeof (short));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (21 * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc2 } */
+  /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc2 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  void *ptr = malloc (21 * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (void)
+{
+  void *ptr = malloc (21 * sizeof (short)); /* { dg-message "\\d+ bytes" } */
+  int *iptr = (int *)ptr; /* { dg-line assign4 } */
+  free (iptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign4 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign4 } */
+}
+
+void test_5 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 21 * sizeof (short);
+  else
+    n = 42 * sizeof (short);
+  void *ptr = malloc (n);
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_6 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 21 * sizeof (short);
+  else
+    n = 42 * sizeof (short);
+  void *ptr = malloc (n); /* { dg-message "" "note" } */
+                          /* ^^^ on widening_svalues no expr is returned
+                                 by get_representative_tree at the moment.  */ 
+  int *iptr = (int *)ptr; /* { dg-line assign6 } */
+  free (iptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign6 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign6 } */
+}
+
+void test_7 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 1;
+  else if (user_input == 2)
+    n = 5;
+  else
+    n = 7;
+  /* n is an unknown_svalue at this point.  */
+  void *ptr = malloc (n);
+  int *iptr = (int *)ptr;
+  free (iptr);
+}
+
+void *create_buffer (int n)
+{
+  return malloc(n);
+}
+
+void test_8 (void) 
+{
+  int *buf = create_buffer(4 * sizeof (int));
+  free (buf);
+}
+
+void test_9 (void) 
+{
+  /* FIXME: At the moment, region_model::set_value (lhs, <return_value>)
+     is called at the src_node of the return edge. This edge has no stmts
+     associated with it, leading to a rejection of the warning inside
+     impl_region_model_context::warn. To ensure that the indentation
+     in the diagnostic is right, the warning has to be emitted on an EN
+     that is after the return edge.  */
+  int *buf = create_buffer(42); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
+
+void test_10 (int n)
+{
+  char *ptr = malloc (7 * n);
+  free (ptr);
+}
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
new file mode 100644
index 00000000000..37bbbac87c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-2.c
@@ -0,0 +1,155 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests with symbolic buffer sizes.  */
+
+void test_1 (int n)
+{
+  short *ptr = malloc (n * sizeof (short));
+  free (ptr);
+}
+
+void test_2 (int n)
+{
+  int *ptr = malloc (n * sizeof (short)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc2 } */
+  /* { dg-message "'\[a-z0-9\\*\\(\\)\\s\]*' bytes" "note" { target *-*-* } malloc2 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } malloc2 } */
+}
+
+void test_3 (int n)
+{
+  void *ptr = malloc (n * sizeof (short));
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_4 (int n)
+{
+  void *ptr = malloc (n * sizeof (short)); /* { dg-message "'\[a-z0-9\\*\\(\\)\\s\]*'" "note" } */
+  int *iptr = (int *)ptr; /* { dg-line assign4 } */
+  free (iptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign4 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign4 } */
+}
+
+void test_5 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = 3 * user_input * sizeof (short);
+  else
+    n = 5 * user_input * sizeof (short);
+  void *ptr = malloc (n);
+  short *sptr = (short *)ptr;
+  free (sptr);
+}
+
+void test_6 (void)
+{
+  int user_input;
+  scanf("%i", &user_input);
+  int n;
+  if (user_input == 0)
+    n = user_input;
+  else if (user_input == 2)
+    n = user_input * 3;
+  else
+    n = user_input * 5;
+  /* n is an unknown_svalue at this point.  */
+  void *ptr = malloc (n);
+  int *iptr = (int *)ptr;
+  free (iptr);
+}
+
+void *create_buffer(int n)
+{
+  return malloc(n);
+}
+
+void test_7(int n) 
+{
+  int *buf = create_buffer(n * sizeof (int));
+  free (buf);
+}
+
+void test_8(int n) 
+{
+  /* FIXME: At the moment, region_model::set_value (lhs, <return_value>)
+     is called at the src_node of the return edge. This edge has no stmts
+     associated with it, leading to a rejection of the warning inside
+     impl_region_model_context::warn. To ensure that the indentation
+     in the diagnostic is right, the warning has to be emitted on an EN
+     that is after the return edge.  */
+  int *buf = create_buffer(n * sizeof(short)); /* { dg-warning "" "" { xfail *-*-* } } */
+  free (buf);
+}
+
+void test_9 (void)
+{
+  int n;
+  scanf("%i", &n);
+  /* n is a conjured_svalue.  */
+  void *ptr = malloc (n); /* { dg-message "'n' bytes" "note" } */
+  int *iptr = (int *)ptr; /* { dg-line assign9 } */
+  free (iptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign9 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign9 } */
+}
+
+void test_11 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n);
+  if (n == sizeof (int))
+    {
+      /* n is a conjured_svalue but guarded such that we
+         know the value is a multiple of sizeof (*iptr).  */
+      int *iptr = (int *)ptr;
+      free (iptr);
+    }
+  else
+    free (ptr);
+}
+
+void test_12 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n); /* { dg-message "'n' bytes" } */
+  if (n == 5)
+    {
+      /* n is a conjured_svalue but guarded such that we
+         know the value isn't a multiple of sizeof (*iptr).  */
+      int *iptr = (int *)ptr; /* { dg-line assign12 } */
+      free (iptr);
+    }
+  else
+    free (ptr);
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign12 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign12 } */
+}
+
+void test_13 (void)
+{
+  int n;
+  scanf("%i", &n);
+  void *ptr = malloc (n);
+  if (n == n * n)
+    {
+      /* n is a conjured_svalue but guarded such that we don't have an
+         equivalence class for it. In such cases, we assume that the
+         condition ensures that the value is okay.  */
+      int *iptr = (int *)ptr;
+      free (iptr);
+    }
+  else
+    free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
new file mode 100644
index 00000000000..fdc1c56b7ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-3.c
@@ -0,0 +1,45 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* CWE-131 example 5 */
+void test_1 (void) 
+{
+  int *id_sequence = (int *) malloc (3); /* { dg-line malloc1 } */
+  if (id_sequence == NULL) exit (1);
+
+  id_sequence[0] = 13579;
+  id_sequence[1] = 24680;
+  id_sequence[2] = 97531;
+
+  free (id_sequence);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc1 } */
+  /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc1 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } malloc1 } */
+}
+
+void test_2 (void)
+{
+  int *ptr = malloc (10 + sizeof(int)); /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc2 } */
+  /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc2 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } malloc2 } */
+}
+
+void test_3 (int n)
+{
+  int *ptr = malloc (n + sizeof (int)); /* { dg-line malloc3 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc3 } */
+  /* { dg-message "'\[a-z0-9\\+\\(\\)\\s\]*' bytes" "note" { target *-*-* } malloc3 } */
+  /* { dg-message "'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } malloc3 } */
+}
+
+void test_4 (int n, int m)
+{
+  int *ptr = malloc ((n + m) * sizeof (int));
+  free (ptr);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
new file mode 100644
index 00000000000..e475c1586a3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-4.c
@@ -0,0 +1,60 @@
+#include <stdlib.h>
+
+/* Tests related to structs.  */
+
+struct base {
+  int i;
+};
+
+struct sub {
+  struct base b;
+  int j;
+};
+
+struct var_len {
+  int i;
+  char arr[];
+};
+
+
+void test_1 (void)
+{
+  struct base *ptr = malloc (5 * sizeof (struct base));
+  free (ptr);
+}
+
+void test_2 (void)
+{
+  long *ptr = malloc (5 * sizeof (struct base));  /* { dg-line malloc2 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc2 } */
+  /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc2 } */
+  /* { dg-message "'long (int)? \\*' here; 'sizeof \\(long (int)?\\)' is '\\d+'" "note" { target *-*-* } malloc2 } */
+}
+
+void test_3 (void)
+{
+  /* Even though 10 bytes is not a multiple of 4, we do not warn to prevent
+     a false positive in case s is the base struct of a struct inheritance.  */
+  struct base *ptr = malloc (10);
+  free (ptr);
+}
+
+void test_4 (void)
+{
+  struct var_len *ptr = malloc (10);
+  free (ptr);
+}
+
+void test_5 (void)
+{
+  /* For constant sizes, we warn if the buffer
+     is too small to hold a single struct.  */
+  struct base *ptr = malloc (2);  /* { dg-line malloc5 } */
+  free (ptr);
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } malloc5 } */
+  /* { dg-message "\\d+ bytes" "note" { target *-*-* } malloc5 } */
+  /* { dg-message "'struct base \\*' here; 'sizeof \\(struct base\\)' is '\\d+'" "note" { target *-*-* } malloc5 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
new file mode 100644
index 00000000000..ae7e1074ebb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-5.c
@@ -0,0 +1,36 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+/* Tests related to statically allocated buffers.  */
+
+typedef struct a {
+  short s;
+} a;
+
+int *test_1 (void)
+{
+  a A; /* { dg-message "\\d+ bytes" "note" } */
+  A.s = 1;
+  int *ptr = (int *) &A; /* { dg-line assign1 } */
+  return ptr;
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign1 } */
+  /* { dg-message "assigned to 'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign1 } */
+}
+
+int *test2 (void)
+{
+  char arr[sizeof (int)];
+  int *ptr = (int *)arr;
+  return ptr;
+}
+
+int *test3 (void)
+{
+  char arr[sizeof (short)]; /* { dg-message "\\d+ bytes" "note" } */
+  int *ptr = (int *)arr; /* { dg-line assign3 } */
+  return ptr;
+
+  /* { dg-warning "allocated buffer size is not a multiple of the pointee's size \\\[CWE-131\\\]" "warning" { target *-*-* } assign3 } */
+  /* { dg-message "assigned to 'int \\*' here; 'sizeof \\(int\\)' is '\\d+'" "note" { target *-*-* } assign3 } */
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr96639.c b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
index 02ca3f084a2..aedf0464dc9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/pr96639.c
+++ b/gcc/testsuite/gcc.dg/analyzer/pr96639.c
@@ -3,7 +3,7 @@ void *calloc (__SIZE_TYPE__, __SIZE_TYPE__);
 int
 x7 (void)
 {
-  int **md = calloc (1, 1);
+  int **md = calloc (1, sizeof (void *));
 
   return md[0][0]; /* { dg-warning "possibly-NULL" "unchecked deref" } */
   /* { dg-warning "leak of 'md'" "leak" { target *-*-* } .-1 } */
-- 
2.36.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] analyzer: add allocation size checker [PR105900]
  2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
@ 2022-06-30 22:47   ` David Malcolm
  0 siblings, 0 replies; 17+ messages in thread
From: David Malcolm @ 2022-06-30 22:47 UTC (permalink / raw)
  To: Tim Lange; +Cc: gcc

On Fri, 2022-07-01 at 00:11 +0200, Tim Lange wrote:
> Hi,
> 
> here's the updated patch that should address all the comments from the
> v2.
> 
> - Tim
> 
> This patch adds an checker that warns about code paths in which a
> buffer is
> assigned to a incompatible type, i.e. when the allocated buffer size is
> not a
> multiple of the pointee's size.
> 
> 2022-07-30  Tim Lange  <mail@tim-lange.me>
> 
> gcc/analyzer/ChangeLog:
> 
>         PR analyzer/105900
>         * analyzer.opt: Added Wanalyzer-allocation-size.
>         * checker-path.cc (region_creation_event::get_desc): Added call
> to new
>         virtual function
> pending_diagnostic::describe_region_creation_event.
>         * checker-path.h: Added region_creation_event::get_desc.
>         * diagnostic-manager.cc
> (diagnostic_manager::add_event_on_final_node):
>         New function.
>         * diagnostic-manager.h:
>         Added diagnostic_manager::add_event_on_final_node.
>         * pending-diagnostic.h (struct region_creation): New event_desc
> struct.
>         (pending_diagnostic::describe_region_creation_event): Added
> virtual
>         function to overwrite description of a region creation.
>         * region-model.cc (class dubious_allocation_size): New class.
>         (capacity_compatible_with_type): New helper function.
>         (class size_visitor): New class.
>         (struct_or_union_with_inheritance_p): New helper function.
>         (is_any_cast_p): New helper function.
>         (region_model::check_region_size): New function.
>         (region_model::set_value): Added call to
>         region_model::check_region_size.
>         * region-model.h (class region_model): New function
> check_region_size.
>         * svalue.cc (region_svalue::accept): Changed to post-order
> traversal.
>         (initial_svalue::accept): Likewise.
>         (unaryop_svalue::accept): Likewise.
>         (binop_svalue::accept): Likewise.
>         (sub_svalue::accept): Likewise.
>         (repeated_svalue::accept): Likewise.
>         (bits_within_svalue::accept): Likewise.
>         (widening_svalue::accept): Likewise.
>         (unmergeable_svalue::accept): Likewise.
>         (compound_svalue::accept): Likewise.
>         (conjured_svalue::accept): Likewise.
>         (asm_output_svalue::accept): Likewise.
>         (const_fn_result_svalue::accept): Likewise.
> 
> gcc/ChangeLog:
> 
>         PR analyzer/105900
>         * doc/invoke.texi: Added Wanalyzer-allocation-size.
> 
> gcc/testsuite/ChangeLog:
> 
>         PR analyzer/105900
> * gcc.dg/analyzer/pr96639.c: Changed buffer size to omit warning.
>         * gcc.dg/analyzer/allocation-size-1.c: New test.
>         * gcc.dg/analyzer/allocation-size-2.c: New test.
>         * gcc.dg/analyzer/allocation-size-3.c: New test.
>         * gcc.dg/analyzer/allocation-size-4.c: New test.
>         * gcc.dg/analyzer/allocation-size-5.c: New test.
> 
> Signed-off-by: Tim Lange <mail@tim-lange.me>


Thanks for the v3 patch.

Content-wise, the v3 patch looks ready to me, though there's something
weird with the formatting of the ChangeLog entry for pr96639.c in the
commit message - does the patch pass:
  ./contrib/gcc-changelog/git_check_commit.py HEAD
?  (this script gets run server-side on our git repository, and it
won't let you push a patch unless the script passes)

You didn't specify to what extent you've tested it.  If you've
successfully bootstrapped gcc with this patch applied, and run the test
suite with no regressions, then this is OK to push to trunk.

[...snip...]

Thanks
Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-06-30 22:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-17 15:54 [RFC] analyzer: allocation size warning Tim Lange
2022-06-17 17:15 ` Prathamesh Kulkarni
2022-06-17 19:23   ` Tim Lange
2022-06-17 21:39     ` David Malcolm
2022-06-17 17:48 ` David Malcolm
2022-06-17 20:23   ` Tim Lange
2022-06-17 22:13     ` David Malcolm
2022-06-21 20:00       ` Tim Lange
2022-06-21 23:16         ` David Malcolm
2022-06-22 14:57           ` Tim Lange
2022-06-22 18:23             ` David Malcolm
2022-06-17 18:34 ` [RFC] analyzer: add " Tim Lange
2022-06-29 15:39 ` [PATCH v2] analyzer: add allocation size checker Tim Lange
2022-06-29 17:39   ` David Malcolm
2022-06-30 20:40     ` Tim Lange
2022-06-30 22:11 ` [PATCH v3] analyzer: add allocation size checker [PR105900] Tim Lange
2022-06-30 22:47   ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).