public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] warn for integer overflow in allocation calls (PR 96838)
@ 2020-09-15 19:47 Martin Sebor
  2020-09-16  9:39 ` Bernhard Reutner-Fischer
  2020-09-17  3:23 ` Jeff Law
  0 siblings, 2 replies; 18+ messages in thread
From: Martin Sebor @ 2020-09-15 19:47 UTC (permalink / raw)
  To: gcc-patches, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 411 bytes --]

Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
can lead to a subsequent buffer overflow corrupting the heap or
stack.  The attached patch diagnoses a subset of these cases where
the overflow/wraparound is still detectable.

Besides regtesting GCC on x86_64-linux I also verified the warning
doesn't introduce any false positives into Glibc or Binutils/GDB
builds on the same target.

Martin

[-- Attachment #2: gcc-96838.diff --]
[-- Type: text/x-patch, Size: 15880 bytes --]

PR middle-end/96838 - missing warning on integer overflow in calls to allocation functions

gcc/ChangeLog:

	PR middle-end/96838
	* calls.c (eval_size_vflow): New function.
	(get_size_range): Call it.  Add argument.
	(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
	* calls.h (get_size_range): Add argument.

gcc/testsuite/ChangeLog:

	PR middle-end/96838
	* gcc.dg/Walloc-size-larger-than-19.c: New test.
	* gcc.dg/Walloc-size-larger-than-20.c: New test.

diff --git a/gcc/calls.c b/gcc/calls.c
index 8ac94db6817..a5acff301e0 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1237,6 +1237,139 @@ alloc_max_size (void)
   return alloc_object_size_limit;
 }
 
+/* Try to evaluate the artithmetic EXPresssion representing the size of
+   an object in the widest possible type and set RANGE[] to the result.
+   Return the overflow status for the type of the expression (which may
+   be OVF_UNKNOWN if it cannot be determined from the ranges of its
+   operands).  Used to detect calls to allocation functions with
+   an argument that either overflows or wraps around zero.  */
+
+static wi::overflow_type
+eval_size_vflow (tree exp, wide_int range[2])
+{
+  const int prec = WIDE_INT_MAX_PRECISION;
+
+  if (TREE_CODE (exp) == INTEGER_CST)
+    {
+      range[0] = range[1] = wi::to_wide (exp, prec);
+      return wi::OVF_NONE;
+    }
+
+  if (TREE_CODE (exp) != SSA_NAME)
+    return wi::OVF_UNKNOWN;
+
+  gimple *def = SSA_NAME_DEF_STMT (exp);
+  if (!is_gimple_assign (def))
+    return wi::OVF_UNKNOWN;
+
+  tree_code code = gimple_assign_rhs_code (def);
+  tree optype = NULL_TREE;
+  wide_int op1r[2], op2r[2];
+  if (code == MULT_EXPR
+      || code == MINUS_EXPR
+      || code == PLUS_EXPR)
+    {
+      /* Ignore the overflow on the operands.  */
+      tree rhs1 = gimple_assign_rhs1 (def);
+      wi::overflow_type ovf = eval_size_vflow (rhs1, op1r);
+      if (ovf == wi::OVF_UNKNOWN)
+	return ovf;
+
+      optype = TREE_TYPE (rhs1);
+      tree rhs2 = gimple_assign_rhs2 (def);
+      ovf = eval_size_vflow (rhs2, op2r);
+      if (ovf == wi::OVF_UNKNOWN)
+	return ovf;
+
+      if (code == PLUS_EXPR
+	  && TYPE_UNSIGNED (optype)
+	  && TREE_CODE (rhs2) == INTEGER_CST)
+	{
+	  /* A - CST is transformed into A + (-CST).  Undo that to avoid
+	     false reports of overflow (this means overflow due to very
+	     large constants in the source code isn't detected.)  */
+	  tree sgn_type = signed_type_for (optype);
+	  tree max = TYPE_MAX_VALUE (sgn_type);
+	  wide_int smax = wi::to_wide (max, prec);
+	  if (wi::ltu_p (smax, op2r[0]))
+	    {
+	      op2r[0] = wi::neg (wi::sub (op2r[0], smax, UNSIGNED, &ovf));
+	      op2r[1] = op2r[0];
+	    }
+	}
+    }
+  else if (code == NOP_EXPR)
+    {
+      /* Strip (implicit) conversions.  Explicit conversions are stripped
+	 as well which may result in reporting overflow despite a cast.
+	 Those cases should be rare.  */
+      tree rhs1 = gimple_assign_rhs1 (def);
+      wi::overflow_type ovf = eval_size_vflow (rhs1, op1r);
+      if (ovf == wi::OVF_UNKNOWN)
+	return ovf;
+      optype = TREE_TYPE (rhs1);
+    }
+  else
+    {
+      wide_int min, max;
+      if (determine_value_range (exp, &min, &max) != VR_RANGE)
+	return wi::OVF_UNKNOWN;
+      optype = TREE_TYPE (exp);
+      op1r[0] = wide_int::from (min, prec, TYPE_SIGN (optype));
+      op1r[1] = wide_int::from (max, prec, TYPE_SIGN (optype));
+    }
+
+  wide_int umax = wi::to_wide (TYPE_MAX_VALUE (optype), prec);
+  tree sgn_type = signed_type_for (optype);
+  wide_int smax = wi::to_wide (TYPE_MAX_VALUE (sgn_type), prec);
+
+  wi::overflow_type ovf = wi::OVF_NONE;
+  if (code == MULT_EXPR)
+    {
+      /* Compute the upper bound of the result first, discarding any
+	 overflow.  Only the overflow in the lower bound matters.  */
+      range[1] = wi::mul (op1r[1], op2r[1], UNSIGNED, &ovf);
+      range[0] = wi::mul (op1r[0], op2r[0], UNSIGNED, &ovf);
+    }
+  else if (code == MINUS_EXPR)
+    {
+      range[1] = wi::sub (op1r[1], op2r[1], UNSIGNED, &ovf);
+      range[0] = wi::sub (op1r[0], op2r[0], UNSIGNED, &ovf);
+    }
+  else if (code == PLUS_EXPR)
+    {
+      signop sgn = UNSIGNED;
+      if (op2r[0] == op2r[1] && wi::ltu_p (smax, op2r[0]))
+	sgn = SIGNED;
+
+      range[1] = wi::add (op1r[1], op2r[1], sgn, &ovf);
+      range[0] = wi::add (op1r[0], op2r[0], sgn, &ovf);
+    }
+  else
+    {
+      range[0] = op1r[0];
+      range[1] = op1r[1];
+    }
+
+  if (ovf != wi::OVF_NONE)
+    return ovf;
+
+  /* Nothing can be determined from a range that spans zero.
+     TO DO: Assume a range with a negative lower bound a zero upper
+     bound overflows?  */
+  if (wi::sign_mask (range[0]) && !wi::sign_mask (range[1]))
+    return wi::OVF_UNKNOWN;
+
+  if (wi::ltu_p (umax, range[0]))
+    return wi::OVF_OVERFLOW;
+
+  umax = wi::to_wide (max_object_size (), prec);
+  if (wi::ltu_p (umax, range[0]))
+    return wi::OVF_OVERFLOW;
+
+  return wi::OVF_NONE;
+}
+
 /* Return true when EXP's range can be determined and set RANGE[] to it
    after adjusting it if necessary to make EXP a represents a valid size
    of object, or a valid size argument to an allocation function declared
@@ -1244,11 +1377,13 @@ alloc_max_size (void)
    manipulation function like memset.  When ALLOW_ZERO is true, allow
    returning a range of [0, 0] for a size in an anti-range [1, N] where
    N > PTRDIFF_MAX.  A zero range is a (nearly) invalid argument to
-   allocation functions like malloc but it is a valid argument to
-   functions like memset.  */
+   allocation functions like malloc but it is a valid argument to functions
+   like memset.  When nonnull, set *VFLOW_RESULT to the overflowing result
+   if the evaluating EXP results in overflow/wraparound, otherwise to null.  */
 
 bool
-get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
+get_size_range (tree exp, tree range[2], bool allow_zero /* = false */,
+		tree vflow_range[2] /* = NULL_TREE */)
 {
   if (!exp)
     return false;
@@ -1266,10 +1401,27 @@ get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
   wide_int min, max;
   enum value_range_kind range_type;
 
+  if (vflow_range)
+    {
+      /* Check for overflow or wraparound first.  */
+      wide_int vfr[2];
+      wi::overflow_type vflow = eval_size_vflow (exp, vfr);
+      if (vflow == wi::OVF_OVERFLOW || vflow == wi::OVF_UNDERFLOW)
+	{
+	  tree type = uint128_type_node ? uint128_type_node : sizetype;
+	  vflow_range[0] = wide_int_to_tree (type, vfr[0]);
+	  vflow_range[1] = wide_int_to_tree (type, vfr[1]);
+	}
+    }
+
   if (integral)
     range_type = determine_value_range (exp, &min, &max);
   else
-    range_type = VR_VARYING;
+    {
+      if (vflow_range)
+	vflow_range[0] = vflow_range[1] = NULL_TREE;
+      range_type = VR_VARYING;
+    }
 
   if (range_type == VR_VARYING)
     {
@@ -1373,6 +1525,8 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree args[2], int idx[2])
   /* Validate each argument individually.  */
   for (unsigned i = 0; i != 2 && args[i]; ++i)
     {
+      tree vflow_range[2] = { NULL_TREE, NULL_TREE };
+
       if (TREE_CODE (args[i]) == INTEGER_CST)
 	{
 	  argrange[i][0] = args[i];
@@ -1422,12 +1576,12 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree args[2], int idx[2])
 	    }
 	}
       else if (TREE_CODE (args[i]) == SSA_NAME
-	       && get_size_range (args[i], argrange[i]))
+	       && get_size_range (args[i], argrange[i], false, vflow_range))
 	{
 	  /* Verify that the argument's range is not negative (including
 	     upper bound of zero).  */
 	  if (tree_int_cst_lt (argrange[i][0], integer_zero_node)
-	      && tree_int_cst_le (argrange[i][1], integer_zero_node))
+		   && tree_int_cst_le (argrange[i][1], integer_zero_node))
 	    {
 	      warned = warning_at (loc, OPT_Walloc_size_larger_than_,
 				   "%Kargument %i range [%E, %E] is negative",
@@ -1443,6 +1597,23 @@ maybe_warn_alloc_args_overflow (tree fn, tree exp, tree args[2], int idx[2])
 				   argrange[i][0], argrange[i][1],
 				   maxobjsize);
 	    }
+	  /* Finally, diagnose integer overflow/wraparound. */
+	  else if (vflow_range[0])
+	    {
+	      if (tree_int_cst_equal (vflow_range[0], vflow_range[1]))
+		warned = warning_at (loc, OPT_Walloc_size_larger_than_,
+				     "%Kargument %i value %E is too large "
+				     "to represent in %qT",
+				     exp, idx[i] + 1, vflow_range[0],
+				     TREE_TYPE (args[i]));
+	      else
+		warned = warning_at (loc, OPT_Walloc_size_larger_than_,
+				     "%Kargument %i range [%E, %E] is too large "
+				     "to represent in %qT",
+				     exp, idx[i] + 1,
+				     vflow_range[0], vflow_range[1],
+				     TREE_TYPE (args[i]));
+	    }
 	}
     }
 
diff --git a/gcc/calls.h b/gcc/calls.h
index dfb951ca45b..cf0ac0c3eee 100644
--- a/gcc/calls.h
+++ b/gcc/calls.h
@@ -133,7 +133,7 @@ extern bool reference_callee_copied (CUMULATIVE_ARGS *,
 extern void maybe_warn_alloc_args_overflow (tree, tree, tree[2], int[2]);
 extern tree get_attr_nonstring_decl (tree, tree * = NULL);
 extern bool maybe_warn_nonstring_arg (tree, tree);
-extern bool get_size_range (tree, tree[2], bool = false);
+extern bool get_size_range (tree, tree[2], bool = false, tree[2] = NULL);
 extern rtx rtx_for_static_chain (const_tree, bool);
 extern bool cxx17_empty_base_field_p (const_tree);
 
diff --git a/gcc/testsuite/gcc.dg/Walloc-size-larger-than-19.c b/gcc/testsuite/gcc.dg/Walloc-size-larger-than-19.c
new file mode 100644
index 00000000000..d906354442b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-size-larger-than-19.c
@@ -0,0 +1,143 @@
+/* PR middle-end/96838 - missing warning on integer overflow in calls
+   to allocation functions
+   { dg-do compile }
+   { dg-options "-O1 -Wall -ftrack-macro-expansion=0" } */
+
+#define INT_MAX   __INT_MAX__
+#define UINT_MAX  (__INT_MAX__ * 2U + 1)
+#define SIZE_MAX  __SIZE_MAX__
+
+#define ATTR(...) __attribute__ ((__VA_ARGS__))
+
+typedef __SIZE_TYPE__   size_t;
+typedef __UINT32_TYPE__ uin32_t;
+
+void* malloc (size_t);
+
+void sink (void*);
+
+#define T(call) sink (call)
+
+ATTR (alloc_size (1))    void* fui1 (unsigned);
+ATTR (alloc_size (1, 2)) void* fui1_2 (unsigned, unsigned);
+
+void alloc_ui_mul_ui (unsigned n)
+{
+  if (n < UINT_MAX / 2)
+    n = UINT_MAX / 2;
+
+  T (fui1 (n));
+  T (fui1 (n * 2));
+  T (fui1 (n * 3));                 // { dg-warning "argument 1 range \\\[6442450941, 12884901885] is too large to represent in 'unsigned int'" }
+  T (fui1 (n * n));                 // { dg-warning "argument 1 range \\\[4611686014132420609, 18446744065119617025] is too large to represent in 'unsigned int'" }
+}
+
+void alloc_ui_mul_ui_plus (unsigned n, unsigned p1, int m1)
+{
+  if (n < 32)
+    n = 32;
+
+  T (fui1 (n));
+  T (fui1 (n * 2));
+  T (fui1 (n * 3));
+  T (fui1 (n * 4));
+  T (fui1 (n * (UINT_MAX / 32)));
+  T (fui1 (n * (UINT_MAX / 32) + 31));
+  T (fui1 (n * (UINT_MAX / 32) + 32));  // { dg-warning "argument 1 range \\\[4294967296, 576460747874238497] is too large to represent in 'unsigned int'" }
+
+  if (n < UINT_MAX / 4)
+    n = UINT_MAX / 4;
+
+  T (fui1 (n));
+  T (fui1 (n * 2));
+  T (fui1 (n * 3));
+  T (fui1 (n * 4));
+  T (fui1 (n * 4 + 1));
+  T (fui1 (n * 4 + 2));
+  T (fui1 (n * 4 + 3));
+  T (fui1 (n * 4 + 4));             // { dg-warning "argument 1 range \\\[4294967296, 17179869184] is too large to represent in 'unsigned int'" }
+
+  if (p1 < 1)
+    p1 = 1;
+
+  T (fui1 (n * 4 + p1));
+  T (fui1 (n * 4 + p1 + 1));
+  T (fui1 (n * 4 + p1 + 2));
+  T (fui1 (n * 4 + p1 + 3));        // { dg-warning "argument 1 range \\\[4294967296, 21474836478] is too large to represent in 'unsigned int'" }
+
+  T (fui1 (n * 4 - p1 + 1));
+  T (fui1 (n * 4 - p1 + 2));
+  T (fui1 (n * 4 - p1 + 3));
+  T (fui1 (n * 4 - p1 + 4));
+  T (fui1 (n * 4 - p1 + 5));        // { dg-warning "argument 1 range \\\[4294967296, 12884901890] is too large to represent in 'unsigned int'" }
+
+  if (m1 > -1)
+    m1 = -1;
+
+  T (fui1 (n * 1 + m1));
+  T (fui1 (n * 2 + m1));
+  T (fui1 (n * 3 + m1));
+  T (fui1 (n * 4 + m1));
+  T (fui1 (n * 4 + m1 + 1));
+  T (fui1 (n * 4 + m1 + 2));
+  T (fui1 (n * 4 + m1 + 4));
+}
+
+void alloc_ui_plus_ui (unsigned n)
+{
+  if (n < UINT_MAX - 2)
+    n = UINT_MAX - 2;
+
+  T (fui1 (n - 2));
+  T (fui1 (n - 1));
+  T (fui1 (n));
+  T (fui1 (n + 1));
+  T (fui1 (n + 2));
+  T (fui1 (n + 3));                 // { dg-warning "argument 1 range \\\[4294967296, 4294967298] is too large to represent in 'unsigned int'" }
+
+  T (fui1 (n + sizeof (char)));
+  T (fui1 (n + sizeof (char[2])));
+  T (fui1 (n + sizeof (char[3])));  // { dg-warning "argument 1 range \\\[4294967296, 4294967298] is too large to represent in 'unsigned int'" }
+  T (fui1 (n + sizeof (char[n])));  // { dg-warning "argument 1 range \\\[8589934586, 8589934590] is too large to represent in 'unsigned int'" }
+}
+
+void alloc_ui_plus_sz (size_t n)
+{
+  if (n < SIZE_MAX - 2)
+    n = SIZE_MAX - 2;
+
+  T (fui1 (n));                     // { dg-warning "argument 1 range \\\[18446744073709551613, 18446744073709551615] is too large to represent in 'unsigned int'" }
+  T (fui1 (n + 1));                 // { dg-warning "argument 1 range \\\[18446744073709551614, 0x10000000000000000] is too large to represent in 'unsigned int'" }
+  T (fui1 (n + 2));                 // { dg-warning "argument 1 range \\\[18446744073709551615, 0x10000000000000001] is too large to represent in 'unsigned int'" }
+  T (fui1 (n + 3));                 // { dg-warning "argument 1 range \\\[0x10000000000000000, 0x10000000000000002] is too large to represent in 'unsigned int'" }
+  T (fui1 (n + n));                 // { dg-warning "argument 1 range \\\[0x1fffffffffffffffa, 0x1fffffffffffffffe] is too large to represent in 'unsigned int'" }
+}
+
+void alloc_ui_mul_sz (size_t n, size_t nx)
+{
+  if (n < SIZE_MAX / 4)
+    n = SIZE_MAX / 4;
+
+  T (fui1 (n));
+  T (fui1 (n * 2));                 // { dg-warning "argument 1 range \\\[9223372036854775806, 0x1fffffffffffffffe] is too large to represent in 'unsigned int'" }
+  T (fui1 (n * 3));                 // { dg-warning "argument 1 range \\\[13835058055282163709, 0x2fffffffffffffffd] is too large to represent in 'unsigned int'" }
+
+  T (fui1 (sizeof (int) * (nx - 2)));
+  T (fui1 (sizeof (int) * (nx - 1)));
+  T (fui1 (sizeof (int) * nx));
+  T (fui1 (sizeof (int) * (nx + 1)));
+  T (fui1 (sizeof (int) * (nx + 2)));
+}
+
+void malloc_mul (int n, size_t nx)
+{
+  if (n > -1)
+    n = -1;
+
+  T (malloc (n * sizeof (int)));    // { dg-warning "argument 1 range \\\[18446744065119617024, 18446744073709551612] exceeds maximum object size" }
+
+  T (malloc (nx));
+  T (malloc (nx * 2));
+  T (malloc (nx * 3));
+  T (malloc (nx * 4));
+}
diff --git a/gcc/testsuite/gcc.dg/Walloc-size-larger-than-20.c b/gcc/testsuite/gcc.dg/Walloc-size-larger-than-20.c
new file mode 100644
index 00000000000..8353291e4bd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Walloc-size-larger-than-20.c
@@ -0,0 +1,38 @@
+/* PR middle-end/96838 - missing warning on integer overflow in calls
+   to allocation functions
+   Test case reduced from binutils/gas/sb.c.
+   { dg-do compile }
+   { dg-options "-O1 -Wall -ftrack-macro-expansion=0" } */
+
+typedef __SIZE_TYPE__    size_t;
+typedef __PTRDIFF_TYPE__ ssize_t;
+
+extern void* malloc (size_t);
+
+typedef struct sb
+{
+  size_t len;
+  size_t max;
+}
+sb;
+
+static void*
+sb_check (sb *ptr, size_t len)
+{
+  size_t want = ptr->len + len + (2 * sizeof (size_t)) + 1;
+  if ((ssize_t) want < 0) return 0;
+  size_t max = (size_t)1 << (8 * sizeof (want)
+			     - (sizeof (want) <= sizeof (long)
+				? __builtin_clzl ((long) want)
+				: __builtin_clzll ((long long) want)));
+
+  max -= (2 * sizeof (size_t)) + 1;
+  return malloc (max + 1);    // { dg-bogus "-Walloc-size-larger-than" }
+}
+
+
+
+void* sb_add_char (sb *ptr)
+{
+  return sb_check (ptr, 1);
+}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-15 19:47 [PATCH] warn for integer overflow in allocation calls (PR 96838) Martin Sebor
@ 2020-09-16  9:39 ` Bernhard Reutner-Fischer
  2020-09-17  3:23 ` Jeff Law
  1 sibling, 0 replies; 18+ messages in thread
From: Bernhard Reutner-Fischer @ 2020-09-16  9:39 UTC (permalink / raw)
  To: Martin Sebor, Martin Sebor via Gcc-patches, gcc-patches, Jeff Law

On 15 September 2020 21:47:46 CEST, Martin Sebor via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>can lead to a subsequent buffer overflow corrupting the heap or
>stack.  The attached patch diagnoses a subset of these cases where
>the overflow/wraparound is still detectable.
>
>Besides regtesting GCC on x86_64-linux I also verified the warning
>doesn't introduce any false positives into Glibc or Binutils/GDB
>builds on the same target.

+/* Try to evaluate the artithmetic EXPresssion representing the size of

s/EXPresssion/expression EXP/

You had a bit more s than strictly necessary..
thanks,

>
>Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-15 19:47 [PATCH] warn for integer overflow in allocation calls (PR 96838) Martin Sebor
  2020-09-16  9:39 ` Bernhard Reutner-Fischer
@ 2020-09-17  3:23 ` Jeff Law
  2020-09-17 16:08   ` Martin Sebor
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Law @ 2020-09-17  3:23 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1655 bytes --]


On 9/15/20 1:47 PM, Martin Sebor wrote:
> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
> can lead to a subsequent buffer overflow corrupting the heap or
> stack.  The attached patch diagnoses a subset of these cases where
> the overflow/wraparound is still detectable.
>
> Besides regtesting GCC on x86_64-linux I also verified the warning
> doesn't introduce any false positives into Glibc or Binutils/GDB
> builds on the same target.
>
> Martin
>
> gcc-96838.diff
>
> PR middle-end/96838 - missing warning on integer overflow in calls to allocation functions
>
> gcc/ChangeLog:
>
> 	PR middle-end/96838
> 	* calls.c (eval_size_vflow): New function.
> 	(get_size_range): Call it.  Add argument.
> 	(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
> 	* calls.h (get_size_range): Add argument.
>
> gcc/testsuite/ChangeLog:
>
> 	PR middle-end/96838
> 	* gcc.dg/Walloc-size-larger-than-19.c: New test.
> 	* gcc.dg/Walloc-size-larger-than-20.c: New test.

If an attacker can control an integer overflow that feeds an allocation,
then they can do all kinds of bad things.  In fact, when my son was
asking me attack vectors, this is one I said I'd look at if I were a bad
guy.


I'm a bit surprised you can't just query the range of the argument and
get the overflow status out of that range, but I don't see that in the
APIs.  How painful would it be to make that part of the API?  The
conceptual model would be to just ask for the range of the argument to
malloc which would include the range and a status bit indicating the
computation might have overflowed.


jeff



[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 1763 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-17  3:23 ` Jeff Law
@ 2020-09-17 16:08   ` Martin Sebor
  2020-09-17 18:39     ` Andrew MacLeod
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-09-17 16:08 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, Aldy Hernandez

On 9/16/20 9:23 PM, Jeff Law wrote:
> 
> On 9/15/20 1:47 PM, Martin Sebor wrote:
>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>> can lead to a subsequent buffer overflow corrupting the heap or
>> stack.  The attached patch diagnoses a subset of these cases where
>> the overflow/wraparound is still detectable.
>>
>> Besides regtesting GCC on x86_64-linux I also verified the warning
>> doesn't introduce any false positives into Glibc or Binutils/GDB
>> builds on the same target.
>>
>> Martin
>>
>> gcc-96838.diff
>>
>> PR middle-end/96838 - missing warning on integer overflow in calls to allocation functions
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/96838
>> 	* calls.c (eval_size_vflow): New function.
>> 	(get_size_range): Call it.  Add argument.
>> 	(maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>> 	* calls.h (get_size_range): Add argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR middle-end/96838
>> 	* gcc.dg/Walloc-size-larger-than-19.c: New test.
>> 	* gcc.dg/Walloc-size-larger-than-20.c: New test.
> 
> If an attacker can control an integer overflow that feeds an allocation, 
> then they can do all kinds of bad things.  In fact, when my son was 
> asking me attack vectors, this is one I said I'd look at if I were a bad 
> guy.
> 
> 
> I'm a bit surprised you can't just query the range of the argument and 
> get the overflow status out of that range, but I don't see that in the 
> APIs.  How painful would it be to make that part of the API?  The 
> conceptual model would be to just ask for the range of the argument to 
> malloc which would include the range and a status bit indicating the 
> computation might have overflowed.

I agree that being able to evaluate an expression in an as-if
infinite precision (in addition to its type) would be helpful.
I mentioned it to Aldy in response to the irange best practices
document.  He says there's no way to do that and no plans to
make it possible.

But just to make sure I understood correctly, let me ask again
using an example:

   void* f (size_t n)
   {
     if (n < PTRDIFF_MAX / 2)
       n = PTRDIFF_MAX / 2;

     return malloc (n * sizeof (int));
   }

Can the unsigned wraparound in the argument be readily detected?

On trunk, this ends up with the following:

   # RANGE [4611686018427387903, 18446744073709551615]
   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
   _1 = _6 * 4;
   ...
   p_5 = mallocD.1206 (_1); [tail call]
   ...
   return p_5;

so _1's range reflects the wraparound in size_t, but _6's range
has enough information to uncover it.  So detecting it is possible
and is done in the patch so we get a warning:

warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc] is 
too large to represent in ‘long unsigned int’ [-Walloc-size-larger-than=]
     6 |   return malloc (n * sizeof (int));
       |          ^~~~~~~~~~~~~~~~~~~~~~~~~

The code is very simplistic and only handles a small subset of cases.
It could be generalized and exposed by a more generic API but it does
seem like the ranger must already have all the logic built into it so
if it isn't exposed now it should be a matter of opening it up.

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-17 16:08   ` Martin Sebor
@ 2020-09-17 18:39     ` Andrew MacLeod
  2020-09-17 20:18       ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew MacLeod @ 2020-09-17 18:39 UTC (permalink / raw)
  To: Martin Sebor, Jeff Law, gcc-patches, Aldy Hernandez

On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
> On 9/16/20 9:23 PM, Jeff Law wrote:
>>
>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>> can lead to a subsequent buffer overflow corrupting the heap or
>>> stack.  The attached patch diagnoses a subset of these cases where
>>> the overflow/wraparound is still detectable.
>>>
>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>> builds on the same target.
>>>
>>> Martin
>>>
>>> gcc-96838.diff
>>>
>>> PR middle-end/96838 - missing warning on integer overflow in calls 
>>> to allocation functions
>>>
>>> gcc/ChangeLog:
>>>
>>>     PR middle-end/96838
>>>     * calls.c (eval_size_vflow): New function.
>>>     (get_size_range): Call it.  Add argument.
>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>     * calls.h (get_size_range): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     PR middle-end/96838
>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>
>> If an attacker can control an integer overflow that feeds an 
>> allocation, then they can do all kinds of bad things.  In fact, when 
>> my son was asking me attack vectors, this is one I said I'd look at 
>> if I were a bad guy.
>>
>>
>> I'm a bit surprised you can't just query the range of the argument 
>> and get the overflow status out of that range, but I don't see that 
>> in the APIs.  How painful would it be to make that part of the API?  
>> The conceptual model would be to just ask for the range of the 
>> argument to malloc which would include the range and a status bit 
>> indicating the computation might have overflowed.
>
  Do we know if it did/would have wrapped? sure.  since we have to do 
the math.    so you are correct in that the information is there. but is 
it useful?

We are in the very annoying habit of subtracting one by adding 
0xFFFFFFF.  which means you get an overflow for unsigned when you 
subtract one.   From what I have seen of unsigned math, we would be 
flagging very many operations as overflows, so you would still have the 
difficulty of figuring out whether its a "real" overflow or a fake one 
because of the way we do unsigned math

At the very start, I did have an overflow flag in the range class...  
but it was turning out to be fairly useless so it was removed.
.

> I agree that being able to evaluate an expression in an as-if
> infinite precision (in addition to its type) would be helpful.

SO again, we get back to adding 0x0fffff when we are trying to subtract 
one...  now, with infinite precision you are going to see

  [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now give 
you  [0x100000001, 0x100000009]    so its going to look like it overflowed?



> But just to make sure I understood correctly, let me ask again
> using an example:
>
>   void* f (size_t n)
>   {
>     if (n < PTRDIFF_MAX / 2)
>       n = PTRDIFF_MAX / 2;
>
>     return malloc (n * sizeof (int));
>   }
>
> Can the unsigned wraparound in the argument be readily detected?
>
> On trunk, this ends up with the following:
>
>   # RANGE [4611686018427387903, 18446744073709551615]
>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>   _1 = _6 * 4;
>   ...
>   p_5 = mallocD.1206 (_1); [tail call]
>   ...
>   return p_5;
>
> so _1's range reflects the wraparound in size_t, but _6's range
> has enough information to uncover it.  So detecting it is possible
> and is done in the patch so we get a warning:
>
> warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc] 
> is too large to represent in ‘long unsigned int’ 
> [-Walloc-size-larger-than=]
>     6 |   return malloc (n * sizeof (int));
>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>
> The code is very simplistic and only handles a small subset of cases.
> It could be generalized and exposed by a more generic API but it does
> seem like the ranger must already have all the logic built into it so
> if it isn't exposed now it should be a matter of opening it up.

everything is exposed in range-ops.  well, mostly.
if we have _1 = _6 * 4

if one wanted to do that infinite precision, you query the range for _6, 
and the range for 4 (which would be [4,4] :-)
range_of_expr (op1r, _6, stmt)
range_of_expr (op2r, 4, stmt)

you could take their current types, and cast those ranges to whatever 
the next higher precsion is,
range_cast  (op1r, highertype)
range_cast (op2r, highertype)
then invoke the operation on those parameters

gimple_range_fold (r, stmt,  op1r, op2r)

and that will do your operation in the higher precision.  you could 
compare that to the value in regular precision too i suppose.

The ranger is designed to track ranges as they are represented in the 
IL.  You are asking for us to interpret the IL as something other than 
what is there, and increase the precision.  Thats a different task.

could that be done?    maybe. It might involve parallel tracking of 
ssa-Name ranges in a higher precision... and recalculating every 
expression using those values.   Im not convinced that a generalized 
ranger which works in higher precision is really going to solve the 
problem the way you hope it would.. i think we'd get a lot of false info 
when unsigned values are involved

If I were to see a clear solution/description as to what is a 
problematic overflow and what isnt, there might be something more 
general that could be done...

> I mentioned it to Aldy in response to the irange best practices
> document.  He says there's no way to do that and no plans to
> make it possible.
>
Thats not quite what he said.   . He said  "If the operation may wrap or 
overflow, it will show up as so in the range.  I don't think we have any 
plans of changing this behavior."

In fact. looking back, he said the same thing I just did....  There IS a 
mechanism there for working in higher precision should one desire to do 
so, but as I pointed out above, Im not convinced as a general thing in 
the ranger it will work how you imagine.

My guess is what you really want is to be invoking range-ops on stmts 
you care about whether they might overflow or not with higher precision 
versions and then identify the cases which trigger that you actually 
care about.

So maybe you want an "overflow calculator" which is similar to the 
ranger, and  identifies which kinds of stmts expect certain range of 
results, and if they get results outside of  the expected ranges, 
triggers a flag.
maybe it could convert all unsigned and signed  values to signed, and 
then work in a higher precision,  and then if any intermediate result 
cant be accurately cast back to the original type,  it gets flagged?  is 
that the kind of overflow that is cared about?

I dont know... this isnt my space :-)       Im just telling you that I'm 
not sure the current range query mechanism is the right place to be 
asking if an "overflow" occurred because I think the general answer wont 
be useful.. too many kinds of overflows thrown together.

I think this requires more specifoc detailed information, and maybe a 
simple overflow analyzer will do the trick based on range-ops.... I 
think range-ops has the required abilities.  whether it needs some 
enhancing, or something else exposed,  I'm not sure yet


Andrew

PS  One could always take a ranger and overload the range_of_stmt() 
routine that calculates results, so that it calls current functionality, 
then try converting the arguements to  higher precision ranges and 
re-invoking range-ops on those arguments, get a new higher precision 
range back and then look at the results vs the normal ones.  Might be a 
place to start experimenting...   then pass a flag along sayign an 
overflow occurred.
.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-17 18:39     ` Andrew MacLeod
@ 2020-09-17 20:18       ` Martin Sebor
  2020-09-18  6:29         ` Aldy Hernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-09-17 20:18 UTC (permalink / raw)
  To: Andrew MacLeod, Jeff Law, gcc-patches, Aldy Hernandez

On 9/17/20 12:39 PM, Andrew MacLeod wrote:
> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>
>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>> the overflow/wraparound is still detectable.
>>>>
>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>> builds on the same target.
>>>>
>>>> Martin
>>>>
>>>> gcc-96838.diff
>>>>
>>>> PR middle-end/96838 - missing warning on integer overflow in calls 
>>>> to allocation functions
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>     PR middle-end/96838
>>>>     * calls.c (eval_size_vflow): New function.
>>>>     (get_size_range): Call it.  Add argument.
>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>     * calls.h (get_size_range): Add argument.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>     PR middle-end/96838
>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>
>>> If an attacker can control an integer overflow that feeds an 
>>> allocation, then they can do all kinds of bad things.  In fact, when 
>>> my son was asking me attack vectors, this is one I said I'd look at 
>>> if I were a bad guy.
>>>
>>>
>>> I'm a bit surprised you can't just query the range of the argument 
>>> and get the overflow status out of that range, but I don't see that 
>>> in the APIs.  How painful would it be to make that part of the API? 
>>> The conceptual model would be to just ask for the range of the 
>>> argument to malloc which would include the range and a status bit 
>>> indicating the computation might have overflowed.
>>
>   Do we know if it did/would have wrapped? sure.  since we have to do 
> the math.    so you are correct in that the information is there. but is 
> it useful?
> 
> We are in the very annoying habit of subtracting one by adding 
> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
> subtract one.   From what I have seen of unsigned math, we would be 
> flagging very many operations as overflows, so you would still have the 
> difficulty of figuring out whether its a "real" overflow or a fake one 
> because of the way we do unsigned math

You and me both :)

> 
> At the very start, I did have an overflow flag in the range class... but 
> it was turning out to be fairly useless so it was removed.
> .
> 
>> I agree that being able to evaluate an expression in an as-if
>> infinite precision (in addition to its type) would be helpful.
> 
> SO again, we get back to adding 0x0fffff when we are trying to subtract 
> one...  now, with infinite precision you are going to see
> 
>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now give 
> you  [0x100000001, 0x100000009]    so its going to look like it overflowed?
> 
> 
> 
>> But just to make sure I understood correctly, let me ask again
>> using an example:
>>
>>   void* f (size_t n)
>>   {
>>     if (n < PTRDIFF_MAX / 2)
>>       n = PTRDIFF_MAX / 2;
>>
>>     return malloc (n * sizeof (int));
>>   }
>>
>> Can the unsigned wraparound in the argument be readily detected?
>>
>> On trunk, this ends up with the following:
>>
>>   # RANGE [4611686018427387903, 18446744073709551615]
>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>   _1 = _6 * 4;
>>   ...
>>   p_5 = mallocD.1206 (_1); [tail call]
>>   ...
>>   return p_5;
>>
>> so _1's range reflects the wraparound in size_t, but _6's range
>> has enough information to uncover it.  So detecting it is possible
>> and is done in the patch so we get a warning:
>>
>> warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc] 
>> is too large to represent in ‘long unsigned int’ 
>> [-Walloc-size-larger-than=]
>>     6 |   return malloc (n * sizeof (int));
>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> The code is very simplistic and only handles a small subset of cases.
>> It could be generalized and exposed by a more generic API but it does
>> seem like the ranger must already have all the logic built into it so
>> if it isn't exposed now it should be a matter of opening it up.
> 
> everything is exposed in range-ops.  well, mostly.
> if we have _1 = _6 * 4
> 
> if one wanted to do that infinite precision, you query the range for _6, 
> and the range for 4 (which would be [4,4] :-)
> range_of_expr (op1r, _6, stmt)
> range_of_expr (op2r, 4, stmt)
> 
> you could take their current types, and cast those ranges to whatever 
> the next higher precsion is,
> range_cast  (op1r, highertype)
> range_cast (op2r, highertype)
> then invoke the operation on those parameters
> 
> gimple_range_fold (r, stmt,  op1r, op2r)
> 
> and that will do your operation in the higher precision.  you could 
> compare that to the value in regular precision too i suppose.

The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsigned expression wrapped simply by checking
to see if the mathematically correct result fits in its type.

It sounds like I should be able to use the range APIs above instead
and get the same result, but for arbitrary expressions.  I don't
need to (and very well can't) compare the infinitely precise result
to the regular result because when it's a range that wraps the result
is the full range of the type (i.e., [0, SIZE_MAX] in the test above).

> 
> The ranger is designed to track ranges as they are represented in the 
> IL.  You are asking for us to interpret the IL as something other than 
> what is there, and increase the precision.  Thats a different task.
> 
> could that be done?    maybe. It might involve parallel tracking of 
> ssa-Name ranges in a higher precision... and recalculating every 
> expression using those values.   Im not convinced that a generalized 
> ranger which works in higher precision is really going to solve the 
> problem the way you hope it would.. i think we'd get a lot of false info 
> when unsigned values are involved

I don't think that's necessary for unsigned wrapping.  Most of
the time the (possibly) wrapped result is what we need because
that's what the languages say is supposed to happen.  The cases
when we need to check for the dangerous wraparound (or overflow)
should be rare (just allocation calls) and can be handled on
demand by walking the IL and doing the computation in the infinite
precision.

For signed overflow, though, computing the mathematically correct
result seems preferable.  Or at least tracking (and propagating)
the overflow bit should be.  That way we could tell when
an expression is definitely invalid.

> 
> If I were to see a clear solution/description as to what is a 
> problematic overflow and what isnt, there might be something more 
> general that could be done...

Does the distinction above (unsigned wrapping/signed overflow) help?

> 
>> I mentioned it to Aldy in response to the irange best practices
>> document.  He says there's no way to do that and no plans to
>> make it possible.
>>
> Thats not quite what he said.   . He said  "If the operation may wrap or 
> overflow, it will show up as so in the range.  I don't think we have any 
> plans of changing this behavior."
> 
> In fact. looking back, he said the same thing I just did....  There IS a 
> mechanism there for working in higher precision should one desire to do 
> so, but as I pointed out above, Im not convinced as a general thing in 
> the ranger it will work how you imagine.
> 
> My guess is what you really want is to be invoking range-ops on stmts 
> you care about whether they might overflow or not with higher precision 
> versions and then identify the cases which trigger that you actually 
> care about.

Yes, it looks like that's the way to go.  I was hoping the ranger
had an API that would do all that for me (i.e., walk the IL and
compute the range of the result on demand in the precision of
my choice), since it has to do that as it is, except that it uses
the precision of the expression.  But if not, it sounds like with
range_ops I should be able to fairly easily write one myself.

> So maybe you want an "overflow calculator" which is similar to the 
> ranger, and  identifies which kinds of stmts expect certain range of 
> results, and if they get results outside of  the expected ranges, 
> triggers a flag.
> maybe it could convert all unsigned and signed  values to signed, and 
> then work in a higher precision,  and then if any intermediate result 
> cant be accurately cast back to the original type,  it gets flagged?  is 
> that the kind of overflow that is cared about?

There are two kinds of "overflow:" unsigned wrapping and signed
overflow.  Both are a problem in allocation calls like:

   int *p = malloc (nelts * size);

If nelts and size are signed, the compile-time result seems to
be done in saturation arithmetic (its range looks to be capped
at TYPE_MAX).  If they are unsigned, they wrap around zero.
Neither is usually intended by the user but the wraparound is
the more dangerous of the two because it's well-defined and
because it shrinks the size to a very small number.

> 
> I dont know... this isnt my space :-)       Im just telling you that I'm 
> not sure the current range query mechanism is the right place to be 
> asking if an "overflow" occurred because I think the general answer wont 
> be useful.. too many kinds of overflows thrown together.
> 
> I think this requires more specifoc detailed information, and maybe a 
> simple overflow analyzer will do the trick based on range-ops.... I 
> think range-ops has the required abilities.  whether it needs some 
> enhancing, or something else exposed,  I'm not sure yet

Thank you!  This was a useful clarification to what Aldy said.
I'd been meaning to follow up with him on this point when I was
done with what I'm working on but Jeff's question just got me
the answers I was looking for quicker.  Let me experiment with
range_ops and get back to you.

> 
> 
> Andrew
> 
> PS  One could always take a ranger and overload the range_of_stmt() 
> routine that calculates results, so that it calls current functionality, 
> then try converting the arguements to  higher precision ranges and 
> re-invoking range-ops on those arguments, get a new higher precision 
> range back and then look at the results vs the normal ones.  Might be a 
> place to start experimenting...   then pass a flag along sayign an 
> overflow occurred.

That sounds like something to look into.  I don't see range_of_stmt
on trunk.  Is that the right name or has it not been merged yet?

Thanks again.
Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-17 20:18       ` Martin Sebor
@ 2020-09-18  6:29         ` Aldy Hernandez
  2020-09-19 21:22           ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Aldy Hernandez @ 2020-09-18  6:29 UTC (permalink / raw)
  To: Martin Sebor, Andrew MacLeod, Jeff Law, gcc-patches



On 9/17/20 10:18 PM, Martin Sebor wrote:
> On 9/17/20 12:39 PM, Andrew MacLeod wrote:
>> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>>
>>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>>> the overflow/wraparound is still detectable.
>>>>>
>>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>>> builds on the same target.
>>>>>
>>>>> Martin
>>>>>
>>>>> gcc-96838.diff
>>>>>
>>>>> PR middle-end/96838 - missing warning on integer overflow in calls 
>>>>> to allocation functions
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>     PR middle-end/96838
>>>>>     * calls.c (eval_size_vflow): New function.
>>>>>     (get_size_range): Call it.  Add argument.
>>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>>     * calls.h (get_size_range): Add argument.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>     PR middle-end/96838
>>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>>
>>>> If an attacker can control an integer overflow that feeds an 
>>>> allocation, then they can do all kinds of bad things.  In fact, when 
>>>> my son was asking me attack vectors, this is one I said I'd look at 
>>>> if I were a bad guy.
>>>>
>>>>
>>>> I'm a bit surprised you can't just query the range of the argument 
>>>> and get the overflow status out of that range, but I don't see that 
>>>> in the APIs.  How painful would it be to make that part of the API? 
>>>> The conceptual model would be to just ask for the range of the 
>>>> argument to malloc which would include the range and a status bit 
>>>> indicating the computation might have overflowed.
>>>
>>   Do we know if it did/would have wrapped? sure.  since we have to do 
>> the math.    so you are correct in that the information is there. but 
>> is it useful?
>>
>> We are in the very annoying habit of subtracting one by adding 
>> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
>> subtract one.   From what I have seen of unsigned math, we would be 
>> flagging very many operations as overflows, so you would still have 
>> the difficulty of figuring out whether its a "real" overflow or a fake 
>> one because of the way we do unsigned math
> 
> You and me both :)
> 
>>
>> At the very start, I did have an overflow flag in the range class... 
>> but it was turning out to be fairly useless so it was removed.
>> .
>>
>>> I agree that being able to evaluate an expression in an as-if
>>> infinite precision (in addition to its type) would be helpful.
>>
>> SO again, we get back to adding 0x0fffff when we are trying to 
>> subtract one...  now, with infinite precision you are going to see
>>
>>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now give 
>> you  [0x100000001, 0x100000009]    so its going to look like it 
>> overflowed?
>>
>>
>>
>>> But just to make sure I understood correctly, let me ask again
>>> using an example:
>>>
>>>   void* f (size_t n)
>>>   {
>>>     if (n < PTRDIFF_MAX / 2)
>>>       n = PTRDIFF_MAX / 2;
>>>
>>>     return malloc (n * sizeof (int));
>>>   }
>>>
>>> Can the unsigned wraparound in the argument be readily detected?
>>>
>>> On trunk, this ends up with the following:
>>>
>>>   # RANGE [4611686018427387903, 18446744073709551615]
>>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>>   _1 = _6 * 4;
>>>   ...
>>>   p_5 = mallocD.1206 (_1); [tail call]
>>>   ...
>>>   return p_5;
>>>
>>> so _1's range reflects the wraparound in size_t, but _6's range
>>> has enough information to uncover it.  So detecting it is possible
>>> and is done in the patch so we get a warning:
>>>
>>> warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc] 
>>> is too large to represent in ‘long unsigned int’ 
>>> [-Walloc-size-larger-than=]
>>>     6 |   return malloc (n * sizeof (int));
>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> The code is very simplistic and only handles a small subset of cases.
>>> It could be generalized and exposed by a more generic API but it does
>>> seem like the ranger must already have all the logic built into it so
>>> if it isn't exposed now it should be a matter of opening it up.
>>
>> everything is exposed in range-ops.  well, mostly.
>> if we have _1 = _6 * 4
>>
>> if one wanted to do that infinite precision, you query the range for 
>> _6, and the range for 4 (which would be [4,4] :-)
>> range_of_expr (op1r, _6, stmt)
>> range_of_expr (op2r, 4, stmt)
>>
>> you could take their current types, and cast those ranges to whatever 
>> the next higher precsion is,
>> range_cast  (op1r, highertype)
>> range_cast (op2r, highertype)
>> then invoke the operation on those parameters
>>
>> gimple_range_fold (r, stmt,  op1r, op2r)
>>
>> and that will do your operation in the higher precision.  you could 
>> compare that to the value in regular precision too i suppose.
> 
> The patch does pretty much exactly what you described, except in
> offset_int, and only for a limited set of arithmetic operations.
> It figures out if an unsigned expression wrapped simply by checking
> to see if the mathematically correct result fits in its type.
> 
> It sounds like I should be able to use the range APIs above instead
> and get the same result, but for arbitrary expressions.  I don't
> need to (and very well can't) compare the infinitely precise result
> to the regular result because when it's a range that wraps the result
> is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
> 
>>
>> The ranger is designed to track ranges as they are represented in the 
>> IL.  You are asking for us to interpret the IL as something other than 
>> what is there, and increase the precision.  Thats a different task.
>>
>> could that be done?    maybe. It might involve parallel tracking of 
>> ssa-Name ranges in a higher precision... and recalculating every 
>> expression using those values.   Im not convinced that a generalized 
>> ranger which works in higher precision is really going to solve the 
>> problem the way you hope it would.. i think we'd get a lot of false 
>> info when unsigned values are involved
> 
> I don't think that's necessary for unsigned wrapping.  Most of
> the time the (possibly) wrapped result is what we need because
> that's what the languages say is supposed to happen.  The cases
> when we need to check for the dangerous wraparound (or overflow)
> should be rare (just allocation calls) and can be handled on
> demand by walking the IL and doing the computation in the infinite
> precision.
> 
> For signed overflow, though, computing the mathematically correct
> result seems preferable.  Or at least tracking (and propagating)
> the overflow bit should be.  That way we could tell when
> an expression is definitely invalid.
> 
>>
>> If I were to see a clear solution/description as to what is a 
>> problematic overflow and what isnt, there might be something more 
>> general that could be done...
> 
> Does the distinction above (unsigned wrapping/signed overflow) help?
> 
>>
>>> I mentioned it to Aldy in response to the irange best practices
>>> document.  He says there's no way to do that and no plans to
>>> make it possible.
>>>
>> Thats not quite what he said.   . He said  "If the operation may wrap 
>> or overflow, it will show up as so in the range.  I don't think we 
>> have any plans of changing this behavior."
>>
>> In fact. looking back, he said the same thing I just did....  There IS 
>> a mechanism there for working in higher precision should one desire to 
>> do so, but as I pointed out above, Im not convinced as a general thing 
>> in the ranger it will work how you imagine.
>>
>> My guess is what you really want is to be invoking range-ops on stmts 
>> you care about whether they might overflow or not with higher 
>> precision versions and then identify the cases which trigger that you 
>> actually care about.
> 
> Yes, it looks like that's the way to go.  I was hoping the ranger
> had an API that would do all that for me (i.e., walk the IL and
> compute the range of the result on demand in the precision of
> my choice), since it has to do that as it is, except that it uses
> the precision of the expression.  But if not, it sounds like with
> range_ops I should be able to fairly easily write one myself.

The range_ops class is too low-level for this.  It folds pairs of 
ranges, and has no concept of gimple or tree expressions.  To roll your 
own solution you'd have to walk the IL, parse the gimple statements and 
then call range-ops on your adjusted arguments.  A better approach would 
be to overload range_of_stmt as Andrew suggested (see below).

> 
>> So maybe you want an "overflow calculator" which is similar to the 
>> ranger, and  identifies which kinds of stmts expect certain range of 
>> results, and if they get results outside of  the expected ranges, 
>> triggers a flag.
>> maybe it could convert all unsigned and signed  values to signed, and 
>> then work in a higher precision,  and then if any intermediate result 
>> cant be accurately cast back to the original type,  it gets flagged?  
>> is that the kind of overflow that is cared about?
> 
> There are two kinds of "overflow:" unsigned wrapping and signed
> overflow.  Both are a problem in allocation calls like:
> 
>    int *p = malloc (nelts * size);
> 
> If nelts and size are signed, the compile-time result seems to
> be done in saturation arithmetic (its range looks to be capped
> at TYPE_MAX).  If they are unsigned, they wrap around zero.
> Neither is usually intended by the user but the wraparound is
> the more dangerous of the two because it's well-defined and
> because it shrinks the size to a very small number.
> 
>>
>> I dont know... this isnt my space :-)       Im just telling you that 
>> I'm not sure the current range query mechanism is the right place to 
>> be asking if an "overflow" occurred because I think the general answer 
>> wont be useful.. too many kinds of overflows thrown together.
>>
>> I think this requires more specifoc detailed information, and maybe a 
>> simple overflow analyzer will do the trick based on range-ops.... I 
>> think range-ops has the required abilities.  whether it needs some 
>> enhancing, or something else exposed,  I'm not sure yet
> 
> Thank you!  This was a useful clarification to what Aldy said.
> I'd been meaning to follow up with him on this point when I was
> done with what I'm working on but Jeff's question just got me
> the answers I was looking for quicker.  Let me experiment with
> range_ops and get back to you.
> 
>>
>>
>> Andrew
>>
>> PS  One could always take a ranger and overload the range_of_stmt() 
>> routine that calculates results, so that it calls current 
>> functionality, then try converting the arguements to  higher precision 
>> ranges and re-invoking range-ops on those arguments, get a new higher 
>> precision range back and then look at the results vs the normal ones.  
>> Might be a place to start experimenting...   then pass a flag along 
>> sayign an overflow occurred.
> 
> That sounds like something to look into.  I don't see range_of_stmt

This would be my preferred approach.  It's clean and builds on top of 
the current infrastructure without polluting other users of the ranger. 
It definitely sounds like something worth pursuing.

 > on trunk.  Is that the right name or has it not been merged yet?

This is in the ranger proper which should come in, in the next week or 
two (??).  You can see it in our staging branch 
(users/aldyh/ranger-staging):

class gimple_ranger : public range_query
{
public:
   gimple_ranger (bool use_loop_info) : m_use_loop_info (use_loop_info) { }
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) 
OVERRIDE;
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
OVERRIDE;
   virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
   virtual void range_on_entry (irange &r, basic_block bb, tree name);
   virtual void range_on_exit (irange &r, basic_block bb, tree name);
...
...
};

The idea is that range_of_stmt() is called for all statements as the IL 
is walked backwards.  You could just overload this, build a throwaway 
statement you could pass to gimple_ranger::range_of_stmt, and work from 
there.

Aldy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-18  6:29         ` Aldy Hernandez
@ 2020-09-19 21:22           ` Martin Sebor
  2020-09-20  6:39             ` Aldy Hernandez
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-09-19 21:22 UTC (permalink / raw)
  To: Aldy Hernandez, Andrew MacLeod, Jeff Law, gcc-patches

On 9/18/20 12:29 AM, Aldy Hernandez wrote:
> 
> 
> On 9/17/20 10:18 PM, Martin Sebor wrote:
>> On 9/17/20 12:39 PM, Andrew MacLeod wrote:
>>> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>>>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>>>
>>>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>>>> the overflow/wraparound is still detectable.
>>>>>>
>>>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>>>> builds on the same target.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> gcc-96838.diff
>>>>>>
>>>>>> PR middle-end/96838 - missing warning on integer overflow in calls 
>>>>>> to allocation functions
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>     PR middle-end/96838
>>>>>>     * calls.c (eval_size_vflow): New function.
>>>>>>     (get_size_range): Call it.  Add argument.
>>>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>>>     * calls.h (get_size_range): Add argument.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>     PR middle-end/96838
>>>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>>>
>>>>> If an attacker can control an integer overflow that feeds an 
>>>>> allocation, then they can do all kinds of bad things.  In fact, 
>>>>> when my son was asking me attack vectors, this is one I said I'd 
>>>>> look at if I were a bad guy.
>>>>>
>>>>>
>>>>> I'm a bit surprised you can't just query the range of the argument 
>>>>> and get the overflow status out of that range, but I don't see that 
>>>>> in the APIs.  How painful would it be to make that part of the API? 
>>>>> The conceptual model would be to just ask for the range of the 
>>>>> argument to malloc which would include the range and a status bit 
>>>>> indicating the computation might have overflowed.
>>>>
>>>   Do we know if it did/would have wrapped? sure.  since we have to do 
>>> the math.    so you are correct in that the information is there. but 
>>> is it useful?
>>>
>>> We are in the very annoying habit of subtracting one by adding 
>>> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
>>> subtract one.   From what I have seen of unsigned math, we would be 
>>> flagging very many operations as overflows, so you would still have 
>>> the difficulty of figuring out whether its a "real" overflow or a 
>>> fake one because of the way we do unsigned math
>>
>> You and me both :)
>>
>>>
>>> At the very start, I did have an overflow flag in the range class... 
>>> but it was turning out to be fairly useless so it was removed.
>>> .
>>>
>>>> I agree that being able to evaluate an expression in an as-if
>>>> infinite precision (in addition to its type) would be helpful.
>>>
>>> SO again, we get back to adding 0x0fffff when we are trying to 
>>> subtract one...  now, with infinite precision you are going to see
>>>
>>>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now 
>>> give you  [0x100000001, 0x100000009]    so its going to look like it 
>>> overflowed?
>>>
>>>
>>>
>>>> But just to make sure I understood correctly, let me ask again
>>>> using an example:
>>>>
>>>>   void* f (size_t n)
>>>>   {
>>>>     if (n < PTRDIFF_MAX / 2)
>>>>       n = PTRDIFF_MAX / 2;
>>>>
>>>>     return malloc (n * sizeof (int));
>>>>   }
>>>>
>>>> Can the unsigned wraparound in the argument be readily detected?
>>>>
>>>> On trunk, this ends up with the following:
>>>>
>>>>   # RANGE [4611686018427387903, 18446744073709551615]
>>>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>>>   _1 = _6 * 4;
>>>>   ...
>>>>   p_5 = mallocD.1206 (_1); [tail call]
>>>>   ...
>>>>   return p_5;
>>>>
>>>> so _1's range reflects the wraparound in size_t, but _6's range
>>>> has enough information to uncover it.  So detecting it is possible
>>>> and is done in the patch so we get a warning:
>>>>
>>>> warning: argument 1 range [18446744073709551612, 
>>>> 0x3fffffffffffffffc] is too large to represent in ‘long unsigned 
>>>> int’ [-Walloc-size-larger-than=]
>>>>     6 |   return malloc (n * sizeof (int));
>>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> The code is very simplistic and only handles a small subset of cases.
>>>> It could be generalized and exposed by a more generic API but it does
>>>> seem like the ranger must already have all the logic built into it so
>>>> if it isn't exposed now it should be a matter of opening it up.
>>>
>>> everything is exposed in range-ops.  well, mostly.
>>> if we have _1 = _6 * 4
>>>
>>> if one wanted to do that infinite precision, you query the range for 
>>> _6, and the range for 4 (which would be [4,4] :-)
>>> range_of_expr (op1r, _6, stmt)
>>> range_of_expr (op2r, 4, stmt)
>>>
>>> you could take their current types, and cast those ranges to whatever 
>>> the next higher precsion is,
>>> range_cast  (op1r, highertype)
>>> range_cast (op2r, highertype)
>>> then invoke the operation on those parameters
>>>
>>> gimple_range_fold (r, stmt,  op1r, op2r)
>>>
>>> and that will do your operation in the higher precision.  you could 
>>> compare that to the value in regular precision too i suppose.
>>
>> The patch does pretty much exactly what you described, except in
>> offset_int, and only for a limited set of arithmetic operations.
>> It figures out if an unsigned expression wrapped simply by checking
>> to see if the mathematically correct result fits in its type.
>>
>> It sounds like I should be able to use the range APIs above instead
>> and get the same result, but for arbitrary expressions.  I don't
>> need to (and very well can't) compare the infinitely precise result
>> to the regular result because when it's a range that wraps the result
>> is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
>>
>>>
>>> The ranger is designed to track ranges as they are represented in the 
>>> IL.  You are asking for us to interpret the IL as something other 
>>> than what is there, and increase the precision.  Thats a different task.
>>>
>>> could that be done?    maybe. It might involve parallel tracking of 
>>> ssa-Name ranges in a higher precision... and recalculating every 
>>> expression using those values.   Im not convinced that a generalized 
>>> ranger which works in higher precision is really going to solve the 
>>> problem the way you hope it would.. i think we'd get a lot of false 
>>> info when unsigned values are involved
>>
>> I don't think that's necessary for unsigned wrapping.  Most of
>> the time the (possibly) wrapped result is what we need because
>> that's what the languages say is supposed to happen.  The cases
>> when we need to check for the dangerous wraparound (or overflow)
>> should be rare (just allocation calls) and can be handled on
>> demand by walking the IL and doing the computation in the infinite
>> precision.
>>
>> For signed overflow, though, computing the mathematically correct
>> result seems preferable.  Or at least tracking (and propagating)
>> the overflow bit should be.  That way we could tell when
>> an expression is definitely invalid.
>>
>>>
>>> If I were to see a clear solution/description as to what is a 
>>> problematic overflow and what isnt, there might be something more 
>>> general that could be done...
>>
>> Does the distinction above (unsigned wrapping/signed overflow) help?
>>
>>>
>>>> I mentioned it to Aldy in response to the irange best practices
>>>> document.  He says there's no way to do that and no plans to
>>>> make it possible.
>>>>
>>> Thats not quite what he said.   . He said  "If the operation may wrap 
>>> or overflow, it will show up as so in the range.  I don't think we 
>>> have any plans of changing this behavior."
>>>
>>> In fact. looking back, he said the same thing I just did....  There 
>>> IS a mechanism there for working in higher precision should one 
>>> desire to do so, but as I pointed out above, Im not convinced as a 
>>> general thing in the ranger it will work how you imagine.
>>>
>>> My guess is what you really want is to be invoking range-ops on stmts 
>>> you care about whether they might overflow or not with higher 
>>> precision versions and then identify the cases which trigger that you 
>>> actually care about.
>>
>> Yes, it looks like that's the way to go.  I was hoping the ranger
>> had an API that would do all that for me (i.e., walk the IL and
>> compute the range of the result on demand in the precision of
>> my choice), since it has to do that as it is, except that it uses
>> the precision of the expression.  But if not, it sounds like with
>> range_ops I should be able to fairly easily write one myself.
> 
> The range_ops class is too low-level for this.  It folds pairs of 
> ranges, and has no concept of gimple or tree expressions.  To roll your 
> own solution you'd have to walk the IL, parse the gimple statements and 
> then call range-ops on your adjusted arguments.  A better approach would 
> be to overload range_of_stmt as Andrew suggested (see below).
> 
>>
>>> So maybe you want an "overflow calculator" which is similar to the 
>>> ranger, and  identifies which kinds of stmts expect certain range of 
>>> results, and if they get results outside of  the expected ranges, 
>>> triggers a flag.
>>> maybe it could convert all unsigned and signed  values to signed, and 
>>> then work in a higher precision,  and then if any intermediate result 
>>> cant be accurately cast back to the original type,  it gets flagged? 
>>> is that the kind of overflow that is cared about?
>>
>> There are two kinds of "overflow:" unsigned wrapping and signed
>> overflow.  Both are a problem in allocation calls like:
>>
>>    int *p = malloc (nelts * size);
>>
>> If nelts and size are signed, the compile-time result seems to
>> be done in saturation arithmetic (its range looks to be capped
>> at TYPE_MAX).  If they are unsigned, they wrap around zero.
>> Neither is usually intended by the user but the wraparound is
>> the more dangerous of the two because it's well-defined and
>> because it shrinks the size to a very small number.
>>
>>>
>>> I dont know... this isnt my space :-)       Im just telling you that 
>>> I'm not sure the current range query mechanism is the right place to 
>>> be asking if an "overflow" occurred because I think the general 
>>> answer wont be useful.. too many kinds of overflows thrown together.
>>>
>>> I think this requires more specifoc detailed information, and maybe a 
>>> simple overflow analyzer will do the trick based on range-ops.... I 
>>> think range-ops has the required abilities.  whether it needs some 
>>> enhancing, or something else exposed,  I'm not sure yet
>>
>> Thank you!  This was a useful clarification to what Aldy said.
>> I'd been meaning to follow up with him on this point when I was
>> done with what I'm working on but Jeff's question just got me
>> the answers I was looking for quicker.  Let me experiment with
>> range_ops and get back to you.
>>
>>>
>>>
>>> Andrew
>>>
>>> PS  One could always take a ranger and overload the range_of_stmt() 
>>> routine that calculates results, so that it calls current 
>>> functionality, then try converting the arguements to  higher 
>>> precision ranges and re-invoking range-ops on those arguments, get a 
>>> new higher precision range back and then look at the results vs the 
>>> normal ones. Might be a place to start experimenting...   then pass a 
>>> flag along sayign an overflow occurred.
>>
>> That sounds like something to look into.  I don't see range_of_stmt
> 
> This would be my preferred approach.  It's clean and builds on top of 
> the current infrastructure without polluting other users of the ranger. 
> It definitely sounds like something worth pursuing.
> 
>  > on trunk.  Is that the right name or has it not been merged yet?
> 
> This is in the ranger proper which should come in, in the next week or 
> two (??).  You can see it in our staging branch 
> (users/aldyh/ranger-staging):
> 
> class gimple_ranger : public range_query
> {
> public:
>    gimple_ranger (bool use_loop_info) : m_use_loop_info (use_loop_info) { }
>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) 
> OVERRIDE;
>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
> OVERRIDE;
>    virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
>    virtual void range_on_entry (irange &r, basic_block bb, tree name);
>    virtual void range_on_exit (irange &r, basic_block bb, tree name);
> ...
> ...
> };
> 
> The idea is that range_of_stmt() is called for all statements as the IL 
> is walked backwards.  You could just overload this, build a throwaway 
> statement you could pass to gimple_ranger::range_of_stmt, and work from 
> there.

Thanks for the tutorial!  I see the patch you posted with this API
so I'll give it a try once it's in.

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-19 21:22           ` Martin Sebor
@ 2020-09-20  6:39             ` Aldy Hernandez
  2020-09-21 15:13               ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Aldy Hernandez @ 2020-09-20  6:39 UTC (permalink / raw)
  To: Martin Sebor, Andrew MacLeod, Jeff Law, gcc-patches



On 9/19/20 11:22 PM, Martin Sebor wrote:
> On 9/18/20 12:29 AM, Aldy Hernandez wrote:
>>
>>
>> On 9/17/20 10:18 PM, Martin Sebor wrote:
>>> On 9/17/20 12:39 PM, Andrew MacLeod wrote:
>>>> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>>>>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>>>>
>>>>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>>>>> the overflow/wraparound is still detectable.
>>>>>>>
>>>>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>>>>> builds on the same target.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> gcc-96838.diff
>>>>>>>
>>>>>>> PR middle-end/96838 - missing warning on integer overflow in 
>>>>>>> calls to allocation functions
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/96838
>>>>>>>     * calls.c (eval_size_vflow): New function.
>>>>>>>     (get_size_range): Call it.  Add argument.
>>>>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>>>>     * calls.h (get_size_range): Add argument.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/96838
>>>>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>>>>
>>>>>> If an attacker can control an integer overflow that feeds an 
>>>>>> allocation, then they can do all kinds of bad things.  In fact, 
>>>>>> when my son was asking me attack vectors, this is one I said I'd 
>>>>>> look at if I were a bad guy.
>>>>>>
>>>>>>
>>>>>> I'm a bit surprised you can't just query the range of the argument 
>>>>>> and get the overflow status out of that range, but I don't see 
>>>>>> that in the APIs.  How painful would it be to make that part of 
>>>>>> the API? The conceptual model would be to just ask for the range 
>>>>>> of the argument to malloc which would include the range and a 
>>>>>> status bit indicating the computation might have overflowed.
>>>>>
>>>>   Do we know if it did/would have wrapped? sure.  since we have to 
>>>> do the math.    so you are correct in that the information is there. 
>>>> but is it useful?
>>>>
>>>> We are in the very annoying habit of subtracting one by adding 
>>>> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
>>>> subtract one.   From what I have seen of unsigned math, we would be 
>>>> flagging very many operations as overflows, so you would still have 
>>>> the difficulty of figuring out whether its a "real" overflow or a 
>>>> fake one because of the way we do unsigned math
>>>
>>> You and me both :)
>>>
>>>>
>>>> At the very start, I did have an overflow flag in the range class... 
>>>> but it was turning out to be fairly useless so it was removed.
>>>> .
>>>>
>>>>> I agree that being able to evaluate an expression in an as-if
>>>>> infinite precision (in addition to its type) would be helpful.
>>>>
>>>> SO again, we get back to adding 0x0fffff when we are trying to 
>>>> subtract one...  now, with infinite precision you are going to see
>>>>
>>>>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now 
>>>> give you  [0x100000001, 0x100000009]    so its going to look like it 
>>>> overflowed?
>>>>
>>>>
>>>>
>>>>> But just to make sure I understood correctly, let me ask again
>>>>> using an example:
>>>>>
>>>>>   void* f (size_t n)
>>>>>   {
>>>>>     if (n < PTRDIFF_MAX / 2)
>>>>>       n = PTRDIFF_MAX / 2;
>>>>>
>>>>>     return malloc (n * sizeof (int));
>>>>>   }
>>>>>
>>>>> Can the unsigned wraparound in the argument be readily detected?
>>>>>
>>>>> On trunk, this ends up with the following:
>>>>>
>>>>>   # RANGE [4611686018427387903, 18446744073709551615]
>>>>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>>>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>>>>   _1 = _6 * 4;
>>>>>   ...
>>>>>   p_5 = mallocD.1206 (_1); [tail call]
>>>>>   ...
>>>>>   return p_5;
>>>>>
>>>>> so _1's range reflects the wraparound in size_t, but _6's range
>>>>> has enough information to uncover it.  So detecting it is possible
>>>>> and is done in the patch so we get a warning:
>>>>>
>>>>> warning: argument 1 range [18446744073709551612, 
>>>>> 0x3fffffffffffffffc] is too large to represent in ‘long unsigned 
>>>>> int’ [-Walloc-size-larger-than=]
>>>>>     6 |   return malloc (n * sizeof (int));
>>>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> The code is very simplistic and only handles a small subset of cases.
>>>>> It could be generalized and exposed by a more generic API but it does
>>>>> seem like the ranger must already have all the logic built into it so
>>>>> if it isn't exposed now it should be a matter of opening it up.
>>>>
>>>> everything is exposed in range-ops.  well, mostly.
>>>> if we have _1 = _6 * 4
>>>>
>>>> if one wanted to do that infinite precision, you query the range for 
>>>> _6, and the range for 4 (which would be [4,4] :-)
>>>> range_of_expr (op1r, _6, stmt)
>>>> range_of_expr (op2r, 4, stmt)
>>>>
>>>> you could take their current types, and cast those ranges to 
>>>> whatever the next higher precsion is,
>>>> range_cast  (op1r, highertype)
>>>> range_cast (op2r, highertype)
>>>> then invoke the operation on those parameters
>>>>
>>>> gimple_range_fold (r, stmt,  op1r, op2r)
>>>>
>>>> and that will do your operation in the higher precision.  you could 
>>>> compare that to the value in regular precision too i suppose.
>>>
>>> The patch does pretty much exactly what you described, except in
>>> offset_int, and only for a limited set of arithmetic operations.
>>> It figures out if an unsigned expression wrapped simply by checking
>>> to see if the mathematically correct result fits in its type.
>>>
>>> It sounds like I should be able to use the range APIs above instead
>>> and get the same result, but for arbitrary expressions.  I don't
>>> need to (and very well can't) compare the infinitely precise result
>>> to the regular result because when it's a range that wraps the result
>>> is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
>>>
>>>>
>>>> The ranger is designed to track ranges as they are represented in 
>>>> the IL.  You are asking for us to interpret the IL as something 
>>>> other than what is there, and increase the precision.  Thats a 
>>>> different task.
>>>>
>>>> could that be done?    maybe. It might involve parallel tracking of 
>>>> ssa-Name ranges in a higher precision... and recalculating every 
>>>> expression using those values.   Im not convinced that a generalized 
>>>> ranger which works in higher precision is really going to solve the 
>>>> problem the way you hope it would.. i think we'd get a lot of false 
>>>> info when unsigned values are involved
>>>
>>> I don't think that's necessary for unsigned wrapping.  Most of
>>> the time the (possibly) wrapped result is what we need because
>>> that's what the languages say is supposed to happen.  The cases
>>> when we need to check for the dangerous wraparound (or overflow)
>>> should be rare (just allocation calls) and can be handled on
>>> demand by walking the IL and doing the computation in the infinite
>>> precision.
>>>
>>> For signed overflow, though, computing the mathematically correct
>>> result seems preferable.  Or at least tracking (and propagating)
>>> the overflow bit should be.  That way we could tell when
>>> an expression is definitely invalid.
>>>
>>>>
>>>> If I were to see a clear solution/description as to what is a 
>>>> problematic overflow and what isnt, there might be something more 
>>>> general that could be done...
>>>
>>> Does the distinction above (unsigned wrapping/signed overflow) help?
>>>
>>>>
>>>>> I mentioned it to Aldy in response to the irange best practices
>>>>> document.  He says there's no way to do that and no plans to
>>>>> make it possible.
>>>>>
>>>> Thats not quite what he said.   . He said  "If the operation may 
>>>> wrap or overflow, it will show up as so in the range.  I don't think 
>>>> we have any plans of changing this behavior."
>>>>
>>>> In fact. looking back, he said the same thing I just did....  There 
>>>> IS a mechanism there for working in higher precision should one 
>>>> desire to do so, but as I pointed out above, Im not convinced as a 
>>>> general thing in the ranger it will work how you imagine.
>>>>
>>>> My guess is what you really want is to be invoking range-ops on 
>>>> stmts you care about whether they might overflow or not with higher 
>>>> precision versions and then identify the cases which trigger that 
>>>> you actually care about.
>>>
>>> Yes, it looks like that's the way to go.  I was hoping the ranger
>>> had an API that would do all that for me (i.e., walk the IL and
>>> compute the range of the result on demand in the precision of
>>> my choice), since it has to do that as it is, except that it uses
>>> the precision of the expression.  But if not, it sounds like with
>>> range_ops I should be able to fairly easily write one myself.
>>
>> The range_ops class is too low-level for this.  It folds pairs of 
>> ranges, and has no concept of gimple or tree expressions.  To roll 
>> your own solution you'd have to walk the IL, parse the gimple 
>> statements and then call range-ops on your adjusted arguments.  A 
>> better approach would be to overload range_of_stmt as Andrew suggested 
>> (see below).
>>
>>>
>>>> So maybe you want an "overflow calculator" which is similar to the 
>>>> ranger, and  identifies which kinds of stmts expect certain range of 
>>>> results, and if they get results outside of  the expected ranges, 
>>>> triggers a flag.
>>>> maybe it could convert all unsigned and signed  values to signed, 
>>>> and then work in a higher precision,  and then if any intermediate 
>>>> result cant be accurately cast back to the original type,  it gets 
>>>> flagged? is that the kind of overflow that is cared about?
>>>
>>> There are two kinds of "overflow:" unsigned wrapping and signed
>>> overflow.  Both are a problem in allocation calls like:
>>>
>>>    int *p = malloc (nelts * size);
>>>
>>> If nelts and size are signed, the compile-time result seems to
>>> be done in saturation arithmetic (its range looks to be capped
>>> at TYPE_MAX).  If they are unsigned, they wrap around zero.
>>> Neither is usually intended by the user but the wraparound is
>>> the more dangerous of the two because it's well-defined and
>>> because it shrinks the size to a very small number.
>>>
>>>>
>>>> I dont know... this isnt my space :-)       Im just telling you that 
>>>> I'm not sure the current range query mechanism is the right place to 
>>>> be asking if an "overflow" occurred because I think the general 
>>>> answer wont be useful.. too many kinds of overflows thrown together.
>>>>
>>>> I think this requires more specifoc detailed information, and maybe 
>>>> a simple overflow analyzer will do the trick based on range-ops.... 
>>>> I think range-ops has the required abilities.  whether it needs some 
>>>> enhancing, or something else exposed,  I'm not sure yet
>>>
>>> Thank you!  This was a useful clarification to what Aldy said.
>>> I'd been meaning to follow up with him on this point when I was
>>> done with what I'm working on but Jeff's question just got me
>>> the answers I was looking for quicker.  Let me experiment with
>>> range_ops and get back to you.
>>>
>>>>
>>>>
>>>> Andrew
>>>>
>>>> PS  One could always take a ranger and overload the range_of_stmt() 
>>>> routine that calculates results, so that it calls current 
>>>> functionality, then try converting the arguements to  higher 
>>>> precision ranges and re-invoking range-ops on those arguments, get a 
>>>> new higher precision range back and then look at the results vs the 
>>>> normal ones. Might be a place to start experimenting...   then pass 
>>>> a flag along sayign an overflow occurred.
>>>
>>> That sounds like something to look into.  I don't see range_of_stmt
>>
>> This would be my preferred approach.  It's clean and builds on top of 
>> the current infrastructure without polluting other users of the 
>> ranger. It definitely sounds like something worth pursuing.
>>
>>  > on trunk.  Is that the right name or has it not been merged yet?
>>
>> This is in the ranger proper which should come in, in the next week or 
>> two (??).  You can see it in our staging branch 
>> (users/aldyh/ranger-staging):
>>
>> class gimple_ranger : public range_query
>> {
>> public:
>>    gimple_ranger (bool use_loop_info) : m_use_loop_info 
>> (use_loop_info) { }
>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) 
>> OVERRIDE;
>>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
>> OVERRIDE;
>>    virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
>>    virtual void range_on_entry (irange &r, basic_block bb, tree name);
>>    virtual void range_on_exit (irange &r, basic_block bb, tree name);
>> ...
>> ...
>> };
>>
>> The idea is that range_of_stmt() is called for all statements as the 
>> IL is walked backwards.  You could just overload this, build a 
>> throwaway statement you could pass to gimple_ranger::range_of_stmt, 
>> and work from there.
> 
> Thanks for the tutorial!  I see the patch you posted with this API
> so I'll give it a try once it's in.

Note that the patch I posted was to standardize valuation throughout the 
compiler.  It's not the ranger per se, but the underlying API for 
querying ranges that ranger, vr_values, etc will use.  So you need to 
wait for the full ranger to work on this.

Aldy


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-20  6:39             ` Aldy Hernandez
@ 2020-09-21 15:13               ` Martin Sebor
  2020-11-09 16:00                 ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-09-21 15:13 UTC (permalink / raw)
  To: Aldy Hernandez, Andrew MacLeod, Jeff Law, gcc-patches

On 9/20/20 12:39 AM, Aldy Hernandez wrote:
> 
> 
> On 9/19/20 11:22 PM, Martin Sebor wrote:
>> On 9/18/20 12:29 AM, Aldy Hernandez wrote:
>>>
>>>
>>> On 9/17/20 10:18 PM, Martin Sebor wrote:
>>>> On 9/17/20 12:39 PM, Andrew MacLeod wrote:
>>>>> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>>>>>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>>>>>
>>>>>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>>>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>>>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>>>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>>>>>> the overflow/wraparound is still detectable.
>>>>>>>>
>>>>>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>>>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>>>>>> builds on the same target.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> gcc-96838.diff
>>>>>>>>
>>>>>>>> PR middle-end/96838 - missing warning on integer overflow in 
>>>>>>>> calls to allocation functions
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>     PR middle-end/96838
>>>>>>>>     * calls.c (eval_size_vflow): New function.
>>>>>>>>     (get_size_range): Call it.  Add argument.
>>>>>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>>>>>     * calls.h (get_size_range): Add argument.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>     PR middle-end/96838
>>>>>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>>>>>
>>>>>>> If an attacker can control an integer overflow that feeds an 
>>>>>>> allocation, then they can do all kinds of bad things.  In fact, 
>>>>>>> when my son was asking me attack vectors, this is one I said I'd 
>>>>>>> look at if I were a bad guy.
>>>>>>>
>>>>>>>
>>>>>>> I'm a bit surprised you can't just query the range of the 
>>>>>>> argument and get the overflow status out of that range, but I 
>>>>>>> don't see that in the APIs.  How painful would it be to make that 
>>>>>>> part of the API? The conceptual model would be to just ask for 
>>>>>>> the range of the argument to malloc which would include the range 
>>>>>>> and a status bit indicating the computation might have overflowed.
>>>>>>
>>>>>   Do we know if it did/would have wrapped? sure.  since we have to 
>>>>> do the math.    so you are correct in that the information is 
>>>>> there. but is it useful?
>>>>>
>>>>> We are in the very annoying habit of subtracting one by adding 
>>>>> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
>>>>> subtract one.   From what I have seen of unsigned math, we would be 
>>>>> flagging very many operations as overflows, so you would still have 
>>>>> the difficulty of figuring out whether its a "real" overflow or a 
>>>>> fake one because of the way we do unsigned math
>>>>
>>>> You and me both :)
>>>>
>>>>>
>>>>> At the very start, I did have an overflow flag in the range 
>>>>> class... but it was turning out to be fairly useless so it was 
>>>>> removed.
>>>>> .
>>>>>
>>>>>> I agree that being able to evaluate an expression in an as-if
>>>>>> infinite precision (in addition to its type) would be helpful.
>>>>>
>>>>> SO again, we get back to adding 0x0fffff when we are trying to 
>>>>> subtract one...  now, with infinite precision you are going to see
>>>>>
>>>>>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now 
>>>>> give you  [0x100000001, 0x100000009]    so its going to look like 
>>>>> it overflowed?
>>>>>
>>>>>
>>>>>
>>>>>> But just to make sure I understood correctly, let me ask again
>>>>>> using an example:
>>>>>>
>>>>>>   void* f (size_t n)
>>>>>>   {
>>>>>>     if (n < PTRDIFF_MAX / 2)
>>>>>>       n = PTRDIFF_MAX / 2;
>>>>>>
>>>>>>     return malloc (n * sizeof (int));
>>>>>>   }
>>>>>>
>>>>>> Can the unsigned wraparound in the argument be readily detected?
>>>>>>
>>>>>> On trunk, this ends up with the following:
>>>>>>
>>>>>>   # RANGE [4611686018427387903, 18446744073709551615]
>>>>>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>>>>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>>>>>   _1 = _6 * 4;
>>>>>>   ...
>>>>>>   p_5 = mallocD.1206 (_1); [tail call]
>>>>>>   ...
>>>>>>   return p_5;
>>>>>>
>>>>>> so _1's range reflects the wraparound in size_t, but _6's range
>>>>>> has enough information to uncover it.  So detecting it is possible
>>>>>> and is done in the patch so we get a warning:
>>>>>>
>>>>>> warning: argument 1 range [18446744073709551612, 
>>>>>> 0x3fffffffffffffffc] is too large to represent in ‘long unsigned 
>>>>>> int’ [-Walloc-size-larger-than=]
>>>>>>     6 |   return malloc (n * sizeof (int));
>>>>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>> The code is very simplistic and only handles a small subset of cases.
>>>>>> It could be generalized and exposed by a more generic API but it does
>>>>>> seem like the ranger must already have all the logic built into it so
>>>>>> if it isn't exposed now it should be a matter of opening it up.
>>>>>
>>>>> everything is exposed in range-ops.  well, mostly.
>>>>> if we have _1 = _6 * 4
>>>>>
>>>>> if one wanted to do that infinite precision, you query the range 
>>>>> for _6, and the range for 4 (which would be [4,4] :-)
>>>>> range_of_expr (op1r, _6, stmt)
>>>>> range_of_expr (op2r, 4, stmt)
>>>>>
>>>>> you could take their current types, and cast those ranges to 
>>>>> whatever the next higher precsion is,
>>>>> range_cast  (op1r, highertype)
>>>>> range_cast (op2r, highertype)
>>>>> then invoke the operation on those parameters
>>>>>
>>>>> gimple_range_fold (r, stmt,  op1r, op2r)
>>>>>
>>>>> and that will do your operation in the higher precision.  you could 
>>>>> compare that to the value in regular precision too i suppose.
>>>>
>>>> The patch does pretty much exactly what you described, except in
>>>> offset_int, and only for a limited set of arithmetic operations.
>>>> It figures out if an unsigned expression wrapped simply by checking
>>>> to see if the mathematically correct result fits in its type.
>>>>
>>>> It sounds like I should be able to use the range APIs above instead
>>>> and get the same result, but for arbitrary expressions.  I don't
>>>> need to (and very well can't) compare the infinitely precise result
>>>> to the regular result because when it's a range that wraps the result
>>>> is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
>>>>
>>>>>
>>>>> The ranger is designed to track ranges as they are represented in 
>>>>> the IL.  You are asking for us to interpret the IL as something 
>>>>> other than what is there, and increase the precision.  Thats a 
>>>>> different task.
>>>>>
>>>>> could that be done?    maybe. It might involve parallel tracking of 
>>>>> ssa-Name ranges in a higher precision... and recalculating every 
>>>>> expression using those values.   Im not convinced that a 
>>>>> generalized ranger which works in higher precision is really going 
>>>>> to solve the problem the way you hope it would.. i think we'd get a 
>>>>> lot of false info when unsigned values are involved
>>>>
>>>> I don't think that's necessary for unsigned wrapping.  Most of
>>>> the time the (possibly) wrapped result is what we need because
>>>> that's what the languages say is supposed to happen.  The cases
>>>> when we need to check for the dangerous wraparound (or overflow)
>>>> should be rare (just allocation calls) and can be handled on
>>>> demand by walking the IL and doing the computation in the infinite
>>>> precision.
>>>>
>>>> For signed overflow, though, computing the mathematically correct
>>>> result seems preferable.  Or at least tracking (and propagating)
>>>> the overflow bit should be.  That way we could tell when
>>>> an expression is definitely invalid.
>>>>
>>>>>
>>>>> If I were to see a clear solution/description as to what is a 
>>>>> problematic overflow and what isnt, there might be something more 
>>>>> general that could be done...
>>>>
>>>> Does the distinction above (unsigned wrapping/signed overflow) help?
>>>>
>>>>>
>>>>>> I mentioned it to Aldy in response to the irange best practices
>>>>>> document.  He says there's no way to do that and no plans to
>>>>>> make it possible.
>>>>>>
>>>>> Thats not quite what he said.   . He said  "If the operation may 
>>>>> wrap or overflow, it will show up as so in the range.  I don't 
>>>>> think we have any plans of changing this behavior."
>>>>>
>>>>> In fact. looking back, he said the same thing I just did....  There 
>>>>> IS a mechanism there for working in higher precision should one 
>>>>> desire to do so, but as I pointed out above, Im not convinced as a 
>>>>> general thing in the ranger it will work how you imagine.
>>>>>
>>>>> My guess is what you really want is to be invoking range-ops on 
>>>>> stmts you care about whether they might overflow or not with higher 
>>>>> precision versions and then identify the cases which trigger that 
>>>>> you actually care about.
>>>>
>>>> Yes, it looks like that's the way to go.  I was hoping the ranger
>>>> had an API that would do all that for me (i.e., walk the IL and
>>>> compute the range of the result on demand in the precision of
>>>> my choice), since it has to do that as it is, except that it uses
>>>> the precision of the expression.  But if not, it sounds like with
>>>> range_ops I should be able to fairly easily write one myself.
>>>
>>> The range_ops class is too low-level for this.  It folds pairs of 
>>> ranges, and has no concept of gimple or tree expressions.  To roll 
>>> your own solution you'd have to walk the IL, parse the gimple 
>>> statements and then call range-ops on your adjusted arguments.  A 
>>> better approach would be to overload range_of_stmt as Andrew 
>>> suggested (see below).
>>>
>>>>
>>>>> So maybe you want an "overflow calculator" which is similar to the 
>>>>> ranger, and  identifies which kinds of stmts expect certain range 
>>>>> of results, and if they get results outside of  the expected 
>>>>> ranges, triggers a flag.
>>>>> maybe it could convert all unsigned and signed  values to signed, 
>>>>> and then work in a higher precision,  and then if any intermediate 
>>>>> result cant be accurately cast back to the original type,  it gets 
>>>>> flagged? is that the kind of overflow that is cared about?
>>>>
>>>> There are two kinds of "overflow:" unsigned wrapping and signed
>>>> overflow.  Both are a problem in allocation calls like:
>>>>
>>>>    int *p = malloc (nelts * size);
>>>>
>>>> If nelts and size are signed, the compile-time result seems to
>>>> be done in saturation arithmetic (its range looks to be capped
>>>> at TYPE_MAX).  If they are unsigned, they wrap around zero.
>>>> Neither is usually intended by the user but the wraparound is
>>>> the more dangerous of the two because it's well-defined and
>>>> because it shrinks the size to a very small number.
>>>>
>>>>>
>>>>> I dont know... this isnt my space :-)       Im just telling you 
>>>>> that I'm not sure the current range query mechanism is the right 
>>>>> place to be asking if an "overflow" occurred because I think the 
>>>>> general answer wont be useful.. too many kinds of overflows thrown 
>>>>> together.
>>>>>
>>>>> I think this requires more specifoc detailed information, and maybe 
>>>>> a simple overflow analyzer will do the trick based on range-ops.... 
>>>>> I think range-ops has the required abilities.  whether it needs 
>>>>> some enhancing, or something else exposed,  I'm not sure yet
>>>>
>>>> Thank you!  This was a useful clarification to what Aldy said.
>>>> I'd been meaning to follow up with him on this point when I was
>>>> done with what I'm working on but Jeff's question just got me
>>>> the answers I was looking for quicker.  Let me experiment with
>>>> range_ops and get back to you.
>>>>
>>>>>
>>>>>
>>>>> Andrew
>>>>>
>>>>> PS  One could always take a ranger and overload the range_of_stmt() 
>>>>> routine that calculates results, so that it calls current 
>>>>> functionality, then try converting the arguements to  higher 
>>>>> precision ranges and re-invoking range-ops on those arguments, get 
>>>>> a new higher precision range back and then look at the results vs 
>>>>> the normal ones. Might be a place to start experimenting...   then 
>>>>> pass a flag along sayign an overflow occurred.
>>>>
>>>> That sounds like something to look into.  I don't see range_of_stmt
>>>
>>> This would be my preferred approach.  It's clean and builds on top of 
>>> the current infrastructure without polluting other users of the 
>>> ranger. It definitely sounds like something worth pursuing.
>>>
>>>  > on trunk.  Is that the right name or has it not been merged yet?
>>>
>>> This is in the ranger proper which should come in, in the next week 
>>> or two (??).  You can see it in our staging branch 
>>> (users/aldyh/ranger-staging):
>>>
>>> class gimple_ranger : public range_query
>>> {
>>> public:
>>>    gimple_ranger (bool use_loop_info) : m_use_loop_info 
>>> (use_loop_info) { }
>>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL) 
>>> OVERRIDE;
>>>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
>>> OVERRIDE;
>>>    virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
>>>    virtual void range_on_entry (irange &r, basic_block bb, tree name);
>>>    virtual void range_on_exit (irange &r, basic_block bb, tree name);
>>> ...
>>> ...
>>> };
>>>
>>> The idea is that range_of_stmt() is called for all statements as the 
>>> IL is walked backwards.  You could just overload this, build a 
>>> throwaway statement you could pass to gimple_ranger::range_of_stmt, 
>>> and work from there.
>>
>> Thanks for the tutorial!  I see the patch you posted with this API
>> so I'll give it a try once it's in.
> 
> Note that the patch I posted was to standardize valuation throughout the 
> compiler.  It's not the ranger per se, but the underlying API for 
> querying ranges that ranger, vr_values, etc will use.  So you need to 
> wait for the full ranger to work on this.

I see.  In that case it might make sense to commit the patch as is
and switch it over to the ranger API when it's in.  Jeff, what's
your preference?

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-09-21 15:13               ` Martin Sebor
@ 2020-11-09 16:00                 ` Martin Sebor
  2020-11-21  5:07                   ` Jeff Law
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-11-09 16:00 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html

Jeff, I don't expect to have the cycles to reimplement this patch
using the Ranger APIs before stage 1 closes.  I'm open to giving
it a try in stage 3 if it's still in scope for GCC 11.  Otherwise,
is this patch okay to commit?

On 9/21/20 9:13 AM, Martin Sebor wrote:
> On 9/20/20 12:39 AM, Aldy Hernandez wrote:
>>
>>
>> On 9/19/20 11:22 PM, Martin Sebor wrote:
>>> On 9/18/20 12:29 AM, Aldy Hernandez wrote:
>>>>
>>>>
>>>> On 9/17/20 10:18 PM, Martin Sebor wrote:
>>>>> On 9/17/20 12:39 PM, Andrew MacLeod wrote:
>>>>>> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>>>>>>
>>>>>>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>>>>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>>>>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>>>>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>>>>>>> the overflow/wraparound is still detectable.
>>>>>>>>>
>>>>>>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>>>>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>>>>>>> builds on the same target.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> gcc-96838.diff
>>>>>>>>>
>>>>>>>>> PR middle-end/96838 - missing warning on integer overflow in 
>>>>>>>>> calls to allocation functions
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>     PR middle-end/96838
>>>>>>>>>     * calls.c (eval_size_vflow): New function.
>>>>>>>>>     (get_size_range): Call it.  Add argument.
>>>>>>>>>     (maybe_warn_alloc_args_overflow): Diagnose 
>>>>>>>>> overflow/wraparound.
>>>>>>>>>     * calls.h (get_size_range): Add argument.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>>     PR middle-end/96838
>>>>>>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>>>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>>>>>>
>>>>>>>> If an attacker can control an integer overflow that feeds an 
>>>>>>>> allocation, then they can do all kinds of bad things.  In fact, 
>>>>>>>> when my son was asking me attack vectors, this is one I said I'd 
>>>>>>>> look at if I were a bad guy.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm a bit surprised you can't just query the range of the 
>>>>>>>> argument and get the overflow status out of that range, but I 
>>>>>>>> don't see that in the APIs.  How painful would it be to make 
>>>>>>>> that part of the API? The conceptual model would be to just ask 
>>>>>>>> for the range of the argument to malloc which would include the 
>>>>>>>> range and a status bit indicating the computation might have 
>>>>>>>> overflowed.
>>>>>>>
>>>>>>   Do we know if it did/would have wrapped? sure.  since we have to 
>>>>>> do the math.    so you are correct in that the information is 
>>>>>> there. but is it useful?
>>>>>>
>>>>>> We are in the very annoying habit of subtracting one by adding 
>>>>>> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
>>>>>> subtract one.   From what I have seen of unsigned math, we would 
>>>>>> be flagging very many operations as overflows, so you would still 
>>>>>> have the difficulty of figuring out whether its a "real" overflow 
>>>>>> or a fake one because of the way we do unsigned math
>>>>>
>>>>> You and me both :)
>>>>>
>>>>>>
>>>>>> At the very start, I did have an overflow flag in the range 
>>>>>> class... but it was turning out to be fairly useless so it was 
>>>>>> removed.
>>>>>> .
>>>>>>
>>>>>>> I agree that being able to evaluate an expression in an as-if
>>>>>>> infinite precision (in addition to its type) would be helpful.
>>>>>>
>>>>>> SO again, we get back to adding 0x0fffff when we are trying to 
>>>>>> subtract one...  now, with infinite precision you are going to see
>>>>>>
>>>>>>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now 
>>>>>> give you  [0x100000001, 0x100000009]    so its going to look like 
>>>>>> it overflowed?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> But just to make sure I understood correctly, let me ask again
>>>>>>> using an example:
>>>>>>>
>>>>>>>   void* f (size_t n)
>>>>>>>   {
>>>>>>>     if (n < PTRDIFF_MAX / 2)
>>>>>>>       n = PTRDIFF_MAX / 2;
>>>>>>>
>>>>>>>     return malloc (n * sizeof (int));
>>>>>>>   }
>>>>>>>
>>>>>>> Can the unsigned wraparound in the argument be readily detected?
>>>>>>>
>>>>>>> On trunk, this ends up with the following:
>>>>>>>
>>>>>>>   # RANGE [4611686018427387903, 18446744073709551615]
>>>>>>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>>>>>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>>>>>>   _1 = _6 * 4;
>>>>>>>   ...
>>>>>>>   p_5 = mallocD.1206 (_1); [tail call]
>>>>>>>   ...
>>>>>>>   return p_5;
>>>>>>>
>>>>>>> so _1's range reflects the wraparound in size_t, but _6's range
>>>>>>> has enough information to uncover it.  So detecting it is possible
>>>>>>> and is done in the patch so we get a warning:
>>>>>>>
>>>>>>> warning: argument 1 range [18446744073709551612, 
>>>>>>> 0x3fffffffffffffffc] is too large to represent in ‘long unsigned 
>>>>>>> int’ [-Walloc-size-larger-than=]
>>>>>>>     6 |   return malloc (n * sizeof (int));
>>>>>>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>> The code is very simplistic and only handles a small subset of 
>>>>>>> cases.
>>>>>>> It could be generalized and exposed by a more generic API but it 
>>>>>>> does
>>>>>>> seem like the ranger must already have all the logic built into 
>>>>>>> it so
>>>>>>> if it isn't exposed now it should be a matter of opening it up.
>>>>>>
>>>>>> everything is exposed in range-ops.  well, mostly.
>>>>>> if we have _1 = _6 * 4
>>>>>>
>>>>>> if one wanted to do that infinite precision, you query the range 
>>>>>> for _6, and the range for 4 (which would be [4,4] :-)
>>>>>> range_of_expr (op1r, _6, stmt)
>>>>>> range_of_expr (op2r, 4, stmt)
>>>>>>
>>>>>> you could take their current types, and cast those ranges to 
>>>>>> whatever the next higher precsion is,
>>>>>> range_cast  (op1r, highertype)
>>>>>> range_cast (op2r, highertype)
>>>>>> then invoke the operation on those parameters
>>>>>>
>>>>>> gimple_range_fold (r, stmt,  op1r, op2r)
>>>>>>
>>>>>> and that will do your operation in the higher precision.  you 
>>>>>> could compare that to the value in regular precision too i suppose.
>>>>>
>>>>> The patch does pretty much exactly what you described, except in
>>>>> offset_int, and only for a limited set of arithmetic operations.
>>>>> It figures out if an unsigned expression wrapped simply by checking
>>>>> to see if the mathematically correct result fits in its type.
>>>>>
>>>>> It sounds like I should be able to use the range APIs above instead
>>>>> and get the same result, but for arbitrary expressions.  I don't
>>>>> need to (and very well can't) compare the infinitely precise result
>>>>> to the regular result because when it's a range that wraps the result
>>>>> is the full range of the type (i.e., [0, SIZE_MAX] in the test above).
>>>>>
>>>>>>
>>>>>> The ranger is designed to track ranges as they are represented in 
>>>>>> the IL.  You are asking for us to interpret the IL as something 
>>>>>> other than what is there, and increase the precision.  Thats a 
>>>>>> different task.
>>>>>>
>>>>>> could that be done?    maybe. It might involve parallel tracking 
>>>>>> of ssa-Name ranges in a higher precision... and recalculating 
>>>>>> every expression using those values.   Im not convinced that a 
>>>>>> generalized ranger which works in higher precision is really going 
>>>>>> to solve the problem the way you hope it would.. i think we'd get 
>>>>>> a lot of false info when unsigned values are involved
>>>>>
>>>>> I don't think that's necessary for unsigned wrapping.  Most of
>>>>> the time the (possibly) wrapped result is what we need because
>>>>> that's what the languages say is supposed to happen.  The cases
>>>>> when we need to check for the dangerous wraparound (or overflow)
>>>>> should be rare (just allocation calls) and can be handled on
>>>>> demand by walking the IL and doing the computation in the infinite
>>>>> precision.
>>>>>
>>>>> For signed overflow, though, computing the mathematically correct
>>>>> result seems preferable.  Or at least tracking (and propagating)
>>>>> the overflow bit should be.  That way we could tell when
>>>>> an expression is definitely invalid.
>>>>>
>>>>>>
>>>>>> If I were to see a clear solution/description as to what is a 
>>>>>> problematic overflow and what isnt, there might be something more 
>>>>>> general that could be done...
>>>>>
>>>>> Does the distinction above (unsigned wrapping/signed overflow) help?
>>>>>
>>>>>>
>>>>>>> I mentioned it to Aldy in response to the irange best practices
>>>>>>> document.  He says there's no way to do that and no plans to
>>>>>>> make it possible.
>>>>>>>
>>>>>> Thats not quite what he said.   . He said  "If the operation may 
>>>>>> wrap or overflow, it will show up as so in the range.  I don't 
>>>>>> think we have any plans of changing this behavior."
>>>>>>
>>>>>> In fact. looking back, he said the same thing I just did....  
>>>>>> There IS a mechanism there for working in higher precision should 
>>>>>> one desire to do so, but as I pointed out above, Im not convinced 
>>>>>> as a general thing in the ranger it will work how you imagine.
>>>>>>
>>>>>> My guess is what you really want is to be invoking range-ops on 
>>>>>> stmts you care about whether they might overflow or not with 
>>>>>> higher precision versions and then identify the cases which 
>>>>>> trigger that you actually care about.
>>>>>
>>>>> Yes, it looks like that's the way to go.  I was hoping the ranger
>>>>> had an API that would do all that for me (i.e., walk the IL and
>>>>> compute the range of the result on demand in the precision of
>>>>> my choice), since it has to do that as it is, except that it uses
>>>>> the precision of the expression.  But if not, it sounds like with
>>>>> range_ops I should be able to fairly easily write one myself.
>>>>
>>>> The range_ops class is too low-level for this.  It folds pairs of 
>>>> ranges, and has no concept of gimple or tree expressions.  To roll 
>>>> your own solution you'd have to walk the IL, parse the gimple 
>>>> statements and then call range-ops on your adjusted arguments.  A 
>>>> better approach would be to overload range_of_stmt as Andrew 
>>>> suggested (see below).
>>>>
>>>>>
>>>>>> So maybe you want an "overflow calculator" which is similar to the 
>>>>>> ranger, and  identifies which kinds of stmts expect certain range 
>>>>>> of results, and if they get results outside of  the expected 
>>>>>> ranges, triggers a flag.
>>>>>> maybe it could convert all unsigned and signed  values to signed, 
>>>>>> and then work in a higher precision,  and then if any intermediate 
>>>>>> result cant be accurately cast back to the original type,  it gets 
>>>>>> flagged? is that the kind of overflow that is cared about?
>>>>>
>>>>> There are two kinds of "overflow:" unsigned wrapping and signed
>>>>> overflow.  Both are a problem in allocation calls like:
>>>>>
>>>>>    int *p = malloc (nelts * size);
>>>>>
>>>>> If nelts and size are signed, the compile-time result seems to
>>>>> be done in saturation arithmetic (its range looks to be capped
>>>>> at TYPE_MAX).  If they are unsigned, they wrap around zero.
>>>>> Neither is usually intended by the user but the wraparound is
>>>>> the more dangerous of the two because it's well-defined and
>>>>> because it shrinks the size to a very small number.
>>>>>
>>>>>>
>>>>>> I dont know... this isnt my space :-)       Im just telling you 
>>>>>> that I'm not sure the current range query mechanism is the right 
>>>>>> place to be asking if an "overflow" occurred because I think the 
>>>>>> general answer wont be useful.. too many kinds of overflows thrown 
>>>>>> together.
>>>>>>
>>>>>> I think this requires more specifoc detailed information, and 
>>>>>> maybe a simple overflow analyzer will do the trick based on 
>>>>>> range-ops.... I think range-ops has the required abilities.  
>>>>>> whether it needs some enhancing, or something else exposed,  I'm 
>>>>>> not sure yet
>>>>>
>>>>> Thank you!  This was a useful clarification to what Aldy said.
>>>>> I'd been meaning to follow up with him on this point when I was
>>>>> done with what I'm working on but Jeff's question just got me
>>>>> the answers I was looking for quicker.  Let me experiment with
>>>>> range_ops and get back to you.
>>>>>
>>>>>>
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> PS  One could always take a ranger and overload the 
>>>>>> range_of_stmt() routine that calculates results, so that it calls 
>>>>>> current functionality, then try converting the arguements to  
>>>>>> higher precision ranges and re-invoking range-ops on those 
>>>>>> arguments, get a new higher precision range back and then look at 
>>>>>> the results vs the normal ones. Might be a place to start 
>>>>>> experimenting...   then pass a flag along sayign an overflow 
>>>>>> occurred.
>>>>>
>>>>> That sounds like something to look into.  I don't see range_of_stmt
>>>>
>>>> This would be my preferred approach.  It's clean and builds on top 
>>>> of the current infrastructure without polluting other users of the 
>>>> ranger. It definitely sounds like something worth pursuing.
>>>>
>>>>  > on trunk.  Is that the right name or has it not been merged yet?
>>>>
>>>> This is in the ranger proper which should come in, in the next week 
>>>> or two (??).  You can see it in our staging branch 
>>>> (users/aldyh/ranger-staging):
>>>>
>>>> class gimple_ranger : public range_query
>>>> {
>>>> public:
>>>>    gimple_ranger (bool use_loop_info) : m_use_loop_info 
>>>> (use_loop_info) { }
>>>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = 
>>>> NULL) OVERRIDE;
>>>>    virtual bool range_of_expr (irange &r, tree name, gimple * = 
>>>> NULL) OVERRIDE;
>>>>    virtual bool range_on_edge (irange &r, edge e, tree name) OVERRIDE;
>>>>    virtual void range_on_entry (irange &r, basic_block bb, tree name);
>>>>    virtual void range_on_exit (irange &r, basic_block bb, tree name);
>>>> ...
>>>> ...
>>>> };
>>>>
>>>> The idea is that range_of_stmt() is called for all statements as the 
>>>> IL is walked backwards.  You could just overload this, build a 
>>>> throwaway statement you could pass to gimple_ranger::range_of_stmt, 
>>>> and work from there.
>>>
>>> Thanks for the tutorial!  I see the patch you posted with this API
>>> so I'll give it a try once it's in.
>>
>> Note that the patch I posted was to standardize valuation throughout 
>> the compiler.  It's not the ranger per se, but the underlying API for 
>> querying ranges that ranger, vr_values, etc will use.  So you need to 
>> wait for the full ranger to work on this.
> 
> I see.  In that case it might make sense to commit the patch as is
> and switch it over to the ranger API when it's in.  Jeff, what's
> your preference?
> 
> Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-09 16:00                 ` Martin Sebor
@ 2020-11-21  5:07                   ` Jeff Law
  2020-11-21 13:26                     ` Andrew MacLeod
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Law @ 2020-11-21  5:07 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches



On 11/9/20 9:00 AM, Martin Sebor wrote:
> Ping:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>
> Jeff, I don't expect to have the cycles to reimplement this patch
> using the Ranger APIs before stage 1 closes.  I'm open to giving
> it a try in stage 3 if it's still in scope for GCC 11.  Otherwise,
> is this patch okay to commit?
So all we're going to get from the ranger is ranges of operands, right? 
Meaning that we still need to either roll our own evaluator
(eval_size_vflow) or overload range_for_stmt with our own, which likely
looks like eval_size_vflow anyway, right?

My hope was to avoid the roll our own evaluator, but that doesn't look
like it's in the cards in the reasonably near future.

If my summary of the current state is correct, then I'd suggest we go
with the patch as-is.  If we want to override eval_size_vflow in the
future, we can still do that.  And if we want to replace the call to
determine_value_range with a ranger API, that seems like a fairly
straightforward change to make in gcc-12.



Jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-21  5:07                   ` Jeff Law
@ 2020-11-21 13:26                     ` Andrew MacLeod
  2020-11-23 21:38                       ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew MacLeod @ 2020-11-21 13:26 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, gcc-patches

On 11/21/20 12:07 AM, Jeff Law wrote:
>
> On 11/9/20 9:00 AM, Martin Sebor wrote:
>> Ping:
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>
>> Jeff, I don't expect to have the cycles to reimplement this patch
>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>> it a try in stage 3 if it's still in scope for GCC 11.  Otherwise,
>> is this patch okay to commit?
> So all we're going to get from the ranger is ranges of operands, right?
> Meaning that we still need to either roll our own evaluator
> (eval_size_vflow) or overload range_for_stmt with our own, which likely
> looks like eval_size_vflow anyway, right?
>
> My hope was to avoid the roll our own evaluator, but that doesn't look
> like it's in the cards in the reasonably near future.

Is there a PR open showing what exactly you are looking for?
I'm using open PRs to track enhancement requests, and they will all feed 
back into the development roadmap  I am working on.



> If my summary of the current state is correct, then I'd suggest we go
> with the patch as-is.  If we want to override eval_size_vflow in the
> future, we can still do that.  And if we want to replace the call to
> determine_value_range with a ranger API, that seems like a fairly
> straightforward change to make in gcc-12.
>
>
>
> Jeff
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-21 13:26                     ` Andrew MacLeod
@ 2020-11-23 21:38                       ` Martin Sebor
  2020-11-24 17:42                         ` Andrew MacLeod
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-11-23 21:38 UTC (permalink / raw)
  To: Andrew MacLeod, Jeff Law, gcc-patches

On 11/21/20 6:26 AM, Andrew MacLeod wrote:
> On 11/21/20 12:07 AM, Jeff Law wrote:
>>
>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>> Ping:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>
>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>> it a try in stage 3 if it's still in scope for GCC 11.  Otherwise,
>>> is this patch okay to commit?
>> So all we're going to get from the ranger is ranges of operands, right?
>> Meaning that we still need to either roll our own evaluator
>> (eval_size_vflow) or overload range_for_stmt with our own, which likely
>> looks like eval_size_vflow anyway, right?
>>
>> My hope was to avoid the roll our own evaluator, but that doesn't look
>> like it's in the cards in the reasonably near future.
> 
> Is there a PR open showing what exactly you are looking for?
> I'm using open PRs to track enhancement requests, and they will all feed 
> back into the development roadmap  I am working on.

Not that I know of.  The background is upthread, in particular in
Aldy's response here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html

I like the suggestion and if/when I have the time I'd like to give
it a try.  Until then, I think the patch is useful on its own so
I'll go with it for now.

Longer term, I do hope we can revisit the idea of computing either
mathematically correct ranges alongside those required by the language
semantics, or tracking signed overflow or unsigned wraparound.  E.g.,
in:

   void* f (int n)
   {
     if (n < INT_MAX / 3)
       n = INT_MAX / 3;

     n *= sizeof (int);
     // n is [(INT_MAX / 3) * 4, INF] mathematically
     // undefined due to overflow in C
     // but [INT_MIN, INT_MAX] according to VRP

     return malloc (n);
   }

and in

   void* g (unsigned n)
   {
     if (n < UINT_MAX / 3)
       n = UINT_MAX / 3;

     n *= sizeof (int);
     // n is [(UINT_MAX / 3) * 4, INF] mathematically
     // but [0, UINT_MAX] in C

     return malloc (n);
   }

In neither case is the overflow (or wraparound) easy to detect
without the Ranger's help.

Martin

>> If my summary of the current state is correct, then I'd suggest we go
>> with the patch as-is.  If we want to override eval_size_vflow in the
>> future, we can still do that.  And if we want to replace the call to
>> determine_value_range with a ranger API, that seems like a fairly
>> straightforward change to make in gcc-12.
>>
>>
>>
>> Jeff
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-23 21:38                       ` Martin Sebor
@ 2020-11-24 17:42                         ` Andrew MacLeod
  2020-11-24 17:44                           ` Andrew MacLeod
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew MacLeod @ 2020-11-24 17:42 UTC (permalink / raw)
  To: Martin Sebor, Jeff Law, gcc-patches

On 11/23/20 4:38 PM, Martin Sebor wrote:
> On 11/21/20 6:26 AM, Andrew MacLeod wrote:
>> On 11/21/20 12:07 AM, Jeff Law wrote:
>>>
>>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>>> Ping:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>>
>>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>>> it a try in stage 3 if it's still in scope for GCC 11. Otherwise,
>>>> is this patch okay to commit?
>>> So all we're going to get from the ranger is ranges of operands, right?
>>> Meaning that we still need to either roll our own evaluator
>>> (eval_size_vflow) or overload range_for_stmt with our own, which likely
>>> looks like eval_size_vflow anyway, right?
>>>
>>> My hope was to avoid the roll our own evaluator, but that doesn't look
>>> like it's in the cards in the reasonably near future.
>>
>> Is there a PR open showing what exactly you are looking for?
>> I'm using open PRs to track enhancement requests, and they will all 
>> feed back into the development roadmap  I am working on.
>
> Not that I know of.  The background is upthread, in particular in
> Aldy's response here:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html
>
> I like the suggestion and if/when I have the time I'd like to give
> it a try.  Until then, I think the patch is useful on its own so
> I'll go with it for now.
>
> Longer term, I do hope we can revisit the idea of computing either
> mathematically correct ranges alongside those required by the language
> semantics, or tracking signed overflow or unsigned wraparound. E.g.,
> in:
>
>   void* f (int n)
>   {
>     if (n < INT_MAX / 3)
>       n = INT_MAX / 3;
>
>     n *= sizeof (int);
>     // n is [(INT_MAX / 3) * 4, INF] mathematically
>     // undefined due to overflow in C
>     // but [INT_MIN, INT_MAX] according to VRP

but sizeof returns a size_t.. which is an unsigned. thus the multiply is 
promoted to an unsigned multiply  which means there is lots of wrapping 
and I don't see how you can conclude those ranges?    [INT_MIN, INT_MAX] 
are all possible outcomes based on the code that is generated.

If I change that to
   n *= (int) sizeof (int) to keep it as signed arithmetic, I see:


Folding statement: n_4 = n_1 * 4;
EVRP:hybrid: RVRP found singleton 2147483647
Queued stmt for removal.  Folds to: 2147483647
evrp visiting stmt _7 = malloc (n_4);

extract_range_from_stmt visiting:
_7 = foo (n_4);
Folding statement: _7 = foo (n_4);
EVRP:hybrid: RVRP found singleton 2147483647
Folded into: _7 = malloc (2147483647);

So I'm not sure what exactly you want to do?  We are calculating what 
the program can produce?

Why do we care about alternative calculations?

Andrew


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-24 17:42                         ` Andrew MacLeod
@ 2020-11-24 17:44                           ` Andrew MacLeod
  2020-11-24 18:39                             ` Martin Sebor
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew MacLeod @ 2020-11-24 17:44 UTC (permalink / raw)
  To: Martin Sebor, Jeff Law, gcc-patches

On 11/24/20 12:42 PM, Andrew MacLeod wrote:
> On 11/23/20 4:38 PM, Martin Sebor wrote:
>> On 11/21/20 6:26 AM, Andrew MacLeod wrote:
>>> On 11/21/20 12:07 AM, Jeff Law wrote:
>>>>
>>>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>>>> Ping:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>>>
>>>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>>>> it a try in stage 3 if it's still in scope for GCC 11. Otherwise,
>>>>> is this patch okay to commit?
>>>> So all we're going to get from the ranger is ranges of operands, 
>>>> right?
>>>> Meaning that we still need to either roll our own evaluator
>>>> (eval_size_vflow) or overload range_for_stmt with our own, which 
>>>> likely
>>>> looks like eval_size_vflow anyway, right?
>>>>
>>>> My hope was to avoid the roll our own evaluator, but that doesn't look
>>>> like it's in the cards in the reasonably near future.
>>>
>>> Is there a PR open showing what exactly you are looking for?
>>> I'm using open PRs to track enhancement requests, and they will all 
>>> feed back into the development roadmap  I am working on.
>>
>> Not that I know of.  The background is upthread, in particular in
>> Aldy's response here:
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html
>>
>> I like the suggestion and if/when I have the time I'd like to give
>> it a try.  Until then, I think the patch is useful on its own so
>> I'll go with it for now.
>>
>> Longer term, I do hope we can revisit the idea of computing either
>> mathematically correct ranges alongside those required by the language
>> semantics, or tracking signed overflow or unsigned wraparound. E.g.,
>> in:
>>
>>   void* f (int n)
>>   {
>>     if (n < INT_MAX / 3)
>>       n = INT_MAX / 3;
>>
>>     n *= sizeof (int);
>>     // n is [(INT_MAX / 3) * 4, INF] mathematically
>>     // undefined due to overflow in C
>>     // but [INT_MIN, INT_MAX] according to VRP
>
> but sizeof returns a size_t.. which is an unsigned. thus the multiply 
> is promoted to an unsigned multiply  which means there is lots of 
> wrapping and I don't see how you can conclude those ranges?    
> [INT_MIN, INT_MAX] are all possible outcomes based on the code that is 
> generated.
>
> If I change that to
>   n *= (int) sizeof (int) to keep it as signed arithmetic, I see:
>
>
> Folding statement: n_4 = n_1 * 4;
> EVRP:hybrid: RVRP found singleton 2147483647
> Queued stmt for removal.  Folds to: 2147483647
> evrp visiting stmt _7 = malloc (n_4);
>
> extract_range_from_stmt visiting:
> _7 = foo (n_4);
> Folding statement: _7 = foo (n_4);
> EVRP:hybrid: RVRP found singleton 2147483647
> Folded into: _7 = malloc (2147483647);
>
> So I'm not sure what exactly you want to do?  We are calculating what 
> the program can produce?
>
> Why do we care about alternative calculations?
>
Or rather, why do we want to do this?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-24 17:44                           ` Andrew MacLeod
@ 2020-11-24 18:39                             ` Martin Sebor
  2020-12-04 19:14                               ` Jeff Law
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Sebor @ 2020-11-24 18:39 UTC (permalink / raw)
  To: Andrew MacLeod, Jeff Law, gcc-patches

On 11/24/20 10:44 AM, Andrew MacLeod wrote:
> On 11/24/20 12:42 PM, Andrew MacLeod wrote:
>> On 11/23/20 4:38 PM, Martin Sebor wrote:
>>> On 11/21/20 6:26 AM, Andrew MacLeod wrote:
>>>> On 11/21/20 12:07 AM, Jeff Law wrote:
>>>>>
>>>>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>>>>> Ping:
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>>>>
>>>>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>>>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>>>>> it a try in stage 3 if it's still in scope for GCC 11. Otherwise,
>>>>>> is this patch okay to commit?
>>>>> So all we're going to get from the ranger is ranges of operands, 
>>>>> right?
>>>>> Meaning that we still need to either roll our own evaluator
>>>>> (eval_size_vflow) or overload range_for_stmt with our own, which 
>>>>> likely
>>>>> looks like eval_size_vflow anyway, right?
>>>>>
>>>>> My hope was to avoid the roll our own evaluator, but that doesn't look
>>>>> like it's in the cards in the reasonably near future.
>>>>
>>>> Is there a PR open showing what exactly you are looking for?
>>>> I'm using open PRs to track enhancement requests, and they will all 
>>>> feed back into the development roadmap  I am working on.
>>>
>>> Not that I know of.  The background is upthread, in particular in
>>> Aldy's response here:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html
>>>
>>> I like the suggestion and if/when I have the time I'd like to give
>>> it a try.  Until then, I think the patch is useful on its own so
>>> I'll go with it for now.
>>>
>>> Longer term, I do hope we can revisit the idea of computing either
>>> mathematically correct ranges alongside those required by the language
>>> semantics, or tracking signed overflow or unsigned wraparound. E.g.,
>>> in:
>>>
>>>   void* f (int n)
>>>   {
>>>     if (n < INT_MAX / 3)
>>>       n = INT_MAX / 3;
>>>
>>>     n *= sizeof (int);
>>>     // n is [(INT_MAX / 3) * 4, INF] mathematically
>>>     // undefined due to overflow in C
>>>     // but [INT_MIN, INT_MAX] according to VRP
>>
>> but sizeof returns a size_t.. which is an unsigned. thus the multiply 
>> is promoted to an unsigned multiply  which means there is lots of 
>> wrapping and I don't see how you can conclude those ranges? [INT_MIN, 
>> INT_MAX] are all possible outcomes based on the code that is generated.
>>
>> If I change that to
>>   n *= (int) sizeof (int) to keep it as signed arithmetic, I see:
>>
>>
>> Folding statement: n_4 = n_1 * 4;
>> EVRP:hybrid: RVRP found singleton 2147483647
>> Queued stmt for removal.  Folds to: 2147483647
>> evrp visiting stmt _7 = malloc (n_4);
>>
>> extract_range_from_stmt visiting:
>> _7 = foo (n_4);
>> Folding statement: _7 = foo (n_4);
>> EVRP:hybrid: RVRP found singleton 2147483647
>> Folded into: _7 = malloc (2147483647);
>>
>> So I'm not sure what exactly you want to do?  We are calculating what 
>> the program can produce?
>>
>> Why do we care about alternative calculations?
>>
> Or rather, why do we want to do this?

When computing the sizes of things, programmers commonly forget
to consider unsigned wrapping (or signed overflow).  We simply
assume it can't happen and that (for instance) N * sizeof (X)
is necessarily big enough for N elements of type X.  (Grepping
any code base for the pattern '\* sizeof' and looking for code
that tests that the result doesn't wrap is revealing.)

When overflow or wrapping does happen (typically because of poor
precondition checking) it often leads to bugs when we end up
allocating less space than we need and use.  A simple example
to help illustrate what I mean:

   void* g (int *a, int n)
   {
     a = realloc (a, n * sizeof (int) + 32);
     for (int i = n; i != n + 32; ++i)
       a[i] = f ();
   }

In ILP32, if (n > INT_MAX / 4 - 32) holds, n * sizeof(int) will
wrap around zero.  The realloc call will end up allocating less
space than expected, and the loop will write past the end of
the allocated block.

(The bug above can only be detected if we know n's range.
I left that part out.)

Historically, bugs caused by integer overflow and wrapping have
been among the most serious security weaknesses.  Detecting these
mistakes will help prevent some of these.

The problem is that according to C/C++, nothing in the function
above is undefined except for the buffer overflow in the loop,
and the buffer overflow only happens because of the well-defined
integer wrapping.  To detect the wrapping, we either need to do
the computation in as-if infinite math and compare the final result
to the result we get under C's truncating rules, or we need to set
and propagate the "wraparound" bit throughout the computation.

Martin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
  2020-11-24 18:39                             ` Martin Sebor
@ 2020-12-04 19:14                               ` Jeff Law
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Law @ 2020-12-04 19:14 UTC (permalink / raw)
  To: Martin Sebor, Andrew MacLeod, gcc-patches



On 11/24/20 11:39 AM, Martin Sebor wrote:
> On 11/24/20 10:44 AM, Andrew MacLeod wrote:
>> On 11/24/20 12:42 PM, Andrew MacLeod wrote:
>>> On 11/23/20 4:38 PM, Martin Sebor wrote:
>>>> On 11/21/20 6:26 AM, Andrew MacLeod wrote:
>>>>> On 11/21/20 12:07 AM, Jeff Law wrote:
>>>>>>
>>>>>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>>>>>> Ping:
>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>>>>>
>>>>>>>
>>>>>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>>>>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>>>>>> it a try in stage 3 if it's still in scope for GCC 11. Otherwise,
>>>>>>> is this patch okay to commit?
>>>>>> So all we're going to get from the ranger is ranges of operands,
>>>>>> right?
>>>>>> Meaning that we still need to either roll our own evaluator
>>>>>> (eval_size_vflow) or overload range_for_stmt with our own, which
>>>>>> likely
>>>>>> looks like eval_size_vflow anyway, right?
>>>>>>
>>>>>> My hope was to avoid the roll our own evaluator, but that doesn't
>>>>>> look
>>>>>> like it's in the cards in the reasonably near future.
>>>>>
>>>>> Is there a PR open showing what exactly you are looking for?
>>>>> I'm using open PRs to track enhancement requests, and they will
>>>>> all feed back into the development roadmap  I am working on.
>>>>
>>>> Not that I know of.  The background is upthread, in particular in
>>>> Aldy's response here:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html
>>>>
>>>> I like the suggestion and if/when I have the time I'd like to give
>>>> it a try.  Until then, I think the patch is useful on its own so
>>>> I'll go with it for now.
>>>>
>>>> Longer term, I do hope we can revisit the idea of computing either
>>>> mathematically correct ranges alongside those required by the language
>>>> semantics, or tracking signed overflow or unsigned wraparound. E.g.,
>>>> in:
>>>>
>>>>   void* f (int n)
>>>>   {
>>>>     if (n < INT_MAX / 3)
>>>>       n = INT_MAX / 3;
>>>>
>>>>     n *= sizeof (int);
>>>>     // n is [(INT_MAX / 3) * 4, INF] mathematically
>>>>     // undefined due to overflow in C
>>>>     // but [INT_MIN, INT_MAX] according to VRP
>>>
>>> but sizeof returns a size_t.. which is an unsigned. thus the
>>> multiply is promoted to an unsigned multiply  which means there is
>>> lots of wrapping and I don't see how you can conclude those ranges?
>>> [INT_MIN, INT_MAX] are all possible outcomes based on the code that
>>> is generated.
>>>
>>> If I change that to
>>>   n *= (int) sizeof (int) to keep it as signed arithmetic, I see:
>>>
>>>
>>> Folding statement: n_4 = n_1 * 4;
>>> EVRP:hybrid: RVRP found singleton 2147483647
>>> Queued stmt for removal.  Folds to: 2147483647
>>> evrp visiting stmt _7 = malloc (n_4);
>>>
>>> extract_range_from_stmt visiting:
>>> _7 = foo (n_4);
>>> Folding statement: _7 = foo (n_4);
>>> EVRP:hybrid: RVRP found singleton 2147483647
>>> Folded into: _7 = malloc (2147483647);
>>>
>>> So I'm not sure what exactly you want to do?  We are calculating
>>> what the program can produce?
>>>
>>> Why do we care about alternative calculations?
>>>
>> Or rather, why do we want to do this?
>
> When computing the sizes of things, programmers commonly forget
> to consider unsigned wrapping (or signed overflow).  We simply
> assume it can't happen and that (for instance) N * sizeof (X)
> is necessarily big enough for N elements of type X.  (Grepping
> any code base for the pattern '\* sizeof' and looking for code
> that tests that the result doesn't wrap is revealing.)
>
> When overflow or wrapping does happen (typically because of poor
> precondition checking) it often leads to bugs when we end up
> allocating less space than we need and use.  A simple example
> to help illustrate what I mean:
>
>   void* g (int *a, int n)
>   {
>     a = realloc (a, n * sizeof (int) + 32);
>     for (int i = n; i != n + 32; ++i)
>       a[i] = f ();
>   }
>
> In ILP32, if (n > INT_MAX / 4 - 32) holds, n * sizeof(int) will
> wrap around zero.  The realloc call will end up allocating less
> space than expected, and the loop will write past the end of
> the allocated block.
>
> (The bug above can only be detected if we know n's range.
> I left that part out.)
>
> Historically, bugs caused by integer overflow and wrapping have
> been among the most serious security weaknesses.  Detecting these
> mistakes will help prevent some of these.
>
> The problem is that according to C/C++, nothing in the function
> above is undefined except for the buffer overflow in the loop,
> and the buffer overflow only happens because of the well-defined
> integer wrapping.  To detect the wrapping, we either need to do
> the computation in as-if infinite math and compare the final result
> to the result we get under C's truncating rules, or we need to set
> and propagate the "wraparound" bit throughout the computation.
Just to add a bit to Martin's note.  Yes, an overflow of the size passed
to an allocation routine is a significant security risk and these have
been actively exploited through the years.  In simpest terms of an
attacker can control the argument to malloc, then they can make
subsequent writes scribble into fairly arbitrary chunks of memory.

There was a great example a while back where there was an overflow in an
allocation call from the low level glibc printf support.  By
manipulating the exact overflow result the attacker was able to then
control a write of 0 into what should have been in the allocated buffer
to instead write the 0 into a private field of a FILE* pointer that
controlled whether or not the output string fortification hardening was
applied to the file stream.  Turning off the fortification on the stream
in turn allowed them to use a positional printf argument attack to
perform arbitrary writes and ultimately control an application (cups
daemon IIRC).


Jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-12-04 19:14 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-15 19:47 [PATCH] warn for integer overflow in allocation calls (PR 96838) Martin Sebor
2020-09-16  9:39 ` Bernhard Reutner-Fischer
2020-09-17  3:23 ` Jeff Law
2020-09-17 16:08   ` Martin Sebor
2020-09-17 18:39     ` Andrew MacLeod
2020-09-17 20:18       ` Martin Sebor
2020-09-18  6:29         ` Aldy Hernandez
2020-09-19 21:22           ` Martin Sebor
2020-09-20  6:39             ` Aldy Hernandez
2020-09-21 15:13               ` Martin Sebor
2020-11-09 16:00                 ` Martin Sebor
2020-11-21  5:07                   ` Jeff Law
2020-11-21 13:26                     ` Andrew MacLeod
2020-11-23 21:38                       ` Martin Sebor
2020-11-24 17:42                         ` Andrew MacLeod
2020-11-24 17:44                           ` Andrew MacLeod
2020-11-24 18:39                             ` Martin Sebor
2020-12-04 19:14                               ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).