PR analyzer/94362 Partial Fix

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* PR analyzer/94362 Partial Fix
@ 2021-02-27 10:04 brian.sobulefsky
  2021-03-02  2:30 ` David Malcolm
  0 siblings, 1 reply; 9+ messages in thread
From: brian.sobulefsky @ 2021-02-27 10:04 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2977 bytes --]

Hi,

Please find a patch to fix part of the bug PR analyzer/94362. This bug is a
false positive for a null dereference found when compiling openssl. The cause
is the constraint_manager not knowing that i >= 0 within the for block:

for ( ; i-- > 0; )

The bug can be further reduced to the constraint manager not knowing that i >= 0
within the if block:

if (i-- > 0)

which is not replicated for other operators, such as prefix decrement. The
cause is that the constraint is applied to the initial_svalue of i, while it
is a binop_svalue of i that enters the block (with op PLUS and arg1 -1). The
constraint_manager does not have any constraints for this svalue and has no
handler. A handler has been added that essentially recurs on the remaining arg
if the other arg and other side of the condition are both constants and the op
is PLUS_EXPR or MINUS_EXPR.

This in essence fixed the problem, except an off by one error had been hiding
in range::eval_condition. This error is hard to notice, because, for example,
the code

if(idx > 0)
  __analyzer_eval(idx >= 1);

will compile as (check -fdump-ipa-analyzer to see)

void test (int idx)
{
  _Bool _1;
  int _2;

  <bb 2> :
  if (idx_4(D) > 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  _1 = idx_4(D) > 0;
  _2 = (int) _1;
  __analyzer_eval (_2);

  <bb 4> :
  return;

}

and will print "TRUE" to the screen, but as you can see, it is for the wrong
reason, because we are not really testing the condition we wanted to test.

You can force the failure (i.e. "UNKNOWN") for yourself with the following:

void test(int idx)
{
  int check = 1;
  if(idx > 0)
    __analyzer_eval(idx >= check);
}

which the compiler will not "fix" on us. An examination of range::eval_condition
should convince you that there is an off by one error. Incidentally, I might
recommend doing away with "open intervals of integers" entirely.

When running the initial bug (the for loop), you will find that the analyzer
prints "UNKNOWN" twice for the postfix operator, and "TRUE" "UNKNOWN" for other
operators. This patch reduces postfix to the same state as the other operators.
The second "UNKNOWN" seems to be due to a second "iterated" pass through the
loop with a widening_svalue. A full fix of the bug will need a handler for the
widening svalue, and much more critically, a correct merge of the constraints
at the loop entrance. That, unfortunately, looks to be a hard problem.

This patch fixes a few xfails as noted in the commit message. These were tests
that were evidently devised to test whether the analyzer would understand
arithmetic being done on constrained values. Addition and subtraction is now
working as expected, a handler for multiplication and division can be added.

As was noted in those test files, consideration at some point should be given to
overflow.


Thank you,
Brian

Sent with ProtonMail Secure Email.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr94362-v00.patch --]
[-- Type: text/x-patch; name=pr94362-v00.patch, Size: 17544 bytes --]

commit d4052e8c273ca267f6dcf782084d60acfc50a609
Author: Brian Sobulefsky <brian.sobulefsky@protonmail.com>
Date:   Sat Feb 27 00:36:40 2021 -0800

    Changes to support eventual solution to bug PR analyzer/94362. This bug
    originated with a false positive null dereference during compilation of
    openssl. The bug is in effect caused by an error in constraint handling,
    specifically that within the for block:
    
    for ( ; i-- > 0; )
      {
      }
    
    the constraint_manager should know i >= 0 but does not. A reduced form of
    this bug was found where the constraint manager did not know within the if
    block:
    
    if (i-- > 0)
      {
      }
    
    that i >= 0. This latter error was only present for the postfix
    operators, and not for other forms, like --i > 0. It was due to the
    constraint being set for the initial_svalue associated with i, but a
    binop_svalue being what entered the if block for which no constraint
    rules existed.
    
    By adding handling logic for a binop_svalue that adds or
    subtracts a constant, this problem was solved. This logic was added to
    a new method, constraint_manager::maybe_fold_condition, with the
    intent of eventually adding more cases there (unary_svalue and
    widening_svalue for example). Additionally, an off by one error was
    found in range::eval_condition that needed to be corrected to get
    the expected result. Correction of this error was done in that
    subroutine, resulting in no more calls to below_lower_bound and
    above_upper_bound. As such, these functions were commented out and may
    be removed if not needed for anything else.
    
    This change does not entirely fix the initial bug pr94362, but it
    reduces the postfix operator to the same state as other operators. In the
    case of the for loop, there appears to be an "initial pass" through the
    loop, which the analyzer will now understand for postfix, and then an
    "iterated pass" with a widening_svalue that the analyzer does not
    understand for any condition found. This seems to be due to the merging
    of constraints and is under investigation.
    
    While not the original intent of this change, during run of the
    testsuite it was found that some errors that were marked xfail had been
    corrected in the files testsuite/gcc.dg/analyzer/operations.c and
    testsuite/gcc.dg/analyzer/params.c. These corrected errors were for
    addition and subtraction manipulations of variables within a
    conditional. Other manipulations, like multiply and divide, are still
    failing as expected because no handler has been added.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/94362
            * analyzer/constraint-manager.cc
            (constraint_manager::eval_condition(svalue *, op, svalue *)):
            Add call to new function maybe_fold_condition.
            * analyzer/constraint-manager.cc
            (constraint_manager::maybe_fold_condition): New function.
            * analyzer/constraint-manager.cc (range::eval_condition): Update
            logic to correct off by one error
            * analyzer/constraint-manager.cc (range::below_lower_bound):
            Remove function as there are no longer any calls to it.
            * analyzer/constraint-manager.cc (range::above_upper_bound):
            Remove function as there are no longer any calls to it.
            * analyzer/constraint-manager.h (class constraint_manager): Add
            new function declaration maybe_fold_condition.
            * analyzer/constraint-manager.h (class range): Remove
            declarations of functions no longer called.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/94362
            * gcc.dg/analyzer/operations.c: remove "desired", "xfail" and
            entire call to "bogus".
            * gcc.dg/analyzer/params.c: remove "desired", "xfail" and
            entire call to "bogus".

diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc
index 4dadd200bee..5184ead2b3f 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -181,24 +181,56 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
   if (tree single_element = copy.constrained_to_single_element ())
     return compare_constants (single_element, op, rhs_const);
 
+  /* Corner case in which constrained_to_single_element will not convert the
+     bounds to closure (i.e., a bound with only one end point) */
+  if(copy.m_lower_bound.m_constant == NULL_TREE
+      && copy.m_upper_bound.m_constant != NULL_TREE
+      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_upper_bound.m_constant)))
+    copy.m_upper_bound.ensure_closed(true);
+  if(copy.m_upper_bound.m_constant == NULL_TREE
+      && copy.m_lower_bound.m_constant != NULL_TREE
+      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_lower_bound.m_constant)))
+    copy.m_lower_bound.ensure_closed(false);
+
   switch (op)
     {
+
     case EQ_EXPR:
-      if (below_lower_bound (rhs_const))
+      if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
-      if (above_upper_bound (rhs_const))
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
       break;
 
     case LT_EXPR:
-    case LE_EXPR:
-      /* Qn: "X </<= RHS_CONST".  */
+     /* Qn: "X </<= RHS_CONST".  */
       /* If RHS_CONST > upper bound, then it's true.
 	 If RHS_CONST < lower bound, then it's false.
 	 Otherwise unknown.  */
-      if (above_upper_bound (rhs_const))
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
+	return tristate (tristate::TS_TRUE);
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LE_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
+	return tristate (tristate::TS_FALSE);
+      break;
+
+    case LE_EXPR:
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GE_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
-      if (below_lower_bound (rhs_const))
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
       break;
 
@@ -207,18 +239,37 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
       /* If RHS_CONST < lower bound, then it's true.
 	 If RHS_CONST > upper bound, then it's false.
 	 Otherwise unknown.  */
-      if (below_lower_bound (rhs_const))
+      if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
-      if (above_upper_bound (rhs_const))
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
       break;
 
     case GE_EXPR:
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
+	return tristate (tristate::TS_FALSE);
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LE_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
+	return tristate (tristate::TS_TRUE);
+      break;
+
     case GT_EXPR:
-      /* Qn: "X >=/> RHS_CONST".  */
-      if (above_upper_bound (rhs_const))
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GE_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
-      if (below_lower_bound (rhs_const))
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
       break;
 
@@ -229,9 +280,15 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
   return tristate (tristate::TS_UNKNOWN);
 }
 
+
+/* Since updating range::eval_condition to fix the off by one errors, these two
+   subroutines are no longer called anywhere else so they are commented out to
+   suppress any compiler warnings. If they will not be used anymore, it can be
+   completely removed. */
+
 /* Return true if RHS_CONST is below the lower bound of this range.  */
 
-bool
+/*bool
 range::below_lower_bound (tree rhs_const) const
 {
   if (!m_lower_bound.m_constant)
@@ -240,11 +297,11 @@ range::below_lower_bound (tree rhs_const) const
   return compare_constants (rhs_const,
 			    m_lower_bound.m_closed ? LT_EXPR : LE_EXPR,
 			    m_lower_bound.m_constant).is_true ();
-}
+}*/
 
 /* Return true if RHS_CONST is above the upper bound of this range.  */
 
-bool
+/*bool
 range::above_upper_bound (tree rhs_const) const
 {
   if (!m_upper_bound.m_constant)
@@ -253,7 +310,7 @@ range::above_upper_bound (tree rhs_const) const
   return compare_constants (rhs_const,
 			    m_upper_bound.m_closed ? GT_EXPR : GE_EXPR,
 			    m_upper_bound.m_constant).is_true ();
-}
+}*/
 
 /* class equiv_class.  */
 
@@ -1499,6 +1556,167 @@ constraint_manager::eval_condition (const svalue *lhs,
 	}
     }
 
+  /* Try folding the condition with any sub svalues */
+  {
+    tristate result_for_folding = maybe_fold_condition (lhs, op, rhs);
+    if (result_for_folding.is_known ())
+      return result_for_folding;
+  }
+
+  return tristate (tristate::TS_UNKNOWN);
+}
+
+/* Method to put logic for folding conditions. Currently contains logic for
+   folding a binop_svalue where one arg is a constant and the other side of
+   the condition is a constant. Inspiration was to handle the conditional
+   if (i-- > 0) __analyzer_eval (i >= 0);
+   
+   Other cases may be added, including perhaps logic to handle unaryop_svalue
+   and widening_svalue. */
+
+tristate
+constraint_manager::maybe_fold_condition (const svalue *lhs,
+				    enum tree_code op,
+				    const svalue *rhs) const
+{
+
+  if (const binop_svalue *binop_sval = lhs->dyn_cast_binop_svalue ())
+    {
+      const constant_svalue *cst_arg;
+      const svalue *other_arg = NULL;
+      const constant_svalue *cst_cond;
+      const svalue *new_cond = NULL;
+      tree_code bin_op;
+      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* (CST_ARG +/- OTHER_ARG) OP CST_COND ==>
+	           +: OTHER_ARG OP (CST_COND - CST_ARG)
+		   -: (CST_ARG - CST_COND) OP OTHER_ARG */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond
+		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
+				    MINUS_EXPR, TREE_TYPE (
+				      cst_cond->get_constant ()),
+				    cst_cond->get_constant (),
+				    cst_arg->get_constant ()));
+		  return eval_condition (other_arg, op, new_cond);
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_arg->get_constant(),
+					  cst_cond->get_constant()));
+		  return eval_condition (new_cond, op, other_arg);
+		}
+	    }
+	}
+      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* (OTHER_ARG +/- CST_ARG) OP CST_COND ==>
+	         OTHER_ARG OP (CST_COND -/+ CST_ARG) */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  PLUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      if (other_arg && new_cond){
+		return eval_condition (other_arg, op, new_cond);
+	      }
+
+	    }
+	} 
+    }
+  else if (const binop_svalue *binop_sval = rhs->dyn_cast_binop_svalue ())
+    {
+      const constant_svalue *cst_arg;
+      const svalue *other_arg = NULL;
+      const constant_svalue *cst_cond;
+      const svalue *new_cond = NULL;
+      tree_code bin_op;
+      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* CST_COND OP (CST_ARG +/- OTHER_ARG) ==>
+	         +: (CST_COND - CST_ARG) OP OTHER_ARG
+		 -: OTHER_ARG OP (CST_ARG - CST_COND) */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond
+		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
+				    MINUS_EXPR, TREE_TYPE (
+				      cst_cond->get_constant ()),
+				    cst_cond->get_constant (),
+				    cst_arg->get_constant ()));
+		  return eval_condition (new_cond, op, other_arg);
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_arg->get_constant(),
+					  cst_cond->get_constant()));
+		  return eval_condition (other_arg, op, new_cond);
+		}
+
+
+	    }
+	}
+      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* CST_COND OP (OTHER_ARG +/- CST_ARG) ==>
+	         (CST_COND -/+ CST_ARG) OP OTHER_ARG */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  PLUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      if (other_arg && new_cond)
+		return eval_condition (new_cond, op, other_arg);
+
+	    }
+	} 
+    }
   return tristate (tristate::TS_UNKNOWN);
 }
 
diff --git a/gcc/analyzer/constraint-manager.h b/gcc/analyzer/constraint-manager.h
index 3173610a723..9017a88ecac 100644
--- a/gcc/analyzer/constraint-manager.h
+++ b/gcc/analyzer/constraint-manager.h
@@ -57,8 +57,10 @@ struct range
 
   tristate eval_condition (enum tree_code op,
 			   tree rhs_const) const;
-  bool below_lower_bound (tree rhs_const) const;
-  bool above_upper_bound (tree rhs_const) const;
+
+  /* Commented out because no longer called from anywhere in program */
+  //bool below_lower_bound (tree rhs_const) const;
+  //bool above_upper_bound (tree rhs_const) const;
 
   bound m_lower_bound;
   bound m_upper_bound;
@@ -284,6 +286,9 @@ public:
   auto_vec<constraint> m_constraints;
 
  private:
+  tristate maybe_fold_condition (const svalue *lhs,
+			   	  enum tree_code op,
+			   	  const svalue *rhs) const;
   void add_constraint_internal (equiv_class_id lhs_id,
 				enum constraint_op c_op,
 				equiv_class_id rhs_id);
diff --git a/gcc/testsuite/gcc.dg/analyzer/operations.c b/gcc/testsuite/gcc.dg/analyzer/operations.c
index 79e76bccc66..87b3703a3d9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/operations.c
+++ b/gcc/testsuite/gcc.dg/analyzer/operations.c
@@ -9,14 +9,14 @@ void test (int i, int j)
 
     i += 3;
 
-    __analyzer_eval (i > 45); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 45); /* { dg-warning "TRUE" } */
+ 
     /* TODO(xfail): do we really know this?  what about overflow?  */
 
     i -= 1;
 
-    __analyzer_eval (i > 44); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 44); /* { dg-warning "TRUE" } */
+
     /* TODO(xfail): do we really know this?  what about overflow?  */
 
     i = 3 * i;
diff --git a/gcc/testsuite/gcc.dg/analyzer/params.c b/gcc/testsuite/gcc.dg/analyzer/params.c
index 12bba70d6e4..a1c443b39eb 100644
--- a/gcc/testsuite/gcc.dg/analyzer/params.c
+++ b/gcc/testsuite/gcc.dg/analyzer/params.c
@@ -8,8 +8,8 @@ static int __analyzer_called_function(int j)
 
   k = j - 1;
 
-  __analyzer_eval (k > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-  /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+  __analyzer_eval (k > 3); /* { dg-warning "TRUE" } */
+
   /* TODO(xfail): we're not then updating based on the assignment.  */
 
   return k;
@@ -25,8 +25,8 @@ void test(int i)
 
     i = __analyzer_called_function(i);
 
-    __analyzer_eval (i > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 3); /* { dg-warning "TRUE" } */
+
     /* TODO(xfail): we're not updating from the returned value.  */
   }
 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: pr94362-v00.patch --]
[-- Type: text/x-patch; name=pr94362-v00.patch, Size: 17544 bytes --]

commit d4052e8c273ca267f6dcf782084d60acfc50a609
Author: Brian Sobulefsky <brian.sobulefsky@protonmail.com>
Date:   Sat Feb 27 00:36:40 2021 -0800

    Changes to support eventual solution to bug PR analyzer/94362. This bug
    originated with a false positive null dereference during compilation of
    openssl. The bug is in effect caused by an error in constraint handling,
    specifically that within the for block:
    
    for ( ; i-- > 0; )
      {
      }
    
    the constraint_manager should know i >= 0 but does not. A reduced form of
    this bug was found where the constraint manager did not know within the if
    block:
    
    if (i-- > 0)
      {
      }
    
    that i >= 0. This latter error was only present for the postfix
    operators, and not for other forms, like --i > 0. It was due to the
    constraint being set for the initial_svalue associated with i, but a
    binop_svalue being what entered the if block for which no constraint
    rules existed.
    
    By adding handling logic for a binop_svalue that adds or
    subtracts a constant, this problem was solved. This logic was added to
    a new method, constraint_manager::maybe_fold_condition, with the
    intent of eventually adding more cases there (unary_svalue and
    widening_svalue for example). Additionally, an off by one error was
    found in range::eval_condition that needed to be corrected to get
    the expected result. Correction of this error was done in that
    subroutine, resulting in no more calls to below_lower_bound and
    above_upper_bound. As such, these functions were commented out and may
    be removed if not needed for anything else.
    
    This change does not entirely fix the initial bug pr94362, but it
    reduces the postfix operator to the same state as other operators. In the
    case of the for loop, there appears to be an "initial pass" through the
    loop, which the analyzer will now understand for postfix, and then an
    "iterated pass" with a widening_svalue that the analyzer does not
    understand for any condition found. This seems to be due to the merging
    of constraints and is under investigation.
    
    While not the original intent of this change, during run of the
    testsuite it was found that some errors that were marked xfail had been
    corrected in the files testsuite/gcc.dg/analyzer/operations.c and
    testsuite/gcc.dg/analyzer/params.c. These corrected errors were for
    addition and subtraction manipulations of variables within a
    conditional. Other manipulations, like multiply and divide, are still
    failing as expected because no handler has been added.
    
    gcc/analyzer/ChangeLog:
            PR analyzer/94362
            * analyzer/constraint-manager.cc
            (constraint_manager::eval_condition(svalue *, op, svalue *)):
            Add call to new function maybe_fold_condition.
            * analyzer/constraint-manager.cc
            (constraint_manager::maybe_fold_condition): New function.
            * analyzer/constraint-manager.cc (range::eval_condition): Update
            logic to correct off by one error
            * analyzer/constraint-manager.cc (range::below_lower_bound):
            Remove function as there are no longer any calls to it.
            * analyzer/constraint-manager.cc (range::above_upper_bound):
            Remove function as there are no longer any calls to it.
            * analyzer/constraint-manager.h (class constraint_manager): Add
            new function declaration maybe_fold_condition.
            * analyzer/constraint-manager.h (class range): Remove
            declarations of functions no longer called.
    
    gcc/testsuite/ChangeLog:
            PR analyzer/94362
            * gcc.dg/analyzer/operations.c: remove "desired", "xfail" and
            entire call to "bogus".
            * gcc.dg/analyzer/params.c: remove "desired", "xfail" and
            entire call to "bogus".

diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc
index 4dadd200bee..5184ead2b3f 100644
--- a/gcc/analyzer/constraint-manager.cc
+++ b/gcc/analyzer/constraint-manager.cc
@@ -181,24 +181,56 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
   if (tree single_element = copy.constrained_to_single_element ())
     return compare_constants (single_element, op, rhs_const);
 
+  /* Corner case in which constrained_to_single_element will not convert the
+     bounds to closure (i.e., a bound with only one end point) */
+  if(copy.m_lower_bound.m_constant == NULL_TREE
+      && copy.m_upper_bound.m_constant != NULL_TREE
+      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_upper_bound.m_constant)))
+    copy.m_upper_bound.ensure_closed(true);
+  if(copy.m_upper_bound.m_constant == NULL_TREE
+      && copy.m_lower_bound.m_constant != NULL_TREE
+      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_lower_bound.m_constant)))
+    copy.m_lower_bound.ensure_closed(false);
+
   switch (op)
     {
+
     case EQ_EXPR:
-      if (below_lower_bound (rhs_const))
+      if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
-      if (above_upper_bound (rhs_const))
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
       break;
 
     case LT_EXPR:
-    case LE_EXPR:
-      /* Qn: "X </<= RHS_CONST".  */
+     /* Qn: "X </<= RHS_CONST".  */
       /* If RHS_CONST > upper bound, then it's true.
 	 If RHS_CONST < lower bound, then it's false.
 	 Otherwise unknown.  */
-      if (above_upper_bound (rhs_const))
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
+	return tristate (tristate::TS_TRUE);
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LE_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
+	return tristate (tristate::TS_FALSE);
+      break;
+
+    case LE_EXPR:
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GE_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
-      if (below_lower_bound (rhs_const))
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
       break;
 
@@ -207,18 +239,37 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
       /* If RHS_CONST < lower bound, then it's true.
 	 If RHS_CONST > upper bound, then it's false.
 	 Otherwise unknown.  */
-      if (below_lower_bound (rhs_const))
+      if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
-      if (above_upper_bound (rhs_const))
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
       break;
 
     case GE_EXPR:
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GT_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
+	return tristate (tristate::TS_FALSE);
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LE_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
+	return tristate (tristate::TS_TRUE);
+      break;
+
     case GT_EXPR:
-      /* Qn: "X >=/> RHS_CONST".  */
-      if (above_upper_bound (rhs_const))
+
+      if (copy.m_upper_bound.m_constant
+	  && compare_constants (rhs_const, GE_EXPR,
+		copy.m_upper_bound.m_constant).is_true())
 	return tristate (tristate::TS_FALSE);
-      if (below_lower_bound (rhs_const))
+      else if (copy.m_lower_bound.m_constant
+	       && compare_constants (rhs_const, LT_EXPR,
+		     copy.m_lower_bound.m_constant).is_true())
 	return tristate (tristate::TS_TRUE);
       break;
 
@@ -229,9 +280,15 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
   return tristate (tristate::TS_UNKNOWN);
 }
 
+
+/* Since updating range::eval_condition to fix the off by one errors, these two
+   subroutines are no longer called anywhere else so they are commented out to
+   suppress any compiler warnings. If they will not be used anymore, it can be
+   completely removed. */
+
 /* Return true if RHS_CONST is below the lower bound of this range.  */
 
-bool
+/*bool
 range::below_lower_bound (tree rhs_const) const
 {
   if (!m_lower_bound.m_constant)
@@ -240,11 +297,11 @@ range::below_lower_bound (tree rhs_const) const
   return compare_constants (rhs_const,
 			    m_lower_bound.m_closed ? LT_EXPR : LE_EXPR,
 			    m_lower_bound.m_constant).is_true ();
-}
+}*/
 
 /* Return true if RHS_CONST is above the upper bound of this range.  */
 
-bool
+/*bool
 range::above_upper_bound (tree rhs_const) const
 {
   if (!m_upper_bound.m_constant)
@@ -253,7 +310,7 @@ range::above_upper_bound (tree rhs_const) const
   return compare_constants (rhs_const,
 			    m_upper_bound.m_closed ? GT_EXPR : GE_EXPR,
 			    m_upper_bound.m_constant).is_true ();
-}
+}*/
 
 /* class equiv_class.  */
 
@@ -1499,6 +1556,167 @@ constraint_manager::eval_condition (const svalue *lhs,
 	}
     }
 
+  /* Try folding the condition with any sub svalues */
+  {
+    tristate result_for_folding = maybe_fold_condition (lhs, op, rhs);
+    if (result_for_folding.is_known ())
+      return result_for_folding;
+  }
+
+  return tristate (tristate::TS_UNKNOWN);
+}
+
+/* Method to put logic for folding conditions. Currently contains logic for
+   folding a binop_svalue where one arg is a constant and the other side of
+   the condition is a constant. Inspiration was to handle the conditional
+   if (i-- > 0) __analyzer_eval (i >= 0);
+   
+   Other cases may be added, including perhaps logic to handle unaryop_svalue
+   and widening_svalue. */
+
+tristate
+constraint_manager::maybe_fold_condition (const svalue *lhs,
+				    enum tree_code op,
+				    const svalue *rhs) const
+{
+
+  if (const binop_svalue *binop_sval = lhs->dyn_cast_binop_svalue ())
+    {
+      const constant_svalue *cst_arg;
+      const svalue *other_arg = NULL;
+      const constant_svalue *cst_cond;
+      const svalue *new_cond = NULL;
+      tree_code bin_op;
+      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* (CST_ARG +/- OTHER_ARG) OP CST_COND ==>
+	           +: OTHER_ARG OP (CST_COND - CST_ARG)
+		   -: (CST_ARG - CST_COND) OP OTHER_ARG */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond
+		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
+				    MINUS_EXPR, TREE_TYPE (
+				      cst_cond->get_constant ()),
+				    cst_cond->get_constant (),
+				    cst_arg->get_constant ()));
+		  return eval_condition (other_arg, op, new_cond);
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_arg->get_constant(),
+					  cst_cond->get_constant()));
+		  return eval_condition (new_cond, op, other_arg);
+		}
+	    }
+	}
+      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* (OTHER_ARG +/- CST_ARG) OP CST_COND ==>
+	         OTHER_ARG OP (CST_COND -/+ CST_ARG) */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  PLUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      if (other_arg && new_cond){
+		return eval_condition (other_arg, op, new_cond);
+	      }
+
+	    }
+	} 
+    }
+  else if (const binop_svalue *binop_sval = rhs->dyn_cast_binop_svalue ())
+    {
+      const constant_svalue *cst_arg;
+      const svalue *other_arg = NULL;
+      const constant_svalue *cst_cond;
+      const svalue *new_cond = NULL;
+      tree_code bin_op;
+      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* CST_COND OP (CST_ARG +/- OTHER_ARG) ==>
+	         +: (CST_COND - CST_ARG) OP OTHER_ARG
+		 -: OTHER_ARG OP (CST_ARG - CST_COND) */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond
+		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
+				    MINUS_EXPR, TREE_TYPE (
+				      cst_cond->get_constant ()),
+				    cst_cond->get_constant (),
+				    cst_arg->get_constant ()));
+		  return eval_condition (new_cond, op, other_arg);
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_arg->get_constant(),
+					  cst_cond->get_constant()));
+		  return eval_condition (other_arg, op, new_cond);
+		}
+
+
+	    }
+	}
+      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
+	{
+	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
+	    {
+	      other_arg = binop_sval->get_arg0 ();
+	      bin_op = binop_sval->get_op();
+	      /* CST_COND OP (OTHER_ARG +/- CST_ARG) ==>
+	         (CST_COND -/+ CST_ARG) OP OTHER_ARG */
+	      if(bin_op == PLUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  MINUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      else if(bin_op == MINUS_EXPR)
+		{
+		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
+					  PLUS_EXPR, TREE_TYPE (
+					    cst_cond->get_constant ()),
+					  cst_cond->get_constant(),
+					  cst_arg->get_constant()));
+		}
+	      if (other_arg && new_cond)
+		return eval_condition (new_cond, op, other_arg);
+
+	    }
+	} 
+    }
   return tristate (tristate::TS_UNKNOWN);
 }
 
diff --git a/gcc/analyzer/constraint-manager.h b/gcc/analyzer/constraint-manager.h
index 3173610a723..9017a88ecac 100644
--- a/gcc/analyzer/constraint-manager.h
+++ b/gcc/analyzer/constraint-manager.h
@@ -57,8 +57,10 @@ struct range
 
   tristate eval_condition (enum tree_code op,
 			   tree rhs_const) const;
-  bool below_lower_bound (tree rhs_const) const;
-  bool above_upper_bound (tree rhs_const) const;
+
+  /* Commented out because no longer called from anywhere in program */
+  //bool below_lower_bound (tree rhs_const) const;
+  //bool above_upper_bound (tree rhs_const) const;
 
   bound m_lower_bound;
   bound m_upper_bound;
@@ -284,6 +286,9 @@ public:
   auto_vec<constraint> m_constraints;
 
  private:
+  tristate maybe_fold_condition (const svalue *lhs,
+			   	  enum tree_code op,
+			   	  const svalue *rhs) const;
   void add_constraint_internal (equiv_class_id lhs_id,
 				enum constraint_op c_op,
 				equiv_class_id rhs_id);
diff --git a/gcc/testsuite/gcc.dg/analyzer/operations.c b/gcc/testsuite/gcc.dg/analyzer/operations.c
index 79e76bccc66..87b3703a3d9 100644
--- a/gcc/testsuite/gcc.dg/analyzer/operations.c
+++ b/gcc/testsuite/gcc.dg/analyzer/operations.c
@@ -9,14 +9,14 @@ void test (int i, int j)
 
     i += 3;
 
-    __analyzer_eval (i > 45); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 45); /* { dg-warning "TRUE" } */
+ 
     /* TODO(xfail): do we really know this?  what about overflow?  */
 
     i -= 1;
 
-    __analyzer_eval (i > 44); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 44); /* { dg-warning "TRUE" } */
+
     /* TODO(xfail): do we really know this?  what about overflow?  */
 
     i = 3 * i;
diff --git a/gcc/testsuite/gcc.dg/analyzer/params.c b/gcc/testsuite/gcc.dg/analyzer/params.c
index 12bba70d6e4..a1c443b39eb 100644
--- a/gcc/testsuite/gcc.dg/analyzer/params.c
+++ b/gcc/testsuite/gcc.dg/analyzer/params.c
@@ -8,8 +8,8 @@ static int __analyzer_called_function(int j)
 
   k = j - 1;
 
-  __analyzer_eval (k > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-  /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+  __analyzer_eval (k > 3); /* { dg-warning "TRUE" } */
+
   /* TODO(xfail): we're not then updating based on the assignment.  */
 
   return k;
@@ -25,8 +25,8 @@ void test(int i)
 
     i = __analyzer_called_function(i);
 
-    __analyzer_eval (i > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
-    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
+    __analyzer_eval (i > 3); /* { dg-warning "TRUE" } */
+
     /* TODO(xfail): we're not updating from the returned value.  */
   }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-02-27 10:04 PR analyzer/94362 Partial Fix brian.sobulefsky
@ 2021-03-02  2:30 ` David Malcolm
  2021-03-02  7:09   ` brian.sobulefsky
  0 siblings, 1 reply; 9+ messages in thread
From: David Malcolm @ 2021-03-02  2:30 UTC (permalink / raw)
  To: brian.sobulefsky, gcc-patches

On Sat, 2021-02-27 at 10:04 +0000, brian.sobulefsky wrote:
> Hi,
> 
> Please find a patch to fix part of the bug PR analyzer/94362.

Thanks.  Various comments inline below.

>  This bug is a
> false positive for a null dereference found when compiling openssl.
> The cause
> is the constraint_manager not knowing that i >= 0 within the for
> block:
> 
> for ( ; i-- > 0; )
> 
> The bug can be further reduced to the constraint manager not knowing
> that i >= 0
> within the if block:
> 
> if (i-- > 0)
> 
> which is not replicated for other operators, such as prefix
> decrement. The
> cause is that the constraint is applied to the initial_svalue of i,
> while it
> is a binop_svalue of i that enters the block (with op PLUS and arg1 -
> 1). The
> constraint_manager does not have any constraints for this svalue and
> has no
> handler. A handler has been added that essentially recurs on the
> remaining arg
> if the other arg and other side of the condition are both constants
> and the op
> is PLUS_EXPR or MINUS_EXPR.
> 
> This in essence fixed the problem, except an off by one error had
> been hiding
> in range::eval_condition. This error is hard to notice, because, for
> example,
> the code
> 
> if(idx > 0)
>   __analyzer_eval(idx >= 1);
> 
> will compile as (check -fdump-ipa-analyzer to see)
> 
> void test (int idx)
> {
>   _Bool _1;
>   int _2;
> 
>   <bb 2> :
>   if (idx_4(D) > 0)
>     goto <bb 3>; [INV]
>   else
>     goto <bb 4>; [INV]
> 
>   <bb 3> :
>   _1 = idx_4(D) > 0;
>   _2 = (int) _1;
>   __analyzer_eval (_2);
> 
>   <bb 4> :
>   return;
> 
> }
> 
> and will print "TRUE" to the screen, but as you can see, it is for
> the wrong
> reason, because we are not really testing the condition we wanted to
> test.
> 
> You can force the failure (i.e. "UNKNOWN") for yourself with the
> following:
> 
> void test(int idx)
> {
>   int check = 1;
>   if(idx > 0)
>     __analyzer_eval(idx >= check);
> }
> 
> which the compiler will not "fix" on us. 

Thank.  This looks like a good way to create DejaGnu tests that verify
the constraint_manager code, rather than accidentally testing the
optimizer.


> An examination of range::eval_condition
> should convince you that there is an off by one error. 

Yes, looking at the switch statement, the fact that LT_EXPR and LE_EXPR
share the same case suggests the boundaries aren't properly handled
(and the same for GT_EXPR and GE_EXPR)


> Incidentally, I might
> recommend doing away with "open intervals of integers" entirely.

What would the alternative be?

Note that in the range class a bound can have a NULL m_constant, in
which case that bound is a kind of null bound (the comments should
probably spell this out).

> When running the initial bug (the for loop), you will find that the
> analyzer
> prints "UNKNOWN" twice for the postfix operator, and "TRUE" "UNKNOWN"
> for other
> operators. This patch reduces postfix to the same state as the other
> operators.
> The second "UNKNOWN" seems to be due to a second "iterated" pass
> through the
> loop with a widening_svalue. A full fix of the bug will need a
> handler for the
> widening svalue, and much more critically, a correct merge of the
> constraints
> at the loop entrance. 

Sounds correct to me.

> That, unfortunately, looks to be a hard problem.

I think it's worth cleaning up this patch and getting this into trunk,
and leave the second part as a followup.

> This patch fixes a few xfails as noted in the commit message. These
> were tests
> that were evidently devised to test whether the analyzer would
> understand
> arithmetic being done on constrained values. Addition and subtraction
> is now
> working as expected, a handler for multiplication and division can be
> added.
> 
> As was noted in those test files, consideration at some point should
> be given to
> overflow.

Indeed.  I think the patch needs to take that into account when
updating bounds in eval_condition.

Various comments inline below throughout.


> commit d4052e8c273ca267f6dcf782084d60acfc50a609
> Author: Brian Sobulefsky <brian.sobulefsky@protonmail.com>
> Date:   Sat Feb 27 00:36:40 2021 -0800
> 
>     Changes to support eventual solution to bug PR analyzer/94362. This bug
>     originated with a false positive null dereference during compilation of
>     openssl. The bug is in effect caused by an error in constraint handling,
>     specifically that within the for block:
>     
>     for ( ; i-- > 0; )
>       {
>       }
>     
>     the constraint_manager should know i >= 0 but does not. A reduced form of
>     this bug was found where the constraint manager did not know within the if
>     block:
>     
>     if (i-- > 0)
>       {
>       }
>     
>     that i >= 0. This latter error was only present for the postfix
>     operators, and not for other forms, like --i > 0. It was due to the
>     constraint being set for the initial_svalue associated with i, but a
>     binop_svalue being what entered the if block for which no constraint
>     rules existed.
>     
>     By adding handling logic for a binop_svalue that adds or
>     subtracts a constant, this problem was solved. This logic was added to
>     a new method, constraint_manager::maybe_fold_condition, with the
>     intent of eventually adding more cases there (unary_svalue and
>     widening_svalue for example). Additionally, an off by one error was
>     found in range::eval_condition that needed to be corrected to get
>     the expected result. Correction of this error was done in that
>     subroutine, resulting in no more calls to below_lower_bound and
>     above_upper_bound. As such, these functions were commented out and may
>     be removed if not needed for anything else.
>     
>     This change does not entirely fix the initial bug pr94362, but it
>     reduces the postfix operator to the same state as other operators. In the
>     case of the for loop, there appears to be an "initial pass" through the
>     loop, which the analyzer will now understand for postfix, and then an
>     "iterated pass" with a widening_svalue that the analyzer does not
>     understand for any condition found. This seems to be due to the merging
>     of constraints and is under investigation.
>     
>     While not the original intent of this change, during run of the
>     testsuite it was found that some errors that were marked xfail had been
>     corrected in the files testsuite/gcc.dg/analyzer/operations.c and
>     testsuite/gcc.dg/analyzer/params.c. These corrected errors were for
>     addition and subtraction manipulations of variables within a
>     conditional. Other manipulations, like multiply and divide, are still
>     failing as expected because no handler has been added.

Excellent.

>     
>     gcc/analyzer/ChangeLog:
>             PR analyzer/94362
>             * analyzer/constraint-manager.cc
>             (constraint_manager::eval_condition(svalue *, op, svalue *)):
>             Add call to new function maybe_fold_condition.
>             * analyzer/constraint-manager.cc
>             (constraint_manager::maybe_fold_condition): New function.
>             * analyzer/constraint-manager.cc (range::eval_condition): Update
>             logic to correct off by one error
>             * analyzer/constraint-manager.cc (range::below_lower_bound):
>             Remove function as there are no longer any calls to it.
>             * analyzer/constraint-manager.cc (range::above_upper_bound):
>             Remove function as there are no longer any calls to it.
>             * analyzer/constraint-manager.h (class constraint_manager): Add
>             new function declaration maybe_fold_condition.
>             * analyzer/constraint-manager.h (class range): Remove
>             declarations of functions no longer called.
>     
>     gcc/testsuite/ChangeLog:
>             PR analyzer/94362
>             * gcc.dg/analyzer/operations.c: remove "desired", "xfail" and
>             entire call to "bogus".
>             * gcc.dg/analyzer/params.c: remove "desired", "xfail" and
>             entire call to "bogus".
> 
> diff --git a/gcc/analyzer/constraint-manager.cc b/gcc/analyzer/constraint-manager.cc
> index 4dadd200bee..5184ead2b3f 100644
> --- a/gcc/analyzer/constraint-manager.cc
> +++ b/gcc/analyzer/constraint-manager.cc
> @@ -181,24 +181,56 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
>    if (tree single_element = copy.constrained_to_single_element ())
>      return compare_constants (single_element, op, rhs_const);
>  
> +  /* Corner case in which constrained_to_single_element will not convert the
> +     bounds to closure (i.e., a bound with only one end point) */
> +  if(copy.m_lower_bound.m_constant == NULL_TREE
> +      && copy.m_upper_bound.m_constant != NULL_TREE
> +      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_upper_bound.m_constant)))
> +    copy.m_upper_bound.ensure_closed(true);
> +  if(copy.m_upper_bound.m_constant == NULL_TREE
> +      && copy.m_lower_bound.m_constant != NULL_TREE
> +      && INTEGRAL_TYPE_P (TREE_TYPE (copy.m_lower_bound.m_constant)))
> +    copy.m_lower_bound.ensure_closed(false);
> +

Maybe it would be cleaner to invent a bound::canonicalize method, with
something like:

/* If this bound has an integral constant, put it into closed form.
   Return true if it does have such an integer constant.  */

bool bound::canonicalize (bool is_upper)
{
  if (m_constant == NULL_TREE)
    return false;

  if (!INTEGRAL_TYPE_P (TREE_TYPE (m_constant)))
    return false;

  /* Set bounds accordingly.  */
  ensure_closed (is_upper);
  return true;
}

and convert the above into:

   copy.m_upper_bound.canonicalize (true);
   copy.m_lower_bound.canonicalize (false);

and maybe even have a range::canonicalize method to do this.


>    switch (op)
>      {
> +
>      case EQ_EXPR:
> -      if (below_lower_bound (rhs_const))
> +      if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LT_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())

The patch breaks out the various cases, with lots of tests that seem to
be of the form:

  if (copy.ONE_OF_THE_BOUNDS.m_constant
      && compare_constants (rhs_const, SOME_COMPARISON_EXPR,
         copy.THE_BOUND.m_constant).is_true())
  
I think the code could be neater by making this a method of class
bound, giving something like:

  if (copy.m_lower_bound.comparison_p (rhs_const, LT_EXPR))

with something like:

bool
bound::comparison_p (tree cst, enum tree_code op)
{
  if (m_constant)
     return false;
  tristate ts = compare_constants (cst, op, m_constant);
  return ts.is_true ();
}

though I'm not sure exactly what the name should be and the ordering of
the args (since this expresses the sense of the comparison).


>  	return tristate (tristate::TS_FALSE);
> -      if (above_upper_bound (rhs_const))
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GT_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
>  	return tristate (tristate::TS_FALSE);
>        break;
>  
>      case LT_EXPR:
> -    case LE_EXPR:
> -      /* Qn: "X </<= RHS_CONST".  */
> +     /* Qn: "X </<= RHS_CONST".  */

Looks like you're splitting up the LT and LE cases, so the comment
needs updating (and copying/updating to the other case).

>        /* If RHS_CONST > upper bound, then it's true.
>  	 If RHS_CONST < lower bound, then it's false.
>  	 Otherwise unknown.  */
> -      if (above_upper_bound (rhs_const))
> +
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GT_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
> +	return tristate (tristate::TS_TRUE);
> +      else if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LE_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())
> +	return tristate (tristate::TS_FALSE);
> +      break;
> +
> +    case LE_EXPR:
> +
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GE_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
>  	return tristate (tristate::TS_TRUE);
> -      if (below_lower_bound (rhs_const))
> +      else if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LT_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())
>  	return tristate (tristate::TS_FALSE);
>        break;
>  
> @@ -207,18 +239,37 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
>        /* If RHS_CONST < lower bound, then it's true.
>  	 If RHS_CONST > upper bound, then it's false.
>  	 Otherwise unknown.  */
> -      if (below_lower_bound (rhs_const))
> +      if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LT_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())
>  	return tristate (tristate::TS_TRUE);
> -      if (above_upper_bound (rhs_const))
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GT_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
>  	return tristate (tristate::TS_TRUE);
>        break;
>  
>      case GE_EXPR:
> +
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GT_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
> +	return tristate (tristate::TS_FALSE);
> +      else if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LE_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())
> +	return tristate (tristate::TS_TRUE);
> +      break;
> +
>      case GT_EXPR:
> -      /* Qn: "X >=/> RHS_CONST".  */
> -      if (above_upper_bound (rhs_const))
> +
> +      if (copy.m_upper_bound.m_constant
> +	  && compare_constants (rhs_const, GE_EXPR,
> +		copy.m_upper_bound.m_constant).is_true())
>  	return tristate (tristate::TS_FALSE);
> -      if (below_lower_bound (rhs_const))
> +      else if (copy.m_lower_bound.m_constant
> +	       && compare_constants (rhs_const, LT_EXPR,
> +		     copy.m_lower_bound.m_constant).is_true())
>  	return tristate (tristate::TS_TRUE);
>        break;
>  
> @@ -229,9 +280,15 @@ range::eval_condition (enum tree_code op, tree rhs_const) const
>    return tristate (tristate::TS_UNKNOWN);
>  }
>  
> +
> +/* Since updating range::eval_condition to fix the off by one errors, these two
> +   subroutines are no longer called anywhere else so they are commented out to
> +   suppress any compiler warnings. If they will not be used anymore, it can be
> +   completely removed. */

FWIW I find patches easier to read if you delete the unused code rather
than merely commenting it out.

I think these did only get used by range::eval_condition, and the
approach you have above seems better, so I'm happy for them to go.

>  /* Return true if RHS_CONST is below the lower bound of this range.
*/
>  
> -bool
> +/*bool
>  range::below_lower_bound (tree rhs_const) const
>  {
>    if (!m_lower_bound.m_constant)
> @@ -240,11 +297,11 @@ range::below_lower_bound (tree rhs_const) const
>    return compare_constants (rhs_const,
>  			    m_lower_bound.m_closed ? LT_EXPR : LE_EXPR,
>  			    m_lower_bound.m_constant).is_true ();
> -}
> +}*/

[...]

> @@ -1499,6 +1556,167 @@ constraint_manager::eval_condition (const svalue *lhs,
>  	}
>      }
>  
> +  /* Try folding the condition with any sub svalues */
> +  {
> +    tristate result_for_folding = maybe_fold_condition (lhs, op, rhs);
> +    if (result_for_folding.is_known ())
> +      return result_for_folding;
> +  }
> +
> +  return tristate (tristate::TS_UNKNOWN);
> +}
> +
> +/* Method to put logic for folding conditions. Currently contains logic for
> +   folding a binop_svalue where one arg is a constant and the other side of
> +   the condition is a constant. Inspiration was to handle the conditional
> +   if (i-- > 0) __analyzer_eval (i >= 0);
> +   
> +   Other cases may be added, including perhaps logic to handle unaryop_svalue
> +   and widening_svalue. */

Does this imply some kind of canonicalization of the equivalence classes?

If we have e.g. (x + 1) < 30

do we expect to store this as x < 29, or as (x + 1) < 30, and to fold
conditions each time?


> +tristate
> +constraint_manager::maybe_fold_condition (const svalue *lhs,
> +				    enum tree_code op,
> +				    const svalue *rhs) const
> +{
> +
> +  if (const binop_svalue *binop_sval = lhs->dyn_cast_binop_svalue ())
> +    {
> +      const constant_svalue *cst_arg;
> +      const svalue *other_arg = NULL;
> +      const constant_svalue *cst_cond;
> +      const svalue *new_cond = NULL;
> +      tree_code bin_op;
> +      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))

FWIW I prefer putting the decl into the "if" stmt to minimize the
scope, though it's usually not necessary to do
dyn_cast_constant_svalue; it's usually simpler to have:

         if (tree cst_arg = binop_sval->get_arg0 ()->maybe_get_constant
())

which effectively does the dyn_cast and saves the "->get_constant ()"
calls below.


> +	{
> +	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))

Similarly here; this would be simpler as:

          if (tree cst_cond = rhs->maybe_get_constant ())


> +	    {
> +	      other_arg = binop_sval->get_arg0 ();
> +	      bin_op = binop_sval->get_op();
> +	      /* (CST_ARG +/- OTHER_ARG) OP CST_COND ==>
> +	           +: OTHER_ARG OP (CST_COND - CST_ARG)
> +		   -: (CST_ARG - CST_COND) OP OTHER_ARG */
> +	      if(bin_op == PLUS_EXPR)
> +		{
> +		  new_cond
> +		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +				    MINUS_EXPR, TREE_TYPE (
> +				      cst_cond->get_constant ()),
> +				    cst_cond->get_constant (),
> +				    cst_arg->get_constant ()));
> +		  return eval_condition (other_arg, op, new_cond);
> +		}
> +	      else if(bin_op == MINUS_EXPR)

Make this a switch statement to minimize churn (it sounds like we
expect to add more tree codes).

> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  MINUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_arg->get_constant(),
> +					  cst_cond->get_constant()));
> +		  return eval_condition (new_cond, op, other_arg);
> +		}
> +	    }
> +	}
> +      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
> +	{
> +	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
> +	    {
> +	      other_arg = binop_sval->get_arg0 ();
> +	      bin_op = binop_sval->get_op();
> +	      /* (OTHER_ARG +/- CST_ARG) OP CST_COND ==>
> +	         OTHER_ARG OP (CST_COND -/+ CST_ARG) */

I confess my eyes started glazing over at the various cases here; I
found it easier once I started thinking in terms of concrete examples
e.g.
"x + 1 < 100" ==> "x < (100 - 1)" ==> "x < 99"

I haven't yet fully thought through what we should do about overflow.

> +	      if(bin_op == PLUS_EXPR)
> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  MINUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_cond->get_constant(),
> +					  cst_arg->get_constant()));
> +		}
> +	      else if(bin_op == MINUS_EXPR)
> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  PLUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_cond->get_constant(),
> +					  cst_arg->get_constant()));
> +		}
> +	      if (other_arg && new_cond){

I think other_arg is guaranteed to be non-NULL, so I think these cases
would all become something resembling:

   case SOME_EXPR:
      new_cond
	= m_mgr->get_or_create_constant_svalue (fold_build2 (
                                                OPPOSITE_EXPR, TREE_TYPE (
                                                cst_cond->get_constant ()),
						cst_cond->get_constant(),
						cst_arg->get_constant()));
      return eval_condition (other_arg, op, new_cont);

Given that, is it possible to introduce a subroutine to reduce the repetition?


> +		return eval_condition (other_arg, op, new_cond);
> +	      }
> +
> +	    }
> +	} 
> +    }
> +  else if (const binop_svalue *binop_sval = rhs->dyn_cast_binop_svalue ())
> +    {
> +      const constant_svalue *cst_arg;
> +      const svalue *other_arg = NULL;
> +      const constant_svalue *cst_cond;
> +      const svalue *new_cond = NULL;
> +      tree_code bin_op;
> +      if ((cst_arg = binop_sval->get_arg0 ()->dyn_cast_constant_svalue ()))
> +	{
> +	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
> +	    {
> +	      other_arg = binop_sval->get_arg0 ();
> +	      bin_op = binop_sval->get_op();
> +	      /* CST_COND OP (CST_ARG +/- OTHER_ARG) ==>
> +	         +: (CST_COND - CST_ARG) OP OTHER_ARG
> +		 -: OTHER_ARG OP (CST_ARG - CST_COND) */
> +	      if(bin_op == PLUS_EXPR)
> +		{
> +		  new_cond
> +		      = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +				    MINUS_EXPR, TREE_TYPE (
> +				      cst_cond->get_constant ()),
> +				    cst_cond->get_constant (),
> +				    cst_arg->get_constant ()));
> +		  return eval_condition (new_cond, op, other_arg);
> +		}
> +	      else if(bin_op == MINUS_EXPR)
> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  MINUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_arg->get_constant(),
> +					  cst_cond->get_constant()));
> +		  return eval_condition (other_arg, op, new_cond);
> +		}
> +
> +
> +	    }
> +	}
> +      else if((cst_arg = binop_sval->get_arg1 ()->dyn_cast_constant_svalue ()))
> +	{
> +	  if((cst_cond = rhs->dyn_cast_constant_svalue ()))
> +	    {
> +	      other_arg = binop_sval->get_arg0 ();
> +	      bin_op = binop_sval->get_op();
> +	      /* CST_COND OP (OTHER_ARG +/- CST_ARG) ==>
> +	         (CST_COND -/+ CST_ARG) OP OTHER_ARG */
> +	      if(bin_op == PLUS_EXPR)
> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  MINUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_cond->get_constant(),
> +					  cst_arg->get_constant()));
> +		}
> +	      else if(bin_op == MINUS_EXPR)
> +		{
> +		  new_cond = m_mgr->get_or_create_constant_svalue (fold_build2 (
> +					  PLUS_EXPR, TREE_TYPE (
> +					    cst_cond->get_constant ()),
> +					  cst_cond->get_constant(),
> +					  cst_arg->get_constant()));
> +		}
> +	      if (other_arg && new_cond)
> +		return eval_condition (new_cond, op, other_arg);
> +
> +	    }
> +	} 
> +    }
>    return tristate (tristate::TS_UNKNOWN);
>  }

There are a lot of cases here with fiddly almost-identical logic; I
think we need some more DejaGnu tests of the form you suggested in your
cover letter to try to exhaustively cover these cases.

(I started wondering if selftests could be another way to achieve test
coverage here, if it makes it any easier to cover the various cases -
but I suspect that DejaGnu is easier)


> diff --git a/gcc/analyzer/constraint-manager.h b/gcc/analyzer/constraint-manager.h
> index 3173610a723..9017a88ecac 100644
> --- a/gcc/analyzer/constraint-manager.h
> +++ b/gcc/analyzer/constraint-manager.h
> @@ -57,8 +57,10 @@ struct range
>  
>    tristate eval_condition (enum tree_code op,
>  			   tree rhs_const) const;
> -  bool below_lower_bound (tree rhs_const) const;
> -  bool above_upper_bound (tree rhs_const) const;
> +
> +  /* Commented out because no longer called from anywhere in program */
> +  //bool below_lower_bound (tree rhs_const) const;
> +  //bool above_upper_bound (tree rhs_const) const;

Delete rather than comment out.

>    bound m_lower_bound;
>    bound m_upper_bound;
> @@ -284,6 +286,9 @@ public:
>    auto_vec<constraint> m_constraints;
>  
>   private:
> +  tristate maybe_fold_condition (const svalue *lhs,
> +			   	  enum tree_code op,
> +			   	  const svalue *rhs) const;
>    void add_constraint_internal (equiv_class_id lhs_id,
>  				enum constraint_op c_op,
>  				equiv_class_id rhs_id);
> diff --git a/gcc/testsuite/gcc.dg/analyzer/operations.c b/gcc/testsuite/gcc.dg/analyzer/operations.c
> index 79e76bccc66..87b3703a3d9 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/operations.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/operations.c
> @@ -9,14 +9,14 @@ void test (int i, int j)
>  
>      i += 3;
>  
> -    __analyzer_eval (i > 45); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
> -    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
> +    __analyzer_eval (i > 45); /* { dg-warning "TRUE" } */
> + 
>      /* TODO(xfail): do we really know this?  what about overflow?  */
>  
>      i -= 1;
>  
> -    __analyzer_eval (i > 44); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
> -    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
> +    __analyzer_eval (i > 44); /* { dg-warning "TRUE" } */
> +
>      /* TODO(xfail): do we really know this?  what about overflow?  */
>  
>      i = 3 * i;
> diff --git a/gcc/testsuite/gcc.dg/analyzer/params.c b/gcc/testsuite/gcc.dg/analyzer/params.c
> index 12bba70d6e4..a1c443b39eb 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/params.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/params.c
> @@ -8,8 +8,8 @@ static int __analyzer_called_function(int j)
>  
>    k = j - 1;
>  
> -  __analyzer_eval (k > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
> -  /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
> +  __analyzer_eval (k > 3); /* { dg-warning "TRUE" } */
> +
>    /* TODO(xfail): we're not then updating based on the assignment.  */
>  
>    return k;
> @@ -25,8 +25,8 @@ void test(int i)
>  
>      i = __analyzer_called_function(i);
>  
> -    __analyzer_eval (i > 3); /* { dg-warning "TRUE" "desired" { xfail *-*-* } } */
> -    /* { dg-warning "UNKNOWN" "status quo" { target *-*-* } .-1 } */
> +    __analyzer_eval (i > 3); /* { dg-warning "TRUE" } */
> +
>      /* TODO(xfail): we're not updating from the returned value.  */
>    }

It's great to see these xfails fixed.

Thanks again for the patch; hope this is constructive
Dave



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-02  2:30 ` David Malcolm
@ 2021-03-02  7:09   ` brian.sobulefsky
  2021-03-02 19:58     ` David Malcolm
  0 siblings, 1 reply; 9+ messages in thread
From: brian.sobulefsky @ 2021-03-02  7:09 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,

It may not be worth altering at this point, but it seems like it would leave less
bugs open if all the constraints go in as "closed" ranges and all evals are
translated to closed intervals. So, if (idx > 0) and if (idx >= 1) are the same
constraint. I know this won't be an option for eventual float support, but
that is a different can of worms. For integers, you can fix it at those entry points
and then all other subroutines can ignore the issue of open or closed ranges.

I fully understand the eye glaze and did not want to have to write it that
way. I am thinking if there is a cleaner way to do it. Anyway, that is why I
put a comment in each case to derive the result. This issue of "sides of the
condition" and "inverted operator" as you call it in some places is a recurring
theme. It is especially irritating when we lose commutativity, as we do with MINUS.

Adding logic in my subroutine for MULT or DIV is not hard, handling overflow
is a bit more involved. At the very least, we would need to know what the max or min
of a particular variable is, which might be in the type tree. We would also need to
define how we want to handle the issue.

The problem is (and I have been thinking about this a lot in terms of constraint
merging), there are currently no "or constraints," which would be helpful in merging too.
So for overflow, when you have something like

if (idx > 0)
 {
  idx += num;
  __analyzer_eval(idx > num);
 }

you have gone from a single constraint (idx > 0), to an "or condition"
(idx > num || idx < MIN_INT + num). The only solution now, other than ignoring
overflow as a bug that is tolerated, is to just pass it off as unhandled (and
therefore UNKNOWN). Perhaps you may want to add overflow as one of your analyzer
warnings if it cannot be ruled out?

I did try to run a test with a simple || in the condition before just to see what
would happen, and as you probably know it is not handled at all. I did not watch
in gdb, but it is obvious from constraint-manager.cc that there is nothing to handle
it. I think I actually did an __analyzer_eval() of the same || condition verbatim
that was in the if() conditional and still got an UNKNOWN.

It is a pretty intrusive change to add logic for that, which is why I have not
done any work on it yet. Without the concept of "or" I don't see how we could
handle overflow, but maybe you don't really want to handle it anyway, but
rather just emit a warning if it might be considered poor practice to rely
on something that is technically machine dependent anyway.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-02  7:09   ` brian.sobulefsky
@ 2021-03-02 19:58     ` David Malcolm
  2021-03-02 23:14       ` brian.sobulefsky
  0 siblings, 1 reply; 9+ messages in thread
From: David Malcolm @ 2021-03-02 19:58 UTC (permalink / raw)
  To: brian.sobulefsky; +Cc: gcc-patches

On Tue, 2021-03-02 at 07:09 +0000, brian.sobulefsky wrote:
> Hi,
> 
> It may not be worth altering at this point, but it seems like it
> would leave less
> bugs open if all the constraints go in as "closed" ranges and all
> evals are
> translated to closed intervals. So, if (idx > 0) and if (idx >= 1)
> are the same
> constraint. I know this won't be an option for eventual float
> support, but
> that is a different can of worms. For integers, you can fix it at
> those entry points
> and then all other subroutines can ignore the issue of open or closed
> ranges.
> 
> I fully understand the eye glaze and did not want to have to write it
> that
> way. I am thinking if there is a cleaner way to do it. Anyway, that
> is why I
> put a comment in each case to derive the result. This issue of "sides
> of the
> condition" and "inverted operator" as you call it in some places is a
> recurring
> theme. It is especially irritating when we lose commutativity, as we
> do with MINUS.
> 
> Adding logic in my subroutine for MULT or DIV is not hard, handling
> overflow
> is a bit more involved. At the very least, we would need to know what
> the max or min
> of a particular variable is, which might be in the type tree. We
> would also need to
> define how we want to handle the issue.
> 
> The problem is (and I have been thinking about this a lot in terms of
> constraint
> merging), there are currently no "or constraints," which would be
> helpful in merging too.
> So for overflow, when you have something like
> 
> if (idx > 0)
>  {
>   idx += num;
>   __analyzer_eval(idx > num);
>  }
> 
> you have gone from a single constraint (idx > 0), to an "or
> condition"
> (idx > num || idx < MIN_INT + num). The only solution now, other than
> ignoring
> overflow as a bug that is tolerated, is to just pass it off as
> unhandled (and
> therefore UNKNOWN). Perhaps you may want to add overflow as one of
> your analyzer
> warnings if it cannot be ruled out?
> 
> I did try to run a test with a simple || in the condition before just
> to see what
> would happen, and as you probably know it is not handled at all. I
> did not watch
> in gdb, but it is obvious from constraint-manager.cc that there is
> nothing to handle
> it. I think I actually did an __analyzer_eval() of the same ||
> condition verbatim
> that was in the if() conditional and still got an UNKNOWN.
> 
> It is a pretty intrusive change to add logic for that, which is why I
> have not
> done any work on it yet. Without the concept of "or" I don't see how
> we could
> handle overflow, but maybe you don't really want to handle it anyway,
> but
> rather just emit a warning if it might be considered poor practice to
> rely
> on something that is technically machine dependent anyway.

I'm not sure how we should handle this.

One approach would be to generalize the constraint-handling so that we
store a set of "facts" about svalues, where the facts are themselves
svalues that are known to be true.

Hence we could build an svalue for the TRUTH_OR_EXPR
  (idx > num || idx < MIN_INT + num)
and store that as a fact within the constraint manager.

But that would be a big rewrite.

Somehow the constraint manager needs to be able to evaluate queries
w.r.t known facts, and probably canonicalize sets of facts, and handle
mergers.

IIRC, the clang analyzer works by exposing an "add fact" interface
(where the facts can be compound symbolic expressions), but
implementing things internally with a choice of either a set of ranges,
or a Z3-backed solver.


If we're considering overflow, I would like to -fanalyzer to eventually
support bounds checking, and e.g. detecting attacks due to buffer size
overflows (CWE-131 due to CWE-190) e.g. in this code that probably
should have used calloc:

struct foo * make_arr (size_t n)
{
  size_t sz = sizeof (struct foo) * n;
  struct foo *p = (struct foo *)malloc (sz);
  if (!p)
     return;
  memset (p, 0, sz);
  return p;
}

void test (size_t n)
{
   struct foo *f = make_arr (n);
   if (!f)
     return;
   for (i = 0; i < n; i++)
     {
       //... do stuff with f[i]
     }
}

it would be good to detect the case when sz overflows and thus the
array is smaller than expected.  I think this could work be recording
that the allocated size of *p (== *f) is
   (size_t)(sizeof (struct foo) * n)

In the loop, the access to f[i] could bifurcate the egraph into:

  outcome A: (sizeof (struct foo) * i) < allocated_size
    (carry on, and have this recorded so no further checking needed on
this path)

  outcome B: (sizeof (struct foo) * i) >= allocated_size
    (complain about buffer overflow and stop this path)

Outcome B thus occurs when:
  (sizeof (struct foo) * i) >= (size_t)(sizeof (struct foo) * n)
  && (i < n)
so say sizeof (struct foo) == 16
  (16 * i) <= (size_t) (16 * n)
  && (i < n)
and we could then (somehow) show that this can happen e.g. for
  n > (SIZE_MAX / 16).
(and I've probably messed up at least some of the logic in the above)

Obviously this would be huge scope creep.

I wonder if it's worth simply fixing the increment/decrement case for
now (though I think we're still affected by overflow, maybe punting on
that), and saving the more complicated cases for a rewrite that can
handle more general boolean symbolic expressions.

Thoughts?
Dave


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-02 19:58     ` David Malcolm
@ 2021-03-02 23:14       ` brian.sobulefsky
  2021-03-03  1:40         ` David Malcolm
  0 siblings, 1 reply; 9+ messages in thread
From: brian.sobulefsky @ 2021-03-02 23:14 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

I have been kicking these sorts of ideas around ever since I came to understand that
the second "UNKNOWN" in the for loop that originally started this was due to the state
merge as we loop. For now, and I don't mean this disrespectfully because it is very
hard to get right, the whole issue of merging has basically been punted, given some
of the simple examples we found that will merge as an unknown svalue. As you think
about this issue, "scope creep" becomes a concern quickly. It  quickly turns into
a halting problem of sorts, you have to decide how much of you want the analyzer to
be able to "understand" a program. For example, any human can follow this:

sum = 0;
for (idx = 1; idx <= 10; idx++) sum += idx;
__analyzer_eval (sum == 55);

but from an analyzer perspective it opens up all sorts of questions and
becomes a bit of a PhD thesis as to where you draw the line. The biggest concern
with the analyzer seems to be vulnerabilities, so I doubt it is useful to get the
analyzer to produce the correct answer for the above code, although it might be
interesting to do so from an academic perspective.

The example you provided gives a concrete reason that overflow should not be a
complete "punt" and I like it. In the interest of fighting scope creep and keeping
things manageable, I would question whether you want to actually track the overflow /
no overflow cases separately or just raise any possible overflow as an error immediately.
I am not disputing your idea, I would prefer to track the overflow and get
a correct result (in this case, an under allocation of memory). I guess I would want
to know how much work you think that will be. You still know the codebase a lot better
than I do.

My devil's advocate position would be if the analyzer raises exception on
any possible overflow, will that overwhelm the user with false positives? I
am not sure of the answer here, because a piece of me feels that overflow is not
something that production code should be relying on in any serious application,
and so should be non existent, but I am not sure if that is reflective of
reality.

The simplest way to handle your example would be the following:

struct foo * make_arr (size_t n)
{
  if (MAX_INT / sizeof (struct foo) >= n)
    return NULL;
  //continue with what you wrote
}

This should add a constraint that downstream of the initial guard, n is small
enough to prevent overflow (I would have to check, but the current analyzer
should be close to doing this if not already correct). Therefore, all we would need
would be the check at the definition of sz as to whether it overflows, and that
check should come back negative in my example, unknown in yours. Assuming we agree
that the purpose of the analyzer is to prevent vulnerabilities, and not to provide
an academic exercise in seeing how close we get to solving the halting problem,
we could just treat any possible overflow as an error.

As this was fundamentally your project, I don't want to tell you how to do it, and
the nerd in me wants to build an analyzer that can answer all the silly puzzle code
I can think to feed it, but from a utilitarian perspective and given everyone's
limited time, I thought I would offer this as a path you can try.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-02 23:14       ` brian.sobulefsky
@ 2021-03-03  1:40         ` David Malcolm
  2021-03-03  2:09           ` Jeff Law
  2021-03-03  2:26           ` brian.sobulefsky
  0 siblings, 2 replies; 9+ messages in thread
From: David Malcolm @ 2021-03-03  1:40 UTC (permalink / raw)
  To: brian.sobulefsky; +Cc: gcc-patches

On Tue, 2021-03-02 at 23:14 +0000, brian.sobulefsky wrote:
> I have been kicking these sorts of ideas around ever since I came to
> understand that
> the second "UNKNOWN" in the for loop that originally started this was
> due to the state
> merge as we loop. For now, and I don't mean this disrespectfully
> because it is very
> hard to get right, the whole issue of merging has basically been
> punted, given some
> of the simple examples we found that will merge as an unknown svalue.
> As you think
> about this issue, "scope creep" becomes a concern quickly. It 
> quickly turns into
> a halting problem of sorts, you have to decide how much of you want
> the analyzer to
> be able to "understand" a program. For example, any human can follow
> this:
> 
> sum = 0;
> for (idx = 1; idx <= 10; idx++) sum += idx;
> __analyzer_eval (sum == 55);
> 
> but from an analyzer perspective it opens up all sorts of questions
> and
> becomes a bit of a PhD thesis as to where you draw the line. 

Challenge accepted!  FWIW with suitable options, the analyzer can
actually "figure this out":

$ cat ../../src/t.c
extern void __analyzer_eval (int);

void test (void)
{
 int sum = 0;
 for (int idx = 1; idx <= 10; idx++)
   sum += idx;
 __analyzer_eval (sum == 55);
}

$ ./xgcc -B. -S -fanalyzer ../../src/t.c \
    -Wanalyzer-too-complex \
    -fno-analyzer-state-merge \
    --param analyzer-max-enodes-per-program-point=11
../../src/t.c: In function ‘test’:
../../src/t.c:8:2: warning: TRUE
    8 |  __analyzer_eval (sum == 55);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~

i.e. with -fno-analyzer-state-merge to disable state merging, 
and increasing the enode limit so that the analyzer effectively fully
unrolls the loop when exploring the exploded_graph.

But obviously this isn't particularly useful, except as a demo of the
internals of the analyzer.


> The biggest concern
> with the analyzer seems to be vulnerabilities, so I doubt it is
> useful to get the
> analyzer to produce the correct answer for the above code, although
> it might be
> interesting to do so from an academic perspective.

Indeed - security vulnerabilities are my highest priority (making it
easier to avoid them as code is written/patched, and to find them in
existing code).

> The example you provided gives a concrete reason that overflow should
> not be a
> complete "punt" and I like it. In the interest of fighting scope
> creep and keeping
> things manageable, I would question whether you want to actually
> track the overflow /
> no overflow cases separately or just raise any possible overflow as
> an error immediately.
> I am not disputing your idea, I would prefer to track the overflow
> and get
> a correct result (in this case, an under allocation of memory). I
> guess I would want
> to know how much work you think that will be. You still know the
> codebase a lot better
> than I do.

Brainstorming somewhat, another idea I have for handling overflow (as
opposed to the || idea you mentioned) might be to bifurcate state at
each point where overflow could occur, splitting the path into "didn't
overflow" and "did overflow" outcomes, adding conditions to each
successor state accordingly.

But maybe that would lead to a combinatorial explosion of nodes (unless
it can be tamed by merging?)

(Unfortunately, we can currently only split states at CFG splits, not
at arbitrary statements; see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99260 )

> My devil's advocate position would be if the analyzer raises
> exception on
> any possible overflow, will that overwhelm the user with false
> positives?

Presumably by "raise exception" you mean "issue a diagnostic and stop
analyzing this path", right?

I think the point is to detect where numerical overflow can lead to
e.g. a buffer overflow, rather than complain about numerical overflow
in its own right, like in the make_arr example I gave earlier.

>  I
> am not sure of the answer here, because a piece of me feels that
> overflow is not
> something that production code should be relying on in any serious
> application,
> and so should be non existent, but I am not sure if that is
> reflective of
> reality.

My belief is that production code is full of overflows, but only some
of them are security-sensitive.  Consider e.g. hashing algorithms that
sum some values and beningly assume overflow for wraparound as opposed
to the "calculate the size of the buffer to be allocated" example
(where the overflow is a classic security pitfall).

> The simplest way to handle your example would be the following:
> 
> struct foo * make_arr (size_t n)
> {
>   if (MAX_INT / sizeof (struct foo) >= n)
>     return NULL;
>   //continue with what you wrote
> }

Ideally we'd emit a fix-it hint suggesting adding such a clause (I'm
kidding, but only partly).

> This should add a constraint that downstream of the initial guard, n
> is small
> enough to prevent overflow (I would have to check, but the current
> analyzer
> should be close to doing this if not already correct). Therefore, all
> we would need
> would be the check at the definition of sz as to whether it
> overflows, and that
> check should come back negative in my example, unknown in yours.
> Assuming we agree
> that the purpose of the analyzer is to prevent vulnerabilities, and
> not to provide
> an academic exercise in seeing how close we get to solving the
> halting problem,
> we could just treat any possible overflow as an error.

Brainstorming again: perhaps a state-machine that flags svalues that
could have overflowed, and complain about "possible-overflowed" svalues
that reach a sink like "malloc"?

> As this was fundamentally your project, I don't want to tell you how
> to do it, and
> the nerd in me wants to build an analyzer that can answer all the
> silly puzzle code
> I can think to feed it, but from a utilitarian perspective and given
> everyone's
> limited time, I thought I would offer this as a path you can try.

Thanks.  As noted above, I want the analyzer to primarily be a
practical tool to help stop security vulnerabilities (though it's fun
to play with puzzle code, too, but pragmatically I want to build
something that helps C/C++ developers).

Dave



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-03  1:40         ` David Malcolm
@ 2021-03-03  2:09           ` Jeff Law
  2021-03-03  2:29             ` brian.sobulefsky
  2021-03-03  2:26           ` brian.sobulefsky
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff Law @ 2021-03-03  2:09 UTC (permalink / raw)
  To: David Malcolm, brian.sobulefsky; +Cc: gcc-patches



On 3/2/21 6:40 PM, David Malcolm via Gcc-patches wrote:
>
>> My devil's advocate position would be if the analyzer raises
>> exception on
>> any possible overflow, will that overwhelm the user with false
>> positives?
> Presumably by "raise exception" you mean "issue a diagnostic and stop
> analyzing this path", right?
>
> I think the point is to detect where numerical overflow can lead to
> e.g. a buffer overflow, rather than complain about numerical overflow
> in its own right, like in the make_arr example I gave earlier.
WRT overflow, IMHO, the most valuable case to detect overflows is when
they feed an allocation via malloc/alloca.  If an attacker can arrange
to overflow the size computation, then they can cause an
under-allocation which in turn opens the ability to over-write the stack
or heap data structures, which in turn are great attack vectors.

And in case you think that's contrived, it isn't :-)

http://phrack.org/issues/67/9.html

>>  I
>> am not sure of the answer here, because a piece of me feels that
>> overflow is not
>> something that production code should be relying on in any serious
>> application,
>> and so should be non existent, but I am not sure if that is
>> reflective of
>> reality.
> My belief is that production code is full of overflows, but only some
> of them are security-sensitive.  Consider e.g. hashing algorithms that
> sum some values and beningly assume overflow for wraparound as opposed
> to the "calculate the size of the buffer to be allocated" example
> (where the overflow is a classic security pitfall).
Agreed.


Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-03  1:40         ` David Malcolm
  2021-03-03  2:09           ` Jeff Law
@ 2021-03-03  2:26           ` brian.sobulefsky
  1 sibling, 0 replies; 9+ messages in thread
From: brian.sobulefsky @ 2021-03-03  2:26 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Wow! I wasn't expecting that to work. Obviously we know that  there is currently
no handler for binop_svalue in the constraints so I would have to watch it run with
state merging disabled to see how it is managing the unroll. The fact that merging
breaks it is indicative of what we are saying though, that the constraint and
merge model is currently insufficient.

Hash algorithms may provide a counterexample for legitimate use of overflow. Anyway,
I would prefer tracking the split too, but it is a big change. One way is state
split, like you say, but that is a pretty invasive change to the way the graph works.
The other way is to handle "or constraints" as I have said. This is an invasive
change to the constraint model, but arguably the concept of "or" cannot be ignored
forever.

Thinking about it, I guess currently the concept of "and" is handled by the
constraints (all constraints in a model exist as a big "and") and the concept of
"or" is handled by the graph. This could be acceptable but we cannot split the
graph arbitrarily, so there is currently no way to handle even a basic

if (i == 1 || i == 10)

what should be a very simple conditional. Handling a hash algorithm, like
you say, is good to keep in mind, because we don't want an explosion of
possibilities. If done right, the analyzer should understand for hashing that
anything + anything => anything, and we see no explosion of state.

By "raise an exception" I did mean issue an analyzer warning, yes. Perhaps the
simple answer is to just track the svalue as a possible overflow in the state
machine and report the warning for certain uses, like the alloc family, as you
said. Regardless, proper overflow handling renders my naive binop handler unusable,
because all it does is fold the condition and recur. There is basically no logic
to it, and once you reenter eval_condition it is not possible to know how you got
there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR analyzer/94362 Partial Fix
  2021-03-03  2:09           ` Jeff Law
@ 2021-03-03  2:29             ` brian.sobulefsky
  0 siblings, 0 replies; 9+ messages in thread
From: brian.sobulefsky @ 2021-03-03  2:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: David Malcolm, gcc-patches

Agreed too. Generic "error on overflow" is not an answer, and ignoring overflow
is not an answer either because flagging faulty memory allocations is an
important feature.

Brian


Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, March 2, 2021 6:09 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:

> On 3/2/21 6:40 PM, David Malcolm via Gcc-patches wrote:
>
> > > My devil's advocate position would be if the analyzer raises
> > > exception on
> > > any possible overflow, will that overwhelm the user with false
> > > positives?
> > > Presumably by "raise exception" you mean "issue a diagnostic and stop
> > > analyzing this path", right?
> >
> > I think the point is to detect where numerical overflow can lead to
> > e.g. a buffer overflow, rather than complain about numerical overflow
> > in its own right, like in the make_arr example I gave earlier.
>
> WRT overflow, IMHO, the most valuable case to detect overflows is when
> they feed an allocation via malloc/alloca.  If an attacker can arrange
> to overflow the size computation, then they can cause an
> under-allocation which in turn opens the ability to over-write the stack
> or heap data structures, which in turn are great attack vectors.
>
> And in case you think that's contrived, it isn't :-)
>
> http://phrack.org/issues/67/9.html
>
> > > I
> > > am not sure of the answer here, because a piece of me feels that
> > > overflow is not
> > > something that production code should be relying on in any serious
> > > application,
> > > and so should be non existent, but I am not sure if that is
> > > reflective of
> > > reality.
> > > My belief is that production code is full of overflows, but only some
> > > of them are security-sensitive. Consider e.g. hashing algorithms that
> > > sum some values and beningly assume overflow for wraparound as opposed
> > > to the "calculate the size of the buffer to be allocated" example
> > > (where the overflow is a classic security pitfall).
>
> Agreed.
>
> Jeff



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-03-03  2:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-27 10:04 PR analyzer/94362 Partial Fix brian.sobulefsky
2021-03-02  2:30 ` David Malcolm
2021-03-02  7:09   ` brian.sobulefsky
2021-03-02 19:58     ` David Malcolm
2021-03-02 23:14       ` brian.sobulefsky
2021-03-03  1:40         ` David Malcolm
2021-03-03  2:09           ` Jeff Law
2021-03-03  2:29             ` brian.sobulefsky
2021-03-03  2:26           ` brian.sobulefsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).