ivopts improvement

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* ivopts improvement
@ 2011-02-28 10:46 Tom de Vries
  2011-02-28 11:44 ` Paolo Bonzini
  2011-02-28 12:47 ` Zdenek Dvorak
  0 siblings, 2 replies; 31+ messages in thread
From: Tom de Vries @ 2011-02-28 10:46 UTC (permalink / raw)
  To: rakdver; +Cc: gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

Hi Zdenek,

could you please review the following patch set?

The goal is to remove the 'i' iterator from the following example, by
replacing 'i < n' with 'p < base + n'.

void
f (char *base, unsigned long int i, unsigned long int n)
{
  char *p = base + i;
  do
    {
      *p = '\0';
      p++;
      i++;
   }
  while (i < n);
}

The difficulty is that this replacement is only valid under the
guarantee that base + n does not overflow. I was not able to deduce from
the C standard that merely the fact that it's a pointer type guarantees
this, so I added analysis to prove no overflow based on the memory
access '*p' and the relation between the 2 iterators.

patch set description:
- iterator.6.1-ml.patch: maximum bounds fix.
- iterator.6.2-ml.patch: infrastructure change, no change in
  functionality. calculate the new comparison operator at the same time
  as the new bound, and store it next to the new bound in the cost pair.
- iterator.6.3-ml.patch: infrastructure change. carved out new function.
- iterator.6.4-ml.patch: patch to rewrite an exit test with a '<'
  using a different iterator.

reg-tested on x86-64, and a 4.5 version of the patch set reg-tested on
MIPS, ARM and PPC.

Thanks,
- Tom

[-- Attachment #2: iterator.6.1-ml.log --]
[-- Type: text/x-log, Size: 286 bytes --]

23-02-2011  Tom de Vries  <tom@codesourcery.com>

	gcc/
	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix boundary checks.
	* testsuite/gcc.dg/tree-ssa/ivopts-max.c: New test.
	* testsuite/gcc.dg/tree-ssa/ivopts-max-2.c: New test.
	* testsuite/gcc.dg/tree-ssa/ivopts-max-3.c: New test.

[-- Attachment #3: iterator.6.1-ml.patch --]
[-- Type: text/x-patch, Size: 1411 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(revision 162585)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4079,8 +4079,12 @@
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      /* See cand_value_at.  */
-      if (stmt_after_increment (loop, cand, use->stmt))
+      /* An iv with TYPE_PRECISION 32 has 2^32 distinct values, and an iv_period
+         of 2^32 - 1.  Such an iv can handle at most a nit of 2^32 - 1.  If the
+         exit test is after the increment, the bound calculation will overflow.
+         If overflow wraps, that's not a problem.  */
+      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
+          && stmt_after_increment (loop, cand, use->stmt))
         {
           if (!tree_int_cst_lt (nit, period))
             return false;
@@ -4100,7 +4104,9 @@
       double_int period_value, max_niter;
 
       max_niter = desc->max;
-      if (stmt_after_increment (loop, cand, use->stmt))
+      /* See comment at TREE_CODE (nit) == INTEGER_CST.  */
+      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
+          && stmt_after_increment (loop, cand, use->stmt))
         max_niter = double_int_add (max_niter, double_int_one);
       period_value = tree_to_double_int (period);
       if (double_int_ucmp (max_niter, period_value) > 0)

[-- Attachment #4: iterator.6.1-ml.test.patch --]
[-- Type: text/x-patch, Size: 3099 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-max.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-max.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-max.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon" } */
+
+/* # (exit test == false)  : 0xffffffff */
+/* iv_period (j)           : 0xffffffff */
+/* may_eliminate (j, i < 0x200000000) should return true */
+
+void
+do_while_maxuint_plus_1 (unsigned char *p)
+{
+  unsigned long int i = 0;
+  unsigned int j = 0;
+  do
+    {
+      p[j] = '\0';
+      j++;
+      i += 2;
+    }
+  while (i < 0x200000000);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI.*, j_" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "j_.* != 0" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-not "i_" "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon" } */
+
+/* # (exit test == false)  : 0x100000000*/
+/* iv_period (j)           : 0x0ffffffff */
+/* may_eliminate (j, i < 0x200000001) should return false */
+
+void
+do_while_maxuint_plus_1 (unsigned char *p)
+{
+  unsigned long int i = 0;
+  unsigned int j = 0;
+  do
+    {
+      p[j] = '\0';
+      j++;
+      i += 2;
+    }
+  while (i < 0x200000001);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI.*, i_" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI.*, j_" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
+
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-max-3.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target { lp64 } } } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-ch" } */
+
+/* # (exit test == false)  : 0x100000000 */
+/* iv_period (j)           : 0x0ffffffff */
+/* may_eliminate (j, i < 0x1ffffffff) should return false */
+
+void
+while_do_maxuint (unsigned char *p, int c)
+{
+  unsigned long int i = 0;
+  unsigned int j = 0;
+  while (i < 0x1ffffffff)
+    {
+      p[j] = '\0';
+      j++;
+      i += 2;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI.*, i_" 0 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI.*, j_" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

[-- Attachment #5: iterator.6.2-ml.log --]
[-- Type: text/x-log, Size: 615 bytes --]

23-02-2011  Tom de Vries  <tom@codesourcery.com>

	gcc/
	* tree-ssa-loop-ivopts.c (struct cost_pair): Add comp field.
	(set_use_iv_cost): Add comp parameter.
	(set_use_iv_cost): Use comp parameter to init comp field.
	(determine_use_iv_cost_generic, determine_use_iv_cost_address)
	(determine_use_iv_cost_condition): Add comp argument to set_use_iv_cost
	call.
	(may_eliminate_iv): Add comp parameter and set.
	(determine_use_iv_cost_condition): Add comp argument to may_eliminate_iv
	call.
	(determine_use_iv_cost_condition): Reset comp if too expensive.
	(rewrite_use_compare): Use compare from comp field of cp.

[-- Attachment #6: iterator.6.2-ml.patch --]
[-- Type: text/x-patch, Size: 5286 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 170268)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -170,6 +170,7 @@ struct cost_pair
   tree value;		/* For final value elimination, the expression for
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
+  enum tree_code comp;	/* For iv elimination, the comparison.  */
   int inv_expr_id;      /* Loop invariant expression id.  */
 };
 
@@ -2654,13 +2655,13 @@ infinite_cost_p (comp_cost cost)
 
 /* Sets cost of (USE, CANDIDATE) pair to COST and record that it depends
    on invariants DEPENDS_ON and that the value used in expressing it
-   is VALUE.  */
+   is VALUE, and in case of iv elimination the comparison operator is COMP.  */
 
 static void
 set_use_iv_cost (struct ivopts_data *data,
 		 struct iv_use *use, struct iv_cand *cand,
 		 comp_cost cost, bitmap depends_on, tree value,
-                 int inv_expr_id)
+		 enum tree_code comp, int inv_expr_id)
 {
   unsigned i, s;
 
@@ -2676,6 +2677,7 @@ set_use_iv_cost (struct ivopts_data *dat
       use->cost_map[cand->id].cost = cost;
       use->cost_map[cand->id].depends_on = depends_on;
       use->cost_map[cand->id].value = value;
+      use->cost_map[cand->id].comp = comp;
       use->cost_map[cand->id].inv_expr_id = inv_expr_id;
       return;
     }
@@ -2696,6 +2698,7 @@ found:
   use->cost_map[i].cost = cost;
   use->cost_map[i].depends_on = depends_on;
   use->cost_map[i].value = value;
+  use->cost_map[i].comp = comp;
   use->cost_map[i].inv_expr_id = inv_expr_id;
 }
 
@@ -4208,14 +4211,15 @@ determine_use_iv_cost_generic (struct iv
   if (cand->pos == IP_ORIGINAL
       && cand->incremented_at == use->stmt)
     {
-      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE, -1);
+      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE,
+                       ERROR_MARK, -1);
       return true;
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on,
                                NULL, &inv_expr_id);
 
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE, ERROR_MARK,
                    inv_expr_id);
 
   return !infinite_cost_p (cost);
@@ -4243,7 +4247,7 @@ determine_use_iv_cost_address (struct iv
       else if (cand->pos == IP_AFTER_USE || cand->pos == IP_BEFORE_USE)
 	cost = infinite_cost;
     }
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE, ERROR_MARK,
                    inv_expr_id);
 
   return !infinite_cost_p (cost);
@@ -4320,11 +4324,13 @@ iv_elimination_compare (struct ivopts_da
 }
 
 /* Check whether it is possible to express the condition in USE by comparison
-   of candidate CAND.  If so, store the value compared with to BOUND.  */
+   of candidate CAND.  If so, store the value compared with to BOUND, and the
+   comparison operator to COMP.  */
 
 static bool
 may_eliminate_iv (struct ivopts_data *data,
-		  struct iv_use *use, struct iv_cand *cand, tree *bound)
+		  struct iv_use *use, struct iv_cand *cand, tree *bound,
+		  enum tree_code *comp)
 {
   basic_block ex_bb;
   edge exit;
@@ -4405,6 +4411,8 @@ may_eliminate_iv (struct ivopts_data *da
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
+  *comp = iv_elimination_compare (data, use);
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))
@@ -4426,16 +4434,18 @@ determine_use_iv_cost_condition (struct
   bool ok;
   int inv_expr_id = -1;
   tree *control_var, *bound_cst;
+  enum tree_code comp;
 
   /* Only consider real candidates.  */
   if (!cand->iv)
     {
-      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE, -1);
+      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE,
+		       ERROR_MARK, -1);
       return false;
     }
 
   /* Try iv elimination.  */
-  if (may_eliminate_iv (data, use, cand, &bound))
+  if (may_eliminate_iv (data, use, cand, &bound, &comp))
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
@@ -4482,9 +4492,10 @@ determine_use_iv_cost_condition (struct
       depends_on = depends_on_express;
       depends_on_express = NULL;
       bound = NULL_TREE;
+      comp = ERROR_MARK;
     }
 
-  set_use_iv_cost (data, use, cand, cost, depends_on, bound, inv_expr_id);
+  set_use_iv_cost (data, use, cand, cost, depends_on, bound, comp, inv_expr_id);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -6119,7 +6130,7 @@ rewrite_use_compare (struct ivopts_data
           fprintf (dump_file, "Replacing exit test: ");
           print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
         }
-      compare = iv_elimination_compare (data, use);
+      compare = cp->comp;
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
       if (stmts)

[-- Attachment #7: iterator.6.3-ml.log --]
[-- Type: text/x-log, Size: 203 bytes --]

23-02-2011  Tom de Vries  <tom@codesourcery.com>

	gcc/
	* tree-ssa-loop-ivopts.c (fold_build_plus): New function factored out of
	add_autoinc_candidates.
	(add_autoinc_candidates): Use fold_build_plus.

[-- Attachment #8: iterator.6.3-ml.patch --]
[-- Type: text/x-patch, Size: 1723 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -340,6 +340,36 @@
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+/* Returns (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Handles the case that A is a pointer robustly.  */
+
+static inline tree
+fold_build_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree b_type = TREE_TYPE (b);
+
+  if (POINTER_TYPE_P (a_type))
+    {
+      switch (code)
+        {
+        case MINUS_EXPR:
+          b = fold_build1 (NEGATE_EXPR, b_type, b);
+
+          /* Fall-through.  */
+        case PLUS_EXPR:
+          code = POINTER_PLUS_EXPR;
+          break;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  else
+    b = fold_convert (a_type, b);
+
+  return fold_build2 (code, a_type, a, b);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -2255,18 +2293,7 @@
   if ((HAVE_PRE_INCREMENT && GET_MODE_SIZE (mem_mode) == cstepi)
       || (HAVE_PRE_DECREMENT && GET_MODE_SIZE (mem_mode) == -cstepi))
     {
-      enum tree_code code = MINUS_EXPR;
-      tree new_base;
-      tree new_step = step;
-
-      if (POINTER_TYPE_P (TREE_TYPE (base)))
-	{
-	  new_step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
-	  code = POINTER_PLUS_EXPR;
-	}
-      else
-	new_step = fold_convert (TREE_TYPE (base), new_step);
-      new_base = fold_build2 (code, TREE_TYPE (base), base, new_step);
+      tree new_base = fold_build_plus (MINUS_EXPR, base, step);
       add_candidate_1 (data, new_base, step, important, IP_BEFORE_USE, use,
 		       use->stmt);
     }

[-- Attachment #9: iterator.6.4-ml.log --]
[-- Type: text/x-log, Size: 297 bytes --]

23-02-2011  Tom de Vries  <tom@codesourcery.com>

	gcc/
	* tree-ssa-loop-ivopts.c (niter_for_exit): Return COND_EXPR if
	!integer_zerop (desc->may_be_zero).
	(cand_valid_pointer_at_use_p, get_lt_bound, iv_elimination_compare_lt):
	New function.
	(may_eliminate_iv): Use iv_elimination_compare_lt.

[-- Attachment #10: iterator.6.4-ml.patch --]
[-- Type: text/x-patch, Size: 6739 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -728,17 +758,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4041,6 +4068,163 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Tries to determine if the iterator CAND is a valid pointer at USE.  */
+
+static bool
+cand_valid_pointer_at_use_p (struct ivopts_data *data, struct iv_cand *cand,
+			     struct iv_use *use)
+{
+  struct iv_use *cand_iv_use = iv_use (data, cand->iv->use_id);
+  basic_block mem_bb = gimple_bb (cand_iv_use->stmt);
+  basic_block use_bb = gimple_bb (use->stmt);
+  tree memref;
+
+  if (cand->iv->base_object == NULL_TREE || cand_iv_use->op_p == NULL)
+    return false;
+
+  memref = *cand_iv_use->op_p;
+
+  if (memref == NULL_TREE)
+    return false;
+
+  switch (TREE_CODE (memref))
+    {
+    case MEM_REF:
+      if (!tree_int_cst_equal (TREE_OPERAND (memref, 1), integer_zero_node))
+	return false;
+      /* Fallthru.  */
+
+    case INDIRECT_REF:
+      if (TREE_OPERAND (memref, 0) != cand->var_before)
+	return false;
+      break;
+
+    default:
+      return false;
+    }
+
+  if (!dominated_by_p (CDI_DOMINATORS, use_bb, mem_bb))
+    return false;
+
+  return true;
+}
+
+/* Tries to detect
+     NIT == (use_iv_max < USE->iv->base)
+            ? 0
+            : (use_iv_max - USE->iv->base)
+   where
+     use_iv_real_base == (USE->iv->base - USE->iv->step)
+     && CAND->iv->base == base_ptr + use_iv_real_base
+   and returns the exclusive upper bound for CAND->var_after:
+     base_ptr + use_iv_max.  */
+
+static tree
+get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
+{
+  tree comp, base_ptr, n, n0, n1;
+  tree use_iv_real_base, cand_iv_base, use_iv_max;
+  gimple def_stmt;
+  int npos, mpos;
+  enum tree_code compare;
+  tree cand_type = TREE_TYPE (cand->iv->base);
+
+  if (TREE_CODE (nit) != COND_EXPR)
+    return NULL_TREE;
+
+  /* use_iv_real_base == use->iv->base - use->iv->step.  */
+  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+
+  /* cand_iv_base.  */
+  cand_iv_base = cand->iv->base;
+  STRIP_NOPS (cand_iv_base);
+
+  /* cand->iv->base == base_ptr + use_iv_real_base.  */
+  if (TREE_CODE (cand_iv_base) != SSA_NAME)
+    return NULL_TREE;
+  def_stmt = SSA_NAME_DEF_STMT (cand_iv_base);
+  if (!is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR)
+    return NULL_TREE;
+  if (gimple_assign_rhs2 (def_stmt) != use_iv_real_base)
+    return NULL_TREE;
+  base_ptr = gimple_assign_rhs1 (def_stmt);
+
+  /* 0.  */
+  if (tree_int_cst_equal (TREE_OPERAND (nit, 1), integer_zero_node))
+    npos = 2;
+  else if (tree_int_cst_equal (TREE_OPERAND (nit, 2), integer_zero_node))
+    npos = 1;
+  else
+    return NULL_TREE;
+
+  /* n == use_iv_max - use->iv->base.  */
+  n = TREE_OPERAND (nit, npos);
+  if (TREE_CODE (n) != PLUS_EXPR)
+    return NULL_TREE;
+  n0 = TREE_OPERAND (n, 0);
+  n1 = TREE_OPERAND (n, 1);
+  if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n0),
+			  integer_zero_node))
+    use_iv_max = n1;
+  else if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n1),
+			       integer_zero_node))
+    use_iv_max = n0;
+  else
+    return NULL_TREE;
+
+  /* comp == use_iv_max < use->iv->base.  */
+  comp = TREE_OPERAND (nit, 0);
+  compare = TREE_CODE (comp);
+  if ((npos == 2 && compare == LT_EXPR)
+      || (npos == 1 && compare == GE_EXPR))
+    mpos = 0;
+  else if ((npos == 2 && compare == GT_EXPR)
+	   || (npos == 1 && compare == LE_EXPR))
+    mpos = 1;
+  else
+    return NULL_TREE;
+  if (TREE_OPERAND (comp, mpos) != use_iv_max
+      || !tree_int_cst_equal (fold_build_plus (MINUS_EXPR, use->iv->base,
+                                               TREE_OPERAND (comp, 1 - mpos)),
+			      integer_zero_node))
+    return NULL_TREE;
+
+  /* Calculate bound.  */
+  return fold_build_plus (PLUS_EXPR, convert (cand_type, base_ptr), use_iv_max);
+}
+
+/* Tries to replace loop exit test USE, which allows NIT iterations, by one
+   formulated in terms of a LT_EXPR comparison with CAND.  Stores the resulting
+   comparison in COMP_P and bound in BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data,
+			   struct iv_use *use, struct iv_cand *cand,
+			   tree *bound_p, tree nit, enum tree_code *comp_p)
+{
+  tree bound;
+
+  if (!cand_valid_pointer_at_use_p (data, cand, use))
+    return false;
+
+  if (*comp_p != NE_EXPR)
+    return false;
+
+  bound = get_lt_bound (use, cand, nit);
+
+  if (bound == NULL_TREE)
+    return false;
+
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = LT_EXPR;
+  *bound_p = bound;
+
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND.  */
 
@@ -4132,6 +4311,10 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    return iv_elimination_compare_lt (data, use, cand, bound, nit, comp);
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

[-- Attachment #11: iterator.6.4-ml.test.patch --]
[-- Type: text/x-patch, Size: 779 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon" } */
+
+void
+f (char *p, unsigned long int i, unsigned long int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 10:46 ivopts improvement Tom de Vries
@ 2011-02-28 11:44 ` Paolo Bonzini
  2011-02-28 14:59   ` Tom de Vries
  2011-02-28 12:47 ` Zdenek Dvorak
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2011-02-28 11:44 UTC (permalink / raw)
  To: Tom de Vries; +Cc: rakdver, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

On 02/28/2011 10:41 AM, Tom de Vries wrote:
> The difficulty is that this replacement is only valid under the
> guarantee that base + n does not overflow.

I think this should be done unconditionally under 
-funsafe-loop-optimizations.  Look in the code for examples of how to 
add warnings, it should be something like:

   bool last_iter_may_overflow;
   if (TYPE_OVERFLOW_WRAPS (...))
     last_iter_may_overflow = false;
   else
     {
       last_iter_may_overflow = !flag_unsafe_loop_optimizations;
       if (warn_unsafe_loop_optimizations)
         {
           if (flag_unsafe_loop_optimizations)
             warning ("assuming ... does not overflow");
           else
             warning ("not optimizing ... because ... could overflow");
         }
     }

   if (last_iter_may_overflow
       && stmt_after_increment (loop, cand, use->stmt))

Looks like good work, thanks!

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 10:46 ivopts improvement Tom de Vries
  2011-02-28 11:44 ` Paolo Bonzini
@ 2011-02-28 12:47 ` Zdenek Dvorak
  2011-02-28 13:09   ` Richard Guenther
  2011-02-28 16:45   ` Tom de Vries
  1 sibling, 2 replies; 31+ messages in thread
From: Zdenek Dvorak @ 2011-02-28 12:47 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

Hi,

> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
> 
> 	gcc/
> 	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix boundary checks.
> 	* testsuite/gcc.dg/tree-ssa/ivopts-max.c: New test.
> 	* testsuite/gcc.dg/tree-ssa/ivopts-max-2.c: New test.
> 	* testsuite/gcc.dg/tree-ssa/ivopts-max-3.c: New test.

> diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
> --- gcc/tree-ssa-loop-ivopts.c	(revision 162585)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -4079,8 +4079,12 @@
>    /* If the number of iterations is constant, compare against it directly.  */
>    if (TREE_CODE (nit) == INTEGER_CST)
>      {
> -      /* See cand_value_at.  */
> -      if (stmt_after_increment (loop, cand, use->stmt))
> +      /* An iv with TYPE_PRECISION 32 has 2^32 distinct values, and an iv_period
> +         of 2^32 - 1.  Such an iv can handle at most a nit of 2^32 - 1.  If the
> +         exit test is after the increment, the bound calculation will overflow.
> +         If overflow wraps, that's not a problem.  */
> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
> +          && stmt_after_increment (loop, cand, use->stmt))
>          {
>            if (!tree_int_cst_lt (nit, period))
>              return false;
> @@ -4100,7 +4104,9 @@
>        double_int period_value, max_niter;
>  
>        max_niter = desc->max;
> -      if (stmt_after_increment (loop, cand, use->stmt))
> +      /* See comment at TREE_CODE (nit) == INTEGER_CST.  */
> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
> +          && stmt_after_increment (loop, cand, use->stmt))
>          max_niter = double_int_add (max_niter, double_int_one);
>        period_value = tree_to_double_int (period);
>        if (double_int_ucmp (max_niter, period_value) > 0)

This looks strange.  If indeed there is an off-by-one error here, it should not depend
on whether TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base)) is true or not.  The only
candidates that satisfy !TYPE_OVERFLOW_WRAPS are those for that we are certain that their
computations cannot overflow (e.g., if they are the original induction variables of the loop
computed in a !TYPE_OVERFLOW_WRAPS type).

> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
> 
> 	gcc/
> 	* tree-ssa-loop-ivopts.c (struct cost_pair): Add comp field.
> 	(set_use_iv_cost): Add comp parameter.
> 	(set_use_iv_cost): Use comp parameter to init comp field.
> 	(determine_use_iv_cost_generic, determine_use_iv_cost_address)
> 	(determine_use_iv_cost_condition): Add comp argument to set_use_iv_cost
> 	call.
> 	(may_eliminate_iv): Add comp parameter and set.
> 	(determine_use_iv_cost_condition): Add comp argument to may_eliminate_iv
> 	call.
> 	(determine_use_iv_cost_condition): Reset comp if too expensive.
> 	(rewrite_use_compare): Use compare from comp field of cp.

This is ok.

> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
> 
> 	gcc/
> 	* tree-ssa-loop-ivopts.c (fold_build_plus): New function factored out of
> 	add_autoinc_candidates.
> 	(add_autoinc_candidates): Use fold_build_plus.

OK.

> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
> 
> 	gcc/
> 	* tree-ssa-loop-ivopts.c (niter_for_exit): Return COND_EXPR if
> 	!integer_zerop (desc->may_be_zero).
> 	(cand_valid_pointer_at_use_p, get_lt_bound, iv_elimination_compare_lt):
> 	New function.
> 	(may_eliminate_iv): Use iv_elimination_compare_lt.

> +/* Tries to determine if the iterator CAND is a valid pointer at USE.  */

For consistency with the rest of the comments in ivopts, please use "candidate"
instead of "iterator" in similar comments.

> +static bool
> +cand_valid_pointer_at_use_p (struct ivopts_data *data, struct iv_cand *cand,
> +			     struct iv_use *use)
> +{
> +  struct iv_use *cand_iv_use = iv_use (data, cand->iv->use_id);

This is rather strange.  cand->iv->use_id is never being set (all candidates are
allocated through add_candidate_1, which creates a new iv for them from the given
base and step).  So this just chooses a random use (the first one appearing in the
loop), regrardless whether it has anything to do with cand or not.

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 12:47 ` Zdenek Dvorak
@ 2011-02-28 13:09   ` Richard Guenther
  2011-02-28 16:45   ` Tom de Vries
  1 sibling, 0 replies; 31+ messages in thread
From: Richard Guenther @ 2011-02-28 13:09 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

On Mon, Feb 28, 2011 at 11:46 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
>>
>>       gcc/
>>       * tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix boundary checks.
>>       * testsuite/gcc.dg/tree-ssa/ivopts-max.c: New test.
>>       * testsuite/gcc.dg/tree-ssa/ivopts-max-2.c: New test.
>>       * testsuite/gcc.dg/tree-ssa/ivopts-max-3.c: New test.
>
>> diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
>> --- gcc/tree-ssa-loop-ivopts.c        (revision 162585)
>> +++ gcc/tree-ssa-loop-ivopts.c        (working copy)
>> @@ -4079,8 +4079,12 @@
>>    /* If the number of iterations is constant, compare against it directly.  */
>>    if (TREE_CODE (nit) == INTEGER_CST)
>>      {
>> -      /* See cand_value_at.  */
>> -      if (stmt_after_increment (loop, cand, use->stmt))
>> +      /* An iv with TYPE_PRECISION 32 has 2^32 distinct values, and an iv_period
>> +         of 2^32 - 1.  Such an iv can handle at most a nit of 2^32 - 1.  If the
>> +         exit test is after the increment, the bound calculation will overflow.
>> +         If overflow wraps, that's not a problem.  */
>> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
>> +          && stmt_after_increment (loop, cand, use->stmt))
>>          {
>>            if (!tree_int_cst_lt (nit, period))
>>              return false;
>> @@ -4100,7 +4104,9 @@
>>        double_int period_value, max_niter;
>>
>>        max_niter = desc->max;
>> -      if (stmt_after_increment (loop, cand, use->stmt))
>> +      /* See comment at TREE_CODE (nit) == INTEGER_CST.  */
>> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
>> +          && stmt_after_increment (loop, cand, use->stmt))
>>          max_niter = double_int_add (max_niter, double_int_one);
>>        period_value = tree_to_double_int (period);
>>        if (double_int_ucmp (max_niter, period_value) > 0)
>
> This looks strange.  If indeed there is an off-by-one error here, it should not depend
> on whether TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base)) is true or not.  The only
> candidates that satisfy !TYPE_OVERFLOW_WRAPS are those for that we are certain that their
> computations cannot overflow (e.g., if they are the original induction variables of the loop
> computed in a !TYPE_OVERFLOW_WRAPS type).
>
>> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
>>
>>       gcc/
>>       * tree-ssa-loop-ivopts.c (struct cost_pair): Add comp field.
>>       (set_use_iv_cost): Add comp parameter.
>>       (set_use_iv_cost): Use comp parameter to init comp field.
>>       (determine_use_iv_cost_generic, determine_use_iv_cost_address)
>>       (determine_use_iv_cost_condition): Add comp argument to set_use_iv_cost
>>       call.
>>       (may_eliminate_iv): Add comp parameter and set.
>>       (determine_use_iv_cost_condition): Add comp argument to may_eliminate_iv
>>       call.
>>       (determine_use_iv_cost_condition): Reset comp if too expensive.
>>       (rewrite_use_compare): Use compare from comp field of cp.
>
> This is ok.
>
>> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
>>
>>       gcc/
>>       * tree-ssa-loop-ivopts.c (fold_build_plus): New function factored out of
>>       add_autoinc_candidates.
>>       (add_autoinc_candidates): Use fold_build_plus.
>
> OK.
>
>> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
>>
>>       gcc/
>>       * tree-ssa-loop-ivopts.c (niter_for_exit): Return COND_EXPR if
>>       !integer_zerop (desc->may_be_zero).
>>       (cand_valid_pointer_at_use_p, get_lt_bound, iv_elimination_compare_lt):
>>       New function.
>>       (may_eliminate_iv): Use iv_elimination_compare_lt.
>
>> +/* Tries to determine if the iterator CAND is a valid pointer at USE.  */
>
> For consistency with the rest of the comments in ivopts, please use "candidate"
> instead of "iterator" in similar comments.
>
>> +static bool
>> +cand_valid_pointer_at_use_p (struct ivopts_data *data, struct iv_cand *cand,
>> +                          struct iv_use *use)
>> +{
>> +  struct iv_use *cand_iv_use = iv_use (data, cand->iv->use_id);
>
> This is rather strange.  cand->iv->use_id is never being set (all candidates are
> allocated through add_candidate_1, which creates a new iv for them from the given
> base and step).  So this just chooses a random use (the first one appearing in the
> loop), regrardless whether it has anything to do with cand or not.

Btw, patches like this should get some benchmarking for a primary platform.

Richard.

> Zdenek
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 11:44 ` Paolo Bonzini
@ 2011-02-28 14:59   ` Tom de Vries
  2011-02-28 15:07     ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-02-28 14:59 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: rakdver, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

Hi Paolo,

On 02/28/2011 11:37 AM, Paolo Bonzini wrote:
> On 02/28/2011 10:41 AM, Tom de Vries wrote:
>> The difficulty is that this replacement is only valid under the
>> guarantee that base + n does not overflow.
> 
> I think this should be done unconditionally under 
> -funsafe-loop-optimizations.

void
f (char *base, unsigned long int i, unsigned long int n)
{
  char *p = base + i;
  do
    {
      *p = '\0';
      p++;
      i++;
   }
  while (i < n);
}

AFAIU, -funsafe-loop-optimizations allows me to assume that i does not
overflow.  Are you saying that -funsafe-loop-optimizations also implies
that p does not overflow?

- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 14:59   ` Tom de Vries
@ 2011-02-28 15:07     ` Paolo Bonzini
  2011-02-28 15:35       ` Richard Guenther
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2011-02-28 15:07 UTC (permalink / raw)
  To: Tom de Vries; +Cc: rakdver, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

On 02/28/2011 03:22 PM, Tom de Vries wrote:
>> >
>> >  I think this should be done unconditionally under
>> >  -funsafe-loop-optimizations.
> void
> f (char *base, unsigned long int i, unsigned long int n)
> {
>    char *p = base + i;
>    do
>      {
>        *p = '\0';
>        p++;
>        i++;
>     }
>    while (i<  n);
> }
>
> AFAIU, -funsafe-loop-optimizations allows me to assume that i does not
> overflow.  Are you saying that -funsafe-loop-optimizations also implies
> that p does not overflow?

I'm wondering if the C standard doesn't allow you to accept this always, 
at least if the increment to i (and p) is positive.

I'm thinking that "undefined behavior for NULL pointer access" implies 
that an object cannot be allocated across a NULL pointer.  And since 
pointer arithmetic beyond object boundaries is undefined, an overflow of 
p (which would cross a NULL pointer) is also undefined.

I'm not sure what your code's intentions would be in a more general case 
that incrementing by 1.  But, in particular, this is valid code:

void
f (char *base, unsigned long int i, unsigned long int n)
{
    char *p = base + i;
    do
      {
        *p = '\0';
        p--;
        i--;
     }
    while (i > n);
}

while this isn't (assuming sizeof(long)==sizeof(void *) of course):

void
f (char *base, unsigned long int i, unsigned long int n)
{
    char *p = base + i;
    do
      {
        *p = '\0';
        p += ~0UL;
        i += ~0UL;
     }
    while (i > n);
}

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 15:07     ` Paolo Bonzini
@ 2011-02-28 15:35       ` Richard Guenther
  2011-02-28 19:14         ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Guenther @ 2011-02-28 15:35 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tom de Vries, rakdver, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

On Mon, Feb 28, 2011 at 3:43 PM, Paolo Bonzini <bonzini@gnu.org> wrote:
> On 02/28/2011 03:22 PM, Tom de Vries wrote:
>>>
>>> >
>>> >  I think this should be done unconditionally under
>>> >  -funsafe-loop-optimizations.
>>
>> void
>> f (char *base, unsigned long int i, unsigned long int n)
>> {
>>   char *p = base + i;
>>   do
>>     {
>>       *p = '\0';
>>       p++;
>>       i++;
>>    }
>>   while (i<  n);
>> }
>>
>> AFAIU, -funsafe-loop-optimizations allows me to assume that i does not
>> overflow.  Are you saying that -funsafe-loop-optimizations also implies
>> that p does not overflow?
>
> I'm wondering if the C standard doesn't allow you to accept this always, at
> least if the increment to i (and p) is positive.
>
> I'm thinking that "undefined behavior for NULL pointer access" implies that
> an object cannot be allocated across a NULL pointer.  And since pointer
> arithmetic beyond object boundaries is undefined, an overflow of p (which
> would cross a NULL pointer) is also undefined.

The important thing is that IVOPTs is not allowed to introduce IVs that
overflow either.  And it is known to create weird IVs starting from
-4 ...

Richard.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 12:47 ` Zdenek Dvorak
  2011-02-28 13:09   ` Richard Guenther
@ 2011-02-28 16:45   ` Tom de Vries
  1 sibling, 0 replies; 31+ messages in thread
From: Tom de Vries @ 2011-02-28 16:45 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

On 02/28/2011 11:46 AM, Zdenek Dvorak wrote:
> Hi,
> 
>> 23-02-2011  Tom de Vries  <tom@codesourcery.com>
>>
>> 	gcc/
>> 	* tree-ssa-loop-ivopts.c (may_eliminate_iv): Fix boundary checks.
>> 	* testsuite/gcc.dg/tree-ssa/ivopts-max.c: New test.
>> 	* testsuite/gcc.dg/tree-ssa/ivopts-max-2.c: New test.
>> 	* testsuite/gcc.dg/tree-ssa/ivopts-max-3.c: New test.
> 
>> diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
>> --- gcc/tree-ssa-loop-ivopts.c	(revision 162585)
>> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
>> @@ -4079,8 +4079,12 @@
>>    /* If the number of iterations is constant, compare against it directly.  */
>>    if (TREE_CODE (nit) == INTEGER_CST)
>>      {
>> -      /* See cand_value_at.  */
>> -      if (stmt_after_increment (loop, cand, use->stmt))
>> +      /* An iv with TYPE_PRECISION 32 has 2^32 distinct values, and an iv_period
>> +         of 2^32 - 1.  Such an iv can handle at most a nit of 2^32 - 1.  If the
>> +         exit test is after the increment, the bound calculation will overflow.
>> +         If overflow wraps, that's not a problem.  */
>> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
>> +          && stmt_after_increment (loop, cand, use->stmt))
>>          {
>>            if (!tree_int_cst_lt (nit, period))
>>              return false;
>> @@ -4100,7 +4104,9 @@
>>        double_int period_value, max_niter;
>>  
>>        max_niter = desc->max;
>> -      if (stmt_after_increment (loop, cand, use->stmt))
>> +      /* See comment at TREE_CODE (nit) == INTEGER_CST.  */
>> +      if (!TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base))
>> +          && stmt_after_increment (loop, cand, use->stmt))
>>          max_niter = double_int_add (max_niter, double_int_one);
>>        period_value = tree_to_double_int (period);
>>        if (double_int_ucmp (max_niter, period_value) > 0)
> 
> This looks strange.  If indeed there is an off-by-one error here, it should not depend
> on whether TYPE_OVERFLOW_WRAPS (TREE_TYPE (cand->iv->base)) is true or not.  The only
> candidates that satisfy !TYPE_OVERFLOW_WRAPS are those for that we are certain that their
> computations cannot overflow (e.g., if they are the original induction variables of the loop
> computed in a !TYPE_OVERFLOW_WRAPS type).
> 

Ok, in that case I'll remove the stmt_after_increment related code in
may_eliminate_iv.

>> +static bool
>> +cand_valid_pointer_at_use_p (struct ivopts_data *data, struct iv_cand *cand,
>> +			     struct iv_use *use)
>> +{
>> +  struct iv_use *cand_iv_use = iv_use (data, cand->iv->use_id);
> 
> This is rather strange.  cand->iv->use_id is never being set (all candidates are
> allocated through add_candidate_1, which creates a new iv for them from the given
> base and step).  So this just chooses a random use (the first one appearing in the
> loop), regrardless whether it has anything to do with cand or not.
> 

I see, that needs fixing indeed. Thanks for noticing that.

- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 15:35       ` Richard Guenther
@ 2011-02-28 19:14         ` Zdenek Dvorak
  2011-03-02 22:01           ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-02-28 19:14 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Paolo Bonzini, Tom de Vries, gcc-patches, Bernd Schmidt, Maxim Kuvyrkov

Hi,

> >>> > Â I think this should be done unconditionally under
> >>> > Â -funsafe-loop-optimizations.
> >>
> >> void
> >> f (char *base, unsigned long int i, unsigned long int n)
> >> {
> >> Â  char *p = base + i;
> >> Â  do
> >> Â  Â  {
> >> Â  Â  Â  *p = '\0';
> >> Â  Â  Â  p++;
> >> Â  Â  Â  i++;
> >> Â  Â }
> >> Â  while (i< Â n);
> >> }
> >>
> >> AFAIU, -funsafe-loop-optimizations allows me to assume that i does not
> >> overflow. Â Are you saying that -funsafe-loop-optimizations also implies
> >> that p does not overflow?
> >
> > I'm wondering if the C standard doesn't allow you to accept this always, at
> > least if the increment to i (and p) is positive.
> >
> > I'm thinking that "undefined behavior for NULL pointer access" implies that
> > an object cannot be allocated across a NULL pointer. Â And since pointer
> > arithmetic beyond object boundaries is undefined, an overflow of p (which
> > would cross a NULL pointer) is also undefined.
> 
> The important thing is that IVOPTs is not allowed to introduce IVs that
> overflow either.  And it is known to create weird IVs starting from
> -4 ...

as long as we do the arithmetics in unsigned integer type and only cast to the
pointer type the final result, C standard (plus the gcc-specific interpretation
of casts) allows us to do that.

Nevertheless, pointer arithmetics has defined behavior only within
a single memory object or at most one array element after it; which more
or less forbids overflows, when the arithmetics is performed in pointer
type,

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-02-28 19:14         ` Zdenek Dvorak
@ 2011-03-02 22:01           ` Tom de Vries
  2011-03-03  8:45             ` Paolo Bonzini
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-02 22:01 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Richard Guenther, Paolo Bonzini, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 2473 bytes --]

Hi Zdenek,

On 02/28/2011 07:12 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>>>>>  I think this should be done unconditionally under
>>>>>>  -funsafe-loop-optimizations.
>>>>
>>>> void
>>>> f (char *base, unsigned long int i, unsigned long int n)
>>>> {
>>>>   char *p = base + i;
>>>>   do
>>>>     {
>>>>       *p = '\0';
>>>>       p++;
>>>>       i++;
>>>>    }
>>>>   while (i<  n);
>>>> }
>>>>
>>>> AFAIU, -funsafe-loop-optimizations allows me to assume that i does not
>>>> overflow.  Are you saying that -funsafe-loop-optimizations also implies
>>>> that p does not overflow?
>>>
>>> I'm wondering if the C standard doesn't allow you to accept this always, at
>>> least if the increment to i (and p) is positive.
>>>
>>> I'm thinking that "undefined behavior for NULL pointer access" implies that
>>> an object cannot be allocated across a NULL pointer.  And since pointer
>>> arithmetic beyond object boundaries is undefined, an overflow of p (which
>>> would cross a NULL pointer) is also undefined.
>>
>> The important thing is that IVOPTs is not allowed to introduce IVs that
>> overflow either.  And it is known to create weird IVs starting from
>> -4 ...
> 
> as long as we do the arithmetics in unsigned integer type and only cast to the
> pointer type the final result, C standard (plus the gcc-specific interpretation
> of casts) allows us to do that.
> 
> Nevertheless, pointer arithmetics has defined behavior only within
> a single memory object or at most one array element after it; which more
> or less forbids overflows, when the arithmetics is performed in pointer
> type,
> 
> Zdenek

Another try with updated patches.

Changes compared to last submission:
iterator.6.1-ml.patch:
* removed the stmt_after_increment related code
  in may_eliminate_iv (after review comment).
iterator.6.4-ml.patch:
* added test for loop_only_exit_p before calling
  iv_elimination_compare_lt. make sure the last loop iteration is
  reached, which makes p + n a valid pointer.
* removed cand_valid_pointer_at_use_p and replaced call
  with test for (cand->pos == IP_ORIGINAL
  && POINTER_TYPE_P (TREE_TYPE (cand->var_before))
  && POINTER_TYPE_OVERFLOW_UNDEFINED).
  In other words, the iterator is a source level pointer iterator, which
  is guaranteed not to overflow provided
  POINTER_TYPE_OVERFLOW_UNDEFINED.
* return bound in pointer type rather than unsigned int type. Make sure
  the bound matches the pointer iterator.

reg-tested on x86-64.

Ok now?

Thanks,
- Tom

[-- Attachment #2: iterator.6.1-ml.patch --]
[-- Type: text/x-patch, Size: 1551 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 170268)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4362,17 +4362,8 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      /* See cand_value_at.  */
-      if (stmt_after_increment (loop, cand, use->stmt))
-        {
-          if (!tree_int_cst_lt (nit, period))
-            return false;
-        }
-      else
-        {
-          if (tree_int_cst_lt (period, nit))
+      if (tree_int_cst_lt (period, nit))
             return false;
-        }
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
@@ -4383,8 +4374,6 @@ may_eliminate_iv (struct ivopts_data *da
       double_int period_value, max_niter;
 
       max_niter = desc->max;
-      if (stmt_after_increment (loop, cand, use->stmt))
-        max_niter = double_int_add (max_niter, double_int_one);
       period_value = tree_to_double_int (period);
       if (double_int_ucmp (max_niter, period_value) > 0)
         {
@@ -4393,7 +4382,6 @@ may_eliminate_iv (struct ivopts_data *da
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
-              /* The loop bound is already adjusted by adding 1.  */
               if (double_int_ucmp (max_niter, period_value) > 0)
                 return false;
             }

[-- Attachment #3: iterator.6.4-ml.patch --]
[-- Type: text/x-patch, Size: 5900 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -814,17 +814,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4342,6 +4350,123 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Tries to detect
+     NIT == (use_iv_max < USE->iv->base)
+            ? 0
+            : (use_iv_max - USE->iv->base)
+   where
+     use_iv_real_base == (USE->iv->base - USE->iv->step)
+     && CAND->iv->base == base_ptr + use_iv_real_base
+   and returns the exclusive upper bound for CAND->var_after:
+     base_ptr + use_iv_max.  */
+
+static tree
+get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
+{
+  tree comp, base_ptr, n, n0, n1;
+  tree use_iv_real_base, cand_iv_base, use_iv_max;
+  gimple def_stmt;
+  int npos, mpos;
+  enum tree_code compare;
+  tree cand_type = TREE_TYPE (cand->var_before);
+
+  if (TREE_CODE (nit) != COND_EXPR)
+    return NULL_TREE;
+
+  /* use_iv_real_base == use->iv->base - use->iv->step.  */
+  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+
+  /* cand_iv_base.  */
+  cand_iv_base = cand->iv->base;
+  STRIP_NOPS (cand_iv_base);
+
+  /* cand->iv->base == base_ptr + use_iv_real_base.  */
+  if (TREE_CODE (cand_iv_base) != SSA_NAME)
+    return NULL_TREE;
+  def_stmt = SSA_NAME_DEF_STMT (cand_iv_base);
+  if (!is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR)
+    return NULL_TREE;
+  if (gimple_assign_rhs2 (def_stmt) != use_iv_real_base)
+    return NULL_TREE;
+  base_ptr = gimple_assign_rhs1 (def_stmt);
+
+  /* 0.  */
+  if (tree_int_cst_equal (TREE_OPERAND (nit, 1), integer_zero_node))
+    npos = 2;
+  else if (tree_int_cst_equal (TREE_OPERAND (nit, 2), integer_zero_node))
+    npos = 1;
+  else
+    return NULL_TREE;
+
+  /* n == use_iv_max - use->iv->base.  */
+  n = TREE_OPERAND (nit, npos);
+  if (TREE_CODE (n) != PLUS_EXPR)
+    return NULL_TREE;
+  n0 = TREE_OPERAND (n, 0);
+  n1 = TREE_OPERAND (n, 1);
+  if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n0),
+			  integer_zero_node))
+    use_iv_max = n1;
+  else if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n1),
+			       integer_zero_node))
+    use_iv_max = n0;
+  else
+    return NULL_TREE;
+
+  /* comp == use_iv_max < use->iv->base.  */
+  comp = TREE_OPERAND (nit, 0);
+  compare = TREE_CODE (comp);
+  if ((npos == 2 && compare == LT_EXPR)
+      || (npos == 1 && compare == GE_EXPR))
+    mpos = 0;
+  else if ((npos == 2 && compare == GT_EXPR)
+	   || (npos == 1 && compare == LE_EXPR))
+    mpos = 1;
+  else
+    return NULL_TREE;
+  if (TREE_OPERAND (comp, mpos) != use_iv_max
+      || !tree_int_cst_equal (fold_build_plus (MINUS_EXPR, use->iv->base,
+                                               TREE_OPERAND (comp, 1 - mpos)),
+			      integer_zero_node))
+    return NULL_TREE;
+
+  /* Calculate bound.  */
+  return fold_build_plus (PLUS_EXPR, convert (cand_type, base_ptr), use_iv_max);
+}
+
+/* Tries to replace loop exit test USE, which allows NIT iterations, by one
+   formulated in terms of a LT_EXPR comparison with CAND.  Stores the resulting
+   comparison in COMP_P and bound in BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct iv_use *use, struct iv_cand *cand,
+			   tree *bound_p, tree nit, enum tree_code *comp_p)
+{
+  tree bound;
+
+  if (!(cand->pos == IP_ORIGINAL
+	&& POINTER_TYPE_P (TREE_TYPE (cand->var_before))
+	&& POINTER_TYPE_OVERFLOW_UNDEFINED))
+    return false;
+
+  if (*comp_p != NE_EXPR)
+    return false;
+
+  bound = get_lt_bound (use, cand, nit);
+
+  if (bound == NULL_TREE)
+    return false;
+
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = LT_EXPR;
+  *bound_p = bound;
+
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4420,6 +4545,15 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (!loop_only_exit_p (loop, exit))
+        return false;
+
+      return iv_elimination_compare_lt (use, cand, bound, nit, comp);
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-02 22:01           ` Tom de Vries
@ 2011-03-03  8:45             ` Paolo Bonzini
  2011-03-03 14:29               ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2011-03-03  8:45 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Zdenek Dvorak, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/02/2011 11:01 PM, Tom de Vries wrote:
> +  if (TREE_CODE (nit) == COND_EXPR)
> +    {
> +      if (!loop_only_exit_p (loop, exit))
> +        return false;
> +
> +      return iv_elimination_compare_lt (use, cand, bound, nit, comp);
> +    }
> +

You probably need a comment on top of iv_elimination_compare_lt, 
otherwise I'm left wondering why this isn't

   if (TREE_CODE (nit) == COND_EXPR
       && loop_only_exit_p (loop, exit)
       && iv_elimination_compare_lt (use, cand, bound, nit, comp))
     return true;

Also, the check on nit is an optimization.  Perhaps you should add a 
gcc_checking_assert to i_elimination_compare_lt and/or remove this from 
get_lt_bound:

> +  if (TREE_CODE (nit) != COND_EXPR)
> +    return NULL_TREE;
> +

Or perhaps loop_only_exit_p could be optimized by computing it ahead of 
time, possibly at the same time as loop_body_includes_call.  This way it 
becomes very cheap and the code above can just call 
iv_elimination_compare_lt without any pre-screening.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-03  8:45             ` Paolo Bonzini
@ 2011-03-03 14:29               ` Tom de Vries
  2011-03-04  7:37                 ` Paolo Bonzini
  2011-03-04 22:37                 ` Zdenek Dvorak
  0 siblings, 2 replies; 31+ messages in thread
From: Tom de Vries @ 2011-03-03 14:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Zdenek Dvorak, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1365 bytes --]

Hi Paolo,

On 03/03/2011 09:44 AM, Paolo Bonzini wrote:
> On 03/02/2011 11:01 PM, Tom de Vries wrote:
>> +  if (TREE_CODE (nit) == COND_EXPR)
>> +    {
>> +      if (!loop_only_exit_p (loop, exit))
>> +        return false;
>> +
>> +      return iv_elimination_compare_lt (use, cand, bound, nit, comp);
>> +    }
>> +
> 
> You probably need a comment on top of iv_elimination_compare_lt, 
> otherwise I'm left wondering why this isn't
> 
>    if (TREE_CODE (nit) == COND_EXPR
>        && loop_only_exit_p (loop, exit)
>        && iv_elimination_compare_lt (use, cand, bound, nit, comp))
>      return true;
> 

You're right, there's a comment missing. I added it now.

> Also, the check on nit is an optimization.  

It's not, hopefully now explained by the comment.

> Perhaps you should add a 
> gcc_checking_assert to i_elimination_compare_lt and/or remove this from 
> get_lt_bound:
> 
>> +  if (TREE_CODE (nit) != COND_EXPR)
>> +    return NULL_TREE;
>> +

It's duplicate test, so I turned it into a gcc_checking_assert.

> Or perhaps loop_only_exit_p could be optimized by computing it ahead of 
> time, possibly at the same time as loop_body_includes_call.  This way it 
> becomes very cheap and the code above can just call 
> iv_elimination_compare_lt without any pre-screening.

Gave it a try in a new patch.

reg-tested on x86_64. Better?

Thanks,
- Tom

[-- Attachment #2: iterator.6.4-ml.patch --]
[-- Type: text/x-patch, Size: 3036 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -292,6 +292,10 @@
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether the loop body includes any function calls that possibly have side
+     effects.  */
+  bool body_includes_side_effect_call;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -456,6 +460,20 @@
   return exit;
 }
 
+/* Returns true if single_exit (DATA->current_loop) is the only possible exit.
+   Uses the same logic as loop_only_exit_p.  */
+
+static bool
+loop_single_exit_p (struct ivopts_data *data)
+{
+  edge exit = single_exit (data->current_loop);
+
+  if (!exit)
+    return false;
+
+  return !data->body_includes_side_effect_call;
+}
+
 /* Dumps information about the induction variable IV to FILE.  */
 
 extern void dump_iv (FILE *, struct iv *);
@@ -4403,7 +4421,7 @@
       if (double_int_ucmp (max_niter, period_value) > 0)
         {
           /* See if we can take advantage of infered loop bound information.  */
-          if (loop_only_exit_p (loop, exit))
+          if (loop_single_exit_p (data))
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
@@ -6343,23 +6361,34 @@
   htab_delete (data->inv_expr_tab);
 }
 
-/* Returns true if the loop body BODY includes any function calls.  */
+/* Find any functions calls in loop body BODY and stores a classification of
+   those in calls in DATA.  */
 
-static bool
-loop_body_includes_call (basic_block *body, unsigned num_nodes)
+static void
+find_calls_in_loop_body (struct ivopts_data *data, basic_block *body,
+                         unsigned num_nodes)
 {
   gimple_stmt_iterator gsi;
   unsigned i;
+  bool call = false;
+  bool se_call = false;
+  bool done = false;
 
-  for (i = 0; i < num_nodes; i++)
-    for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+  for (i = 0; i < num_nodes && !done; i++)
+    for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi) && !done;
+         gsi_next (&gsi))
       {
-	gimple stmt = gsi_stmt (gsi);
-	if (is_gimple_call (stmt)
-	    && !is_inexpensive_builtin (gimple_call_fndecl (stmt)))
-	  return true;
+        gimple stmt = gsi_stmt (gsi);
+        if (!is_gimple_call (stmt))
+          continue;
+
+        call = call || is_inexpensive_builtin (gimple_call_fndecl (stmt));
+        se_call = se_call || gimple_has_side_effects (stmt);
+        done = call && se_call;
       }
-  return false;
+
+  data->body_includes_call = call;
+  data->body_includes_side_effect_call = se_call;
 }
 
 /* Optimizes the LOOP.  Returns true if anything changed.  */
@@ -6393,7 +6422,7 @@
     }
 
   body = get_loop_body (loop);
-  data->body_includes_call = loop_body_includes_call (body, loop->num_nodes);
+  find_calls_in_loop_body (data, body, loop->num_nodes);
   renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes);
   free (body);
 

[-- Attachment #3: iterator.6.5-ml.patch --]
[-- Type: text/x-patch, Size: 6535 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -832,17 +832,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4360,6 +4368,126 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Tries to detect
+     NIT == (use_iv_max < USE->iv->base)
+            ? 0
+            : (use_iv_max - USE->iv->base)
+   where
+     use_iv_real_base == (USE->iv->base - USE->iv->step)
+     && CAND->iv->base == base_ptr + use_iv_real_base
+   and returns the exclusive upper bound for CAND->var_after:
+     base_ptr + use_iv_max.  */
+
+static tree
+get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
+{
+  tree comp, base_ptr, n, n0, n1;
+  tree use_iv_real_base, cand_iv_base, use_iv_max;
+  gimple def_stmt;
+  int npos, mpos;
+  enum tree_code compare;
+  tree cand_type = TREE_TYPE (cand->var_before);
+
+  gcc_assert (TREE_CODE (nit) == COND_EXPR);
+
+  /* use_iv_real_base == use->iv->base - use->iv->step.  */
+  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+
+  /* cand_iv_base.  */
+  cand_iv_base = cand->iv->base;
+  STRIP_NOPS (cand_iv_base);
+
+  /* cand->iv->base == base_ptr + use_iv_real_base.  */
+  if (TREE_CODE (cand_iv_base) != SSA_NAME)
+    return NULL_TREE;
+  def_stmt = SSA_NAME_DEF_STMT (cand_iv_base);
+  if (!is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR)
+    return NULL_TREE;
+  if (gimple_assign_rhs2 (def_stmt) != use_iv_real_base)
+    return NULL_TREE;
+  base_ptr = gimple_assign_rhs1 (def_stmt);
+
+  /* 0.  */
+  if (tree_int_cst_equal (TREE_OPERAND (nit, 1), integer_zero_node))
+    npos = 2;
+  else if (tree_int_cst_equal (TREE_OPERAND (nit, 2), integer_zero_node))
+    npos = 1;
+  else
+    return NULL_TREE;
+
+  /* n == use_iv_max - use->iv->base.  */
+  n = TREE_OPERAND (nit, npos);
+  if (TREE_CODE (n) != PLUS_EXPR)
+    return NULL_TREE;
+  n0 = TREE_OPERAND (n, 0);
+  n1 = TREE_OPERAND (n, 1);
+  if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n0),
+                          integer_zero_node))
+    use_iv_max = n1;
+  else if (tree_int_cst_equal (fold_build_plus (PLUS_EXPR, use->iv->base, n1),
+                               integer_zero_node))
+    use_iv_max = n0;
+  else
+    return NULL_TREE;
+
+  /* comp == use_iv_max < use->iv->base.  */
+  comp = TREE_OPERAND (nit, 0);
+  compare = TREE_CODE (comp);
+  if ((npos == 2 && compare == LT_EXPR)
+      || (npos == 1 && compare == GE_EXPR))
+    mpos = 0;
+  else if ((npos == 2 && compare == GT_EXPR)
+           || (npos == 1 && compare == LE_EXPR))
+    mpos = 1;
+  else
+    return NULL_TREE;
+  if (TREE_OPERAND (comp, mpos) != use_iv_max
+      || !tree_int_cst_equal (fold_build_plus (MINUS_EXPR, use->iv->base,
+                                               TREE_OPERAND (comp, 1 - mpos)),
+                              integer_zero_node))
+    return NULL_TREE;
+
+  /* Calculate bound.  */
+  return fold_build_plus (PLUS_EXPR, convert (cand_type, base_ptr), use_iv_max);
+}
+
+/* Tries to replace loop exit test USE, which allows NIT iterations, by one
+   formulated in terms of a LT_EXPR comparison with CAND.  Stores the resulting
+   comparison in COMP_P and bound in BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+                           struct iv_cand *cand, tree *bound_p, tree nit,
+                           enum tree_code *comp_p)
+{
+  tree bound;
+
+  if (!(cand->pos == IP_ORIGINAL
+        && POINTER_TYPE_P (TREE_TYPE (cand->var_before))
+        && POINTER_TYPE_OVERFLOW_UNDEFINED))
+    return false;
+
+  if (*comp_p != NE_EXPR)
+    return false;
+
+  if (!loop_single_exit_p (data))
+    return false;
+
+  bound = get_lt_bound (use, cand, nit);
+
+  if (bound == NULL_TREE)
+    return false;
+
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = LT_EXPR;
+  *bound_p = bound;
+
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4438,6 +4566,21 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (iv_elimination_compare_lt (data, use, cand, bound, nit, comp))
+        return true;
+
+      /* We could try to see if the non-lt bound is not too expensive, but the
+         cost infrastructure needs tuning for that first.  Even though
+         expression_expensive_p always returns true for COND_EXPRs, it happens
+         that the bound is folded into a MAX_EXPR, which is approved by
+         expression_expensive_p, but attributed a too low cost by force_var_cost
+         in case the MAX_EXPR would expand into control flow.  */
+      return false;
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-03 14:29               ` Tom de Vries
@ 2011-03-04  7:37                 ` Paolo Bonzini
  2011-03-04 13:58                   ` Tom de Vries
  2011-03-04 22:37                 ` Zdenek Dvorak
  1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2011-03-04  7:37 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Zdenek Dvorak, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/03/2011 03:28 PM, Tom de Vries wrote:
> reg-tested on x86_64. Better?

Yes, very much so (talking about patch 6.5; the other one is an 
optimization but not essential based on the new comments).

Just one question: what happens if the COND_EXPR is indeed turned into a 
MAX_EXPR?  Is it a missed optimization?  Is it because of overflow that 
you aren't _always_ creating a MAX_EXPR directly?

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-04  7:37                 ` Paolo Bonzini
@ 2011-03-04 13:58                   ` Tom de Vries
  2011-03-04 14:07                     ` Paolo Bonzini
  2011-03-04 17:19                     ` Tom de Vries
  0 siblings, 2 replies; 31+ messages in thread
From: Tom de Vries @ 2011-03-04 13:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Zdenek Dvorak, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/04/2011 08:37 AM, Paolo Bonzini wrote:
> On 03/03/2011 03:28 PM, Tom de Vries wrote:
>> reg-tested on x86_64. Better?
> 
> Yes, very much so 

Great. Thanks for the review.

> (talking about patch 6.5; the other one is an 
> optimization but not essential based on the new comments).
>
> Just one question: what happens if the COND_EXPR is indeed turned into a 
> MAX_EXPR?  Is it a missed optimization?

It is. It will take some effort to get cost calculation for MAX_EXPR ok.

One additional problem (beside costs) that I observed also might need
fixing: a new bound (containing a MAX_EXPR) is generated for an inner
loop. The new bound is outer loop invariant. The MAX_EXPR however
expands into control flow, and is not hoisted out of the outer loop,
while the rest of the bound calculation is.

> Is it because of overflow that 
> you aren't _always_ creating a MAX_EXPR directly?

Indeed. The COND_EXPR created for my example is:
...
  (i + 1 <= n) ? (~i + n) : 0
...
where i and n are unsigned int.

simplified:
...
  (i + 1 <= n) ? (n - (i + 1)) : 0
...

The condition i + 1 <= n precisely states the cases when n - (i + 1)
does not underflow.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-04 13:58                   ` Tom de Vries
@ 2011-03-04 14:07                     ` Paolo Bonzini
  2011-03-04 17:19                     ` Tom de Vries
  1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2011-03-04 14:07 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Zdenek Dvorak, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/04/2011 02:57 PM, Tom de Vries wrote:
> The MAX_EXPR however
> expands into control flow, and is not hoisted out of the outer loop,
> while the rest of the bound calculation is.

That looks like a pass-ordering problem too (both on the tree level, for 
ivopts versus invariant motion, and on the RTL level where predicated 
instructions are created after invariant motion).

I just noticed that ARM doesn't have named {s,u}{min,max} patterns. 
Perhaps adding those helps hoisting the control-flow out of the loop in 
your case.

Paolo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-04 13:58                   ` Tom de Vries
  2011-03-04 14:07                     ` Paolo Bonzini
@ 2011-03-04 17:19                     ` Tom de Vries
  1 sibling, 0 replies; 31+ messages in thread
From: Tom de Vries @ 2011-03-04 17:19 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi Zdenek,

On 03/04/2011 02:57 PM, Tom de Vries wrote:
> On 03/04/2011 08:37 AM, Paolo Bonzini wrote:
>> On 03/03/2011 03:28 PM, Tom de Vries wrote:
>>> reg-tested on x86_64. Better?
>>
>> Yes, very much so 
> 
> Great. Thanks for the review.
> 
>> (talking about patch 6.5; the other one is an 
>> optimization but not essential based on the new comments).
>>

Is this patch set ok for stage 1 4.7, provided benchmarking on a primary
platform does not show any problems?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-03 14:29               ` Tom de Vries
  2011-03-04  7:37                 ` Paolo Bonzini
@ 2011-03-04 22:37                 ` Zdenek Dvorak
  2011-03-13 17:35                   ` Tom de Vries
  1 sibling, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-04 22:37 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

>    /* Whether the loop body includes any function calls.  */
>    bool body_includes_call;
> +
> +  /* Whether the loop body includes any function calls that possibly have side
> +     effects.  */
> +  bool body_includes_side_effect_call;
>  };
>  
>  /* An assignment of iv candidates to uses.  */
> @@ -456,6 +460,20 @@
>    return exit;
>  }
>  
> +/* Returns true if single_exit (DATA->current_loop) is the only possible exit.
> +   Uses the same logic as loop_only_exit_p.  */

why are you duplicating the functionality, instead of simply caching the result
of loop_only_exit_p?

> +/* Tries to detect
> +     NIT == (use_iv_max < USE->iv->base)
> +            ? 0
> +            : (use_iv_max - USE->iv->base)
> +   where
> +     use_iv_real_base == (USE->iv->base - USE->iv->step)
> +     && CAND->iv->base == base_ptr + use_iv_real_base
> +   and returns the exclusive upper bound for CAND->var_after:
> +     base_ptr + use_iv_max.  */
> +
> +static tree
> +get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
> +{
...
> +  /* use_iv_real_base == use->iv->base - use->iv->step.  */
> +  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
> +
> +  /* cand_iv_base.  */
> +
> +  /* cand->iv->base == base_ptr + use_iv_real_base.  */
...
> +  /* 0.  */
...

This function seriously needs better comments.  All that are currently present just
give relations between variables that can be as easily seen from the code (but
do not at all explain what the variables are supposed to mean), or make no sense
(what does the 0. comment mean?)

Otherwise the patch looks ok (but I would still like to see get_lt_bound with proper
comments, currently I don't really understand what happens there),

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-04 22:37                 ` Zdenek Dvorak
@ 2011-03-13 17:35                   ` Tom de Vries
  2011-03-14  9:58                     ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-13 17:35 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On 03/04/2011 11:37 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>    /* Whether the loop body includes any function calls.  */
>>    bool body_includes_call;
>> +
>> +  /* Whether the loop body includes any function calls that possibly have side
>> +     effects.  */
>> +  bool body_includes_side_effect_call;
>>  };
>>  
>>  /* An assignment of iv candidates to uses.  */
>> @@ -456,6 +460,20 @@
>>    return exit;
>>  }
>>  
>> +/* Returns true if single_exit (DATA->current_loop) is the only possible exit.
>> +   Uses the same logic as loop_only_exit_p.  */
> 
> why are you duplicating the functionality, instead of simply caching the result
> of loop_only_exit_p?
> 

I was trying to avoid iterating over the loop body twice, once for
body_includes_call and once for body_includes_side_effect_call (or
loop_only_exit_p). But indeed, duplicating functionality is not ideal
either. I additionally tried a version which both does not duplicate
functionality, and is runtime efficient, but the implementation became
very convoluted, so I settled for your suggestion of caching the result
of loop_only_exit_p.

>> +/* Tries to detect
>> +     NIT == (use_iv_max < USE->iv->base)
>> +            ? 0
>> +            : (use_iv_max - USE->iv->base)
>> +   where
>> +     use_iv_real_base == (USE->iv->base - USE->iv->step)
>> +     && CAND->iv->base == base_ptr + use_iv_real_base
>> +   and returns the exclusive upper bound for CAND->var_after:
>> +     base_ptr + use_iv_max.  */
>> +
>> +static tree
>> +get_lt_bound (struct iv_use *use, struct iv_cand *cand, tree nit)
>> +{
> ...
>> +  /* use_iv_real_base == use->iv->base - use->iv->step.  */
>> +  use_iv_real_base = fold_build_plus (MINUS_EXPR, use->iv->base, use->iv->step);
>> +
>> +  /* cand_iv_base.  */
>> +
>> +  /* cand->iv->base == base_ptr + use_iv_real_base.  */
> ...
>> +  /* 0.  */
> ...
> 
> This function seriously needs better comments.  All that are currently present just
> give relations between variables that can be as easily seen from the code (but
> do not at all explain what the variables are supposed to mean), 

I see.

> or make no sense
> (what does the 0. comment mean?)

I was trying to repeat parts of the function header comment bit by bit,
but got too terse in the process.

> Otherwise the patch looks ok (but I would still like to see get_lt_bound with proper
> comments, currently I don't really understand what happens there),

changes compared to last submission:
iterator.6.3-ml.patch:
- split up fold_build_plus into fold_build_plus and robust_plus, in
  order to reuse robust_plus in fold_plus in iterator.6.6-ml.patch.
iterator.6.4-ml.patch:
- just cache result of loop_only_exit_p.
- make loop_only_exit_p robust against exit == NULL.
iterator.6.5-ml.patch:
- new patch. keep ssa_name field valid.
iterator.6.6-ml.patch:
- improved comments.
- factored out folding functionality into fold_plus and
  fold_walk_def_plus.
- detect use loop bound based on use->stmt rather than on the nit
  COND_EXPR.
- improved code to handle 'int' iterator.
- improved code to handle '<=' case.
- improved code to handle negative step.
- improved code to handle iv increments after loop exit.
iterator.6.6-ml.test.patch:
- duplicated test for int iterator.

reg-tested on x86_64.

I hope the comments are better now.

Thanks,
- Tom

[-- Attachment #2: iterator.6.3-ml.patch --]
[-- Type: text/x-patch, Size: 1951 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -340,6 +340,44 @@
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+/* Detects whether A is of POINTER_TYPE, and modifies CODE and B to make
+   A CODE B type-safe.  */
+
+static inline void
+robust_plus (enum tree_code *code, tree a, tree *b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree b_type = TREE_TYPE (*b);
+
+  if (POINTER_TYPE_P (a_type))
+    {
+      switch (*code)
+        {
+        case MINUS_EXPR:
+          *b = fold_build1 (NEGATE_EXPR, b_type, *b);
+
+          /* Fall-through.  */
+        case PLUS_EXPR:
+          *code = POINTER_PLUS_EXPR;
+          break;
+        default:
+          gcc_unreachable ();
+        }
+    }
+  else
+    *b = fold_convert (a_type, *b);
+}
+
+/* Returns (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Handles the case that A is a pointer robustly.  */
+
+static inline tree
+fold_build_plus (enum tree_code code, tree a, tree b)
+{
+  robust_plus (&code, a, &b);
+  return fold_build2 (code, TREE_TYPE (a), a, b);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -2255,18 +2293,7 @@
   if ((HAVE_PRE_INCREMENT && GET_MODE_SIZE (mem_mode) == cstepi)
       || (HAVE_PRE_DECREMENT && GET_MODE_SIZE (mem_mode) == -cstepi))
     {
-      enum tree_code code = MINUS_EXPR;
-      tree new_base;
-      tree new_step = step;
-
-      if (POINTER_TYPE_P (TREE_TYPE (base)))
-	{
-	  new_step = fold_build1 (NEGATE_EXPR, TREE_TYPE (step), step);
-	  code = POINTER_PLUS_EXPR;
-	}
-      else
-	new_step = fold_convert (TREE_TYPE (base), new_step);
-      new_base = fold_build2 (code, TREE_TYPE (base), base, new_step);
+      tree new_base = fold_build_plus (MINUS_EXPR, base, step);
       add_candidate_1 (data, new_base, step, important, IP_BEFORE_USE, use,
 		       use->stmt);
     }

[-- Attachment #3: iterator.6.4-ml.patch --]
[-- Type: text/x-patch, Size: 1475 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -292,6 +292,9 @@
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether the loop body can only be exited via single exit.  */
+  bool loop_single_exit_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4403,7 +4406,7 @@
       if (double_int_ucmp (max_niter, period_value) > 0)
         {
           /* See if we can take advantage of infered loop bound information.  */
-          if (loop_only_exit_p (loop, exit))
+          if (data->loop_single_exit_p)
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
@@ -6397,6 +6400,8 @@
   renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes);
   free (body);
 
+  data->loop_single_exit_p = loop_only_exit_p (loop, single_exit (loop));
+
   /* For each ssa name determines whether it behaves as an induction variable
      in some loop.  */
   if (!find_induction_variables (data))
only in patch2:
unchanged:
--- gcc/tree-ssa-loop-niter.c	(revision 170268)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -1771,7 +1771,7 @@ loop_only_exit_p (const struct loop *loo
   unsigned i;
   gimple call;
 
-  if (exit != single_exit (loop))
+  if (exit == NULL || exit != single_exit (loop))
     return false;
 
   body = get_loop_body (loop);

[-- Attachment #4: iterator.6.5-ml.patch --]
[-- Type: text/x-patch, Size: 772 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -1194,6 +1300,7 @@ record_use (struct ivopts_data *data, tr
 	    gimple stmt, enum use_type use_type)
 {
   struct iv_use *use = XCNEW (struct iv_use);
+  tree tmp;
 
   use->id = n_iv_uses (data);
   use->type = use_type;
@@ -1204,11 +1311,14 @@ record_use (struct ivopts_data *data, tr
 
   /* To avoid showing ssa name in the dumps, if it was not reset by the
      caller.  */
+  tmp = iv->ssa_name;
   iv->ssa_name = NULL_TREE;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     dump_use (dump_file, use);
 
+  iv->ssa_name = tmp;
+
   VEC_safe_push (iv_use_p, heap, data->iv_uses, use);
 
   return use;

[-- Attachment #5: iterator.6.6-ml.patch --]
[-- Type: text/x-patch, Size: 9340 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -419,6 +419,67 @@
   return fold_build2 (code, TREE_TYPE (a), a, b);
 }
 
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Returns the folded expression if folding is successful.
+   Otherwise, return NULL_TREE.  Handles the case that A is a pointer
+   robustly.  */
+
+static inline tree
+fold_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res;
+
+  STRIP_NOPS (a);
+  robust_plus (&code, a, &b);
+
+  res = fold_binary (code, TREE_TYPE (a), a, b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR, possibly using the defining stmt of A.  Returns the folded
+   expression if folding is successful.  Otherwise, return NULL_TREE.  */
+
+static inline tree
+fold_walk_def_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res, a0, a1;
+  gimple def_stmt;
+
+  res = fold_plus (code, a, b);
+  if (res != NULL_TREE)
+    return res;
+
+  STRIP_NOPS (a);
+
+  if (TREE_CODE (a) != SSA_NAME)
+    return NULL_TREE;
+
+  def_stmt = SSA_NAME_DEF_STMT (a);
+  if (!is_gimple_assign (def_stmt)
+      || (gimple_assign_rhs_code (def_stmt) != PLUS_EXPR
+	  && gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR))
+    return NULL_TREE;
+  a0 = gimple_assign_rhs1 (def_stmt);
+  a1 = gimple_assign_rhs2 (def_stmt);
+
+  def_stmt = SSA_NAME_DEF_STMT (a1);
+  if (is_gimple_assign (def_stmt)
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+    a1 = fold_convert (TREE_TYPE (a1), gimple_assign_rhs1 (def_stmt));
+
+  res = fold_plus (code, fold_build_plus (PLUS_EXPR, a0, a1), b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -825,17 +886,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4357,6 +4426,153 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Get the loop bound and comparison operator of USE->iv, and store them in
+   BOUND_P and COMP_P.  Returns false if unsuccessful.  */
+
+static bool
+get_use_lt_bound (struct iv_use *use, tree *bound_p, enum tree_code *comp_p)
+{
+  gimple stmt = use->stmt;
+
+  if (gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_lhs (stmt) != use->iv->ssa_name)
+    return false;
+
+  *comp_p = gimple_cond_code (stmt);
+  *bound_p = gimple_cond_rhs (stmt);
+
+  return true;
+}
+
+/* Tries to replace loop exit test USE, by one formulated in terms of a LT_EXPR
+   comparison with CAND.  Stores the resulting comparison in COMP_P and bound in
+   BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+                           struct iv_cand *cand, tree *bound_p,
+			   enum tree_code *comp_p)
+{
+  enum tree_code use_comp, canon_comp;
+  tree base_ptr, use_iv_real_base, use_lt_bound, bound;
+  bool use_uses_inced_iv, use_after_cand_inc;
+  tree use_type = TREE_TYPE (use->iv->ssa_name);
+  tree cand_type, cand_iv_base = cand->iv->base;
+  STRIP_NOPS (cand_iv_base);
+  cand_type = TREE_TYPE (cand_iv_base);
+
+  /* We're trying to replace 'i < n' with 'p < base + n' in
+
+     void
+     f1 (char *base, unsigned long int s, unsigned long int n)
+     {
+       unsigned long int i = s;
+       char *p = base + s;
+       do
+         {
+	   *p = '\0';
+	   p++;
+	   i++;
+	 }
+       while (i < n);
+     }
+
+     Overflow of base + n can't happen because either:
+     - s < n, and i will step to n, and p will step to base + n, or
+     - s >= n, so base + n < base + s, and assuming pointer arithmetic
+       doesn't overflow, base + s doesn't overflow, so base + n won't.
+
+     This transformation is not valid if i and n are signed, because
+     base + n might underflow.
+  */
+
+  /* Use should be an unsigned integral.  */
+  if (!INTEGRAL_TYPE_P (use_type) || !TYPE_UNSIGNED (use_type))
+    return false;
+
+  /* Cand should be a pointer, and pointer overflow should be undefined.  */
+  if (!POINTER_TYPE_P (cand_type) || !POINTER_TYPE_OVERFLOW_UNDEFINED)
+    return false;
+
+  /* Make sure that the loop iterates till the loop bound is hit.  */
+  if (!data->loop_single_exit_p)
+    return false;
+
+  /* We only handle this case for the moment.  */
+  if (!tree_int_cst_equal (use->iv->step, cand->iv->step))
+    return false;
+
+  /* For now, we only handle the case that cand is a source level cand.  It
+     is possible to also allow other cands, provided we can prove there is
+     pointer arithmetic in the loop body reaching base + n.  */
+  if (cand->pos != IP_ORIGINAL)
+    return false;
+
+  /* Determine if the exit test is formulated in terms of the phi or the
+     increment of the use iv.  */
+  use_uses_inced_iv
+    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
+
+  /* Determine if the exit test is before or after the increment of the
+     cand.  */
+  use_after_cand_inc
+    = stmt_after_increment (data->current_loop, cand, use->stmt);
+
+  /* For now, we only handle these cases.  */
+  if (use_after_cand_inc != use_uses_inced_iv)
+    return false;
+
+  /* Get the base of the non-incremented loop var.  */
+  if (use_uses_inced_iv)
+    {
+      use_iv_real_base = fold_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+      if (use_iv_real_base == NULL_TREE)
+	return false;
+    }
+  else
+    use_iv_real_base = use->iv->base;
+
+  /* Detect p = base + s.  */
+  base_ptr = fold_walk_def_plus (MINUS_EXPR, cand->iv->base,
+				 fold_convert (sizetype, use_iv_real_base));
+  if (base_ptr == NULL_TREE)
+    return false;
+  STRIP_NOPS (base_ptr);
+  if (TREE_CODE (base_ptr) != SSA_NAME)
+    return false;
+
+  /* Get the bound of the iv of the use.  */
+  if (!get_use_lt_bound (use, &use_lt_bound, &use_comp))
+    return false;
+
+  /* Determine canon_comp.  */
+  if (*comp_p == NE_EXPR)
+    canon_comp = use_comp;
+  else if (*comp_p == EQ_EXPR)
+    canon_comp = invert_tree_comparison (use_comp, false);
+  else
+    gcc_unreachable ();
+
+  /* Allow positive and negative step, and inclusive and exclusive bound.
+     To trigger inclusive bound, we need -funsafe-loop-optimizations.  */
+  if (canon_comp != LT_EXPR && canon_comp != GT_EXPR
+      && canon_comp != LE_EXPR && canon_comp != GE_EXPR)
+    return false;
+
+  /* Calculate bound.  */
+  bound = fold_build_plus (PLUS_EXPR, base_ptr,
+			   fold_convert (sizetype, use_lt_bound));
+  if (bound == NULL_TREE)
+    return false;
+
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = use_comp;
+  *bound_p = bound;
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4435,6 +4651,21 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (iv_elimination_compare_lt (data, use, cand, bound, comp))
+        return true;
+
+      /* We could try to see if the non-lt bound is not too expensive, but the
+         cost infrastructure needs tuning for that first.  Even though
+         expression_expensive_p always returns true for COND_EXPRs, it happens
+         that the bound is folded into a MAX_EXPR, which is approved by
+         expression_expensive_p, but attributed a too low cost by force_var_cost
+         in case the MAX_EXPR would expand into control flow.  */
+      return false;
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

[-- Attachment #6: iterator.6.6-ml.test.patch --]
[-- Type: text/x-patch, Size: 934 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts -fno-tree-vectorize -fno-tree-loop-ivcanon" } */
+
+void
+f1 (char *p, unsigned long int i, unsigned long int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+void
+f2 (char *p, unsigned int i, unsigned int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 2 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 2 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-13 17:35                   ` Tom de Vries
@ 2011-03-14  9:58                     ` Zdenek Dvorak
  2011-03-14 12:04                       ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-14  9:58 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
> --- gcc/tree-ssa-loop-ivopts.c	(working copy)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -1194,6 +1300,7 @@ record_use (struct ivopts_data *data, tr
>  	    gimple stmt, enum use_type use_type)
>  {
>    struct iv_use *use = XCNEW (struct iv_use);
> +  tree tmp;
>  
>    use->id = n_iv_uses (data);
>    use->type = use_type;
> @@ -1204,11 +1311,14 @@ record_use (struct ivopts_data *data, tr
>  
>    /* To avoid showing ssa name in the dumps, if it was not reset by the
>       caller.  */
> +  tmp = iv->ssa_name;
>    iv->ssa_name = NULL_TREE;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      dump_use (dump_file, use);
>  
> +  iv->ssa_name = tmp;
> +
>    VEC_safe_push (iv_use_p, heap, data->iv_uses, use);
>  
>    return use;

this is not ok.  You can get the ssa name from extract_cond_operands.

> +  /* Determine if the exit test is formulated in terms of the phi or the
> +     increment of the use iv.  */
> +  use_uses_inced_iv
> +    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
> +
> +  /* Determine if the exit test is before or after the increment of the
> +     cand.  */
> +  use_after_cand_inc
> +    = stmt_after_increment (data->current_loop, cand, use->stmt);
> +
> +  /* For now, we only handle these cases.  */
> +  if (use_after_cand_inc != use_uses_inced_iv)
> +    return false;

what is this trying to achieve?  USE_USES_INCED_IV is pretty much meaningless --
the value of USE does not depend in any way on whether it comes directly from
a PHI node or from some other calculation.

> +  /* Calculate bound.  */
> +  bound = fold_build_plus (PLUS_EXPR, base_ptr,
> +			   fold_convert (sizetype, use_lt_bound));
> +  if (bound == NULL_TREE)
> +    return false;

How could the result be NULL_TREE?

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14  9:58                     ` Zdenek Dvorak
@ 2011-03-14 12:04                       ` Tom de Vries
  2011-03-14 12:14                         ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-14 12:04 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 2774 bytes --]

On 03/14/2011 10:58 AM, Zdenek Dvorak wrote:
> Hi,
> 
>> diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
>> --- gcc/tree-ssa-loop-ivopts.c	(working copy)
>> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
>> @@ -1194,6 +1300,7 @@ record_use (struct ivopts_data *data, tr
>>  	    gimple stmt, enum use_type use_type)
>>  {
>>    struct iv_use *use = XCNEW (struct iv_use);
>> +  tree tmp;
>>  
>>    use->id = n_iv_uses (data);
>>    use->type = use_type;
>> @@ -1204,11 +1311,14 @@ record_use (struct ivopts_data *data, tr
>>  
>>    /* To avoid showing ssa name in the dumps, if it was not reset by the
>>       caller.  */
>> +  tmp = iv->ssa_name;
>>    iv->ssa_name = NULL_TREE;
>>  
>>    if (dump_file && (dump_flags & TDF_DETAILS))
>>      dump_use (dump_file, use);
>>  
>> +  iv->ssa_name = tmp;
>> +
>>    VEC_safe_push (iv_use_p, heap, data->iv_uses, use);
>>  
>>    return use;
> 
> this is not ok.  You can get the ssa name from extract_cond_operands.
> 

Ok, I did that now.

>> +  /* Determine if the exit test is formulated in terms of the phi or the
>> +     increment of the use iv.  */
>> +  use_uses_inced_iv
>> +    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
>> +
>> +  /* Determine if the exit test is before or after the increment of the
>> +     cand.  */
>> +  use_after_cand_inc
>> +    = stmt_after_increment (data->current_loop, cand, use->stmt);
>> +
>> +  /* For now, we only handle these cases.  */
>> +  if (use_after_cand_inc != use_uses_inced_iv)
>> +    return false;
> 
> what is this trying to achieve?  USE_USES_INCED_IV is pretty much meaningless --
> the value of USE does not depend in any way on whether it comes
> directly from
> a PHI node or from some other calculation.

it is trying to allow for

do
  {
    *p = '\0';
    i++; /* use_uses_inced_iv == true */
    p++; /* use_after_cand_inc == true */
    if (!(i < n))
      break;
  }
while (1);

and for

do
  {
    *p = '\0';
    if (!(i < n))
      break;
    i++; /* use_uses_inced_iv == false */
    p++; /* use_after_cand_inc == false */
  }
while (1);

but not for

do
  {
    *p = '\0';
    i++; /* use_uses_inced_iv == true */
    if (!(i < n))
      break;
    p++; /* use_after_cand_inc == false */
  }
while (1);

and not for

do
  {
    *p = '\0';
    p++; /* use_after_cand_inc == true */
    if (!(i < n))
      break;
    i++; /* use_uses_inced_iv == false */
  }
while (1);

In the latter 2 cases, I cannot simply replace i < n with p < base + n.

>> +  /* Calculate bound.  */
>> +  bound = fold_build_plus (PLUS_EXPR, base_ptr,
>> +			   fold_convert (sizetype, use_lt_bound));
>> +  if (bound == NULL_TREE)
>> +    return false;
> 
> How could the result be NULL_TREE?

It can't, indeed. I removed that now.

- Tom

[-- Attachment #2: iterator.6.6-ml.patch --]
[-- Type: text/x-patch, Size: 9463 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -419,6 +419,67 @@
   return fold_build2 (code, TREE_TYPE (a), a, b);
 }
 
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Returns the folded expression if folding is successful.
+   Otherwise, return NULL_TREE.  Handles the case that A is a pointer
+   robustly.  */
+
+static inline tree
+fold_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res;
+
+  STRIP_NOPS (a);
+  robust_plus (&code, a, &b);
+
+  res = fold_binary (code, TREE_TYPE (a), a, b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR, possibly using the defining stmt of A.  Returns the folded
+   expression if folding is successful.  Otherwise, return NULL_TREE.  */
+
+static inline tree
+fold_walk_def_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res, a0, a1;
+  gimple def_stmt;
+
+  res = fold_plus (code, a, b);
+  if (res != NULL_TREE)
+    return res;
+
+  STRIP_NOPS (a);
+
+  if (TREE_CODE (a) != SSA_NAME)
+    return NULL_TREE;
+
+  def_stmt = SSA_NAME_DEF_STMT (a);
+  if (!is_gimple_assign (def_stmt)
+      || (gimple_assign_rhs_code (def_stmt) != PLUS_EXPR
+	  && gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR))
+    return NULL_TREE;
+  a0 = gimple_assign_rhs1 (def_stmt);
+  a1 = gimple_assign_rhs2 (def_stmt);
+
+  def_stmt = SSA_NAME_DEF_STMT (a1);
+  if (is_gimple_assign (def_stmt)
+      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def_stmt)))
+    a1 = fold_convert (TREE_TYPE (a1), gimple_assign_rhs1 (def_stmt));
+
+  res = fold_plus (code, fold_build_plus (PLUS_EXPR, a0, a1), b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -825,17 +886,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4353,6 +4422,155 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Get the loop bound and comparison operator of USE->iv, and store them in
+   BOUND_P and COMP_P.  Returns false if unsuccessful.  */
+
+static bool
+get_use_lt_bound (struct iv_use *use, tree use_iv, tree *bound_p,
+		  enum tree_code *comp_p)
+{
+  gimple stmt = use->stmt;
+
+  if (gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_lhs (stmt) != use_iv)
+    return false;
+
+  *comp_p = gimple_cond_code (stmt);
+  *bound_p = gimple_cond_rhs (stmt);
+
+  return true;
+}
+
+/* Tries to replace loop exit test USE, by one formulated in terms of a LT_EXPR
+   comparison with CAND.  Stores the resulting comparison in COMP_P and bound in
+   BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+                           struct iv_cand *cand, tree *bound_p,
+			   enum tree_code *comp_p)
+{
+  enum tree_code use_comp, canon_comp;
+  tree base_ptr, use_iv_real_base, use_lt_bound, bound, *use_iv;
+  bool ok, use_uses_inced_iv, use_after_cand_inc;
+  tree use_type, cand_type, cand_iv_base = cand->iv->base;
+
+  /* Initialize cand_type, use_iv and use_type.  */
+  STRIP_NOPS (cand_iv_base);
+  cand_type = TREE_TYPE (cand_iv_base);
+  ok = extract_cond_operands (data, use->stmt, &use_iv, NULL, NULL, NULL);
+  gcc_assert (ok);
+  use_type = TREE_TYPE (*use_iv);
+      
+  /* We're trying to replace 'i < n' with 'p < base + n' in
+
+     void
+     f1 (char *base, unsigned long int s, unsigned long int n)
+     {
+       unsigned long int i = s;
+       char *p = base + s;
+       do
+         {
+	   *p = '\0';
+	   p++;
+	   i++;
+	 }
+       while (i < n);
+     }
+
+     Overflow of base + n can't happen because either:
+     - s < n, and i will step to n, and p will step to base + n, or
+     - s >= n, so base + n < base + s, and assuming pointer arithmetic
+       doesn't overflow, base + s doesn't overflow, so base + n won't.
+
+     This transformation is not valid if i and n are signed, because
+     base + n might underflow.
+  */
+
+  /* Use should be an unsigned integral.  */
+  if (!INTEGRAL_TYPE_P (use_type) || !TYPE_UNSIGNED (use_type))
+    return false;
+
+  /* Cand should be a pointer, and pointer overflow should be undefined.  */
+  if (!POINTER_TYPE_P (cand_type) || !POINTER_TYPE_OVERFLOW_UNDEFINED)
+    return false;
+
+  /* Make sure that the loop iterates till the loop bound is hit.  */
+  if (!data->loop_single_exit_p)
+    return false;
+
+  /* We only handle this case for the moment.  */
+  if (!tree_int_cst_equal (use->iv->step, cand->iv->step))
+    return false;
+
+  /* For now, we only handle the case that cand is a source level cand.  It
+     is possible to also allow other cands, provided we can prove there is
+     pointer arithmetic in the loop body reaching base + n.  */
+  if (cand->pos != IP_ORIGINAL)
+    return false;
+
+  /* Determine if the exit test is formulated in terms of the phi or the
+     increment of the use iv.  */
+  use_uses_inced_iv
+    = gimple_code (SSA_NAME_DEF_STMT (*use_iv)) != GIMPLE_PHI;
+
+  /* Determine if the exit test is before or after the increment of the
+     cand.  */
+  use_after_cand_inc
+    = stmt_after_increment (data->current_loop, cand, use->stmt);
+
+  /* For now, we only handle these cases.  */
+  if (use_after_cand_inc != use_uses_inced_iv)
+    return false;
+
+  /* Get the base of the non-incremented loop var.  */
+  if (use_uses_inced_iv)
+    {
+      use_iv_real_base = fold_plus (MINUS_EXPR, use->iv->base, use->iv->step);
+      if (use_iv_real_base == NULL_TREE)
+	return false;
+    }
+  else
+    use_iv_real_base = use->iv->base;
+
+  /* Detect p = base + s.  */
+  base_ptr = fold_walk_def_plus (MINUS_EXPR, cand->iv->base,
+				 fold_convert (sizetype, use_iv_real_base));
+  if (base_ptr == NULL_TREE)
+    return false;
+  STRIP_NOPS (base_ptr);
+  if (TREE_CODE (base_ptr) != SSA_NAME)
+    return false;
+
+  /* Get the bound of the iv of the use.  */
+  if (!get_use_lt_bound (use, *use_iv, &use_lt_bound, &use_comp))
+    return false;
+
+  /* Determine canon_comp.  */
+  if (*comp_p == NE_EXPR)
+    canon_comp = use_comp;
+  else if (*comp_p == EQ_EXPR)
+    canon_comp = invert_tree_comparison (use_comp, false);
+  else
+    gcc_unreachable ();
+
+  /* Allow positive and negative step, and inclusive and exclusive bound.
+     To trigger inclusive bound, we need -funsafe-loop-optimizations.  */
+  if (canon_comp != LT_EXPR && canon_comp != GT_EXPR
+      && canon_comp != LE_EXPR && canon_comp != GE_EXPR)
+    return false;
+
+  /* Calculate bound.  */
+  bound = fold_build_plus (PLUS_EXPR, base_ptr,
+			   fold_convert (sizetype, use_lt_bound));
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = use_comp;
+  *bound_p = bound;
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4431,6 +4649,21 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (iv_elimination_compare_lt (data, use, cand, bound, comp))
+        return true;
+
+      /* We could try to see if the non-lt bound is not too expensive, but the
+         cost infrastructure needs tuning for that first.  Even though
+         expression_expensive_p always returns true for COND_EXPRs, it happens
+         that the bound is folded into a MAX_EXPR, which is approved by
+         expression_expensive_p, but attributed a too low cost by force_var_cost
+         in case the MAX_EXPR would expand into control flow.  */
+      return false;
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 12:04                       ` Tom de Vries
@ 2011-03-14 12:14                         ` Zdenek Dvorak
  2011-03-14 12:49                           ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-14 12:14 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> >> +  /* Determine if the exit test is formulated in terms of the phi or the
> >> +     increment of the use iv.  */
> >> +  use_uses_inced_iv
> >> +    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
> >> +
> >> +  /* Determine if the exit test is before or after the increment of the
> >> +     cand.  */
> >> +  use_after_cand_inc
> >> +    = stmt_after_increment (data->current_loop, cand, use->stmt);
> >> +
> >> +  /* For now, we only handle these cases.  */
> >> +  if (use_after_cand_inc != use_uses_inced_iv)
> >> +    return false;
> > 
> > what is this trying to achieve?  USE_USES_INCED_IV is pretty much meaningless --
> > the value of USE does not depend in any way on whether it comes
> > directly from
> > a PHI node or from some other calculation.
> 
> it is trying to allow for
> 
> do
>   {
>     *p = '\0';
>     i++; /* use_uses_inced_iv == true */
>     p++; /* use_after_cand_inc == true */
>     if (!(i < n))
>       break;
>   }
> while (1);
> 
> and for
> 
> do
>   {
>     *p = '\0';
>     if (!(i < n))
>       break;
>     i++; /* use_uses_inced_iv == false */
>     p++; /* use_after_cand_inc == false */
>   }
> while (1);
> 
> but not for
> 
> do
>   {
>     *p = '\0';
>     i++; /* use_uses_inced_iv == true */
>     if (!(i < n))
>       break;
>     p++; /* use_after_cand_inc == false */
>   }
> while (1);
> 
> and not for
> 
> do
>   {
>     *p = '\0';
>     p++; /* use_after_cand_inc == true */
>     if (!(i < n))
>       break;
>     i++; /* use_uses_inced_iv == false */
>   }
> while (1);
> 
> In the latter 2 cases, I cannot simply replace i < n with p < base + n.

Why not (other than that the value to compare with varies in these cases, but
it always is "the value of p in the last iteration of the loop")?

One more thing: what is fold_walk_def_plus for?

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 12:14                         ` Zdenek Dvorak
@ 2011-03-14 12:49                           ` Tom de Vries
  2011-03-14 12:55                             ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-14 12:49 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/14/2011 01:14 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>>> +  /* Determine if the exit test is formulated in terms of the phi or the
>>>> +     increment of the use iv.  */
>>>> +  use_uses_inced_iv
>>>> +    = gimple_code (SSA_NAME_DEF_STMT (use->iv->ssa_name)) != GIMPLE_PHI;
>>>> +
>>>> +  /* Determine if the exit test is before or after the increment of the
>>>> +     cand.  */
>>>> +  use_after_cand_inc
>>>> +    = stmt_after_increment (data->current_loop, cand, use->stmt);
>>>> +
>>>> +  /* For now, we only handle these cases.  */
>>>> +  if (use_after_cand_inc != use_uses_inced_iv)
>>>> +    return false;
>>>
>>> what is this trying to achieve?  USE_USES_INCED_IV is pretty much meaningless --
>>> the value of USE does not depend in any way on whether it comes
>>> directly from
>>> a PHI node or from some other calculation.
>>
>> it is trying to allow for
>>
>> do
>>   {
>>     *p = '\0';
>>     i++; /* use_uses_inced_iv == true */
>>     p++; /* use_after_cand_inc == true */
>>     if (!(i < n))
>>       break;
>>   }
>> while (1);
>>
>> and for
>>
>> do
>>   {
>>     *p = '\0';
>>     if (!(i < n))
>>       break;
>>     i++; /* use_uses_inced_iv == false */
>>     p++; /* use_after_cand_inc == false */
>>   }
>> while (1);
>>
>> but not for
>>
>> do
>>   {
>>     *p = '\0';
>>     i++; /* use_uses_inced_iv == true */
>>     if (!(i < n))
>>       break;
>>     p++; /* use_after_cand_inc == false */
>>   }
>> while (1);
>>
>> and not for
>>
>> do
>>   {
>>     *p = '\0';
>>     p++; /* use_after_cand_inc == true */
>>     if (!(i < n))
>>       break;
>>     i++; /* use_uses_inced_iv == false */
>>   }
>> while (1);
>>
>> In the latter 2 cases, I cannot simply replace i < n with p < base + n.
> 
> Why not (other than that the value to compare with varies in these cases, but
> it always is "the value of p in the last iteration of the loop")?

In the 2 latter cases, the value to compare with will be either base + n
+ 1 or base + n - 1. The code does not handle these cases yet.

> One more thing: what is fold_walk_def_plus for?

It tries to fold a plus, and if not successful, finds ssa vars in
operands of the plus, substitutes the defining statements of ssa vars
for those ssa vars and retries folding.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 12:49                           ` Tom de Vries
@ 2011-03-14 12:55                             ` Zdenek Dvorak
  2011-03-14 13:51                               ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-14 12:55 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> >> it is trying to allow for
> >>
> >> do
> >>   {
> >>     *p = '\0';
> >>     i++; /* use_uses_inced_iv == true */
> >>     p++; /* use_after_cand_inc == true */
> >>     if (!(i < n))
> >>       break;
> >>   }
> >> while (1);
> >>
> >> and for
> >>
> >> do
> >>   {
> >>     *p = '\0';
> >>     if (!(i < n))
> >>       break;
> >>     i++; /* use_uses_inced_iv == false */
> >>     p++; /* use_after_cand_inc == false */
> >>   }
> >> while (1);
> >>
> >> but not for
> >>
> >> do
> >>   {
> >>     *p = '\0';
> >>     i++; /* use_uses_inced_iv == true */
> >>     if (!(i < n))
> >>       break;
> >>     p++; /* use_after_cand_inc == false */
> >>   }
> >> while (1);
> >>
> >> and not for
> >>
> >> do
> >>   {
> >>     *p = '\0';
> >>     p++; /* use_after_cand_inc == true */
> >>     if (!(i < n))
> >>       break;
> >>     i++; /* use_uses_inced_iv == false */
> >>   }
> >> while (1);
> >>
> >> In the latter 2 cases, I cannot simply replace i < n with p < base + n.
> > 
> > Why not (other than that the value to compare with varies in these cases, but
> > it always is "the value of p in the last iteration of the loop")?
> 
> In the 2 latter cases, the value to compare with will be either base + n
> + 1 or base + n - 1. The code does not handle these cases yet.

well, if not, then it certainly handles one of the first two cases incorrectly
(since the use_uses_inced_iv test is meaningless).

> > One more thing: what is fold_walk_def_plus for?
> 
> It tries to fold a plus, and if not successful, finds ssa vars in
> operands of the plus, substitutes the defining statements of ssa vars
> for those ssa vars and retries folding.

Sorry for not being more clear; I meant, why is it used?

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 12:55                             ` Zdenek Dvorak
@ 2011-03-14 13:51                               ` Tom de Vries
  2011-03-14 14:03                                 ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-14 13:51 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/14/2011 01:55 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>>> it is trying to allow for
>>>>
>>>> do
>>>>   {
>>>>     *p = '\0';
>>>>     i++; /* use_uses_inced_iv == true */
>>>>     p++; /* use_after_cand_inc == true */
>>>>     if (!(i < n))
>>>>       break;
>>>>   }
>>>> while (1);
>>>>
>>>> and for
>>>>
>>>> do
>>>>   {
>>>>     *p = '\0';
>>>>     if (!(i < n))
>>>>       break;
>>>>     i++; /* use_uses_inced_iv == false */
>>>>     p++; /* use_after_cand_inc == false */
>>>>   }
>>>> while (1);
>>>>
>>>> but not for
>>>>
>>>> do
>>>>   {
>>>>     *p = '\0';
>>>>     i++; /* use_uses_inced_iv == true */
>>>>     if (!(i < n))
>>>>       break;
>>>>     p++; /* use_after_cand_inc == false */
>>>>   }
>>>> while (1);
>>>>
>>>> and not for
>>>>
>>>> do
>>>>   {
>>>>     *p = '\0';
>>>>     p++; /* use_after_cand_inc == true */
>>>>     if (!(i < n))
>>>>       break;
>>>>     i++; /* use_uses_inced_iv == false */
>>>>   }
>>>> while (1);
>>>>
>>>> In the latter 2 cases, I cannot simply replace i < n with p < base + n.
>>>
>>> Why not (other than that the value to compare with varies in these cases, but
>>> it always is "the value of p in the last iteration of the loop")?
>>
>> In the 2 latter cases, the value to compare with will be either base + n
>> + 1 or base + n - 1. The code does not handle these cases yet.
> 
> well, if not, then it certainly handles one of the first two cases incorrectly

The ivopts code for the first example is:

<bb 2>:
  p_5 = p_3(D) + i_4(D);
  D.2698_12 = p_3(D) + n_8(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_7(4)>
  *p_1 = 0;
  p_7 = p_1 + 1;
  if (p_7 >= D.2698_12)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;

and for the second example is:

<bb 2>:
  p_5 = p_3(D) + i_4(D);
  D.2704_13 = p_3(D) + n_6(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_8(4)>
  *p_1 = 0;
  if (p_1 >= D.2704_13)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  p_8 = p_1 + 1;
  goto <bb 3>;

<bb 5>:
  return;

both seem correct to me.

> (since the use_uses_inced_iv test is meaningless).

To me it seems use_uses_inced_iv has meaning:
- it models something: it states whether the comparison is using
  the iv increment result or the iv phi result.
- it is used to detect cases were we would generate incorrect code.

>>> One more thing: what is fold_walk_def_plus for?
>>
>> It tries to fold a plus, and if not successful, finds ssa vars in
>> operands of the plus, substitutes the defining statements of ssa vars
>> for those ssa vars and retries folding.
> 
> Sorry for not being more clear; I meant, why is it used?
> 

In the following code

<bb 2>:
  p_5 = p_3(D) + i_4(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_7(4)>
  # i_2 = PHI <i_4(D)(2), i_6(4)>
  *p_1 = 0;
  i_6 = i_2 + 1;
  p_7 = p_1 + 1;
  if (i_6 >= n_8(D))
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;


I'm trying to prove that p_5 == x + i_4 where x is an ssa_name. I can't
do that without going through the defining statement of p_5.

> Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 13:51                               ` Tom de Vries
@ 2011-03-14 14:03                                 ` Zdenek Dvorak
  2011-03-15 14:19                                   ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-14 14:03 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> > (since the use_uses_inced_iv test is meaningless).
> 
> To me it seems use_uses_inced_iv has meaning:
> - it models something: it states whether the comparison is using
>   the iv increment result or the iv phi result.

but that has nothing to do with the value of the iv.  For instance,
in the following:

a = phi (0, a')
b = phi (1, b')
c = phi (1, c')

a' = a + 1;
tmp = b;

compare (a'/tmp/c, something)

b' = tmp + 1;
c' = c + 1;

a', tmp and c are completely equivalent, yet your code for some reason claims
to handle c and the other two differently.

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-14 14:03                                 ` Zdenek Dvorak
@ 2011-03-15 14:19                                   ` Tom de Vries
  2011-03-15 15:10                                     ` Zdenek Dvorak
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-15 14:19 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

Hi Zdenek,

I rewrote the patch to remove the use of use_uses_inced_iv.

On 03/14/2011 03:03 PM, Zdenek Dvorak wrote:
> Hi,
> 
>>> (since the use_uses_inced_iv test is meaningless).
>>
>> To me it seems use_uses_inced_iv has meaning:
>> - it models something: it states whether the comparison is using
>>   the iv increment result or the iv phi result.
> 
> but that has nothing to do with the value of the iv.  For instance,
> in the following:
> 
> a = phi (0, a')
> b = phi (1, b')
> c = phi (1, c')
> 
> a' = a + 1;
> tmp = b;
> 
> compare (a'/tmp/c, something)
> 
> b' = tmp + 1;
> c' = c + 1;
> 
> a', tmp and c are completely equivalent, yet your code for some reason claims
> to handle c and the other two differently.

That a' and c have the same value and the code handles them differently
does not necessarily mean that the code is incorrect.  The way I would
formulate it is that the code has implementation limitations, which are
expressed in an awkward way.

> tmp = b;

You're right, the calculation of use_uses_inced_iv assumed that it looks
at either the iv phi or the iv increment, so that was not safe.


I'll try to explain what my intention with the code is, by looking at a
pre-ivopt level example.

<bb 2>:
  p_5 = p_3(D) + i_4(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_6(4)>
  # i_2 = PHI <i_4(D)(2), i_7(4)>
  *p_1 = 0;
  p_6 = p_1 + 1;
  i_7 = i_2 + 1;
  if (i_7 < n_8(D))
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;

In this example, I try to analyze whether it is safe to replace
  i_7 < n_8
by
  p_6 < p_3 + n_8.

What I tried to do previously was to first prove a relation p_1 == i_2 +
ssa_name between i_2 and p_1, and then figure out (in the awkward code)
if I was allowed to assume p_6 == i_7 + ssa_name.

Now, I try to prove that relation between i_7 and p_6 instead, which
makes the code in iv_elimination_compare_lt simpler.

This has as casualty though that the 'int' case (the case that use->iv
is not compatible with sizetype) does not work anymore. That would need
more work.

The code still has the same limitation, meaning it will transform the
first 2 examples, but not the last 2 examples of the series of 4
mentioned earlier.

Changes compared to previous submission:
- changed name of fold_walk_def_plus into fold_diff_to_ssa_name
- added extra intelligence in fold_diff_to_ssa_name to deal with
  (a + x) - (b - x).
- don't calculate real_iv_use_base, calculate instead base_cand_at_use
  to simplify code

Thanks,
-Tom

[-- Attachment #2: iterator.6.6-ml.patch --]
[-- Type: text/x-patch, Size: 9415 bytes --]

diff -u gcc/tree-ssa-loop-ivopts.c gcc/tree-ssa-loop-ivopts.c
--- gcc/tree-ssa-loop-ivopts.c	(working copy)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -419,6 +419,89 @@
   return fold_build2 (code, TREE_TYPE (a), a, b);
 }
 
+/* Folds (TREE_TYPE (A))(A CODE B), where CODE is either PLUS_EXPR or
+   MINUS_EXPR.  Returns the folded expression if folding is successful.
+   Otherwise, return NULL_TREE.  Handles the case that A is a pointer
+   robustly.  */
+
+static inline tree
+fold_plus (enum tree_code code, tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res;
+
+  STRIP_NOPS (a);
+  robust_plus (&code, a, &b);
+
+  res = fold_binary (code, TREE_TYPE (a), a, b);
+  if (res == NULL_TREE)
+    return NULL_TREE;
+
+  return fold_convert (a_type, res);
+}
+
+/* Folds (TREE_TYPE (A))(A - B), possibly using the defining stmt of A.  Returns
+   the folded expression if folding is successful and resulted in an ssa_name.
+   Otherwise, return NULL_TREE.  */
+
+static inline tree
+fold_diff_to_ssa_name (tree a, tree b)
+{
+  tree a_type = TREE_TYPE (a);
+  tree res, a0, a1;
+  gimple def_stmt;
+
+  res = fold_plus (MINUS_EXPR, a, b);
+  if (res != NULL_TREE)
+    {
+      STRIP_NOPS (res);
+      if (TREE_CODE (res) == SSA_NAME)
+	return fold_convert (a_type, res);
+    }
+
+  STRIP_NOPS (a);
+  STRIP_NOPS (b);
+
+  if (TREE_CODE (a) == PLUS_EXPR && TREE_CODE (b) == PLUS_EXPR
+      && TREE_OPERAND (a, 1) == TREE_OPERAND (b, 1))
+    {
+      a = TREE_OPERAND (a, 0);
+      b = TREE_OPERAND (b, 0);
+
+      STRIP_NOPS (a);
+      STRIP_NOPS (b);
+
+      res = fold_plus (MINUS_EXPR, a, b);
+      if (res != NULL_TREE)
+	{
+	  STRIP_NOPS (res);
+	  if (TREE_CODE (res) == SSA_NAME)
+	    return fold_convert (a_type, res);
+	}
+    }
+
+  if (TREE_CODE (a) != SSA_NAME)
+    return NULL_TREE;
+
+  def_stmt = SSA_NAME_DEF_STMT (a);
+  if (!is_gimple_assign (def_stmt)
+      || (gimple_assign_rhs_code (def_stmt) != PLUS_EXPR
+	  && gimple_assign_rhs_code (def_stmt) != POINTER_PLUS_EXPR))
+    return NULL_TREE;
+  a0 = gimple_assign_rhs1 (def_stmt);
+  a1 = gimple_assign_rhs2 (def_stmt);
+
+  res = fold_plus (MINUS_EXPR, fold_build_plus (PLUS_EXPR, a0, a1), b);
+  if (res != NULL_TREE)
+    {
+      STRIP_NOPS (res);
+      if (TREE_CODE (res) == SSA_NAME)
+	return fold_convert (a_type, res);
+    }
+
+  return NULL_TREE;
+}
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -825,17 +908,25 @@
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
 				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
      	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
+	{
+	  if (!integer_zerop (desc->may_be_zero))
+            /* Construct COND_EXPR that describes the number of iterations.
+               Either the COND_EXPR is not too expensive, and we can use it as
+               loop bound, or we can deduce a LT_EXPR bound from it.  */
+	    niter
+	      = build3 (COND_EXPR, TREE_TYPE (desc->niter), desc->may_be_zero,
+			build_int_cst_type (TREE_TYPE (desc->niter), 0),
+			desc->niter);
+	  else
+	    niter = desc->niter;
+	}
       else
 	niter = NULL_TREE;
 
@@ -4353,6 +4444,145 @@
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Get the loop bound and comparison operator of USE->iv, and store them in
+   BOUND_P and COMP_P.  Returns false if unsuccessful.  */
+
+static bool
+get_use_lt_bound (struct iv_use *use, tree use_iv, tree *bound_p,
+		  enum tree_code *comp_p)
+{
+  gimple stmt = use->stmt;
+
+  if (gimple_code (stmt) != GIMPLE_COND
+      || gimple_cond_lhs (stmt) != use_iv)
+    return false;
+
+  *comp_p = gimple_cond_code (stmt);
+  *bound_p = gimple_cond_rhs (stmt);
+
+  return true;
+}
+
+/* Tries to replace loop exit test USE, by one formulated in terms of a LT_EXPR
+   comparison with CAND.  Stores the resulting comparison in COMP_P and bound in
+   BOUND_P.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data, struct iv_use *use,
+                           struct iv_cand *cand, tree *bound_p,
+			   enum tree_code *comp_p)
+{
+  enum tree_code use_comp, canon_comp;
+  tree base_ptr, use_lt_bound, bound, *use_iv, base_cand_at_use;
+  bool ok;
+  tree use_type, cand_type, cand_iv_base = cand->iv->base;
+
+  /* Initialize cand_type, use_iv and use_type.  */
+  STRIP_NOPS (cand_iv_base);
+  cand_type = TREE_TYPE (cand_iv_base);
+  ok = extract_cond_operands (data, use->stmt, &use_iv, NULL, NULL, NULL);
+  gcc_assert (ok);
+  use_type = TREE_TYPE (*use_iv);
+
+  /* We're trying to replace 'i < n' with 'p < base + n' in
+
+     void
+     f1 (char *base, unsigned long int s, unsigned long int n)
+     {
+       unsigned long int i = s;
+       char *p = base + s;
+       do
+         {
+	   *p = '\0';
+	   p++;
+	   i++;
+	 }
+       while (i < n);
+     }
+
+     Overflow of base + n can't happen because either:
+     - s < n, and i will step to n, and p will step to base + n, or
+     - s >= n, so base + n < base + s, and assuming pointer arithmetic
+       doesn't overflow, base + s doesn't overflow, so base + n won't.
+
+     This transformation is not valid if i and n are signed, because
+     base + n might underflow.
+  */
+
+  /* Use should be an unsigned integral.  */
+  if (!INTEGRAL_TYPE_P (use_type) || !TYPE_UNSIGNED (use_type))
+    return false;
+
+  /* Cand should be a pointer, and pointer overflow should be undefined.  */
+  if (!POINTER_TYPE_P (cand_type) || !POINTER_TYPE_OVERFLOW_UNDEFINED)
+    return false;
+
+  /* Make sure that the loop iterates till the loop bound is hit.  */
+  if (!data->loop_single_exit_p)
+    return false;
+
+  /* We only handle this case for the moment.  */
+  if (!tree_int_cst_equal (use->iv->step, cand->iv->step))
+    return false;
+
+  if (cand->pos == IP_ORIGINAL)
+    {
+      /* If cand is a src level ptr iv, we know that
+	 cand->iv->base + nit * cand->iv->step won't overflow.  */
+
+      /* Now get the base of var_at_stmt (cand, use->stmt).  */
+      if (stmt_after_increment (data->current_loop, cand, use->stmt))
+	/* Get the base of var_after.  */
+	base_cand_at_use = fold_build_plus (PLUS_EXPR, cand->iv->base,
+					     cand->iv->step);
+      else
+	/* Get the base of var_before.  */
+	base_cand_at_use = cand->iv->base;
+    }
+  else
+    {
+      /* It is possible to also allow other pointer cands, provided we can prove
+	 cand->iv->base + nit * cand->iv->step doesn't overflow.  Return false
+	 for now.  */
+      return false;
+    }
+
+  /* Detect p = base + s.  */
+  base_ptr = fold_diff_to_ssa_name (base_cand_at_use,
+				    fold_convert (sizetype, use->iv->base));
+  if (base_ptr == NULL_TREE)
+    return false;
+  STRIP_NOPS (base_ptr);
+
+  /* Get the bound of the iv of the use.  */
+  if (!get_use_lt_bound (use, *use_iv, &use_lt_bound, &use_comp))
+    return false;
+
+  /* Determine canon_comp.  */
+  if (*comp_p == NE_EXPR)
+    canon_comp = use_comp;
+  else if (*comp_p == EQ_EXPR)
+    canon_comp = invert_tree_comparison (use_comp, false);
+  else
+    gcc_unreachable ();
+
+  /* Allow positive and negative step, and inclusive and exclusive bound.
+     To trigger inclusive bound, we need -funsafe-loop-optimizations.  */
+  if (canon_comp != LT_EXPR && canon_comp != GT_EXPR
+      && canon_comp != LE_EXPR && canon_comp != GE_EXPR)
+    return false;
+
+  /* Calculate bound.  */
+  bound = fold_build_plus (PLUS_EXPR, base_ptr,
+			   fold_convert (sizetype, use_lt_bound));
+  if (expression_expensive_p (bound))
+    return false;
+
+  *comp_p = use_comp;
+  *bound_p = bound;
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
    of candidate CAND.  If so, store the value compared with to BOUND, and the
    comparison operator to COMP.  */
@@ -4431,6 +4661,21 @@
   *bound = aff_combination_to_tree (&bnd);
   *comp = iv_elimination_compare (data, use);
 
+  /* Try to implement nit using a '<' instead.  */
+  if (TREE_CODE (nit) == COND_EXPR)
+    {
+      if (iv_elimination_compare_lt (data, use, cand, bound, comp))
+        return true;
+
+      /* We could try to see if the non-lt bound is not too expensive, but the
+         cost infrastructure needs tuning for that first.  Even though
+         expression_expensive_p always returns true for COND_EXPRs, it happens
+         that the bound is folded into a MAX_EXPR, which is approved by
+         expression_expensive_p, but attributed a too low cost by force_var_cost
+         in case the MAX_EXPR would expand into control flow.  */
+      return false;
+    }
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-15 14:19                                   ` Tom de Vries
@ 2011-03-15 15:10                                     ` Zdenek Dvorak
  2011-03-15 15:48                                       ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Zdenek Dvorak @ 2011-03-15 15:10 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> I'll try to explain what my intention with the code is, by looking at a
> pre-ivopt level example.
> 
> <bb 2>:
>   p_5 = p_3(D) + i_4(D);
> 
> <bb 3>:
>   # p_1 = PHI <p_5(2), p_6(4)>
>   # i_2 = PHI <i_4(D)(2), i_7(4)>
>   *p_1 = 0;
>   p_6 = p_1 + 1;
>   i_7 = i_2 + 1;
>   if (i_7 < n_8(D))
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
> <bb 4>:
>   goto <bb 3>;
> 
> <bb 5>:
>   return;
> 
> In this example, I try to analyze whether it is safe to replace
>   i_7 < n_8
> by
>   p_6 < p_3 + n_8.

hmmm... is it actually safe?  What if n_8 is negative, and p_3 + n_8
overflows?

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-15 15:10                                     ` Zdenek Dvorak
@ 2011-03-15 15:48                                       ` Tom de Vries
  2011-08-25 12:20                                         ` Tom de Vries
  0 siblings, 1 reply; 31+ messages in thread
From: Tom de Vries @ 2011-03-15 15:48 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

On 03/15/2011 04:10 PM, Zdenek Dvorak wrote:
> Hi,
> 
>> I'll try to explain what my intention with the code is, by looking at a
>> pre-ivopt level example.
>>
>> <bb 2>:
>>   p_5 = p_3(D) + i_4(D);
>>
>> <bb 3>:
>>   # p_1 = PHI <p_5(2), p_6(4)>
>>   # i_2 = PHI <i_4(D)(2), i_7(4)>
>>   *p_1 = 0;
>>   p_6 = p_1 + 1;
>>   i_7 = i_2 + 1;
>>   if (i_7 < n_8(D))
>>     goto <bb 4>;
>>   else
>>     goto <bb 5>;
>>
>> <bb 4>:
>>   goto <bb 3>;
>>
>> <bb 5>:
>>   return;
>>
>> In this example, I try to analyze whether it is safe to replace
>>   i_7 < n_8
>> by
>>   p_6 < p_3 + n_8.
> 
> hmmm... is it actually safe?  What if n_8 is negative, and p_3 + n_8
> overflows?

I note that situation here in the comments in iv_elimination_compare_lt:
...
    This transformation is not valid if i and n are signed, because
     base + n might underflow.
...

and test for i == unsigned here in iv_elimination_compare_lt:
...
  /* Use should be an unsigned integral.  */
  if (!INTEGRAL_TYPE_P (use_type) || !TYPE_UNSIGNED (use_type))
    return false;
...

If i is unsigned but n is signed the code looks like this and the
transformation is done using the unsigned pretmp.3_13 rather than signed
n_8:
...
<bb 2>:
  p_5 = p_3(D) + i_4(D);
  pretmp.3_13 = (long unsigned int) n_8(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_7(4)>
  # i_2 = PHI <i_4(D)(2), i_6(4)>
  *p_1 = 0;
  i_6 = i_2 + 1;
  p_7 = p_1 + 1;
  if (i_6 < pretmp.3_13)
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;
...


If i is unsigned and n is signed, and we compare (long signed int)i < n
then the use->iv is of type long int, and the TYPE_UNSIGNED (use_type)
test prevents the transformation.
...
  long int i.0;

<bb 2>:
  p_5 = p_3(D) + i_4(D);

<bb 3>:
  # p_1 = PHI <p_5(2), p_7(4)>
  # i_2 = PHI <i_4(D)(2), i_6(4)>
  *p_1 = 0;
  i_6 = i_2 + 1;
  p_7 = p_1 + 1;
  i.0_8 = (long int) i_6;
  if (i.0_8 < n_9(D))
    goto <bb 4>;
  else
    goto <bb 5>;

<bb 4>:
  goto <bb 3>;

<bb 5>:
  return;
...

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-03-15 15:48                                       ` Tom de Vries
@ 2011-08-25 12:20                                         ` Tom de Vries
  2011-08-25 12:38                                           ` Tom de Vries
  2011-08-25 20:31                                           ` Zdenek Dvorak
  0 siblings, 2 replies; 31+ messages in thread
From: Tom de Vries @ 2011-08-25 12:20 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1698 bytes --]

Hi Zdenek,

here's the updated version of the patch.

The goal is to remove the 'i' iterator from the following example, by
replacing 'i < n' with 'p < base + n'.

void
f (char *base, unsigned long int i, unsigned long int n)
{
  char *p = base + i;
  do
    {
      *p = '\0';
      p++;
      i++;
   }
  while (i < n);
}

bootstrapped and reg-tested on x864_64, and build and reg-tested on MIPS.

I will sent a test-case in a separate email.

OK for trunk?

Thanks,
- Tom

2011-08-25  Zdenek Dvorak  <ook@ucw.cz>
	    Tom de Vries  <tom@codesourcery.com>

	* tree-ssa-loop-ivopts.c (struct cost_pair): Add comp field.
	(struct ivopts_data): Add loop_single_exit_p field.
	(niter_for_exit): Change parameter desc_p into return value.  Return
	desc if	desc->may_be_zero.  Free desc if unused.
	(niter_for_single_dom_exit): Change return type.
	(find_induction_variables): Handle changed return type of
	niter_for_single_dom_exit.  Dump may_be_zero.
	(add_candidate_1): Keep original base and step type for IP_ORIGINAL.
	(set_use_iv_cost): Add and handle comp parameter.
	(determine_use_iv_cost_generic, determine_use_iv_cost_address): Add
	comp argument to set_use_iv_cost.
	(strip_wrap_conserving_type_conversions, expr_equal_p)
	(difference_cannot_overflow_p, iv_elimination_compare_lt): New function.
	(may_eliminate_iv): Add comp parameter.  Handle new return type of
	niter_for_exit.  Use loop_single_exit_p.  Use iv_elimination_compare_lt.
	(determine_use_iv_cost_condition): Add comp argument to set_use_iv_cost
	and may_eliminate_iv.
	(rewrite_use_compare): Move call to iv_elimination_compare to ...
	(may_eliminate_iv): Here.
	(tree_ssa_iv_optimize_loop): Initialize loop_single_exit_p.

[-- Attachment #2: ivopts.3.patch --]
[-- Type: text/x-patch, Size: 19847 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 176554)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -176,6 +176,7 @@ struct cost_pair
   tree value;		/* For final value elimination, the expression for
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
+  enum tree_code comp;	/* For iv elimination, the comparison.  */
   int inv_expr_id;      /* Loop invariant expression id.  */
 };
 
@@ -297,6 +298,9 @@ struct ivopts_data
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether the loop body can only be exited via single exit.  */
+  bool loop_single_exit_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -770,15 +774,13 @@ contains_abnormal_ssa_name_p (tree expr)
   return false;
 }
 
-/*  Returns tree describing number of iterations determined from
+/*  Returns the structure describing number of iterations determined from
     EXIT of DATA->current_loop, or NULL if something goes wrong.  */
 
-static tree
-niter_for_exit (struct ivopts_data *data, edge exit,
-                struct tree_niter_desc **desc_p)
+static struct tree_niter_desc *
+niter_for_exit (struct ivopts_data *data, edge exit)
 {
-  struct tree_niter_desc* desc = NULL;
-  tree niter;
+  struct tree_niter_desc *desc;
   void **slot;
 
   if (!data->niters)
@@ -791,37 +793,31 @@ niter_for_exit (struct ivopts_data *data
 
   if (!slot)
     {
-      /* Try to determine number of iterations.  We must know it
-	 unconditionally (i.e., without possibility of # of iterations
-	 being zero).  Also, we cannot safely work with ssa names that
-	 appear in phi nodes on abnormal edges, so that we do not create
-	 overlapping life ranges for them (PR 27283).  */
+      /* Try to determine number of iterations.  We cannot safely work with ssa
+         names that appear in phi nodes on abnormal edges, so that we do not
+         create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
-      if (number_of_iterations_exit (data->current_loop,
-				     exit, desc, true)
-	  && integer_zerop (desc->may_be_zero)
-     	  && !contains_abnormal_ssa_name_p (desc->niter))
-	niter = desc->niter;
-      else
-	niter = NULL_TREE;
-
-      desc->niter = niter;
+      if (!number_of_iterations_exit (data->current_loop,
+				      exit, desc, true)
+     	  || contains_abnormal_ssa_name_p (desc->niter))
+	{
+	  XDELETE (desc);
+	  desc = NULL;
+	}
       slot = pointer_map_insert (data->niters, exit);
       *slot = desc;
     }
   else
-    niter = ((struct tree_niter_desc *) *slot)->niter;
+    desc = (struct tree_niter_desc *) *slot;
 
-  if (desc_p)
-    *desc_p = (struct tree_niter_desc *) *slot;
-  return niter;
+  return desc;
 }
 
-/* Returns tree describing number of iterations determined from
+/* Returns the structure describing number of iterations determined from
    single dominating exit of DATA->current_loop, or NULL if something
    goes wrong.  */
 
-static tree
+static struct tree_niter_desc *
 niter_for_single_dom_exit (struct ivopts_data *data)
 {
   edge exit = single_dom_exit (data->current_loop);
@@ -829,7 +825,7 @@ niter_for_single_dom_exit (struct ivopts
   if (!exit)
     return NULL;
 
-  return niter_for_exit (data, exit, NULL);
+  return niter_for_exit (data, exit);
 }
 
 /* Hash table equality function for expressions.  */
@@ -1174,12 +1170,17 @@ find_induction_variables (struct ivopts_
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      tree niter = niter_for_single_dom_exit (data);
+      struct tree_niter_desc *niter = niter_for_single_dom_exit (data);
 
       if (niter)
 	{
 	  fprintf (dump_file, "  number of iterations ");
-	  print_generic_expr (dump_file, niter, TDF_SLIM);
+	  print_generic_expr (dump_file, niter->niter, TDF_SLIM);
+	  if (!integer_zerop (niter->may_be_zero))
+	    {
+	      fprintf (dump_file, "; zero if ");
+	      print_generic_expr (dump_file, niter->may_be_zero, TDF_SLIM);
+	    }
 	  fprintf (dump_file, "\n\n");
     	};
 
@@ -2216,7 +2217,10 @@ add_candidate_1 (struct ivopts_data *dat
   struct iv_cand *cand = NULL;
   tree type, orig_type;
 
-  if (base)
+  /* For non-original variables, make sure their values are computed in a type
+     that does not invoke undefined behavior on overflows (since in general,
+     we cannot prove that these induction variables are non-wrapping).  */
+  if (pos != IP_ORIGINAL)
     {
       orig_type = TREE_TYPE (base);
       type = generic_type_for (orig_type);
@@ -2662,13 +2666,13 @@ infinite_cost_p (comp_cost cost)
 
 /* Sets cost of (USE, CANDIDATE) pair to COST and record that it depends
    on invariants DEPENDS_ON and that the value used in expressing it
-   is VALUE.  */
+   is VALUE, and in case of iv elimination the comparison operator is COMP.  */
 
 static void
 set_use_iv_cost (struct ivopts_data *data,
 		 struct iv_use *use, struct iv_cand *cand,
 		 comp_cost cost, bitmap depends_on, tree value,
-                 int inv_expr_id)
+		 enum tree_code comp, int inv_expr_id)
 {
   unsigned i, s;
 
@@ -2684,6 +2688,7 @@ set_use_iv_cost (struct ivopts_data *dat
       use->cost_map[cand->id].cost = cost;
       use->cost_map[cand->id].depends_on = depends_on;
       use->cost_map[cand->id].value = value;
+      use->cost_map[cand->id].comp = comp;
       use->cost_map[cand->id].inv_expr_id = inv_expr_id;
       return;
     }
@@ -2704,6 +2709,7 @@ found:
   use->cost_map[i].cost = cost;
   use->cost_map[i].depends_on = depends_on;
   use->cost_map[i].value = value;
+  use->cost_map[i].comp = comp;
   use->cost_map[i].inv_expr_id = inv_expr_id;
 }
 
@@ -4276,14 +4282,15 @@ determine_use_iv_cost_generic (struct iv
   if (cand->pos == IP_ORIGINAL
       && cand->incremented_at == use->stmt)
     {
-      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE, -1);
+      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE,
+                       ERROR_MARK, -1);
       return true;
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on,
                                NULL, &inv_expr_id);
 
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE, ERROR_MARK,
                    inv_expr_id);
 
   return !infinite_cost_p (cost);
@@ -4311,7 +4318,7 @@ determine_use_iv_cost_address (struct iv
       else if (cand->pos == IP_AFTER_USE || cand->pos == IP_BEFORE_USE)
 	cost = infinite_cost;
     }
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE, ERROR_MARK,
                    inv_expr_id);
 
   return !infinite_cost_p (cost);
@@ -4387,16 +4394,261 @@ iv_elimination_compare (struct ivopts_da
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+static tree
+strip_wrap_conserving_type_conversions (tree exp)
+{
+  while (tree_ssa_useless_type_conversion (exp)
+	 && (nowrap_type_p (TREE_TYPE (exp))
+	     == nowrap_type_p (TREE_TYPE (TREE_OPERAND (exp, 0)))))
+    exp = TREE_OPERAND (exp, 0);
+  return exp;
+}
+
+/* Walk the SSA form and check whether E == WHAT.  Fairly simplistic, we
+   check for an exact match.  */
+
+static bool
+expr_equal_p (tree e, tree what)
+{
+  gimple stmt;
+  enum tree_code code;
+
+  e = strip_wrap_conserving_type_conversions (e);
+  what = strip_wrap_conserving_type_conversions (what);
+
+  code = TREE_CODE (what);
+  if (TREE_TYPE (e) != TREE_TYPE (what))
+    return false;
+
+  if (operand_equal_p (e, what, 0))
+    return true;
+
+  if (TREE_CODE (e) != SSA_NAME)
+    return false;
+
+  stmt = SSA_NAME_DEF_STMT (e);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN
+      || gimple_assign_rhs_code (stmt) != code)
+    return false;
+
+  switch (get_gimple_rhs_class (code))
+    {
+    case GIMPLE_BINARY_RHS:
+      if (!expr_equal_p (gimple_assign_rhs2 (stmt), TREE_OPERAND (what, 1)))
+	return false;
+      /* Fallthru.  */
+
+    case GIMPLE_UNARY_RHS:
+    case GIMPLE_SINGLE_RHS:
+      return expr_equal_p (gimple_assign_rhs1 (stmt), TREE_OPERAND (what, 0));
+    default:
+      return false;
+    }
+}
+
+/* Returns true if we can prove that BASE - OFFSET does not overflow.  For now,
+   we only detect the situation that BASE = SOMETHING + OFFSET, where the
+   calculation is performed in non-wrapping type.
+
+   TODO: More generally, we could test for the situation that
+	 BASE = SOMETHING + OFFSET' and OFFSET is between OFFSET' and zero.
+	 This would require knowing the sign of OFFSET.
+
+	 Also, we only look for the first addition in the computation of BASE.
+	 More complex analysis would be better, but introducing it just for
+	 this optimization seems like an overkill.  */
+
+static bool
+difference_cannot_overflow_p (tree base, tree offset)
+{
+  enum tree_code code;
+  tree e1, e2;
+
+  if (!nowrap_type_p (TREE_TYPE (base)))
+    return false;
+
+  base = expand_simple_operations (base);
+
+  if (TREE_CODE (base) == SSA_NAME)
+    {
+      gimple stmt = SSA_NAME_DEF_STMT (base);
+
+      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	return false;
+
+      code = gimple_assign_rhs_code (stmt);
+      if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
+	return false;
+
+      e1 = gimple_assign_rhs1 (stmt);
+      e2 = gimple_assign_rhs2 (stmt);
+    }
+  else
+    {
+      code = TREE_CODE (base);
+      if (get_gimple_rhs_class (code) != GIMPLE_BINARY_RHS)
+	return false;
+      e1 = TREE_OPERAND (base, 0);
+      e2 = TREE_OPERAND (base, 1);
+    }
+
+  /* TODO: deeper inspection may be necessary to prove the equality.  */
+  switch (code)
+    {
+    case PLUS_EXPR:
+      return expr_equal_p (e1, offset) || expr_equal_p (e2, offset);
+    case POINTER_PLUS_EXPR:
+      return expr_equal_p (e2, offset);
+
+    default:
+      return false;
+    }
+}
+
+/* Tries to replace loop exit by one formulated in terms of a LT_EXPR
+   comparison with CAND.  NITER describes the number of iterations of
+   the loops.  If successful, the comparison in COMP_P is altered accordingly.
+
+   We aim to handle the following situation:
+
+   sometype *base, *p;
+   int a, b, i;
+
+   i = a;
+   p = p_0 = base + a;
+
+   do
+     {
+       bla (*p);
+       p++;
+       i++;
+     }
+   while (i < b);
+
+   Here, the number of iterations of the loop is (a + 1 > b) ? 0 : b - a - 1.
+   We aim to optimize this to
+
+   p = p_0 = base + a;
+   do
+     {
+       bla (*p);
+       p++;
+     }
+   while (p < p_0 - a + b);
+
+   This preserves the correctness, since the pointer arithmetics does not
+   overflow.  More precisely:
+
+   1) if a + 1 <= b, then p_0 - a + b is the final value of p, hence there is no
+      overflow in computing it or the values of p.
+   2) if a + 1 > b, then we need to verify that the expression p_0 - a does not
+      overflow.  To prove this, we use the fact that p_0 = base + a.  */
+
+static bool
+iv_elimination_compare_lt (struct ivopts_data *data,
+                           struct iv_cand *cand, enum tree_code *comp_p,
+			   struct tree_niter_desc *niter)
+{
+  tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset;
+  struct affine_tree_combination nit, tmpa, tmpb;
+  enum tree_code comp;
+  HOST_WIDE_INT step;
+
+  /* We need to know that the candidate induction variable does not overflow.
+     While more complex analysis may be used to prove this, for now just
+     check that the variable appears in the original program and that it
+     is computed in a type that guarantees no overflows.  */
+  cand_type = TREE_TYPE (cand->iv->base);
+  if (cand->pos != IP_ORIGINAL || !nowrap_type_p (cand_type))
+    return false;
+
+  /* Make sure that the loop iterates till the loop bound is hit, as otherwise
+     the calculation of the BOUND could overflow, making the comparison
+     invalid.  */
+  if (!data->loop_single_exit_p)
+    return false;
+
+  /* We need to be able to decide whether candidate is increasing or decreasing
+     in order to choose the right comparison operator.  */
+  if (!cst_and_fits_in_hwi (cand->iv->step))
+    return false;
+  step = int_cst_value (cand->iv->step);
+
+  /* Check that the number of iterations matches the expected pattern:
+     a + 1 > b ? 0 : b - a - 1.  */
+  mbz = niter->may_be_zero;
+  if (TREE_CODE (mbz) == GT_EXPR)
+    {
+      /* Handle a + 1 > b.  */
+      tree op0 = TREE_OPERAND (mbz, 0);
+      if (TREE_CODE (op0) == PLUS_EXPR && integer_onep (TREE_OPERAND (op0, 1)))
+	{
+	  a = TREE_OPERAND (op0, 0);
+	  b = TREE_OPERAND (mbz, 1);
+	}
+      else
+	return false;
+    }
+  else if (TREE_CODE (mbz) == LT_EXPR)
+    {
+      tree op1 = TREE_OPERAND (mbz, 1);
+
+      /* Handle b < a + 1.  */
+      if (TREE_CODE (op1) == PLUS_EXPR && integer_onep (TREE_OPERAND (op1, 1)))
+        {
+          a = TREE_OPERAND (op1, 0);
+          b = TREE_OPERAND (mbz, 0);
+        }
+      else
+	return false;
+    }
+  else
+    return false;
+
+  /* Expected number of iterations is B - A - 1.  Check that it matches
+     the actual number, i.e., that B - A - NITER = 1.  */
+  tree_to_aff_combination (niter->niter, nit_type, &nit);
+  tree_to_aff_combination (fold_convert (nit_type, a), nit_type, &tmpa);
+  tree_to_aff_combination (fold_convert (nit_type, b), nit_type, &tmpb);
+  aff_combination_scale (&nit, double_int_minus_one);
+  aff_combination_scale (&tmpa, double_int_minus_one);
+  aff_combination_add (&tmpb, &tmpa);
+  aff_combination_add (&tmpb, &nit);
+  if (tmpb.n != 0 || !double_int_equal_p (tmpb.offset, double_int_one))
+    return false;
+
+  /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not
+     overflow.  */
+  offset = fold_build2 (MULT_EXPR, TREE_TYPE (cand->iv->step),
+			cand->iv->step,
+			fold_convert (TREE_TYPE (cand->iv->step), a));
+  if (!difference_cannot_overflow_p (cand->iv->base, offset))
+    return false;
+
+  /* Determine the new comparison operator.  */
+  comp = step < 0 ? GT_EXPR : LT_EXPR;
+  if (*comp_p == NE_EXPR)
+    *comp_p = comp;
+  else if (*comp_p == EQ_EXPR)
+    *comp_p = invert_tree_comparison (comp, false);
+  else
+    gcc_unreachable ();
+
+  return true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
-   of candidate CAND.  If so, store the value compared with to BOUND.  */
+   of candidate CAND.  If so, store the value compared with to BOUND, and the
+   comparison operator to COMP.  */
 
 static bool
 may_eliminate_iv (struct ivopts_data *data,
-		  struct iv_use *use, struct iv_cand *cand, tree *bound)
+		  struct iv_use *use, struct iv_cand *cand, tree *bound,
+		  enum tree_code *comp)
 {
   basic_block ex_bb;
   edge exit;
-  tree nit, period;
+  tree period;
   struct loop *loop = data->current_loop;
   aff_tree bnd;
   struct tree_niter_desc *desc = NULL;
@@ -4418,8 +4670,8 @@ may_eliminate_iv (struct ivopts_data *da
   if (flow_bb_inside_loop_p (loop, exit->dest))
     return false;
 
-  nit = niter_for_exit (data, exit, &desc);
-  if (!nit)
+  desc = niter_for_exit (data, exit);
+  if (!desc)
     return false;
 
   /* Determine whether we can use the variable to test the exit condition.
@@ -4428,17 +4680,17 @@ may_eliminate_iv (struct ivopts_data *da
   period = iv_period (cand->iv);
 
   /* If the number of iterations is constant, compare against it directly.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (TREE_CODE (desc->niter) == INTEGER_CST)
     {
       /* See cand_value_at.  */
       if (stmt_after_increment (loop, cand, use->stmt))
         {
-          if (!tree_int_cst_lt (nit, period))
+          if (!tree_int_cst_lt (desc->niter, period))
             return false;
         }
       else
         {
-          if (tree_int_cst_lt (period, nit))
+          if (tree_int_cst_lt (period, desc->niter))
             return false;
         }
     }
@@ -4457,7 +4709,7 @@ may_eliminate_iv (struct ivopts_data *da
       if (double_int_ucmp (max_niter, period_value) > 0)
         {
           /* See if we can take advantage of infered loop bound information.  */
-          if (loop_only_exit_p (loop, exit))
+          if (data->loop_single_exit_p)
             {
               if (!estimated_loop_iterations (loop, true, &max_niter))
                 return false;
@@ -4470,13 +4722,26 @@ may_eliminate_iv (struct ivopts_data *da
         }
     }
 
-  cand_value_at (loop, cand, use->stmt, nit, &bnd);
+  cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
+  *comp = iv_elimination_compare (data, use);
+
   /* It is unlikely that computing the number of iterations using division
      would be more profitable than keeping the original induction variable.  */
   if (expression_expensive_p (*bound))
     return false;
+
+  /* Sometimes, it is possible to handle the situation that the number of
+     iterations may be zero unless additional assumtions by using <
+     instead of != in the exit condition.
+
+     TODO: we could also calculate the value MAY_BE_ZERO ? 0 : NITER and
+	   base the exit condition on it.  However, that is often too
+	   expensive.  */
+  if (!integer_zerop (desc->may_be_zero))
+    return iv_elimination_compare_lt (data, cand, comp, desc);
+
   return true;
 }
 
@@ -4511,16 +4776,18 @@ determine_use_iv_cost_condition (struct
   bool ok;
   int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
   tree *control_var, *bound_cst;
+  enum tree_code comp;
 
   /* Only consider real candidates.  */
   if (!cand->iv)
     {
-      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE, -1);
+      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE,
+		       ERROR_MARK, -1);
       return false;
     }
 
   /* Try iv elimination.  */
-  if (may_eliminate_iv (data, use, cand, &bound))
+  if (may_eliminate_iv (data, use, cand, &bound, &comp))
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       if (elim_cost.cost == 0)
@@ -4591,10 +4858,11 @@ determine_use_iv_cost_condition (struct
       depends_on = depends_on_express;
       depends_on_express = NULL;
       bound = NULL_TREE;
+      comp = ERROR_MARK;
       inv_expr_id = express_inv_expr_id;
     }
 
-  set_use_iv_cost (data, use, cand, cost, depends_on, bound, inv_expr_id);
+  set_use_iv_cost (data, use, cand, cost, depends_on, bound, comp, inv_expr_id);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -6234,7 +6502,7 @@ rewrite_use_compare (struct ivopts_data
           fprintf (dump_file, "Replacing exit test: ");
           print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
         }
-      compare = iv_elimination_compare (data, use);
+      compare = cp->comp;
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
       if (stmts)
@@ -6464,7 +6732,7 @@ tree_ssa_iv_optimize_loop (struct ivopts
 {
   bool changed = false;
   struct iv_ca *iv_ca;
-  edge exit;
+  edge exit = single_dom_exit (loop);
   basic_block *body;
 
   gcc_assert (!data->niters);
@@ -6475,7 +6743,6 @@ tree_ssa_iv_optimize_loop (struct ivopts
     {
       fprintf (dump_file, "Processing loop %d\n", loop->num);
 
-      exit = single_dom_exit (loop);
       if (exit)
 	{
 	  fprintf (dump_file, "  single exit %d -> %d, exit condition ",
@@ -6492,6 +6759,8 @@ tree_ssa_iv_optimize_loop (struct ivopts
   renumber_gimple_stmt_uids_in_blocks (body, loop->num_nodes);
   free (body);
 
+  data->loop_single_exit_p = exit != NULL && loop_only_exit_p (loop, exit);
+
   /* For each ssa name determines whether it behaves as an induction variable
      in some loop.  */
   if (!find_induction_variables (data))

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-08-25 12:20                                         ` Tom de Vries
@ 2011-08-25 12:38                                           ` Tom de Vries
  2011-08-25 20:31                                           ` Zdenek Dvorak
  1 sibling, 0 replies; 31+ messages in thread
From: Tom de Vries @ 2011-08-25 12:38 UTC (permalink / raw)
  To: Zdenek Dvorak
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 646 bytes --]

On 08/25/2011 01:42 PM, Tom de Vries wrote:
> Hi Zdenek,
> 
> here's the updated version of the patch.
> 
> The goal is to remove the 'i' iterator from the following example, by
> replacing 'i < n' with 'p < base + n'.
> 
> void
> f (char *base, unsigned long int i, unsigned long int n)
> {
>   char *p = base + i;
>   do
>     {
>       *p = '\0';
>       p++;
>       i++;
>    }
>   while (i < n);
> }
> 
> bootstrapped and reg-tested on x864_64, and build and reg-tested on MIPS.
> 
> I will sent a test-case in a separate email.
> 

OK for trunk?

2011-08-25  Tom de Vries  <tom@codesourcery.com>

	* gcc.dg/tree-ssa/ivopts-lt.c: New test.

[-- Attachment #2: ivopts.3.test.patch --]
[-- Type: text/x-patch, Size: 703 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts" } */
+
+void
+f1 (char *p, unsigned long int i, unsigned long int n)
+{
+  p += i;
+  do
+    {
+      *p = '\0';
+      p += 1;
+      i++;
+    }
+  while (i < n);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <p_" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "p_\[0-9\]* <" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: ivopts improvement
  2011-08-25 12:20                                         ` Tom de Vries
  2011-08-25 12:38                                           ` Tom de Vries
@ 2011-08-25 20:31                                           ` Zdenek Dvorak
  1 sibling, 0 replies; 31+ messages in thread
From: Zdenek Dvorak @ 2011-08-25 20:31 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Paolo Bonzini, Richard Guenther, gcc-patches, Bernd Schmidt,
	Maxim Kuvyrkov

Hi,

> here's the updated version of the patch.
> 
> The goal is to remove the 'i' iterator from the following example, by
> replacing 'i < n' with 'p < base + n'.
> 
> void
> f (char *base, unsigned long int i, unsigned long int n)
> {
>   char *p = base + i;
>   do
>     {
>       *p = '\0';
>       p++;
>       i++;
>    }
>   while (i < n);
> }
> 
> bootstrapped and reg-tested on x864_64, and build and reg-tested on MIPS.
> 
> I will sent a test-case in a separate email.
> 
> OK for trunk?

OK,

Zdenek

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-08-25 18:49 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-28 10:46 ivopts improvement Tom de Vries
2011-02-28 11:44 ` Paolo Bonzini
2011-02-28 14:59   ` Tom de Vries
2011-02-28 15:07     ` Paolo Bonzini
2011-02-28 15:35       ` Richard Guenther
2011-02-28 19:14         ` Zdenek Dvorak
2011-03-02 22:01           ` Tom de Vries
2011-03-03  8:45             ` Paolo Bonzini
2011-03-03 14:29               ` Tom de Vries
2011-03-04  7:37                 ` Paolo Bonzini
2011-03-04 13:58                   ` Tom de Vries
2011-03-04 14:07                     ` Paolo Bonzini
2011-03-04 17:19                     ` Tom de Vries
2011-03-04 22:37                 ` Zdenek Dvorak
2011-03-13 17:35                   ` Tom de Vries
2011-03-14  9:58                     ` Zdenek Dvorak
2011-03-14 12:04                       ` Tom de Vries
2011-03-14 12:14                         ` Zdenek Dvorak
2011-03-14 12:49                           ` Tom de Vries
2011-03-14 12:55                             ` Zdenek Dvorak
2011-03-14 13:51                               ` Tom de Vries
2011-03-14 14:03                                 ` Zdenek Dvorak
2011-03-15 14:19                                   ` Tom de Vries
2011-03-15 15:10                                     ` Zdenek Dvorak
2011-03-15 15:48                                       ` Tom de Vries
2011-08-25 12:20                                         ` Tom de Vries
2011-08-25 12:38                                           ` Tom de Vries
2011-08-25 20:31                                           ` Zdenek Dvorak
2011-02-28 12:47 ` Zdenek Dvorak
2011-02-28 13:09   ` Richard Guenther
2011-02-28 16:45   ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).