IVOPT improvement patch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* IVOPT improvement patch
@ 2010-05-11  6:35 Xinliang David Li
  2010-05-11  7:18 ` Zdenek Dvorak
                   ` (4 more replies)
  0 siblings, 5 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11  6:35 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

Hi, IVOPT has been one of the main area of complaints from gcc users
and it is often shutdown or user is forced to use inline assembly to
write key kernel loops. The following (resulting from the
investigation of many user complaints) summarize some of the key
problems:

1) Too many induction variables are used and advanced addressing mode
is not fully taken advantage of. On latest Intel CPU, the increased
loop size (due to iv updates) can have very large negative impact on
performance, e.g, when LSD and uop macro fusion get blocked. The root
cause of the problem is not at the cost model used in IVOPT, but in
the algorithm in finding the 'optimal' assignment from iv candidates
to uses.

2) Profile information is not used in cost estimation (e.g. computing
cost of loop variants)

3) For replaced IV (original) that are only live out of the loop (i.e.
there are no uses inside loop), the rewrite of the IV occurs inside
the loop which usually results in code more expensive than the
original iv update statement -- and it is very difficult for later
phases to sink down the computation outside the loop (see PR31792).
The right solution is to materialize/rewrite such ivs directly outside
the loop (also to avoid introducing overlapping live ranges)

4) iv update statement sometimes block the forward
propagation/combination of the memory ref operation (depending the
before IV value)  with the loop branch compare. Simple minded
propagation will lead to overlapping live range and addition copy/move
instruction to be generated.

5) In estimating the global cost (register pressure), the registers
resulting from LIM of invariant expressions are not considered

6) IN MEM_REF creation, loop variant and invariants may be assigned to
the same part -- which is essentially a re-association blocking LIM

7) Intrinsic calls that are essentially memory operations are not
recognized as uses.

The attached patch handles all the problems above except for 7.


Bootstrapped and regression tested on linux/x86_64.

The patch was not tuned for SPEC, but SPEC testing was done.
Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).

Ok for trunk?

Thanks,

David

[-- Attachment #2: ivopts_latest.p --]
[-- Type: text/x-pascal, Size: 57216 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+#include <stdlib.h>
+int foo(const char* p, const char* p2, size_t N)
+{
+  const char* p_limit = p + N;
+  while (p  <= p_limit - 16
+        && *(long long*)p  <*(long long*)p2 )
+  {
+     p += 16;
+     p2 += 16;
+  }
+  N = p_limit - p;
+  return memcmp(p, p2, N);
+}
+
+/* { dg-final { scan-tree-dump-times "Sinking" 4 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Reordering" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+#include <stdlib.h>
+
+int foo(const char* p, const char* p2, size_t N)
+{
+ const char* p_limit = p + N;
+ int s = 0;
+ while (p  <= p_limit - 16
+        && *(long long*)p <*(long long*)p2)
+ {
+     p += 8;
+     p2 += 8;
+     s += (*p + *p2);
+  }
+  return s;
+}
+/* { dg-final { scan-tree-dump-times "Reordering" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2  -m64 -fdump-tree-ivopts-details" } */
+int inner_longest_match(char *scan, char *match, char *strend)
+{
+  char *start_scan = scan;
+  do {
+  } while (*++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           scan < strend);
+
+  return scan - start_scan;
+}
+
+/* { dg-final { scan-tree-dump-times "Sinking" 7 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159243)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,29 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline unsigned
+avg_loop_niter (struct loop *loop)
+{
+  unsigned tc;
+  if (loop->header->count || loop->latch->count)
+    tc = expected_loop_iterations (loop);
+  else
+    tc = AVG_LOOP_NITER (loop);
+  if (tc == 0)
+    tc++;
+  return tc;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -156,6 +171,14 @@ struct cost_pair
 			   the new bound to compare with.  */
 };
 
+/* The use position for iv.  */
+enum iv_use_pos
+{
+  IU_UNKNOWN,
+  IU_OUTSIDE_LOOP_ONLY,
+  IU_INSIDE_LOOP
+};
+
 /* Use.  */
 struct iv_use
 {
@@ -173,6 +196,8 @@ struct iv_use
 
   struct iv_cand *selected;
 			/* The selected candidate.  */
+  enum iv_use_pos use_pos;
+                        /* The use position.  */
 };
 
 /* The position where the iv is computed.  */
@@ -218,6 +243,11 @@ typedef struct iv_cand *iv_cand_p;
 DEF_VEC_P(iv_cand_p);
 DEF_VEC_ALLOC_P(iv_cand_p,heap);
 
+typedef struct version_info *version_info_p;
+DEF_VEC_P(version_info_p);
+DEF_VEC_ALLOC_P(version_info_p,heap);
+
+
 struct ivopts_data
 {
   /* The currently optimized loop.  */
@@ -235,6 +265,10 @@ struct ivopts_data
   /* The array of information for the ssa names.  */
   struct version_info *version_info;
 
+
+  /* Pseudo version infos for generated loop invariants.  */
+  VEC(version_info_p,heap) *pseudo_version_info;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -250,6 +284,9 @@ struct ivopts_data
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
+  /* The minimal invariant id for pseudo invariants.  */
+  unsigned min_pseudo_inv_id;
+
   /* Whether to consider just related and important candidates when replacing a
      use.  */
   bool consider_all_candidates;
@@ -283,6 +320,9 @@ struct iv_ca
   /* Total number of registers needed.  */
   unsigned n_regs;
 
+  /* Total number of pseudo invariants.  */
+  unsigned n_pseudos;
+
   /* Total cost of expressing uses.  */
   comp_cost cand_use_cost;
 
@@ -335,6 +375,8 @@ struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+static struct pointer_map_t *inverted_stmt_map;
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -513,6 +555,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -544,7 +599,11 @@ dump_cand (FILE *file, struct iv_cand *c
 static inline struct version_info *
 ver_info (struct ivopts_data *data, unsigned ver)
 {
-  return data->version_info + ver;
+  if (ver < data->min_pseudo_inv_id)
+    return data->version_info + ver;
+  else
+    return VEC_index (version_info_p, data->pseudo_version_info,
+                      ver - data->min_pseudo_inv_id);
 }
 
 /* Returns the info for ssa name NAME.  */
@@ -766,6 +825,8 @@ tree_ssa_iv_optimize_init (struct ivopts
 {
   data->version_info_size = 2 * num_ssa_names;
   data->version_info = XCNEWVEC (struct version_info, data->version_info_size);
+  data->min_pseudo_inv_id = num_ssa_names;
+  data->pseudo_version_info = NULL;
   data->relevant = BITMAP_ALLOC (NULL);
   data->important_candidates = BITMAP_ALLOC (NULL);
   data->max_inv_id = 0;
@@ -1102,6 +1163,7 @@ record_use (struct ivopts_data *data, tr
   use->stmt = stmt;
   use->op_p = use_p;
   use->related_cands = BITMAP_ALLOC (NULL);
+  use->use_pos = IU_UNKNOWN;
 
   /* To avoid showing ssa name in the dumps, if it was not reset by the
      caller.  */
@@ -1142,10 +1204,28 @@ record_invariant (struct ivopts_data *da
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (op));
 }
 
-/* Checks whether the use OP is interesting and if so, records it.  */
+/* Records a pseudo invariant and returns its VERSION_INFO.  */
+
+static struct version_info *
+record_pseudo_invariant (struct ivopts_data *data)
+{
+  struct version_info *info;
+
+  info = XCNEW (struct version_info);
+  info->name = NULL;
+  VEC_safe_push (version_info_p, heap, data->pseudo_version_info, info);
+  info->inv_id
+      = VEC_length (version_info_p, data->pseudo_version_info) - 1
+      + data->min_pseudo_inv_id;
+
+  return info;
+}
+
+/* Checks whether the use OP is interesting and if so, records it.
+   USE_POS indicates where the use comes from.  */
 
 static struct iv_use *
-find_interesting_uses_op (struct ivopts_data *data, tree op)
+find_interesting_uses_op (struct ivopts_data *data, tree op, enum iv_use_pos use_pos)
 {
   struct iv *iv;
   struct iv *civ;
@@ -1164,6 +1244,10 @@ find_interesting_uses_op (struct ivopts_
       use = iv_use (data, iv->use_id);
 
       gcc_assert (use->type == USE_NONLINEAR_EXPR);
+      gcc_assert (use->use_pos != IU_UNKNOWN);
+
+      if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+        use->use_pos = use_pos;
       return use;
     }
 
@@ -1183,6 +1267,7 @@ find_interesting_uses_op (struct ivopts_
 
   use = record_use (data, NULL, civ, stmt, USE_NONLINEAR_EXPR);
   iv->use_id = use->id;
+  use->use_pos = use_pos;
 
   return use;
 }
@@ -1260,17 +1345,19 @@ find_interesting_uses_cond (struct ivopt
 {
   tree *var_p, *bound_p;
   struct iv *var_iv, *civ;
+  struct iv_use *use;
 
   if (!extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL))
     {
-      find_interesting_uses_op (data, *var_p);
-      find_interesting_uses_op (data, *bound_p);
+      find_interesting_uses_op (data, *var_p, IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, *bound_p, IU_INSIDE_LOOP);
       return;
     }
 
   civ = XNEW (struct iv);
   *civ = *var_iv;
-  record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use = record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use->use_pos = IU_INSIDE_LOOP;
 }
 
 /* Returns true if expression EXPR is obviously invariant in LOOP,
@@ -1433,11 +1520,13 @@ idx_record_use (tree base, tree *idx,
 		void *vdata)
 {
   struct ivopts_data *data = (struct ivopts_data *) vdata;
-  find_interesting_uses_op (data, *idx);
+  find_interesting_uses_op (data, *idx, IU_INSIDE_LOOP);
   if (TREE_CODE (base) == ARRAY_REF || TREE_CODE (base) == ARRAY_RANGE_REF)
     {
-      find_interesting_uses_op (data, array_ref_element_size (base));
-      find_interesting_uses_op (data, array_ref_low_bound (base));
+      find_interesting_uses_op (data, array_ref_element_size (base),
+                                IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, array_ref_low_bound (base),
+                                IU_INSIDE_LOOP);
     }
   return true;
 }
@@ -1597,12 +1686,13 @@ may_be_nonaddressable_p (tree expr)
 
 /* Finds addresses in *OP_P inside STMT.  */
 
-static void
+static bool
 find_interesting_uses_address (struct ivopts_data *data, gimple stmt, tree *op_p)
 {
   tree base = *op_p, step = build_int_cst (sizetype, 0);
   struct iv *civ;
   struct ifs_ivopts_data ifs_ivopts_data;
+  struct iv_use *use;
 
   /* Do not play with volatile memory references.  A bit too conservative,
      perhaps, but safe.  */
@@ -1696,11 +1786,13 @@ find_interesting_uses_address (struct iv
     }
 
   civ = alloc_iv (base, step);
-  record_use (data, op_p, civ, stmt, USE_ADDRESS);
-  return;
+  use = record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  use->use_pos = IU_INSIDE_LOOP;
+  return true;
 
 fail:
   for_each_index (op_p, idx_record_use, data);
+  return false;
 }
 
 /* Finds and records invariants used in STMT.  */
@@ -1762,7 +1854,7 @@ find_interesting_uses_stmt (struct ivopt
 	  if (REFERENCE_CLASS_P (*rhs))
 	    find_interesting_uses_address (data, stmt, rhs);
 	  else
-	    find_interesting_uses_op (data, *rhs);
+	    find_interesting_uses_op (data, *rhs, IU_INSIDE_LOOP);
 
 	  if (REFERENCE_CLASS_P (*lhs))
 	    find_interesting_uses_address (data, stmt, lhs);
@@ -1803,7 +1895,7 @@ find_interesting_uses_stmt (struct ivopt
       if (!iv)
 	continue;
 
-      find_interesting_uses_op (data, op);
+      find_interesting_uses_op (data, op, IU_INSIDE_LOOP);
     }
 }
 
@@ -1822,7 +1914,12 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        {
+          if (gimple_phi_num_args (phi) == 1)
+            find_interesting_uses_op (data, def, IU_OUTSIDE_LOOP_ONLY);
+	  else
+            find_interesting_uses_op (data, def, IU_INSIDE_LOOP);
+	}
     }
 }
 
@@ -2138,7 +2235,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3684,6 +3783,94 @@ difference_cost (struct ivopts_data *dat
   return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
 }
 
+/* Returns true if AFF1 and AFF2 are identical.  */
+
+static bool
+compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
+{
+  unsigned i;
+
+  if (aff1->n != aff2->n)
+    return false;
+
+  for (i = 0; i < aff1->n; i++)
+    {
+      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
+        return false;
+
+      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
+        return false;
+    }
+  return true;
+}
+
+/* Returns true if expression UBASE - RATIO * CBASE requires a new compiler
+   generated temporary.  */
+
+static bool
+create_loop_invariant_temp (tree ubase, tree cbase, HOST_WIDE_INT ratio)
+{
+  aff_tree ubase_aff, cbase_aff;
+
+  STRIP_NOPS (ubase);
+  STRIP_NOPS (cbase);
+
+  if ((TREE_CODE (ubase) == INTEGER_CST)
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (ubase) == SSA_NAME)
+       || (TREE_CODE (ubase) == ADDR_EXPR))
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (cbase) == SSA_NAME)
+       || (TREE_CODE (cbase) == ADDR_EXPR))
+      && (TREE_CODE (ubase) == INTEGER_CST))
+    return false;
+
+  if (ratio == 1)
+    {
+      if(operand_equal_p (ubase, cbase, 0))
+        return false;
+      if (TREE_CODE (ubase) == ADDR_EXPR
+        && TREE_CODE (cbase) == ADDR_EXPR)
+        {
+          tree usym, csym;
+
+          usym = TREE_OPERAND (ubase, 0);
+          csym = TREE_OPERAND (cbase, 0);
+          if (TREE_CODE (usym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (usym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                usym = TREE_OPERAND (usym, 0);
+            }
+          if (TREE_CODE (csym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (csym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                csym = TREE_OPERAND (csym, 0);
+            }
+          if (usym == csym)
+            return false;
+        }
+      /* Now do more complex comparison  */
+      tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
+      tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
+      if (compare_aff_trees (&ubase_aff, &cbase_aff))
+        return false;
+    }
+
+  return true;
+}
+
+
+
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
    to create an address from it, otherwise we want to get it into
@@ -3811,6 +3998,17 @@ get_computation_cost_at (struct ivopts_d
 					 &offset, depends_on));
     }
 
+  /* Loop invariant computation.  */
+  cost.cost /= avg_loop_niter (data->current_loop);
+
+  if (create_loop_invariant_temp (ubase, cbase, ratio))
+    {
+      struct version_info *pv = record_pseudo_invariant (data);
+       if (!*depends_on)
+         *depends_on = BITMAP_ALLOC (NULL);
+       bitmap_set_bit (*depends_on, pv->inv_id);
+    }
+
   /* If we are after the increment, the value of the candidate is higher by
      one iteration.  */
   stmt_is_after_inc = stmt_after_increment (data->current_loop, cand, at);
@@ -3841,7 +4039,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +4109,10 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY && !infinite_cost_p (cost))
+    cost.cost /= avg_loop_niter (data->current_loop);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4056,20 +4258,16 @@ may_eliminate_iv (struct ivopts_data *da
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
       if (!estimated_loop_iterations (loop, true, &max_niter))
 	return false;
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
+      if (double_int_ucmp (max_niter, period_value) > 0)
 	return false;
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4304,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4551,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4514,6 +4712,12 @@ cheaper_cost_pair (struct cost_pair *a, 
   return false;
 }
 
+
+/* Pseudo invariants may get commonned, and there is no simple way
+   to estimate that. Simply weight it down.  */
+
+#define PSEUDO_COMMON_FACTOR 0.3
+
 /* Computes the cost field of IVS structure.  */
 
 static void
@@ -4521,7 +4725,10 @@ iv_ca_recount_cost (struct ivopts_data *
 {
   comp_cost cost = ivs->cand_use_cost;
   cost.cost += ivs->cand_cost;
-  cost.cost += ivopts_global_cost_for_size (data, ivs->n_regs);
+  cost.cost += ivopts_global_cost_for_size (data,
+                                            ivs->n_regs
+                                            + ivs->n_pseudos
+                                            * PSEUDO_COMMON_FACTOR);
 
   ivs->cost = cost;
 }
@@ -4529,10 +4736,12 @@ iv_ca_recount_cost (struct ivopts_data *
 /* Remove invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_remove_invariants (struct ivopts_data *data,
+                             struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4541,7 +4750,12 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs--;
+          else
+            ivs->n_pseudos--;
+        }
     }
 }
 
@@ -4572,22 +4786,24 @@ iv_ca_set_no_cp (struct ivopts_data *dat
       ivs->n_cands--;
       ivs->cand_cost -= cp->cand->cost;
 
-      iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
+      iv_ca_set_remove_invariants (data, ivs, cp->cand->depends_on);
     }
 
   ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
 
-  iv_ca_set_remove_invariants (ivs, cp->depends_on);
+  iv_ca_set_remove_invariants (data, ivs, cp->depends_on);
   iv_ca_recount_cost (data, ivs);
 }
 
 /* Add invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_add_invariants (struct ivopts_data *data,
+                          struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4596,7 +4812,12 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs++;
+          else
+            ivs->n_pseudos++;
+        }
     }
 }
 
@@ -4630,11 +4851,11 @@ iv_ca_set_cp (struct ivopts_data *data, 
 	  ivs->n_cands++;
 	  ivs->cand_cost += cp->cand->cost;
 
-	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
+	  iv_ca_set_add_invariants (data, ivs, cp->cand->depends_on);
 	}
 
       ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
-      iv_ca_set_add_invariants (ivs, cp->depends_on);
+      iv_ca_set_add_invariants (data, ivs, cp->depends_on);
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -4841,9 +5062,13 @@ iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
+  nw->n_pseudos = 0;
   nw->cand_use_cost = zero_cost;
   nw->cand_cost = 0;
-  nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->n_invariant_uses = XCNEWVEC (unsigned,
+                                   data->min_pseudo_inv_id
+                                   + VEC_length (version_info_p,
+                                                 data->pseudo_version_info));
   nw->cost = zero_cost;
 
   return nw;
@@ -4871,8 +5096,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,7 +5118,9 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
+  fprintf (file, "nregs: %d\nnpseudos: %d\n\n",
+           ivs->n_regs, ivs->n_pseudos);
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
@@ -4890,7 +5130,7 @@ iv_ca_dump (struct ivopts_data *data, FI
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +5154,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5350,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5384,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5444,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5293,6 +5534,251 @@ find_optimal_iv_set (struct ivopts_data 
   return set;
 }
 
+/* Returns a statement that undoes the operation in INCREMENT
+   on value OLD_VAL.  */
+
+static gimple
+get_inverted_increment_1 (gimple increment, tree old_val)
+{
+  tree new_assign_def;
+  gimple inverted_increment;
+  enum tree_code incr_op;
+  tree step;
+
+  new_assign_def = make_ssa_name (SSA_NAME_VAR (old_val), NULL);
+  step = unshare_expr (gimple_assign_rhs2 (increment));
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op == PLUS_EXPR)
+    incr_op = MINUS_EXPR;
+  else
+    {
+      gcc_assert (incr_op == MINUS_EXPR);
+      incr_op = PLUS_EXPR;
+    }
+  inverted_increment
+      = gimple_build_assign_with_ops (incr_op, new_assign_def,
+                                      old_val, step);
+
+  return inverted_increment;
+}
+
+/* Returns a statement that undos the operation in INCREMENT
+   on the result of phi NEW_PHI.  */
+
+static gimple
+get_inverted_increment (gimple reaching_increment, gimple new_phi)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple inverted_increment;
+  tree phi_result;
+  void **slot;
+
+  gcc_assert (gimple_assign_lhs (reaching_increment)
+              == PHI_ARG_DEF (new_phi, 0));
+
+  if (!inverted_stmt_map)
+    inverted_stmt_map = pointer_map_create ();
+
+  slot = pointer_map_insert (inverted_stmt_map, new_phi);
+  if (*slot)
+    return (gimple) *slot;
+
+  phi_result = PHI_RESULT (new_phi);
+  bb = gimple_bb (new_phi);
+  gsi = gsi_after_labels (bb);
+
+  inverted_increment = get_inverted_increment_1 (reaching_increment,
+                                                 phi_result);
+  gsi_insert_before (&gsi, inverted_increment, GSI_NEW_STMT);
+  *slot = (void *) inverted_increment;
+  return inverted_increment;
+}
+
+/* Performs a peephole optimization to reorder the iv update statement with
+   a mem ref to enable instruction combining in later phases. The mem ref uses
+   the iv value before the update, so the reordering transformation requires
+   adjustment of the offset. CAND is the selected IV_CAND.
+
+   Example:
+
+   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
+   iv2 = iv1 + 1;
+
+   if (t < val)      (1)
+     goto L;
+   goto Head;
+
+
+   directly propagating t over to (1) will introduce overlapping live range
+   thus increase register pressure. This peephole transform it into:
+
+
+   iv2 = iv1 + 1;
+   t = MEM_REF (base, iv2, 8, 8);
+   if (t < val)
+     goto L;
+   goto Head;
+*/
+
+static void
+adjust_iv_update_pos (struct ivopts_data *data ATTRIBUTE_UNUSED, struct iv_cand *cand)
+{
+  tree var_after, step, stride, index, offset_adjust, offset, mem_ref_op;
+  gimple iv_update, stmt, cond, mem_ref, index_to_base, use_stmt;
+  basic_block bb;
+  gimple_stmt_iterator gsi, gsi_iv;
+  use_operand_p use_p;
+  enum tree_code incr_op;
+  imm_use_iterator iter;
+  bool found = false;
+
+  var_after = cand->var_after;
+  iv_update = SSA_NAME_DEF_STMT (var_after);
+
+  /* Do not handle complicated iv update case.  */
+  incr_op = gimple_assign_rhs_code (iv_update);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return;
+
+  step = gimple_assign_rhs2 (iv_update);
+  if (!CONSTANT_CLASS_P (step))
+    return;
+
+  bb = gimple_bb (iv_update);
+  gsi = gsi_last_nondebug_bb (bb);
+  stmt = gsi_stmt (gsi);
+
+  /* Only handle conditional statement for now.  */
+  if (gimple_code (stmt) != GIMPLE_COND)
+    return;
+
+  cond = stmt;
+
+  gsi_prev_nondebug (&gsi);
+  stmt = gsi_stmt (gsi);
+  if (stmt != iv_update)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+    return;
+
+  stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+
+  if (gimple_assign_rhs_code (stmt) != TARGET_MEM_REF)
+    return;
+
+  mem_ref = stmt;
+  mem_ref_op = gimple_assign_rhs1 (mem_ref);
+
+  if (TREE_CODE (gimple_assign_lhs (mem_ref)) != SSA_NAME)
+    return;
+
+  if (!single_imm_use (gimple_assign_lhs (mem_ref), &use_p, &use_stmt))
+    return;
+
+  if (use_stmt != cond)
+    return;
+
+  /* Found code motion candidate -- the statement with mem_ref.  */
+
+  index = TMR_INDEX (mem_ref_op);
+  index_to_base = NULL;
+  if (index)
+    {
+      if (index != cand->var_before)
+        return;
+    }
+  else
+    {
+      /* Index used as base.  */
+      tree base = TMR_BASE (mem_ref_op);
+
+      if (TREE_CODE (base) != SSA_NAME)
+        return;
+
+      if (!has_single_use (base))
+        return;
+
+      index_to_base = SSA_NAME_DEF_STMT (base);
+      if (gimple_code (index_to_base) != GIMPLE_ASSIGN)
+        return;
+      if (gimple_assign_rhs_code (index_to_base) != NOP_EXPR)
+        return;
+      if (gimple_assign_rhs1 (index_to_base) != cand->var_before)
+        return;
+    }
+
+  stride = TMR_STEP (mem_ref_op);
+  offset = TMR_OFFSET (mem_ref_op);
+  if (stride && index)
+    offset_adjust = int_const_binop (MULT_EXPR, stride, step, 0);
+  else
+    offset_adjust = step;
+
+  if (offset_adjust == NULL)
+    return;
+
+  offset = int_const_binop ((incr_op == PLUS_EXPR
+                             ? MINUS_EXPR : PLUS_EXPR),
+                            (offset ? offset : size_zero_node),
+                            offset_adjust, 0);
+
+  if (offset == NULL)
+    return;
+
+  if (index_to_base)
+    gsi = gsi_for_stmt (index_to_base);
+  else
+    gsi = gsi_for_stmt (mem_ref);
+  gsi_iv = gsi_for_stmt (iv_update);
+  gsi_move_before (&gsi_iv, &gsi);
+
+  /* Now fix up the mem_ref.  */
+  FOR_EACH_IMM_USE_FAST (use_p, iter, cand->var_before)
+    {
+      if (USE_STMT (use_p) == mem_ref || USE_STMT (use_p) == index_to_base)
+        {
+          set_ssa_use_from_ptr (use_p, var_after);
+          if (index_to_base)
+            *gimple_assign_rhs1_ptr (index_to_base) = var_after;
+          else
+            TMR_INDEX (mem_ref_op) = var_after;
+
+          found = true;
+          break;
+        }
+    }
+  gcc_assert (found);
+  TMR_OFFSET (mem_ref_op) = offset;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Reordering \n");
+      print_gimple_stmt (dump_file, iv_update, 0, 0);
+      print_gimple_stmt (dump_file, mem_ref, 0, 0);
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Performs reordering peep hole optimization for all selected ivs in SET.  */
+
+static void
+adjust_update_pos_for_ivs (struct ivopts_data *data, struct iv_ca *set)
+{
+  unsigned i;
+  struct iv_cand *cand;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+    {
+      cand = iv_cand (data, i);
+      adjust_iv_update_pos (data, cand);
+    }
+}
+
 /* Creates a new induction variable corresponding to CAND.  */
 
 static void
@@ -5329,8 +5815,8 @@ create_new_iv (struct ivopts_data *data,
       name_info (data, cand->var_after)->preserve_biv = true;
 
       /* Rewrite the increment so that it uses var_before directly.  */
-      find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
+      find_interesting_uses_op (data, cand->var_after,
+                                IU_INSIDE_LOOP)->selected = cand;
       return;
     }
 
@@ -5358,9 +5844,512 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Callback function in the tree walk to fix up old live out
+   names to loop exit phi's result.  */
+
+static tree
+fixup_use (tree *op,
+           int *unused ATTRIBUTE_UNUSED,
+           void *data)
+{
+  struct pointer_map_t *nm_to_def_map
+      = (struct pointer_map_t *) data;
+
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    {
+      void **slot;
+      slot = pointer_map_contains (nm_to_def_map, *op);
+      if (slot)
+        {
+          enum gimple_code gc;
+          gimple def = (gimple) (*slot);
+          gc = gimple_code (def);
+          if (gc == GIMPLE_PHI)
+            *op = PHI_RESULT (def);
+          else
+            *op = gimple_assign_lhs (def);
+        }
+    }
+
+  return 0;
+}
+
+/* Callback function in the tree walk to collect used ssa names
+   in the tree.  */
+
+static tree
+collect_ssa_names (tree *op,
+                   int *unused ATTRIBUTE_UNUSED,
+                   void *data)
+{
+  VEC(tree, heap) ** used_names = (VEC(tree, heap) **) data;
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    VEC_safe_push (tree, heap, *used_names, *op);
+
+  return 0;
+}
+
+/* The function fixes up live out ssa names used in tree *VAL to
+   the matching loop exit phi's results. */
+
+static void
+fixup_iv_out_val (tree *val, struct pointer_map_t *nm_to_phi_map)
+{
+  walk_tree (val, fixup_use, nm_to_phi_map, NULL);
+}
+
+/* Returns the iv update statement if USE's cand variable is
+   the version before the update; otherwise returns NULL.  */
+
+static gimple
+cause_overlapping_lr (struct ivopts_data *data,
+                      tree nm_used, struct iv_use *use,
+                      basic_block use_bb)
+{
+  tree selected_iv_nm;
+  edge e;
+  gimple increment;
+  enum tree_code incr_op;
+
+  selected_iv_nm = var_at_stmt (data->current_loop,
+                                use->selected,
+                                use->stmt);
+
+  if (nm_used != selected_iv_nm)
+    return NULL;
+
+  if (selected_iv_nm == use->selected->var_after)
+    return NULL;
+
+  /* Check if def of var_after reaches use_bb.  */
+  gcc_assert (single_pred_p (use_bb));
+  e = single_pred_edge (use_bb);
+
+  increment = SSA_NAME_DEF_STMT (use->selected->var_after);
+
+  if (e->src != gimple_bb (increment))
+    return NULL;
+
+  /* Only handle simple increments  */
+  if (gimple_code (increment) != GIMPLE_ASSIGN)
+    return NULL;
+
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return NULL;
+
+  if (!CONSTANT_CLASS_P (gimple_assign_rhs2 (increment)))
+    return NULL;
+
+  return increment;
 }
 
 
+/* Returns the loop closing phi for LIVE_OUT_IV in basic block TGT_BB.
+   IV_UPDATE_STMT is the update statement for LIVE_OUT_IV, and
+   *FOR_UPDATED_VAL is set to true if the argument of the phi is defined
+   by IV_UPDATE_STMT.  */
+
+static gimple
+find_closing_phi (basic_block tgt_bb, tree live_out_iv,
+                  gimple iv_update_stmt, bool *for_updated_val)
+{
+  gimple_stmt_iterator psi;
+  gimple phi = NULL;
+
+  *for_updated_val = false;
+
+  /* Now try to find the existing matching phi.  */
+  for (psi = gsi_start_phis (tgt_bb); !gsi_end_p (psi); gsi_next (&psi))
+    {
+      gimple p;
+      p = gsi_stmt (psi);
+
+      if (SSA_NAME_VAR (PHI_ARG_DEF (p, 0))
+          == SSA_NAME_VAR (live_out_iv))
+        {
+          phi = p;
+          break;
+        }
+    }
+
+  if (!phi)
+    return NULL;
+
+  if (PHI_ARG_DEF (phi, 0) == live_out_iv)
+    {
+      *for_updated_val = false;
+      /* Found exact match.  */
+      return phi;
+    }
+  else if (iv_update_stmt &&
+           PHI_ARG_DEF (phi, 0) == gimple_assign_lhs (iv_update_stmt))
+    {
+      *for_updated_val = true;
+      return phi;
+    }
+
+  return NULL;
+}
+
+
+/* The function ensures closed SSA form for moving use statement from USE
+   across the loop exit. LIVE_OUT_NM is the original ssa name that is live out,
+   TGT_BB is the destination bb of the code motion, and NM_TO_DEF_MAP maps
+   the original name to the result of the closing phi.
+
+   Scenario 1:
+   ----------------
+   Loop:
+
+   Loop_exit:
+
+     closed_iv_val = PHI (live_out_iv)
+
+     Uses of (live_out_iv) get replaced with closed_iv_val
+
+
+
+   Scenario 2:
+   ----------------
+   Loop:
+
+     updated_iv_val = live_out_iv + 1
+   Loop_exit:
+
+     closed_iv_val = PHI (updated_iv_val)
+     updated_iv_val2 = closed_iv_val - 1
+
+     Uses of live_out_iv get replaced with updated_iv_val2
+*/
+
+static gimple
+ensure_closed_ssa_form_for (struct ivopts_data *data,
+                            tree live_out_nm, basic_block tgt_bb,
+                            struct iv_use *use,
+                            struct pointer_map_t *nm_to_def_map)
+{
+  gimple closing_phi = NULL;
+  bool closing_phi_for_updated_val = false;
+
+  gimple def_stmt, new_def_stmt = NULL;
+  basic_block def_bb;
+  gimple iv_update_stmt;
+  void **slot;
+
+  def_stmt = SSA_NAME_DEF_STMT (live_out_nm);
+  def_bb = gimple_bb (def_stmt);
+
+  if (!def_bb
+      || flow_bb_inside_loop_p (def_bb->loop_father, tgt_bb))
+    return NULL;;
+
+  iv_update_stmt
+      = cause_overlapping_lr (data, live_out_nm, use, tgt_bb);
+
+  gcc_assert (!iv_update_stmt ||
+              gimple_code (iv_update_stmt) == GIMPLE_ASSIGN);
+
+  closing_phi = find_closing_phi (tgt_bb, live_out_nm,
+                                  iv_update_stmt, &closing_phi_for_updated_val);
+
+  /* No closing phi is found.  */
+  if (!closing_phi)
+    {
+      edge e;
+      edge_iterator ei;
+
+      closing_phi = create_phi_node (live_out_nm, tgt_bb);
+      create_new_def_for (gimple_phi_result (closing_phi), closing_phi,
+                          gimple_phi_result_ptr (closing_phi));
+      gcc_assert (single_pred_p (tgt_bb));
+      if (!iv_update_stmt)
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, live_out_nm, e, UNKNOWN_LOCATION);
+          new_def_stmt = closing_phi;
+        }
+      else
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, gimple_assign_lhs (iv_update_stmt),
+                           e, UNKNOWN_LOCATION);
+          /* Now make the value adjustment.  */
+          new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+        }
+    }
+  else if (!closing_phi_for_updated_val)
+    /* Scenario 1 above.  */
+    new_def_stmt = closing_phi;
+  else
+    {
+      /* Scenario 2 above.  */
+      gcc_assert (iv_update_stmt);
+      new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+    }
+
+  /* Now map it.  */
+  slot = pointer_map_insert (nm_to_def_map, live_out_nm);
+  *slot = (void *) new_def_stmt;
+
+  return (new_def_stmt != closing_phi ? new_def_stmt : NULL);
+}
+
+/* The function ensures closed ssa form for all names used in
+   REPLACED_IV_OUT_VAL. TGT_BB is the target bb where the new
+   computation is going to be, USE is the nonlinear use to be
+   rewritten (at loop exits), and *FIXED_UP_VAL holds the live out
+   value after name fixup. It returns the inverted iv update
+   statement if it is created.  */
+
+static gimple
+ensure_closed_ssa_form (struct ivopts_data *data,
+                        basic_block tgt_bb,
+                        struct iv_use *use,
+                        tree replaced_iv_out_val,
+                        tree *fixed_up_val)
+{
+  unsigned i;
+  tree nm;
+  VEC(tree, heap) *used_ssa_names = NULL;
+  struct pointer_map_t *nm_to_def_map = NULL;
+  gimple inverted_incr = NULL;
+
+  nm_to_def_map = pointer_map_create ();
+  *fixed_up_val = unshare_expr (replaced_iv_out_val);
+  walk_tree_without_duplicates (fixed_up_val,
+                                collect_ssa_names, &used_ssa_names);
+
+  for (i = 0;
+       VEC_iterate (tree, used_ssa_names, i, nm); i++)
+    {
+      gimple inv_incr;
+      if ((inv_incr
+           = ensure_closed_ssa_form_for (data, nm, tgt_bb,
+                                         use, nm_to_def_map)))
+        {
+          gcc_assert (!inverted_incr);
+          inverted_incr = inv_incr;
+        }
+    }
+
+  /* Now fix up the references in val.  */
+  fixup_iv_out_val (fixed_up_val, nm_to_def_map);
+  pointer_map_destroy (nm_to_def_map);
+  return inverted_incr;
+}
+
+/* The function returns true if it is possible to sink final value
+   computation for REPLACED_IV_OUT_NAME at loop exits.  */
+
+static bool
+can_compute_final_value_at_exits_p (struct ivopts_data *data,
+                                    tree replaced_iv_out_name)
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  gimple use_stmt;
+
+  /* Walk through all nonlinear uses in all loop exit blocks
+     to see if the sinking transformation is doable.  */
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      edge e;
+      edge_iterator ei;
+      bool found_exit_edge = false;
+
+      use_stmt = USE_STMT (use_p);
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (single_pred_p (exit_bb))
+        continue;
+
+      FOR_EACH_EDGE (e, ei, exit_bb->preds)
+        {
+          if (!flow_bb_inside_loop_p (data->current_loop,
+                                      e->src))
+            continue;
+          /* Can not split the edge.  */
+          if (e->flags & EDGE_ABNORMAL)
+            return false;
+
+          /* Do not handle the case where the exit bb has
+             multiple incoming exit edges from the same loop.  */
+          if (found_exit_edge)
+            return false;
+
+          found_exit_edge = true;
+        }
+      if (!found_exit_edge)
+        return false;
+    }
+  return true;
+}
+
+/* The function splits the loop exit edge targeting EXIT_BB if EXIT_BB
+    and returns the newly split bb.  REPLACED_IV_OUT_NAME is the original
+    ssa name that is live out, and the new use statement (new phi) will
+    be stored in *USE_STMT.  */
+
+static basic_block
+split_exit_edge (struct ivopts_data* data, basic_block exit_bb,
+                 tree replaced_iv_out_name, gimple *use_stmt)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, exit_bb->preds)
+    {
+      edge exit_edge;
+      gimple_stmt_iterator psi;
+      gimple new_use_phi = NULL;
+
+      if (!flow_bb_inside_loop_p (data->current_loop, e->src))
+        continue;
+
+      gcc_assert (!(e->flags & EDGE_ABNORMAL));
+      exit_bb = split_loop_exit_edge (e);
+      exit_edge = single_pred_edge (exit_bb);
+
+      /* Now update the use stmt.  */
+      for (psi = gsi_start_phis (exit_bb);
+           !gsi_end_p (psi); gsi_next (&psi))
+        {
+          tree phi_arg;
+          gimple new_phi = gsi_stmt (psi);
+
+          phi_arg
+              = PHI_ARG_DEF_FROM_EDGE (new_phi, exit_edge);
+          if (phi_arg == replaced_iv_out_name)
+            {
+              new_use_phi = new_phi;
+              break;
+            }
+        }
+      gcc_assert (new_use_phi);
+      *use_stmt = new_use_phi;
+
+      /* There is only one exit edge to split.  */
+      break;
+    }
+
+  return exit_bb;
+}
+
+/* For a non linear use USE that is used outside the loop DATA->current_loop
+   only, try to evaluate the live out value at the exits of the loop.
+   REPLACED_IV_OUT_NAME is the original ssa name that is live out, and
+   REPLACED_IV_OUT_VAL is the expression (in terms of the selected iv cand)
+   to evaluate the live out value. The function tries to sink the computation
+   of replaced_iv_out_val into loop exits, and returns true if successful.  */
+
+static bool
+compute_final_value_at_exits (struct ivopts_data *data,
+                              struct iv_use *use,
+                              tree replaced_iv_out_name,
+                              tree replaced_iv_out_val)
+{
+  imm_use_iterator iter;
+  gimple use_stmt;
+  struct iv* replaced_iv;
+
+  if (!can_compute_final_value_at_exits_p (data, replaced_iv_out_name))
+    return false;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      gimple new_assign;
+      gimple_stmt_iterator gsi, bsi;
+      tree phi_rslt, new_assign_rhs;
+      tree fixed_up_val;
+      gimple inverted_increment;
+
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (!single_pred_p (exit_bb))
+        exit_bb = split_exit_edge (data, exit_bb,
+                                   replaced_iv_out_name, &use_stmt);
+
+      gcc_assert (single_pred_p (exit_bb));
+
+      inverted_increment
+          = ensure_closed_ssa_form (data, exit_bb, use,
+                                    replaced_iv_out_val,
+                                    &fixed_up_val);
+
+      gcc_assert (gimple_code (use_stmt) == GIMPLE_PHI);
+      gsi = gsi_for_stmt (use_stmt);
+      phi_rslt = PHI_RESULT (use_stmt);
+      bsi = (inverted_increment
+             ? gsi_for_stmt (inverted_increment)
+             : gsi_after_labels (exit_bb));
+
+      /* Now convert the original loop exit phi (for closed SSA form)
+         into an assignment statement.  */
+      remove_phi_node (&gsi, false);
+      new_assign_rhs = force_gimple_operand_gsi (&bsi, fixed_up_val,
+                                                 false, NULL_TREE,
+                                                 (inverted_increment == NULL),
+                                                 (inverted_increment == NULL
+                                                  ? GSI_SAME_STMT
+                                                  : GSI_CONTINUE_LINKING));
+      new_assign = gimple_build_assign (phi_rslt, new_assign_rhs);
+      if (inverted_increment)
+        gsi_insert_after (&bsi, new_assign, GSI_SAME_STMT);
+      else
+        gsi_insert_before (&bsi, new_assign, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Sinking computation into exit bb %d\n",
+                   exit_bb->index);
+          print_gimple_stmt (dump_file, new_assign, 0, 0);
+          fprintf (dump_file, "\n");
+	}
+    }
+
+  /* Now the original stmt that defines the liveout value can be removed */
+
+  replaced_iv = get_iv (data, replaced_iv_out_name);
+  gcc_assert (replaced_iv);
+  replaced_iv->have_use_for = false;
+
+  return true;
+}
+
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
 
@@ -5455,6 +6444,11 @@ rewrite_use_nonlinear_expr (struct ivopt
       gcc_unreachable ();
     }
 
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+    {
+      if (compute_final_value_at_exits (data, use, tgt, comp))
+        return;
+    }
   op = force_gimple_operand_gsi (&bsi, comp, false, SSA_NAME_VAR (tgt),
 				 true, GSI_SAME_STMT);
 
@@ -5535,7 +6529,7 @@ rewrite_use_address (struct ivopts_data 
   aff_tree aff;
   gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   tree base_hint = NULL_TREE;
-  tree ref;
+  tree ref, iv;
   bool ok;
 
   ok = get_computation_aff (data->current_loop, use, cand, use->stmt, &aff);
@@ -5556,7 +6550,8 @@ rewrite_use_address (struct ivopts_data 
   if (cand->iv->base_object)
     base_hint = var_at_stmt (data->current_loop, cand, use->stmt);
 
-  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, base_hint,
+  iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, iv, base_hint,
 			data->speed);
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
@@ -5691,6 +6686,7 @@ free_loop_data (struct ivopts_data *data
   unsigned i, j;
   bitmap_iterator bi;
   tree obj;
+  struct version_info *vi;
 
   if (data->niters)
     {
@@ -5748,6 +6744,14 @@ free_loop_data (struct ivopts_data *data
 
   data->max_inv_id = 0;
 
+  for (i = 0; VEC_iterate (version_info_p,
+                           data->pseudo_version_info, i, vi); i++)
+    free (vi);
+
+  VEC_truncate (version_info_p, data->pseudo_version_info, 0);
+  data->min_pseudo_inv_id = num_ssa_names;
+
+
   for (i = 0; VEC_iterate (tree, decl_rtl_to_reset, i, obj); i++)
     SET_DECL_RTL (obj, NULL_RTX);
 
@@ -5768,6 +6772,11 @@ tree_ssa_iv_optimize_finalize (struct iv
   VEC_free (tree, heap, decl_rtl_to_reset);
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
+  if (inverted_stmt_map)
+    {
+      pointer_map_destroy (inverted_stmt_map);
+      inverted_stmt_map = NULL;
+    }
 }
 
 /* Optimizes the LOOP.  Returns true if anything changed.  */
@@ -5830,7 +6839,6 @@ tree_ssa_iv_optimize_loop (struct ivopts
 
   /* Create the new induction variables (item 4, part 1).  */
   create_new_ivs (data, iv_ca);
-  iv_ca_free (&iv_ca);
 
   /* Rewrite the uses (item 4, part 2).  */
   rewrite_uses (data);
@@ -5838,6 +6846,9 @@ tree_ssa_iv_optimize_loop (struct ivopts
   /* Remove the ivs that are unused after rewriting.  */
   remove_unused_ivs (data);
 
+  adjust_update_pos_for_ivs (data, iv_ca);
+
+  iv_ca_free (&iv_ca);
   /* We have changed the structure of induction variables; it might happen
      that definitions in the scev database refer to some of them that were
      eliminated.  */
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 159243)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -450,6 +450,31 @@ move_pointer_to_base (struct mem_address
   aff_combination_remove_elt (addr, i);
 }
 
+/* Moves the loop variant part V in linear address ADDR to be the index
+   of PARTS.  */
+
+static void
+move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
+{
+  unsigned i;
+  tree val = NULL_TREE;
+
+  gcc_assert (!parts->index);
+  for (i = 0; i < addr->n; i++)
+    {
+      val = addr->elts[i].val;
+      if (val == v)
+	break;
+    }
+
+  if (i == addr->n)
+    return;
+
+  parts->index = fold_convert (sizetype, val);
+  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
+  aff_combination_remove_elt (addr, i);
+}
+
 /* Adds ELT to PARTS.  */
 
 static void
@@ -553,7 +578,8 @@ most_expensive_mult_to_index (tree type,
 
 /* Splits address ADDR for a memory access of type TYPE into PARTS.
    If BASE_HINT is non-NULL, it specifies an SSA name to be used
-   preferentially as base of the reference.
+   preferentially as base of the reference, and IV_CAND is the selected
+   iv candidate used in ADDR.
 
    TODO -- be more clever about the distribution of the elements of ADDR
    to PARTS.  Some architectures do not support anything but single
@@ -563,8 +589,9 @@ most_expensive_mult_to_index (tree type,
    addressing modes is useless.  */
 
 static void
-addr_to_parts (tree type, aff_tree *addr, tree base_hint,
-	       struct mem_address *parts, bool speed)
+addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
+	       tree base_hint, struct mem_address *parts,
+               bool speed)
 {
   tree part;
   unsigned i;
@@ -582,9 +609,17 @@ addr_to_parts (tree type, aff_tree *addr
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
+  /* No need to do address parts reassociation if the number of parts
+     is <= 2 -- in that case, no loop invariant code motion can be
+     exposed.  */
+
+  if (!base_hint && (addr->n > 2))
+    move_variant_to_index (parts, addr, iv_cand);
+
   /* First move the most expensive feasible multiplication
      to index.  */
-  most_expensive_mult_to_index (type, parts, addr, speed);
+  if (!parts->index)
+    most_expensive_mult_to_index (type, parts, addr, speed);
 
   /* Try to find a base of the reference.  Since at the moment
      there is no reliable way how to distinguish between pointer and its
@@ -624,17 +659,19 @@ gimplify_mem_ref_parts (gimple_stmt_iter
 
 /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
    computations are emitted in front of GSI.  TYPE is the mode
-   of created memory reference.  */
+   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
+   and IS_CAND_BASE is a flag indidcats if IV_CAND comes from a base address
+   object.  */
 
 tree
 create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
-		tree base_hint, bool speed)
+		tree iv_cand, tree base_hint, bool speed)
 {
   tree mem_ref, tmp;
   tree atype;
   struct mem_address parts;
 
-  addr_to_parts (type, addr, base_hint, &parts, speed);
+  addr_to_parts (type, addr, iv_cand, base_hint, &parts, speed);
   gimplify_mem_ref_parts (gsi, &parts);
   mem_ref = create_mem_ref_raw (type, &parts);
   if (mem_ref)
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 159243)
+++ gcc/tree-flow.h	(working copy)
@@ -863,7 +863,7 @@ struct mem_address
 
 struct affine_tree_combination;
 tree create_mem_ref (gimple_stmt_iterator *, tree,
-		     struct affine_tree_combination *, tree, bool);
+		     struct affine_tree_combination *, tree, tree, bool);
 rtx addr_for_mem_ref (struct mem_address *, addr_space_t, bool);
 void get_address_description (tree, struct mem_address *);
 tree maybe_fold_tmr (tree);

[-- Attachment #3: ivopts.cg --]
[-- Type: application/octet-stream, Size: 2222 bytes --]

2010-05-10  Xinliang David Li  <davidxl@google.com>

	* tree-ssa-loop-ivopts.c (avg_loop_niter): New function.
	(dump_cand): Dump var_before/after.
	(ver_info): Handle pseudo invariants.
	(tree_ssa_iv_optimize_init): Support pseudo invariants.
	(record_use): initialize use_pos. 
	(record_pseudo_invariant): New function.
	(find_interesting_uses_op): Add new parameter.
	(find_interesting_uses_cond): Passing new parameter.
	(idx_record_use): Passing new parameter.
	(find_interesting_uses_address): Changes return type.
	(find_interesting_uses_stmt): Passing new parameter.
	(find_interesting_uses_outside): Passing new parameter. 
	(add_candidate_1): consider base type precision.
	(compare_aff_trees): New function.
	(create_loop_invariant_temp): Ditto.
	(get_compution_cost_at): Handles loop invariants.
	(may_eliminate_iv): fix conservativeness.
	(determine_use_iv_cost_condition): Use profile data.
	(determine_iv_cost): Ditto.
	(iv_ca_recount_cost): Adds cost from pesudos.
	(iv_ca_remove_invariants): New parameter.
	(iv_ca_set_no_cp): Passing new parameter.
	(iv_ca_set_add_invariants): Handle pseudo invariants.
	(iv_ca_set_cp): Passing new paramater.
	(iv_ca_new): Handle pseudo invariants.
	(iv_ca_dump): Better dumping.
	(iv_ca_extend): New parameter.
	(try_add_cand_for): Passing new parameter.
	(try_improve_iv_set): Ditto.
	(get_inverted_increment_1): New function.
	(get_inverted_increment): Ditto.
	(adjust_iv_update_pos): Ditto.
	(adjust_update_pos_for_ivs): Ditto.
	(create_new_iv): Passing new parameter.
	(create_new_ivs): Add dumping.
	(fixup_use): New function.
	(collect_ssa_names): Ditto.
	(fixup_iv_out_val): Ditto.
	(cause_overlapping_lr): Ditto.
	(find_closing_phi): Ditto.
	(ensure_closed_ssa_form_for): Ditto.
	(ensure_closed_ssa_form): Ditto.
	(can_compute_final_value_at_exits_p): Ditto.
	(rewrite_use_nonlinear_expr): Handle new use_pos.
	(rewrite_use_address): Passing new parameter.
	(free_loop_data): Handle pseudos.
	(tree_ssa_iv_optimize_finalize): Free new data structure.
	(tree_ssa_iv_optimize_loop): Delay freeing data.
	* tree-ssa-address.c (move_variant_to_index): New function.
	(addr_to_parts): Better part assigment to expose LIM.
	(create_mem_ref): Passing new parameter.


[-- Attachment #4: ivopts_test.cg --]
[-- Type: application/octet-stream, Size: 338 bytes --]

2010-05-10  Xinliang David Li  <davidxl@google.com>

	* gcc.dg/tree-ssa/ivopt_1.c: New test.
	* gcc.dg/tree-ssa/ivopt_2.c: New test.
	* gcc.dg/tree-ssa/ivopt_3.c: New test.
	* gcc.dg/tree-ssa/ivopt_4.c: New test.
	* gcc.dg/tree-ssa/ivopt_6.c: New test.
	* gcc.dg/tree-ssa/ivopt_7.c: New test.
	* gcc.dg/tree-ssa/ivopt_5_sink.c: New test.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
@ 2010-05-11  7:18 ` Zdenek Dvorak
  2010-05-11 17:29   ` Xinliang David Li
  2010-05-11  7:26 ` Steven Bosscher
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-11  7:18 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> The attached patch handles all the problems above except for 7.

could you please split the patch to separate parts for each problem,
and also describe how the problems are addressed?  Thanks,

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
  2010-05-11  7:18 ` Zdenek Dvorak
@ 2010-05-11  7:26 ` Steven Bosscher
  2010-05-11 17:23   ` Xinliang David Li
  2010-05-11  8:34 ` Richard Guenther
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 100+ messages in thread
From: Steven Bosscher @ 2010-05-11  7:26 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
> The attached patch handles all the problems above except for 7.
>
>
> Bootstrapped and regression tested on linux/x86_64.
>
> The patch was not tuned for SPEC, but SPEC testing was done.
> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).

Impressive!

Could you please split the patch into smaller bits, explaining for
each part of the patch what it does?  It is very difficult to review
large patches like this. You should also commit the patch per part
once approved -- it's also very difficult to hunt regressions down to
single changes when the single change is very large.

Some stylistic comments:

> +  /* Pseudo version infos for generated loop invariants.  */
> +  VEC(version_info_p,heap) *pseudo_version_info;

Could you explain what you mean with "pseudo invariants"? Most GCC
developers will think of pseudo-registers when they see the term
pseudo in GCC, but I think you use the word in a different context
here. Best would be to use another term (dummy?) or otherwise add an
explanation.

> +#define PSEUDO_COMMON_FACTOR 0.3

We try to avoid floating point math in GCC (portability issues). You
should use integer math instead (e.g. see what GCC does with
REG_BR_PROB_BASE, and CGRAPH_FREQ_BASE, and in a few other places).

>  /* Finds addresses in *OP_P inside STMT.  */
>
> -static void
> +static bool
> find_interesting_uses_address (struct ivopts_data *data, gimple stmt, tree *op_p)

Please document the return value.

> -find_interesting_uses_op (struct ivopts_data *data, tree op)
> +find_interesting_uses_op (struct ivopts_data *data, tree op, enum iv_use_pos use_pos)

Please keep the line lengths at less than 80 characters.

> +adjust_iv_update_pos (struct ivopts_data *data ATTRIBUTE_UNUSED, struct iv_cand *cand)

And here, too... And elsewhere, where I didn't spot it :-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
  2010-05-11  7:18 ` Zdenek Dvorak
  2010-05-11  7:26 ` Steven Bosscher
@ 2010-05-11  8:34 ` Richard Guenther
  2010-05-11  9:48   ` Jan Hubicka
                     ` (2 more replies)
  2010-05-11 17:19 ` Toon Moene
  2010-05-13 13:00 ` Toon Moene
  4 siblings, 3 replies; 100+ messages in thread
From: Richard Guenther @ 2010-05-11  8:34 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
> Hi, IVOPT has been one of the main area of complaints from gcc users
> and it is often shutdown or user is forced to use inline assembly to
> write key kernel loops. The following (resulting from the
> investigation of many user complaints) summarize some of the key
> problems:
>
> 1) Too many induction variables are used and advanced addressing mode
> is not fully taken advantage of. On latest Intel CPU, the increased
> loop size (due to iv updates) can have very large negative impact on
> performance, e.g, when LSD and uop macro fusion get blocked. The root
> cause of the problem is not at the cost model used in IVOPT, but in
> the algorithm in finding the 'optimal' assignment from iv candidates
> to uses.
>
> 2) Profile information is not used in cost estimation (e.g. computing
> cost of loop variants)
>
> 3) For replaced IV (original) that are only live out of the loop (i.e.
> there are no uses inside loop), the rewrite of the IV occurs inside
> the loop which usually results in code more expensive than the
> original iv update statement -- and it is very difficult for later
> phases to sink down the computation outside the loop (see PR31792).
> The right solution is to materialize/rewrite such ivs directly outside
> the loop (also to avoid introducing overlapping live ranges)
>
> 4) iv update statement sometimes block the forward
> propagation/combination of the memory ref operation (depending the
> before IV value)  with the loop branch compare. Simple minded
> propagation will lead to overlapping live range and addition copy/move
> instruction to be generated.
>
> 5) In estimating the global cost (register pressure), the registers
> resulting from LIM of invariant expressions are not considered
>
> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
> the same part -- which is essentially a re-association blocking LIM
>
> 7) Intrinsic calls that are essentially memory operations are not
> recognized as uses.

8) Replacement pointer induction variables do not inherit alias-information
pessimizing MEM_REF memory operations.

> The attached patch handles all the problems above except for 7.
>
>
> Bootstrapped and regression tested on linux/x86_64.
>
> The patch was not tuned for SPEC, but SPEC testing was done.
> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).

Can you split the patch into pieces and check SPEC numbers also
for 64bit operation?  I assume that maybe powerpc people want to
check the performance impact as well.

Thanks,
Richard.

> Ok for trunk?
>
> Thanks,
>
> David
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  8:34 ` Richard Guenther
@ 2010-05-11  9:48   ` Jan Hubicka
  2010-05-11 10:04     ` Steven Bosscher
  2010-05-11 14:24   ` Peter Bergner
  2010-05-11 17:28   ` Xinliang David Li
  2 siblings, 1 reply; 100+ messages in thread
From: Jan Hubicka @ 2010-05-11  9:48 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Xinliang David Li, GCC Patches

> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
> > Hi, IVOPT has been one of the main area of complaints from gcc users
> > and it is often shutdown or user is forced to use inline assembly to
> > write key kernel loops. The following (resulting from the
> > investigation of many user complaints) summarize some of the key
> > problems:
> >
> > 1) Too many induction variables are used and advanced addressing mode
> > is not fully taken advantage of. On latest Intel CPU, the increased
> > loop size (due to iv updates) can have very large negative impact on
> > performance, e.g, when LSD and uop macro fusion get blocked. The root
> > cause of the problem is not at the cost model used in IVOPT, but in
> > the algorithm in finding the 'optimal' assignment from iv candidates
> > to uses.
> >
> > 2) Profile information is not used in cost estimation (e.g. computing
> > cost of loop variants)
> >
> > 3) For replaced IV (original) that are only live out of the loop (i.e.
> > there are no uses inside loop), the rewrite of the IV occurs inside
> > the loop which usually results in code more expensive than the
> > original iv update statement -- and it is very difficult for later
> > phases to sink down the computation outside the loop (see PR31792).
> > The right solution is to materialize/rewrite such ivs directly outside
> > the loop (also to avoid introducing overlapping live ranges)
> >
> > 4) iv update statement sometimes block the forward
> > propagation/combination of the memory ref operation (depending the
> > before IV value) Â with the loop branch compare. Simple minded
> > propagation will lead to overlapping live range and addition copy/move
> > instruction to be generated.
> >
> > 5) In estimating the global cost (register pressure), the registers
> > resulting from LIM of invariant expressions are not considered
> >
> > 6) IN MEM_REF creation, loop variant and invariants may be assigned to
> > the same part -- which is essentially a re-association blocking LIM
> >
> > 7) Intrinsic calls that are essentially memory operations are not
> > recognized as uses.
> 
> 8) Replacement pointer induction variables do not inherit alias-information
> pessimizing MEM_REF memory operations.

9) IVopts seems to be compile time hog, especially compiling gamess from SPEC2k6.

Honza

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  9:48   ` Jan Hubicka
@ 2010-05-11 10:04     ` Steven Bosscher
  0 siblings, 0 replies; 100+ messages in thread
From: Steven Bosscher @ 2010-05-11 10:04 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Guenther, Xinliang David Li, GCC Patches

2010/5/11 Jan Hubicka <hubicka@ucw.cz>:
> 9) IVopts seems to be compile time hog, especially compiling gamess from SPEC2k6.

Right. Point proven that "IVOPT has been one of the main area of
complaints from gcc users" and, apparently, developers too. David,
consider this encouragement for your patches ;-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  8:34 ` Richard Guenther
  2010-05-11  9:48   ` Jan Hubicka
@ 2010-05-11 14:24   ` Peter Bergner
  2010-05-11 17:28   ` Xinliang David Li
  2 siblings, 0 replies; 100+ messages in thread
From: Peter Bergner @ 2010-05-11 14:24 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Xinliang David Li, GCC Patches

On Tue, 2010-05-11 at 10:34 +0200, Richard Guenther wrote:
> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
> > The patch was not tuned for SPEC, but SPEC testing was done.
> > Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
> > 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).
> 
> Can you split the patch into pieces and check SPEC numbers also
> for 64bit operation?  I assume that maybe powerpc people want to
> check the performance impact as well.

I'll have someone on my team SPEC test this on powerpc.
Both 32-bit and 64-bit.

Peter



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
                   ` (2 preceding siblings ...)
  2010-05-11  8:34 ` Richard Guenther
@ 2010-05-11 17:19 ` Toon Moene
  2010-05-11 17:49   ` Xinliang David Li
  2010-05-13 13:00 ` Toon Moene
  4 siblings, 1 reply; 100+ messages in thread
From: Toon Moene @ 2010-05-11 17:19 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On 05/11/2010 08:35 AM, Xinliang David Li wrote:

> Hi, IVOPT has been one of the main area of complaints from gcc users
> and it is often shutdown or user is forced to use inline assembly to
> write key kernel loops. The following (resulting from the
> investigation of many user complaints) summarize some of the key
> problems:

> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
> the same part -- which is essentially a re-association blocking LIM

On the other hand, some recombination of induction variables is 
necessary to prevent excessive register pressure (and the resulting spills).

 From my slides at the May, 1999 Linux Expo:

"Let's turn our attention to the kinetic energy loop again:
\begin{verbatim}
       DO 810 I=ILONP2,ILNLT
          ZEK(I) = 0.25 *
      +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
      +                 *HYU(I-1   )
      +          + PUZ(I     ,K)*PUZ(I     ,K)
      +                 *HYU(I     ))*RHYV (I)
      +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
      +                 *HXV(I-ILON)
      +          + PVZ(I     ,K)*PVZ(I     ,K)
      +                 *HXV(I     ))*RHXU (I) )
  810  CONTINUE
\end{verbatim}
If we strength reduce all induction variables and move
all loop invariant code out of the loop, we need
11 registers to hold the addresses needed to step through
the arrays.
\end{slide}
\begin{slide}{}
We can do better, by noting that
\begin{verbatim}
{ PUZ(I-1   ,K), PUZ(I     ,K) }
{ PVZ(I-ILON,K), PVZ(I     ,K) }
{ HYU(I-1   )  , HYU(I     )   }
{ HXV(I-ILON)  , HXV(I     )   }
\end{verbatim}
form 4 {\em equivalence classes} of induction variables
that differ only by a constant - which means they
can be written in the form of address-register-with-offset.

In doing so, we save 4 registers and need only 7 registers
for addressing.

Richard Henderson implemented this optimization, which
will be part of egcs-1.2 ... I mean gcc-2.95."

That was '99, so it was discussing code compiled by g77 and using RTL 
optimization passes only - but the idea is the same.

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  7:26 ` Steven Bosscher
@ 2010-05-11 17:23   ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11 17:23 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches

On Tue, May 11, 2010 at 12:25 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
>> The attached patch handles all the problems above except for 7.
>>
>>
>> Bootstrapped and regression tested on linux/x86_64.
>>
>> The patch was not tuned for SPEC, but SPEC testing was done.
>> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
>> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).
>
> Impressive!
>
> Could you please split the patch into smaller bits, explaining for
> each part of the patch what it does?  It is very difficult to review
> large patches like this. You should also commit the patch per part
> once approved -- it's also very difficult to hunt regressions down to
> single changes when the single change is very large.
>

Will split.


> Some stylistic comments:
>
>> +  /* Pseudo version infos for generated loop invariants.  */
>> +  VEC(version_info_p,heap) *pseudo_version_info;
>
> Could you explain what you mean with "pseudo invariants"? Most GCC
> developers will think of pseudo-registers when they see the term
> pseudo in GCC, but I think you use the word in a different context
> here. Best would be to use another term (dummy?) or otherwise add an
> explanation.

It represents live ranges for common expressions that can be hoisted
out of the loop. These expressions are usually created after full
unrolling of inner loops. Will add explanation.

>
>
>> +#define PSEUDO_COMMON_FACTOR 0.3
>
> We try to avoid floating point math in GCC (portability issues). You
> should use integer math instead (e.g. see what GCC does with
> REG_BR_PROB_BASE, and CGRAPH_FREQ_BASE, and in a few other places).

Done.

>
>>  /* Finds addresses in *OP_P inside STMT.  */
>>
>> -static void
>> +static bool
>> find_interesting_uses_address (struct ivopts_data *data, gimple stmt, tree *op_p)
>
> Please document the return value.


This change is not needed actually (from other changes that are not removed).

>
>
>> -find_interesting_uses_op (struct ivopts_data *data, tree op)
>> +find_interesting_uses_op (struct ivopts_data *data, tree op, enum iv_use_pos use_pos)
>
> Please keep the line lengths at less than 80 characters.

Fixed.

>
>> +adjust_iv_update_pos (struct ivopts_data *data ATTRIBUTE_UNUSED, struct iv_cand *cand)
>
> And here, too... And elsewhere, where I didn't spot it :-)

Done.

Thanks,

David
>
> Ciao!
> Steven
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  8:34 ` Richard Guenther
  2010-05-11  9:48   ` Jan Hubicka
  2010-05-11 14:24   ` Peter Bergner
@ 2010-05-11 17:28   ` Xinliang David Li
  2010-05-12  8:55     ` Richard Guenther
  2 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11 17:28 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC Patches

On Tue, May 11, 2010 at 1:34 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Hi, IVOPT has been one of the main area of complaints from gcc users
>> and it is often shutdown or user is forced to use inline assembly to
>> write key kernel loops. The following (resulting from the
>> investigation of many user complaints) summarize some of the key
>> problems:
>>
>> 1) Too many induction variables are used and advanced addressing mode
>> is not fully taken advantage of. On latest Intel CPU, the increased
>> loop size (due to iv updates) can have very large negative impact on
>> performance, e.g, when LSD and uop macro fusion get blocked. The root
>> cause of the problem is not at the cost model used in IVOPT, but in
>> the algorithm in finding the 'optimal' assignment from iv candidates
>> to uses.
>>
>> 2) Profile information is not used in cost estimation (e.g. computing
>> cost of loop variants)
>>
>> 3) For replaced IV (original) that are only live out of the loop (i.e.
>> there are no uses inside loop), the rewrite of the IV occurs inside
>> the loop which usually results in code more expensive than the
>> original iv update statement -- and it is very difficult for later
>> phases to sink down the computation outside the loop (see PR31792).
>> The right solution is to materialize/rewrite such ivs directly outside
>> the loop (also to avoid introducing overlapping live ranges)
>>
>> 4) iv update statement sometimes block the forward
>> propagation/combination of the memory ref operation (depending the
>> before IV value)  with the loop branch compare. Simple minded
>> propagation will lead to overlapping live range and addition copy/move
>> instruction to be generated.
>>
>> 5) In estimating the global cost (register pressure), the registers
>> resulting from LIM of invariant expressions are not considered
>>
>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>> the same part -- which is essentially a re-association blocking LIM
>>
>> 7) Intrinsic calls that are essentially memory operations are not
>> recognized as uses.
>
> 8) Replacement pointer induction variables do not inherit alias-information
> pessimizing MEM_REF memory operations.


This is a good one. Is there an existing mechanism for the update?


>
>> The attached patch handles all the problems above except for 7.
>>
>>
>> Bootstrapped and regression tested on linux/x86_64.
>>
>> The patch was not tuned for SPEC, but SPEC testing was done.
>> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
>> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).
>
> Can you split the patch into pieces and check SPEC numbers also
> for 64bit operation?  I assume that maybe powerpc people want to
> check the performance impact as well.

On the same machine with m64, eon improves 1.8%, others up and downs
are less  than 1%.

David


>
> Thanks,
> Richard.
>
>> Ok for trunk?
>>
>> Thanks,
>>
>> David
>>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  7:18 ` Zdenek Dvorak
@ 2010-05-11 17:29   ` Xinliang David Li
  2010-05-25  0:17     ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11 17:29 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Tue, May 11, 2010 at 12:18 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> The attached patch handles all the problems above except for 7.
>
> could you please split the patch to separate parts for each problem,
> and also describe how the problems are addressed?  Thanks,
>
> Zdenek

Ok will do.

David

>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 17:19 ` Toon Moene
@ 2010-05-11 17:49   ` Xinliang David Li
  2010-05-11 21:52     ` Toon Moene
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11 17:49 UTC (permalink / raw)
  To: Toon Moene; +Cc: GCC Patches

On Tue, May 11, 2010 at 10:18 AM, Toon Moene <toon@moene.org> wrote:
> On 05/11/2010 08:35 AM, Xinliang David Li wrote:
>
>> Hi, IVOPT has been one of the main area of complaints from gcc users
>> and it is often shutdown or user is forced to use inline assembly to
>> write key kernel loops. The following (resulting from the
>> investigation of many user complaints) summarize some of the key
>> problems:
>
>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>> the same part -- which is essentially a re-association blocking LIM
>
> On the other hand, some recombination of induction variables is necessary to
> prevent excessive register pressure (and the resulting spills).
>
> From my slides at the May, 1999 Linux Expo:
>
> "Let's turn our attention to the kinetic energy loop again:
> \begin{verbatim}
>      DO 810 I=ILONP2,ILNLT
>         ZEK(I) = 0.25 *
>     +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
>     +                 *HYU(I-1   )
>     +          + PUZ(I     ,K)*PUZ(I     ,K)
>     +                 *HYU(I     ))*RHYV (I)
>     +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
>     +                 *HXV(I-ILON)
>     +          + PVZ(I     ,K)*PVZ(I     ,K)
>     +                 *HXV(I     ))*RHXU (I) )
>  810  CONTINUE
> \end{verbatim}
> If we strength reduce all induction variables and move
> all loop invariant code out of the loop, we need
> 11 registers to hold the addresses needed to step through
> the arrays.
> \end{slide}
> \begin{slide}{}
> We can do better, by noting that
> \begin{verbatim}
> { PUZ(I-1   ,K), PUZ(I     ,K) }
> { PVZ(I-ILON,K), PVZ(I     ,K) }
> { HYU(I-1   )  , HYU(I     )   }
> { HXV(I-ILON)  , HXV(I     )   }
> \end{verbatim}
> form 4 {\em equivalence classes} of induction variables
> that differ only by a constant - which means they
> can be written in the form of address-register-with-offset.
>

Finding the optimal partition and iv selection/assignment is the core
part of the IVOPT. GCC IVOPT uses a cost based approach to achieve
that, however there are some issues that prevent the above optimal
solution above. One of the intention of this patch is to address it.

The problem described in 6 is different. Assuming we have the
following linear address expressions:

base + iv + inv*coeff,  where iv is the induction variable, and inv is
loop invariant and coeff is constant 2/4/8, the synthesized mem ref
(without the fix) can be:

MEM_REF(base + iv, inv, coeff)

With the fix,

MEM_REF(base + inv*coeff, iv, 1)

Where base+inv*coeff can be later hoisted out of the loop.

Of course the added cost is the increased register pressure -- but
this is modelled by the so call pseudo invariant.

Thanks,

David



> In doing so, we save 4 registers and need only 7 registers
> for addressing.
>
> Richard Henderson implemented this optimization, which
> will be part of egcs-1.2 ... I mean gcc-2.95."
>
> That was '99, so it was discussing code compiled by g77 and using RTL
> optimization passes only - but the idea is the same.
>
> Kind regards,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 17:49   ` Xinliang David Li
@ 2010-05-11 21:52     ` Toon Moene
  2010-05-11 22:31       ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Toon Moene @ 2010-05-11 21:52 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On 05/11/2010 07:49 PM, Xinliang David Li wrote:

> On Tue, May 11, 2010 at 10:18 AM, Toon Moene<toon@moene.org>  wrote:

>> On 05/11/2010 08:35 AM, Xinliang David Li wrote:
>>
>>> Hi, IVOPT has been one of the main area of complaints from gcc users
>>> and it is often shutdown or user is forced to use inline assembly to
>>> write key kernel loops. The following (resulting from the
>>> investigation of many user complaints) summarize some of the key
>>> problems:
>>
>>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>>> the same part -- which is essentially a re-association blocking LIM
>>
>> On the other hand, some recombination of induction variables is necessary to
>> prevent excessive register pressure (and the resulting spills).
>>
>>  From my slides at the May, 1999 Linux Expo:
>>
>> "Let's turn our attention to the kinetic energy loop again:
>> \begin{verbatim}
>>       DO 810 I=ILONP2,ILNLT
>>          ZEK(I) = 0.25 *
>>      +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
>>      +                 *HYU(I-1   )
>>      +          + PUZ(I     ,K)*PUZ(I     ,K)
>>      +                 *HYU(I     ))*RHYV (I)
>>      +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
>>      +                 *HXV(I-ILON)
>>      +          + PVZ(I     ,K)*PVZ(I     ,K)
>>      +                 *HXV(I     ))*RHXU (I) )
>>   810  CONTINUE
>> \end{verbatim}
>> If we strength reduce all induction variables and move
>> all loop invariant code out of the loop, we need
>> 11 registers to hold the addresses needed to step through
>> the arrays.
>> \end{slide}
>> \begin{slide}{}
>> We can do better, by noting that
>> \begin{verbatim}
>> { PUZ(I-1   ,K), PUZ(I     ,K) }
>> { PVZ(I-ILON,K), PVZ(I     ,K) }
>> { HYU(I-1   )  , HYU(I     )   }
>> { HXV(I-ILON)  , HXV(I     )   }
>> \end{verbatim}
>> form 4 {\em equivalence classes} of induction variables
>> that differ only by a constant - which means they
>> can be written in the form of address-register-with-offset.
>>
>
> Finding the optimal partition and iv selection/assignment is the core
> part of the IVOPT. GCC IVOPT uses a cost based approach to achieve
> that, however there are some issues that prevent the above optimal
> solution above. One of the intention of this patch is to address it.

Thanks !

> The problem described in 6 is different. Assuming we have the
> following linear address expressions:
>
> base + iv + inv*coeff,  where iv is the induction variable, and inv is
> loop invariant and coeff is constant 2/4/8, the synthesized mem ref
> (without the fix) can be:
>
> MEM_REF(base + iv, inv, coeff)
>
> With the fix,
>
> MEM_REF(base + inv*coeff, iv, 1)
>
> Where base+inv*coeff can be later hoisted out of the loop.
>
> Of course the added cost is the increased register pressure -- but
> this is modelled by the so call pseudo invariant.

Does this also take care of induction variables of the form of:

inv * iv + inv + const

?

This was one we had to "deal with" in the second half of the '90s, as a 
result of Fortran arrays being (by default) 1-based instead of 0-based.

However, it might be that the gfortran front end presents this problem 
differently to the middle end than g77 presented it to the RTL passes.

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 21:52     ` Toon Moene
@ 2010-05-11 22:31       ` Xinliang David Li
  2010-05-11 22:44         ` Toon Moene
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-11 22:31 UTC (permalink / raw)
  To: Toon Moene; +Cc: GCC Patches

On Tue, May 11, 2010 at 2:52 PM, Toon Moene <toon@moene.org> wrote:
> On 05/11/2010 07:49 PM, Xinliang David Li wrote:
>
>> On Tue, May 11, 2010 at 10:18 AM, Toon Moene<toon@moene.org>  wrote:
>
>>> On 05/11/2010 08:35 AM, Xinliang David Li wrote:
>>>
>>>> Hi, IVOPT has been one of the main area of complaints from gcc users
>>>> and it is often shutdown or user is forced to use inline assembly to
>>>> write key kernel loops. The following (resulting from the
>>>> investigation of many user complaints) summarize some of the key
>>>> problems:
>>>
>>>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>>>> the same part -- which is essentially a re-association blocking LIM
>>>
>>> On the other hand, some recombination of induction variables is necessary
>>> to
>>> prevent excessive register pressure (and the resulting spills).
>>>
>>>  From my slides at the May, 1999 Linux Expo:
>>>
>>> "Let's turn our attention to the kinetic energy loop again:
>>> \begin{verbatim}
>>>      DO 810 I=ILONP2,ILNLT
>>>         ZEK(I) = 0.25 *
>>>     +        ( ( PUZ(I-1   ,K)*PUZ(I-1   ,K)
>>>     +                 *HYU(I-1   )
>>>     +          + PUZ(I     ,K)*PUZ(I     ,K)
>>>     +                 *HYU(I     ))*RHYV (I)
>>>     +        + ( PVZ(I-ILON,K)*PVZ(I-ILON,K)
>>>     +                 *HXV(I-ILON)
>>>     +          + PVZ(I     ,K)*PVZ(I     ,K)
>>>     +                 *HXV(I     ))*RHXU (I) )
>>>  810  CONTINUE
>>> \end{verbatim}
>>> If we strength reduce all induction variables and move
>>> all loop invariant code out of the loop, we need
>>> 11 registers to hold the addresses needed to step through
>>> the arrays.
>>> \end{slide}
>>> \begin{slide}{}
>>> We can do better, by noting that
>>> \begin{verbatim}
>>> { PUZ(I-1   ,K), PUZ(I     ,K) }
>>> { PVZ(I-ILON,K), PVZ(I     ,K) }
>>> { HYU(I-1   )  , HYU(I     )   }
>>> { HXV(I-ILON)  , HXV(I     )   }
>>> \end{verbatim}
>>> form 4 {\em equivalence classes} of induction variables
>>> that differ only by a constant - which means they
>>> can be written in the form of address-register-with-offset.
>>>
>>
>> Finding the optimal partition and iv selection/assignment is the core
>> part of the IVOPT. GCC IVOPT uses a cost based approach to achieve
>> that, however there are some issues that prevent the above optimal
>> solution above. One of the intention of this patch is to address it.
>
> Thanks !
>
>> The problem described in 6 is different. Assuming we have the
>> following linear address expressions:
>>
>> base + iv + inv*coeff,  where iv is the induction variable, and inv is
>> loop invariant and coeff is constant 2/4/8, the synthesized mem ref
>> (without the fix) can be:
>>
>> MEM_REF(base + iv, inv, coeff)
>>
>> With the fix,
>>
>> MEM_REF(base + inv*coeff, iv, 1)
>>
>> Where base+inv*coeff can be later hoisted out of the loop.
>>
>> Of course the added cost is the increased register pressure -- but
>> this is modelled by the so call pseudo invariant.
>
> Does this also take care of induction variables of the form of:
>
> inv * iv + inv + const
>
> ?
>
> This was one we had to "deal with" in the second half of the '90s, as a
> result of Fortran arrays being (by default) 1-based instead of 0-based.
>

The answer depends:

For this linear address expression:  inv1*iv + inv2 + const

if inv1 is not compile time constant (or too big one), the cost of
expressing it using this iv is 'infinite' in the current
implementation, which means this expression will either be strength
reduced (to its own iv candidate) or expressed using another iv with
stride == inv1.  If inv1 is a constant but not one of the multiplier
allowed in the target address, the cost of expressing is also high,
but the final expression may end up in this form depending on other
cost.

And yes if the final expression is in this form, the reassociation
will make the memory op into MEM_REF (inv2 + const, iv, inv1).

Thanks,

David


> However, it might be that the gfortran front end presents this problem
> differently to the middle end than g77 presented it to the RTL passes.
>
> Kind regards,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 22:31       ` Xinliang David Li
@ 2010-05-11 22:44         ` Toon Moene
  0 siblings, 0 replies; 100+ messages in thread
From: Toon Moene @ 2010-05-11 22:44 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On 05/12/2010 12:31 AM, Xinliang David Li wrote:

> On Tue, May 11, 2010 at 2:52 PM, Toon Moene<toon@moene.org>  wrote:

>> Does this also take care of induction variables of the form of:
>>
>> inv * iv + inv + const
>>
>> ?
>>
>> This was one we had to "deal with" in the second half of the '90s, as a
>> result of Fortran arrays being (by default) 1-based instead of 0-based.

> The answer depends:
>
> For this linear address expression:  inv1*iv + inv2 + const
>
> if inv1 is not compile time constant (or too big one), the cost of
> expressing it using this iv is 'infinite' in the current
> implementation, which means this expression will either be strength
> reduced (to its own iv candidate) or expressed using another iv with
> stride == inv1.  If inv1 is a constant but not one of the multiplier
> allowed in the target address, the cost of expressing is also high,
> but the final expression may end up in this form depending on other
> cost.
>
> And yes if the final expression is in this form, the reassociation
> will make the memory op into MEM_REF (inv2 + const, iv, inv1).

Thanks - that answers all my questions.

Kind regards,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 17:28   ` Xinliang David Li
@ 2010-05-12  8:55     ` Richard Guenther
  0 siblings, 0 replies; 100+ messages in thread
From: Richard Guenther @ 2010-05-12  8:55 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On Tue, May 11, 2010 at 7:27 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, May 11, 2010 at 1:34 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 11, 2010 at 8:35 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> Hi, IVOPT has been one of the main area of complaints from gcc users
>>> and it is often shutdown or user is forced to use inline assembly to
>>> write key kernel loops. The following (resulting from the
>>> investigation of many user complaints) summarize some of the key
>>> problems:
>>>
>>> 1) Too many induction variables are used and advanced addressing mode
>>> is not fully taken advantage of. On latest Intel CPU, the increased
>>> loop size (due to iv updates) can have very large negative impact on
>>> performance, e.g, when LSD and uop macro fusion get blocked. The root
>>> cause of the problem is not at the cost model used in IVOPT, but in
>>> the algorithm in finding the 'optimal' assignment from iv candidates
>>> to uses.
>>>
>>> 2) Profile information is not used in cost estimation (e.g. computing
>>> cost of loop variants)
>>>
>>> 3) For replaced IV (original) that are only live out of the loop (i.e.
>>> there are no uses inside loop), the rewrite of the IV occurs inside
>>> the loop which usually results in code more expensive than the
>>> original iv update statement -- and it is very difficult for later
>>> phases to sink down the computation outside the loop (see PR31792).
>>> The right solution is to materialize/rewrite such ivs directly outside
>>> the loop (also to avoid introducing overlapping live ranges)
>>>
>>> 4) iv update statement sometimes block the forward
>>> propagation/combination of the memory ref operation (depending the
>>> before IV value)  with the loop branch compare. Simple minded
>>> propagation will lead to overlapping live range and addition copy/move
>>> instruction to be generated.
>>>
>>> 5) In estimating the global cost (register pressure), the registers
>>> resulting from LIM of invariant expressions are not considered
>>>
>>> 6) IN MEM_REF creation, loop variant and invariants may be assigned to
>>> the same part -- which is essentially a re-association blocking LIM
>>>
>>> 7) Intrinsic calls that are essentially memory operations are not
>>> recognized as uses.
>>
>> 8) Replacement pointer induction variables do not inherit alias-information
>> pessimizing MEM_REF memory operations.
>
>
> This is a good one. Is there an existing mechanism for the update?

Yes, there is duplicate_ssa_name_ptr_info which is for example
used by the vectorizer for its induction variables.

>>
>>> The attached patch handles all the problems above except for 7.
>>>
>>>
>>> Bootstrapped and regression tested on linux/x86_64.
>>>
>>> The patch was not tuned for SPEC, but SPEC testing was done.
>>> Observable improvements : gcc 4.85%, vpr 1.53%, bzip2 2.36%, and eon
>>> 2.43% (Machine CPU: Intel Xeon E5345/2.33Ghz, m32mode).
>>
>> Can you split the patch into pieces and check SPEC numbers also
>> for 64bit operation?  I assume that maybe powerpc people want to
>> check the performance impact as well.
>
> On the same machine with m64, eon improves 1.8%, others up and downs
> are less  than 1%.

Thanks for checking.

Richard.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
                   ` (3 preceding siblings ...)
  2010-05-11 17:19 ` Toon Moene
@ 2010-05-13 13:00 ` Toon Moene
  2010-05-13 13:30   ` Toon Moene
  4 siblings, 1 reply; 100+ messages in thread
From: Toon Moene @ 2010-05-13 13:00 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On 05/11/2010 08:35 AM, Xinliang David Li wrote:
> Hi, IVOPT has been one of the main area of complaints from gcc users
> and it is often shutdown or user is forced to use inline assembly to
> write key kernel loops. The following (resulting from the
> investigation of many user complaints) summarize some of the key
> problems:

I tried your patch today against revision 159362) and had no problem 
patching in the diff.

However, during stage 3 I ran into:

/home/toon/compilers/obj-t/./prev-gcc/xgcc 
-B/home/toon/compilers/obj-t/./prev-gcc/ 
-B/usr/snp/x86_64-unknown-linux-gnu/bin/ 
-B/usr/snp/x86_64-unknown-linux-gnu/bin/ 
-B/usr/snp/x86_64-unknown-linux-gnu/lib/ -isystem 
/usr/snp/x86_64-unknown-linux-gnu/include -isystem 
/usr/snp/x86_64-unknown-linux-gnu/sys-include    -c  -g -O2 -DIN_GCC 
-W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes 
-Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition 
-Wc++-compat   -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. 
-I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include 
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid 
-I../libdecnumber  -DCLOOG_PPL_BACKEND  -I/usr/include/libelf 
../../gcc/gcc/cfganal.c -o cfganal.o
../../gcc/gcc/cfganal.c: In function 'find_unreachable_blocks':
../../gcc/gcc/cfganal.c:281:1: internal compiler error: in 
compute_final_value_at_exits, at tree-ssa-loop-ivopts.c:6313

This is on x86_64-unknown-linux-gnu doing a run-of-the-mill 64-bit 
native build.

Hope this helps,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-13 13:00 ` Toon Moene
@ 2010-05-13 13:30   ` Toon Moene
  2010-05-13 16:23     ` Xinliang David Li
  2010-05-14  4:26     ` Xinliang David Li
  0 siblings, 2 replies; 100+ messages in thread
From: Toon Moene @ 2010-05-13 13:30 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

On 05/13/2010 03:00 PM, Toon Moene wrote:

> I tried your patch today against revision 159362) and had no problem
> patching in the diff.
>
> However, during stage 3 I ran into:
>
> /home/toon/compilers/obj-t/./prev-gcc/xgcc
> -B/home/toon/compilers/obj-t/./prev-gcc/
> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
> -B/usr/snp/x86_64-unknown-linux-gnu/lib/ -isystem
> /usr/snp/x86_64-unknown-linux-gnu/include -isystem
> /usr/snp/x86_64-unknown-linux-gnu/sys-include -c -g -O2 -DIN_GCC -W
> -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
> -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long
> -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
> -Wc++-compat -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/.
> -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include
> -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid
> -I../libdecnumber -DCLOOG_PPL_BACKEND -I/usr/include/libelf
> ../../gcc/gcc/cfganal.c -o cfganal.o
> ../../gcc/gcc/cfganal.c: In function 'find_unreachable_blocks':
> ../../gcc/gcc/cfganal.c:281:1: internal compiler error: in
> compute_final_value_at_exits, at tree-ssa-loop-ivopts.c:6313
>
> This is on x86_64-unknown-linux-gnu doing a run-of-the-mill 64-bit
> native build.

Well, that last sentence could have been more precise - this is what I 
did after applying your patch:

$ cd ~/compilers/gcc && svn up && echo "`date -u` (revision `svnversion 
.`)" >> LAST_UPDATED && cd ../obj-t && rm -rf * && ../gcc/configure 
--enable-checking=release  --prefix=/usr/snp --enable-gold 
--enable-plugins --disable-multilib --disable-nls --with-arch-64=native 
--with-tune-64=native --enable-languages=fortran,c++ 
--enable-stage1-languages=c++ --disable-werror && make -j8

Note, for instance, the --enable-checking=release

Hope this is useful,

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-13 13:30   ` Toon Moene
@ 2010-05-13 16:23     ` Xinliang David Li
  2010-05-14  4:26     ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-13 16:23 UTC (permalink / raw)
  To: Toon Moene; +Cc: GCC Patches

Thanks. I will take a look.

David

On Thu, May 13, 2010 at 6:30 AM, Toon Moene <toon@moene.org> wrote:
> On 05/13/2010 03:00 PM, Toon Moene wrote:
>
>> I tried your patch today against revision 159362) and had no problem
>> patching in the diff.
>>
>> However, during stage 3 I ran into:
>>
>> /home/toon/compilers/obj-t/./prev-gcc/xgcc
>> -B/home/toon/compilers/obj-t/./prev-gcc/
>> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
>> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
>> -B/usr/snp/x86_64-unknown-linux-gnu/lib/ -isystem
>> /usr/snp/x86_64-unknown-linux-gnu/include -isystem
>> /usr/snp/x86_64-unknown-linux-gnu/sys-include -c -g -O2 -DIN_GCC -W
>> -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
>> -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long
>> -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
>> -Wc++-compat -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/.
>> -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include
>> -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid
>> -I../libdecnumber -DCLOOG_PPL_BACKEND -I/usr/include/libelf
>> ../../gcc/gcc/cfganal.c -o cfganal.o
>> ../../gcc/gcc/cfganal.c: In function 'find_unreachable_blocks':
>> ../../gcc/gcc/cfganal.c:281:1: internal compiler error: in
>> compute_final_value_at_exits, at tree-ssa-loop-ivopts.c:6313
>>
>> This is on x86_64-unknown-linux-gnu doing a run-of-the-mill 64-bit
>> native build.
>
> Well, that last sentence could have been more precise - this is what I did
> after applying your patch:
>
> $ cd ~/compilers/gcc && svn up && echo "`date -u` (revision `svnversion .`)"
>>> LAST_UPDATED && cd ../obj-t && rm -rf * && ../gcc/configure
> --enable-checking=release  --prefix=/usr/snp --enable-gold --enable-plugins
> --disable-multilib --disable-nls --with-arch-64=native --with-tune-64=native
> --enable-languages=fortran,c++ --enable-stage1-languages=c++
> --disable-werror && make -j8
>
> Note, for instance, the --enable-checking=release
>
> Hope this is useful,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-13 13:30   ` Toon Moene
  2010-05-13 16:23     ` Xinliang David Li
@ 2010-05-14  4:26     ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-14  4:26 UTC (permalink / raw)
  To: Toon Moene; +Cc: GCC Patches

I could not reproduce the failure (my client was synced at 159362),
and I used the same check options and 3 stage bootstrap went just
fine. I will see how it goes when I break it into smaller patches.

Thanks,

David

On Thu, May 13, 2010 at 6:30 AM, Toon Moene <toon@moene.org> wrote:
> On 05/13/2010 03:00 PM, Toon Moene wrote:
>
>> I tried your patch today against revision 159362) and had no problem
>> patching in the diff.
>>
>> However, during stage 3 I ran into:
>>
>> /home/toon/compilers/obj-t/./prev-gcc/xgcc
>> -B/home/toon/compilers/obj-t/./prev-gcc/
>> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
>> -B/usr/snp/x86_64-unknown-linux-gnu/bin/
>> -B/usr/snp/x86_64-unknown-linux-gnu/lib/ -isystem
>> /usr/snp/x86_64-unknown-linux-gnu/include -isystem
>> /usr/snp/x86_64-unknown-linux-gnu/sys-include -c -g -O2 -DIN_GCC -W
>> -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes
>> -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long
>> -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition
>> -Wc++-compat -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/.
>> -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include
>> -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/bid
>> -I../libdecnumber -DCLOOG_PPL_BACKEND -I/usr/include/libelf
>> ../../gcc/gcc/cfganal.c -o cfganal.o
>> ../../gcc/gcc/cfganal.c: In function 'find_unreachable_blocks':
>> ../../gcc/gcc/cfganal.c:281:1: internal compiler error: in
>> compute_final_value_at_exits, at tree-ssa-loop-ivopts.c:6313
>>
>> This is on x86_64-unknown-linux-gnu doing a run-of-the-mill 64-bit
>> native build.
>
> Well, that last sentence could have been more precise - this is what I did
> after applying your patch:
>
> $ cd ~/compilers/gcc && svn up && echo "`date -u` (revision `svnversion .`)"
>>> LAST_UPDATED && cd ../obj-t && rm -rf * && ../gcc/configure
> --enable-checking=release  --prefix=/usr/snp --enable-gold --enable-plugins
> --disable-multilib --disable-nls --with-arch-64=native --with-tune-64=native
> --enable-languages=fortran,c++ --enable-stage1-languages=c++
> --disable-werror && make -j8
>
> Note, for instance, the --enable-checking=release
>
> Hope this is useful,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
> At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
> Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-11 17:29   ` Xinliang David Li
@ 2010-05-25  0:17     ` Xinliang David Li
  2010-05-25 10:46       ` Zdenek Dvorak
                         ` (4 more replies)
  0 siblings, 5 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-25  0:17 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

Here are the split patches:

patch-1:

This patch improves the algorithm assigning iv candidates to uses: in
the initial solution computation, do not compare cost savings
'locally' for one use-iv_cand pair, but comparing overall (all uses )
cost of replacing with the new candidates (if possible) with the
current best assignment cost. This will guarantee the initial solution
to start from the minimal set of ivs.

This patch also added fixes to consider profile data in cost
computation, better dumps, and some other minor bug fixes.

patch-2:

This patch address PR31792 -- sinking computation of replaced IV out
of the loop when it is live outside loop only.

patch-3:

The new expression for the use expressed in terms of ubase, cbase,
ratio, and iv_cand may contain loop invariant sub-expressions that may
be hoisted out of  the loop later. The patch implements a mechanism to
evaluate the additional register pressure caused by such expressions
(that can not be constant folded).

The patch also makes sure that variant part of the use expression ( a
sum of product) gets assigned to the index part of  the target_mem_ref
first to expose loop invariant code motion.

patch-4:

A simple local optimization that reorders iv update statement with
preceding target_mem_ref so that instruction combining can happen in
later phases.


PS.

Toon, I reproduced the problem you reported -- it is due to
mishandling of debug stmt. The combined full patch is also attached if
you want to do some experiment.

Thanks,

David

On Tue, May 11, 2010 at 10:29 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, May 11, 2010 at 12:18 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
>> Hi,
>>
>>> The attached patch handles all the problems above except for 7.
>>
>> could you please split the patch to separate parts for each problem,
>> and also describe how the problems are addressed?  Thanks,
>>
>> Zdenek
>
> Ok will do.
>
> David
>
>>
>

[-- Attachment #2: ivopts_latest_part1.p --]
[-- Type: text/x-pascal, Size: 11962 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,29 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline unsigned
+avg_loop_niter (struct loop *loop)
+{
+  unsigned tc;
+  if (loop->header->count || loop->latch->count)
+    tc = expected_loop_iterations (loop);
+  else
+    tc = AVG_LOOP_NITER (loop);
+  if (tc == 0)
+    tc++;
+  return tc;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -513,6 +528,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -1822,7 +1850,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2138,7 +2166,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
 					 &offset, depends_on));
     }
 
+  /* Loop invariant computation.  */
+  cost.cost /= avg_loop_niter (data->current_loop);
+
   /* If we are after the increment, the value of the candidate is higher by
      one iteration.  */
   stmt_is_after_inc = stmt_after_increment (data->current_loop, cand, at);
@@ -3841,7 +3874,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +3944,7 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
       if (!estimated_loop_iterations (loop, true, &max_niter))
 	return false;
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
+      if (double_int_ucmp (max_niter, period_value) > 0)
 	return false;
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4136,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4383,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4541,7 +4571,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4626,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4871,8 +4901,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,7 +4923,7 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
@@ -4890,7 +4933,7 @@ iv_ca_dump (struct ivopts_data *data, FI
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5153,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5187,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5247,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5330,7 +5374,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5358,8 +5401,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

[-- Attachment #3: ivopts_latest_part2.p --]
[-- Type: text/x-pascal, Size: 25324 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -156,6 +156,14 @@ struct cost_pair
 			   the new bound to compare with.  */
 };
 
+/* The use position for iv.  */
+enum iv_use_pos
+{
+  IU_UNKNOWN,
+  IU_OUTSIDE_LOOP_ONLY,
+  IU_INSIDE_LOOP
+};
+
 /* Use.  */
 struct iv_use
 {
@@ -173,6 +181,8 @@ struct iv_use
 
   struct iv_cand *selected;
 			/* The selected candidate.  */
+  enum iv_use_pos use_pos;
+                        /* The use position.  */
 };
 
 /* The position where the iv is computed.  */
@@ -335,6 +345,8 @@ struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+static struct pointer_map_t *inverted_stmt_map;
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -1102,6 +1114,7 @@ record_use (struct ivopts_data *data, tr
   use->stmt = stmt;
   use->op_p = use_p;
   use->related_cands = BITMAP_ALLOC (NULL);
+  use->use_pos = IU_UNKNOWN;
 
   /* To avoid showing ssa name in the dumps, if it was not reset by the
      caller.  */
@@ -1142,10 +1155,13 @@ record_invariant (struct ivopts_data *da
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (op));
 }
 
-/* Checks whether the use OP is interesting and if so, records it.  */
+
+/* Checks whether the use OP is interesting and if so, records it.
+   USE_POS indicates where the use comes from.  */
 
 static struct iv_use *
-find_interesting_uses_op (struct ivopts_data *data, tree op)
+find_interesting_uses_op (struct ivopts_data *data, tree op,
+                          enum iv_use_pos use_pos)
 {
   struct iv *iv;
   struct iv *civ;
@@ -1164,6 +1180,10 @@ find_interesting_uses_op (struct ivopts_
       use = iv_use (data, iv->use_id);
 
       gcc_assert (use->type == USE_NONLINEAR_EXPR);
+      gcc_assert (use->use_pos != IU_UNKNOWN);
+
+      if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+        use->use_pos = use_pos;
       return use;
     }
 
@@ -1183,6 +1203,7 @@ find_interesting_uses_op (struct ivopts_
 
   use = record_use (data, NULL, civ, stmt, USE_NONLINEAR_EXPR);
   iv->use_id = use->id;
+  use->use_pos = use_pos;
 
   return use;
 }
@@ -1260,17 +1281,19 @@ find_interesting_uses_cond (struct ivopt
 {
   tree *var_p, *bound_p;
   struct iv *var_iv, *civ;
+  struct iv_use *use;
 
   if (!extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL))
     {
-      find_interesting_uses_op (data, *var_p);
-      find_interesting_uses_op (data, *bound_p);
+      find_interesting_uses_op (data, *var_p, IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, *bound_p, IU_INSIDE_LOOP);
       return;
     }
 
   civ = XNEW (struct iv);
   *civ = *var_iv;
-  record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use = record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use->use_pos = IU_INSIDE_LOOP;
 }
 
 /* Returns true if expression EXPR is obviously invariant in LOOP,
@@ -1433,11 +1456,13 @@ idx_record_use (tree base, tree *idx,
 		void *vdata)
 {
   struct ivopts_data *data = (struct ivopts_data *) vdata;
-  find_interesting_uses_op (data, *idx);
+  find_interesting_uses_op (data, *idx, IU_INSIDE_LOOP);
   if (TREE_CODE (base) == ARRAY_REF || TREE_CODE (base) == ARRAY_RANGE_REF)
     {
-      find_interesting_uses_op (data, array_ref_element_size (base));
-      find_interesting_uses_op (data, array_ref_low_bound (base));
+      find_interesting_uses_op (data, array_ref_element_size (base),
+                                IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, array_ref_low_bound (base),
+                                IU_INSIDE_LOOP);
     }
   return true;
 }
@@ -1603,6 +1628,7 @@ find_interesting_uses_address (struct iv
   tree base = *op_p, step = build_int_cst (sizetype, 0);
   struct iv *civ;
   struct ifs_ivopts_data ifs_ivopts_data;
+  struct iv_use *use;
 
   /* Do not play with volatile memory references.  A bit too conservative,
      perhaps, but safe.  */
@@ -1696,7 +1722,8 @@ find_interesting_uses_address (struct iv
     }
 
   civ = alloc_iv (base, step);
-  record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  use = record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  use->use_pos = IU_INSIDE_LOOP;
   return;
 
 fail:
@@ -1762,7 +1789,7 @@ find_interesting_uses_stmt (struct ivopt
 	  if (REFERENCE_CLASS_P (*rhs))
 	    find_interesting_uses_address (data, stmt, rhs);
 	  else
-	    find_interesting_uses_op (data, *rhs);
+	    find_interesting_uses_op (data, *rhs, IU_INSIDE_LOOP);
 
 	  if (REFERENCE_CLASS_P (*lhs))
 	    find_interesting_uses_address (data, stmt, lhs);
@@ -1803,7 +1830,7 @@ find_interesting_uses_stmt (struct ivopt
       if (!iv)
 	continue;
 
-      find_interesting_uses_op (data, op);
+      find_interesting_uses_op (data, op, IU_INSIDE_LOOP);
     }
 }
 
@@ -1822,7 +1849,12 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        {
+          if (gimple_phi_num_args (phi) == 1)
+            find_interesting_uses_op (data, def, IU_OUTSIDE_LOOP_ONLY);
+	  else
+            find_interesting_uses_op (data, def, IU_INSIDE_LOOP);
+	}
     }
 }
 
@@ -3911,6 +3943,10 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY && !infinite_cost_p (cost))
+    cost.cost /= AVG_LOOP_NITER (data->current_loop);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4541,7 +4577,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4632,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4915,7 +4951,7 @@ iv_ca_extend (struct ivopts_data *data, 
 	continue;
 
       if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5293,6 +5329,67 @@ find_optimal_iv_set (struct ivopts_data 
   return set;
 }
 
+/* Returns a statement that undoes the operation in INCREMENT
+   on value OLD_VAL.  */
+
+static gimple
+get_inverted_increment_1 (gimple increment, tree old_val)
+{
+  tree new_assign_def;
+  gimple inverted_increment;
+  enum tree_code incr_op;
+  tree step;
+
+  new_assign_def = make_ssa_name (SSA_NAME_VAR (old_val), NULL);
+  step = unshare_expr (gimple_assign_rhs2 (increment));
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op == PLUS_EXPR)
+    incr_op = MINUS_EXPR;
+  else
+    {
+      gcc_assert (incr_op == MINUS_EXPR);
+      incr_op = PLUS_EXPR;
+    }
+  inverted_increment
+      = gimple_build_assign_with_ops (incr_op, new_assign_def,
+                                      old_val, step);
+
+  return inverted_increment;
+}
+
+/* Returns a statement that undos the operation in INCREMENT
+   on the result of phi NEW_PHI.  */
+
+static gimple
+get_inverted_increment (gimple reaching_increment, gimple new_phi)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple inverted_increment;
+  tree phi_result;
+  void **slot;
+
+  gcc_assert (gimple_assign_lhs (reaching_increment)
+              == PHI_ARG_DEF (new_phi, 0));
+
+  if (!inverted_stmt_map)
+    inverted_stmt_map = pointer_map_create ();
+
+  slot = pointer_map_insert (inverted_stmt_map, new_phi);
+  if (*slot)
+    return (gimple) *slot;
+
+  phi_result = PHI_RESULT (new_phi);
+  bb = gimple_bb (new_phi);
+  gsi = gsi_after_labels (bb);
+
+  inverted_increment = get_inverted_increment_1 (reaching_increment,
+                                                 phi_result);
+  gsi_insert_before (&gsi, inverted_increment, GSI_NEW_STMT);
+  *slot = (void *) inverted_increment;
+  return inverted_increment;
+}
+
 /* Creates a new induction variable corresponding to CAND.  */
 
 static void
@@ -5329,8 +5426,8 @@ create_new_iv (struct ivopts_data *data,
       name_info (data, cand->var_after)->preserve_biv = true;
 
       /* Rewrite the increment so that it uses var_before directly.  */
-      find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
+      find_interesting_uses_op (data, cand->var_after,
+                                IU_INSIDE_LOOP)->selected = cand;
       return;
     }
 
@@ -5360,6 +5457,501 @@ create_new_ivs (struct ivopts_data *data
     }
 }
 
+/* Callback function in the tree walk to fix up old live out
+   names to loop exit phi's result.  */
+
+static tree
+fixup_use (tree *op,
+           int *unused ATTRIBUTE_UNUSED,
+           void *data)
+{
+  struct pointer_map_t *nm_to_def_map
+      = (struct pointer_map_t *) data;
+
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    {
+      void **slot;
+      slot = pointer_map_contains (nm_to_def_map, *op);
+      if (slot)
+        {
+          enum gimple_code gc;
+          gimple def = (gimple) (*slot);
+          gc = gimple_code (def);
+          if (gc == GIMPLE_PHI)
+            *op = PHI_RESULT (def);
+          else
+            *op = gimple_assign_lhs (def);
+        }
+    }
+
+  return 0;
+}
+
+/* Callback function in the tree walk to collect used ssa names
+   in the tree.  */
+
+static tree
+collect_ssa_names (tree *op,
+                   int *unused ATTRIBUTE_UNUSED,
+                   void *data)
+{
+  VEC(tree, heap) ** used_names = (VEC(tree, heap) **) data;
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    VEC_safe_push (tree, heap, *used_names, *op);
+
+  return 0;
+}
+
+/* The function fixes up live out ssa names used in tree *VAL to
+   the matching loop exit phi's results. */
+
+static void
+fixup_iv_out_val (tree *val, struct pointer_map_t *nm_to_phi_map)
+{
+  walk_tree (val, fixup_use, nm_to_phi_map, NULL);
+}
+
+/* Returns the iv update statement if USE's cand variable is
+   the version before the update; otherwise returns NULL.  */
+
+static gimple
+cause_overlapping_lr (struct ivopts_data *data,
+                      tree nm_used, struct iv_use *use,
+                      basic_block use_bb)
+{
+  tree selected_iv_nm;
+  edge e;
+  gimple increment;
+  enum tree_code incr_op;
+
+  selected_iv_nm = var_at_stmt (data->current_loop,
+                                use->selected,
+                                use->stmt);
+
+  if (nm_used != selected_iv_nm)
+    return NULL;
+
+  if (selected_iv_nm == use->selected->var_after)
+    return NULL;
+
+  /* Check if def of var_after reaches use_bb.  */
+  gcc_assert (single_pred_p (use_bb));
+  e = single_pred_edge (use_bb);
+
+  increment = SSA_NAME_DEF_STMT (use->selected->var_after);
+
+  if (e->src != gimple_bb (increment))
+    return NULL;
+
+  /* Only handle simple increments  */
+  if (gimple_code (increment) != GIMPLE_ASSIGN)
+    return NULL;
+
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return NULL;
+
+  if (!CONSTANT_CLASS_P (gimple_assign_rhs2 (increment)))
+    return NULL;
+
+  return increment;
+}
+
+
+/* Returns the loop closing phi for LIVE_OUT_IV in basic block TGT_BB.
+   IV_UPDATE_STMT is the update statement for LIVE_OUT_IV, and
+   *FOR_UPDATED_VAL is set to true if the argument of the phi is defined
+   by IV_UPDATE_STMT.  */
+
+static gimple
+find_closing_phi (basic_block tgt_bb, tree live_out_iv,
+                  gimple iv_update_stmt, bool *for_updated_val)
+{
+  gimple_stmt_iterator psi;
+  gimple phi = NULL;
+
+  *for_updated_val = false;
+
+  /* Now try to find the existing matching phi.  */
+  for (psi = gsi_start_phis (tgt_bb); !gsi_end_p (psi); gsi_next (&psi))
+    {
+      gimple p;
+      p = gsi_stmt (psi);
+
+      if (SSA_NAME_VAR (PHI_ARG_DEF (p, 0))
+          == SSA_NAME_VAR (live_out_iv))
+        {
+          phi = p;
+          break;
+        }
+    }
+
+  if (!phi)
+    return NULL;
+
+  if (PHI_ARG_DEF (phi, 0) == live_out_iv)
+    {
+      *for_updated_val = false;
+      /* Found exact match.  */
+      return phi;
+    }
+  else if (iv_update_stmt &&
+           PHI_ARG_DEF (phi, 0) == gimple_assign_lhs (iv_update_stmt))
+    {
+      *for_updated_val = true;
+      return phi;
+    }
+
+  return NULL;
+}
+
+
+/* The function ensures closed SSA form for moving use statement from USE
+   across the loop exit. LIVE_OUT_NM is the original ssa name that is live out,
+   TGT_BB is the destination bb of the code motion, and NM_TO_DEF_MAP maps
+   the original name to the result of the closing phi.
+
+   Scenario 1:
+   ----------------
+   Loop:
+
+   Loop_exit:
+
+     closed_iv_val = PHI (live_out_iv)
+
+     Uses of (live_out_iv) get replaced with closed_iv_val
+
+
+
+   Scenario 2:
+   ----------------
+   Loop:
+
+     updated_iv_val = live_out_iv + 1
+   Loop_exit:
+
+     closed_iv_val = PHI (updated_iv_val)
+     updated_iv_val2 = closed_iv_val - 1
+
+     Uses of live_out_iv get replaced with updated_iv_val2
+*/
+
+static gimple
+ensure_closed_ssa_form_for (struct ivopts_data *data,
+                            tree live_out_nm, basic_block tgt_bb,
+                            struct iv_use *use,
+                            struct pointer_map_t *nm_to_def_map)
+{
+  gimple closing_phi = NULL;
+  bool closing_phi_for_updated_val = false;
+
+  gimple def_stmt, new_def_stmt = NULL;
+  basic_block def_bb;
+  gimple iv_update_stmt;
+  void **slot;
+
+  def_stmt = SSA_NAME_DEF_STMT (live_out_nm);
+  def_bb = gimple_bb (def_stmt);
+
+  if (!def_bb
+      || flow_bb_inside_loop_p (def_bb->loop_father, tgt_bb))
+    return NULL;;
+
+  iv_update_stmt
+      = cause_overlapping_lr (data, live_out_nm, use, tgt_bb);
+
+  gcc_assert (!iv_update_stmt ||
+              gimple_code (iv_update_stmt) == GIMPLE_ASSIGN);
+
+  closing_phi = find_closing_phi (tgt_bb, live_out_nm,
+                                  iv_update_stmt, &closing_phi_for_updated_val);
+
+  /* No closing phi is found.  */
+  if (!closing_phi)
+    {
+      edge e;
+      edge_iterator ei;
+
+      closing_phi = create_phi_node (live_out_nm, tgt_bb);
+      create_new_def_for (gimple_phi_result (closing_phi), closing_phi,
+                          gimple_phi_result_ptr (closing_phi));
+      gcc_assert (single_pred_p (tgt_bb));
+      if (!iv_update_stmt)
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, live_out_nm, e, UNKNOWN_LOCATION);
+          new_def_stmt = closing_phi;
+        }
+      else
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, gimple_assign_lhs (iv_update_stmt),
+                           e, UNKNOWN_LOCATION);
+          /* Now make the value adjustment.  */
+          new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+        }
+    }
+  else if (!closing_phi_for_updated_val)
+    /* Scenario 1 above.  */
+    new_def_stmt = closing_phi;
+  else
+    {
+      /* Scenario 2 above.  */
+      gcc_assert (iv_update_stmt);
+      new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+    }
+
+  /* Now map it.  */
+  slot = pointer_map_insert (nm_to_def_map, live_out_nm);
+  *slot = (void *) new_def_stmt;
+
+  return (new_def_stmt != closing_phi ? new_def_stmt : NULL);
+}
+
+/* The function ensures closed ssa form for all names used in
+   REPLACED_IV_OUT_VAL. TGT_BB is the target bb where the new
+   computation is going to be, USE is the nonlinear use to be
+   rewritten (at loop exits), and *FIXED_UP_VAL holds the live out
+   value after name fixup. It returns the inverted iv update
+   statement if it is created.  */
+
+static gimple
+ensure_closed_ssa_form (struct ivopts_data *data,
+                        basic_block tgt_bb,
+                        struct iv_use *use,
+                        tree replaced_iv_out_val,
+                        tree *fixed_up_val)
+{
+  unsigned i;
+  tree nm;
+  VEC(tree, heap) *used_ssa_names = NULL;
+  struct pointer_map_t *nm_to_def_map = NULL;
+  gimple inverted_incr = NULL;
+
+  nm_to_def_map = pointer_map_create ();
+  *fixed_up_val = unshare_expr (replaced_iv_out_val);
+  walk_tree_without_duplicates (fixed_up_val,
+                                collect_ssa_names, &used_ssa_names);
+
+  for (i = 0;
+       VEC_iterate (tree, used_ssa_names, i, nm); i++)
+    {
+      gimple inv_incr;
+      if ((inv_incr
+           = ensure_closed_ssa_form_for (data, nm, tgt_bb,
+                                         use, nm_to_def_map)))
+        {
+          gcc_assert (!inverted_incr);
+          inverted_incr = inv_incr;
+        }
+    }
+
+  /* Now fix up the references in val.  */
+  fixup_iv_out_val (fixed_up_val, nm_to_def_map);
+  pointer_map_destroy (nm_to_def_map);
+  return inverted_incr;
+}
+
+/* The function returns true if it is possible to sink final value
+   computation for REPLACED_IV_OUT_NAME at loop exits.  */
+
+static bool
+can_compute_final_value_at_exits_p (struct ivopts_data *data,
+                                    tree replaced_iv_out_name)
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  gimple use_stmt;
+
+  /* Walk through all nonlinear uses in all loop exit blocks
+     to see if the sinking transformation is doable.  */
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      edge e;
+      edge_iterator ei;
+      bool found_exit_edge = false;
+
+      use_stmt = USE_STMT (use_p);
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (single_pred_p (exit_bb))
+        continue;
+
+      FOR_EACH_EDGE (e, ei, exit_bb->preds)
+        {
+          if (!flow_bb_inside_loop_p (data->current_loop,
+                                      e->src))
+            continue;
+          /* Can not split the edge.  */
+          if (e->flags & EDGE_ABNORMAL)
+            return false;
+
+          /* Do not handle the case where the exit bb has
+             multiple incoming exit edges from the same loop.  */
+          if (found_exit_edge)
+            return false;
+
+          found_exit_edge = true;
+        }
+      if (!found_exit_edge)
+        return false;
+    }
+  return true;
+}
+
+/* The function splits the loop exit edge targeting EXIT_BB if EXIT_BB
+    and returns the newly split bb.  REPLACED_IV_OUT_NAME is the original
+    ssa name that is live out, and the new use statement (new phi) will
+    be stored in *USE_STMT.  */
+
+static basic_block
+split_exit_edge (struct ivopts_data* data, basic_block exit_bb,
+                 tree replaced_iv_out_name, gimple *use_stmt)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, exit_bb->preds)
+    {
+      edge exit_edge;
+      gimple_stmt_iterator psi;
+      gimple new_use_phi = NULL;
+
+      if (!flow_bb_inside_loop_p (data->current_loop, e->src))
+        continue;
+
+      gcc_assert (!(e->flags & EDGE_ABNORMAL));
+      exit_bb = split_loop_exit_edge (e);
+      exit_edge = single_pred_edge (exit_bb);
+
+      /* Now update the use stmt.  */
+      for (psi = gsi_start_phis (exit_bb);
+           !gsi_end_p (psi); gsi_next (&psi))
+        {
+          tree phi_arg;
+          gimple new_phi = gsi_stmt (psi);
+
+          phi_arg
+              = PHI_ARG_DEF_FROM_EDGE (new_phi, exit_edge);
+          if (phi_arg == replaced_iv_out_name)
+            {
+              new_use_phi = new_phi;
+              break;
+            }
+        }
+      gcc_assert (new_use_phi);
+      *use_stmt = new_use_phi;
+
+      /* There is only one exit edge to split.  */
+      break;
+    }
+
+  return exit_bb;
+}
+
+/* For a non linear use USE that is used outside the loop DATA->current_loop
+   only, try to evaluate the live out value at the exits of the loop.
+   REPLACED_IV_OUT_NAME is the original ssa name that is live out, and
+   REPLACED_IV_OUT_VAL is the expression (in terms of the selected iv cand)
+   to evaluate the live out value. The function tries to sink the computation
+   of replaced_iv_out_val into loop exits, and returns true if successful.  */
+
+static bool
+compute_final_value_at_exits (struct ivopts_data *data,
+                              struct iv_use *use,
+                              tree replaced_iv_out_name,
+                              tree replaced_iv_out_val)
+{
+  imm_use_iterator iter;
+  gimple use_stmt;
+  struct iv* replaced_iv;
+
+  if (!can_compute_final_value_at_exits_p (data, replaced_iv_out_name))
+    return false;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      gimple new_assign;
+      gimple_stmt_iterator gsi, bsi;
+      tree phi_rslt, new_assign_rhs;
+      tree fixed_up_val;
+      gimple inverted_increment;
+
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (is_gimple_debug (use_stmt))
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (!single_pred_p (exit_bb))
+        exit_bb = split_exit_edge (data, exit_bb,
+                                   replaced_iv_out_name, &use_stmt);
+
+      gcc_assert (single_pred_p (exit_bb));
+
+      inverted_increment
+          = ensure_closed_ssa_form (data, exit_bb, use,
+                                    replaced_iv_out_val,
+                                    &fixed_up_val);
+
+      gcc_assert (gimple_code (use_stmt) == GIMPLE_PHI);
+      gsi = gsi_for_stmt (use_stmt);
+      phi_rslt = PHI_RESULT (use_stmt);
+      bsi = (inverted_increment
+             ? gsi_for_stmt (inverted_increment)
+             : gsi_after_labels (exit_bb));
+
+      /* Now convert the original loop exit phi (for closed SSA form)
+         into an assignment statement.  */
+      remove_phi_node (&gsi, false);
+      new_assign_rhs = force_gimple_operand_gsi (&bsi, fixed_up_val,
+                                                 false, NULL_TREE,
+                                                 (inverted_increment == NULL),
+                                                 (inverted_increment == NULL
+                                                  ? GSI_SAME_STMT
+                                                  : GSI_CONTINUE_LINKING));
+      new_assign = gimple_build_assign (phi_rslt, new_assign_rhs);
+      if (inverted_increment)
+        gsi_insert_after (&bsi, new_assign, GSI_SAME_STMT);
+      else
+        gsi_insert_before (&bsi, new_assign, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Sinking computation into exit bb %d\n",
+                   exit_bb->index);
+          print_gimple_stmt (dump_file, new_assign, 0, 0);
+          fprintf (dump_file, "\n");
+	}
+    }
+
+  /* Now the original stmt that defines the liveout value can be removed */
+
+  replaced_iv = get_iv (data, replaced_iv_out_name);
+  gcc_assert (replaced_iv);
+  replaced_iv->have_use_for = false;
+
+  return true;
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5455,6 +6047,11 @@ rewrite_use_nonlinear_expr (struct ivopt
       gcc_unreachable ();
     }
 
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+    {
+      if (compute_final_value_at_exits (data, use, tgt, comp))
+        return;
+    }
   op = force_gimple_operand_gsi (&bsi, comp, false, SSA_NAME_VAR (tgt),
 				 true, GSI_SAME_STMT);
 
@@ -5768,6 +6365,11 @@ tree_ssa_iv_optimize_finalize (struct iv
   VEC_free (tree, heap, decl_rtl_to_reset);
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
+  if (inverted_stmt_map)
+    {
+      pointer_map_destroy (inverted_stmt_map);
+      inverted_stmt_map = NULL;
+    }
 }
 
 /* Optimizes the LOOP.  Returns true if anything changed.  */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2  -m64 -fdump-tree-ivopts-details" } */
+int inner_longest_match(char *scan, char *match, char *strend)
+{
+  char *start_scan = scan;
+  do {
+  } while (*++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           scan < strend);
+
+  return scan - start_scan;
+}
+
+/* { dg-final { scan-tree-dump-times "Sinking" 7 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */

[-- Attachment #4: ivopts_latest_part3.p --]
[-- Type: text/x-pascal, Size: 15944 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -218,6 +218,11 @@ typedef struct iv_cand *iv_cand_p;
 DEF_VEC_P(iv_cand_p);
 DEF_VEC_ALLOC_P(iv_cand_p,heap);
 
+typedef struct version_info *version_info_p;
+DEF_VEC_P(version_info_p);
+DEF_VEC_ALLOC_P(version_info_p,heap);
+
+
 struct ivopts_data
 {
   /* The currently optimized loop.  */
@@ -235,6 +240,9 @@ struct ivopts_data
   /* The array of information for the ssa names.  */
   struct version_info *version_info;
 
+  /* Pseudo version infos for generated loop invariants.  */
+  VEC(version_info_p,heap) *pseudo_version_info;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -250,6 +258,9 @@ struct ivopts_data
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
+  /* The minimal invariant id for pseudo invariants.  */
+  unsigned min_pseudo_inv_id;
+
   /* Whether to consider just related and important candidates when replacing a
      use.  */
   bool consider_all_candidates;
@@ -283,6 +294,9 @@ struct iv_ca
   /* Total number of registers needed.  */
   unsigned n_regs;
 
+  /* Total number of pseudo invariants.  */
+  unsigned n_pseudos;
+
   /* Total cost of expressing uses.  */
   comp_cost cand_use_cost;
 
@@ -544,7 +558,11 @@ dump_cand (FILE *file, struct iv_cand *c
 static inline struct version_info *
 ver_info (struct ivopts_data *data, unsigned ver)
 {
-  return data->version_info + ver;
+  if (ver < data->min_pseudo_inv_id)
+    return data->version_info + ver;
+  else
+    return VEC_index (version_info_p, data->pseudo_version_info,
+                      ver - data->min_pseudo_inv_id);
 }
 
 /* Returns the info for ssa name NAME.  */
@@ -766,6 +784,8 @@ tree_ssa_iv_optimize_init (struct ivopts
 {
   data->version_info_size = 2 * num_ssa_names;
   data->version_info = XCNEWVEC (struct version_info, data->version_info_size);
+  data->min_pseudo_inv_id = num_ssa_names;
+  data->pseudo_version_info = NULL;
   data->relevant = BITMAP_ALLOC (NULL);
   data->important_candidates = BITMAP_ALLOC (NULL);
   data->max_inv_id = 0;
@@ -1142,6 +1162,23 @@ record_invariant (struct ivopts_data *da
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (op));
 }
 
+/* Records a pseudo invariant and returns its VERSION_INFO.  */
+
+static struct version_info *
+record_pseudo_invariant (struct ivopts_data *data)
+{
+  struct version_info *info;
+
+  info = XCNEW (struct version_info);
+  info->name = NULL;
+  VEC_safe_push (version_info_p, heap, data->pseudo_version_info, info);
+  info->inv_id
+      = VEC_length (version_info_p, data->pseudo_version_info) - 1
+      + data->min_pseudo_inv_id;
+
+  return info;
+}
+
 /* Checks whether the use OP is interesting and if so, records it.  */
 
 static struct iv_use *
@@ -1822,7 +1859,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -3684,6 +3721,94 @@ difference_cost (struct ivopts_data *dat
   return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
 }
 
+/* Returns true if AFF1 and AFF2 are identical.  */
+
+static bool
+compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
+{
+  unsigned i;
+
+  if (aff1->n != aff2->n)
+    return false;
+
+  for (i = 0; i < aff1->n; i++)
+    {
+      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
+        return false;
+
+      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
+        return false;
+    }
+  return true;
+}
+
+/* Returns true if expression UBASE - RATIO * CBASE requires a new compiler
+   generated temporary.  */
+
+static bool
+create_loop_invariant_temp (tree ubase, tree cbase, HOST_WIDE_INT ratio)
+{
+  aff_tree ubase_aff, cbase_aff;
+
+  STRIP_NOPS (ubase);
+  STRIP_NOPS (cbase);
+
+  if ((TREE_CODE (ubase) == INTEGER_CST)
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (ubase) == SSA_NAME)
+       || (TREE_CODE (ubase) == ADDR_EXPR))
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (cbase) == SSA_NAME)
+       || (TREE_CODE (cbase) == ADDR_EXPR))
+      && (TREE_CODE (ubase) == INTEGER_CST))
+    return false;
+
+  if (ratio == 1)
+    {
+      if(operand_equal_p (ubase, cbase, 0))
+        return false;
+      if (TREE_CODE (ubase) == ADDR_EXPR
+        && TREE_CODE (cbase) == ADDR_EXPR)
+        {
+          tree usym, csym;
+
+          usym = TREE_OPERAND (ubase, 0);
+          csym = TREE_OPERAND (cbase, 0);
+          if (TREE_CODE (usym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (usym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                usym = TREE_OPERAND (usym, 0);
+            }
+          if (TREE_CODE (csym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (csym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                csym = TREE_OPERAND (csym, 0);
+            }
+          if (usym == csym)
+            return false;
+        }
+      /* Now do more complex comparison  */
+      tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
+      tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
+      if (compare_aff_trees (&ubase_aff, &cbase_aff))
+        return false;
+    }
+
+  return true;
+}
+
+
+
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
    to create an address from it, otherwise we want to get it into
@@ -3811,6 +3936,17 @@ get_computation_cost_at (struct ivopts_d
 					 &offset, depends_on));
     }
 
+  /* Loop invariant computation.  */
+  cost.cost /= AVG_LOOP_NITER (data->current_loop);
+
+  if (create_loop_invariant_temp (ubase, cbase, ratio))
+    {
+      struct version_info *pv = record_pseudo_invariant (data);
+       if (!*depends_on)
+         *depends_on = BITMAP_ALLOC (NULL);
+       bitmap_set_bit (*depends_on, pv->inv_id);
+    }
+
   /* If we are after the increment, the value of the candidate is higher by
      one iteration.  */
   stmt_is_after_inc = stmt_after_increment (data->current_loop, cand, at);
@@ -4514,6 +4650,12 @@ cheaper_cost_pair (struct cost_pair *a, 
   return false;
 }
 
+
+/* Pseudo invariants may get commonned, and there is no simple way
+   to estimate that. Simply weight it down.  */
+
+#define PSEUDO_COMMON_PERC 30
+
 /* Computes the cost field of IVS structure.  */
 
 static void
@@ -4521,7 +4663,10 @@ iv_ca_recount_cost (struct ivopts_data *
 {
   comp_cost cost = ivs->cand_use_cost;
   cost.cost += ivs->cand_cost;
-  cost.cost += ivopts_global_cost_for_size (data, ivs->n_regs);
+  cost.cost += ivopts_global_cost_for_size (data,
+                                            ivs->n_regs
+                                            + (ivs->n_pseudos
+                                               * PSEUDO_COMMON_PERC)/100);
 
   ivs->cost = cost;
 }
@@ -4529,10 +4674,12 @@ iv_ca_recount_cost (struct ivopts_data *
 /* Remove invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_remove_invariants (struct ivopts_data *data,
+                             struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4541,7 +4688,12 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs--;
+          else
+            ivs->n_pseudos--;
+        }
     }
 }
 
@@ -4572,22 +4724,24 @@ iv_ca_set_no_cp (struct ivopts_data *dat
       ivs->n_cands--;
       ivs->cand_cost -= cp->cand->cost;
 
-      iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
+      iv_ca_set_remove_invariants (data, ivs, cp->cand->depends_on);
     }
 
   ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
 
-  iv_ca_set_remove_invariants (ivs, cp->depends_on);
+  iv_ca_set_remove_invariants (data, ivs, cp->depends_on);
   iv_ca_recount_cost (data, ivs);
 }
 
 /* Add invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_add_invariants (struct ivopts_data *data,
+                          struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4596,7 +4750,12 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs++;
+          else
+            ivs->n_pseudos++;
+        }
     }
 }
 
@@ -4630,11 +4789,11 @@ iv_ca_set_cp (struct ivopts_data *data, 
 	  ivs->n_cands++;
 	  ivs->cand_cost += cp->cand->cost;
 
-	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
+	  iv_ca_set_add_invariants (data, ivs, cp->cand->depends_on);
 	}
 
       ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
-      iv_ca_set_add_invariants (ivs, cp->depends_on);
+      iv_ca_set_add_invariants (data, ivs, cp->depends_on);
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -4841,9 +5000,13 @@ iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
+  nw->n_pseudos = 0;
   nw->cand_use_cost = zero_cost;
   nw->cand_cost = 0;
-  nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->n_invariant_uses = XCNEWVEC (unsigned,
+                                   data->min_pseudo_inv_id
+                                   + VEC_length (version_info_p,
+                                                 data->pseudo_version_info));
   nw->cost = zero_cost;
 
   return nw;
@@ -4915,7 +5078,7 @@ iv_ca_extend (struct ivopts_data *data, 
 	continue;
 
       if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5535,7 +5698,7 @@ rewrite_use_address (struct ivopts_data 
   aff_tree aff;
   gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   tree base_hint = NULL_TREE;
-  tree ref;
+  tree ref, iv;
   bool ok;
 
   ok = get_computation_aff (data->current_loop, use, cand, use->stmt, &aff);
@@ -5556,7 +5719,8 @@ rewrite_use_address (struct ivopts_data 
   if (cand->iv->base_object)
     base_hint = var_at_stmt (data->current_loop, cand, use->stmt);
 
-  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, base_hint,
+  iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, iv, base_hint,
 			data->speed);
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
@@ -5691,6 +5855,7 @@ free_loop_data (struct ivopts_data *data
   unsigned i, j;
   bitmap_iterator bi;
   tree obj;
+  struct version_info *vi;
 
   if (data->niters)
     {
@@ -5748,6 +5913,14 @@ free_loop_data (struct ivopts_data *data
 
   data->max_inv_id = 0;
 
+  for (i = 0; VEC_iterate (version_info_p,
+                           data->pseudo_version_info, i, vi); i++)
+    free (vi);
+
+  VEC_truncate (version_info_p, data->pseudo_version_info, 0);
+  data->min_pseudo_inv_id = num_ssa_names;
+
+
   for (i = 0; VEC_iterate (tree, decl_rtl_to_reset, i, obj); i++)
     SET_DECL_RTL (obj, NULL_RTX);
 
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 159362)
+++ gcc/tree-flow.h	(working copy)
@@ -863,7 +863,7 @@ struct mem_address
 
 struct affine_tree_combination;
 tree create_mem_ref (gimple_stmt_iterator *, tree,
-		     struct affine_tree_combination *, tree, bool);
+		     struct affine_tree_combination *, tree, tree, bool);
 rtx addr_for_mem_ref (struct mem_address *, addr_space_t, bool);
 void get_address_description (tree, struct mem_address *);
 tree maybe_fold_tmr (tree);
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 159362)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -450,6 +450,31 @@ move_pointer_to_base (struct mem_address
   aff_combination_remove_elt (addr, i);
 }
 
+/* Moves the loop variant part V in linear address ADDR to be the index
+   of PARTS.  */
+
+static void
+move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
+{
+  unsigned i;
+  tree val = NULL_TREE;
+
+  gcc_assert (!parts->index);
+  for (i = 0; i < addr->n; i++)
+    {
+      val = addr->elts[i].val;
+      if (val == v)
+	break;
+    }
+
+  if (i == addr->n)
+    return;
+
+  parts->index = fold_convert (sizetype, val);
+  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
+  aff_combination_remove_elt (addr, i);
+}
+
 /* Adds ELT to PARTS.  */
 
 static void
@@ -553,7 +578,8 @@ most_expensive_mult_to_index (tree type,
 
 /* Splits address ADDR for a memory access of type TYPE into PARTS.
    If BASE_HINT is non-NULL, it specifies an SSA name to be used
-   preferentially as base of the reference.
+   preferentially as base of the reference, and IV_CAND is the selected
+   iv candidate used in ADDR.
 
    TODO -- be more clever about the distribution of the elements of ADDR
    to PARTS.  Some architectures do not support anything but single
@@ -563,8 +589,9 @@ most_expensive_mult_to_index (tree type,
    addressing modes is useless.  */
 
 static void
-addr_to_parts (tree type, aff_tree *addr, tree base_hint,
-	       struct mem_address *parts, bool speed)
+addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
+	       tree base_hint, struct mem_address *parts,
+               bool speed)
 {
   tree part;
   unsigned i;
@@ -582,9 +609,17 @@ addr_to_parts (tree type, aff_tree *addr
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
+  /* No need to do address parts reassociation if the number of parts
+     is <= 2 -- in that case, no loop invariant code motion can be
+     exposed.  */
+
+  if (!base_hint && (addr->n > 2))
+    move_variant_to_index (parts, addr, iv_cand);
+
   /* First move the most expensive feasible multiplication
      to index.  */
-  most_expensive_mult_to_index (type, parts, addr, speed);
+  if (!parts->index)
+    most_expensive_mult_to_index (type, parts, addr, speed);
 
   /* Try to find a base of the reference.  Since at the moment
      there is no reliable way how to distinguish between pointer and its
@@ -624,17 +659,19 @@ gimplify_mem_ref_parts (gimple_stmt_iter
 
 /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
    computations are emitted in front of GSI.  TYPE is the mode
-   of created memory reference.  */
+   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
+   and IS_CAND_BASE is a flag indidcats if IV_CAND comes from a base address
+   object.  */
 
 tree
 create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
-		tree base_hint, bool speed)
+		tree iv_cand, tree base_hint, bool speed)
 {
   tree mem_ref, tmp;
   tree atype;
   struct mem_address parts;
 
-  addr_to_parts (type, addr, base_hint, &parts, speed);
+  addr_to_parts (type, addr, iv_cand, base_hint, &parts, speed);
   gimplify_mem_ref_parts (gsi, &parts);
   mem_ref = create_mem_ref_raw (type, &parts);
   if (mem_ref)

[-- Attachment #5: ivopts_latest_part4.p --]
[-- Type: text/x-pascal, Size: 5877 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -5293,6 +5293,192 @@ find_optimal_iv_set (struct ivopts_data 
   return set;
 }
 
+
+/* Performs a peephole optimization to reorder the iv update statement with
+   a mem ref to enable instruction combining in later phases. The mem ref uses
+   the iv value before the update, so the reordering transformation requires
+   adjustment of the offset. CAND is the selected IV_CAND.
+
+   Example:
+
+   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
+   iv2 = iv1 + 1;
+
+   if (t < val)      (1)
+     goto L;
+   goto Head;
+
+
+   directly propagating t over to (1) will introduce overlapping live range
+   thus increase register pressure. This peephole transform it into:
+
+
+   iv2 = iv1 + 1;
+   t = MEM_REF (base, iv2, 8, 8);
+   if (t < val)
+     goto L;
+   goto Head;
+*/
+
+static void
+adjust_iv_update_pos (struct ivopts_data *data ATTRIBUTE_UNUSED,
+                      struct iv_cand *cand)
+{
+  tree var_after, step, stride, index, offset_adjust, offset, mem_ref_op;
+  gimple iv_update, stmt, cond, mem_ref, index_to_base, use_stmt;
+  basic_block bb;
+  gimple_stmt_iterator gsi, gsi_iv;
+  use_operand_p use_p;
+  enum tree_code incr_op;
+  imm_use_iterator iter;
+  bool found = false;
+
+  var_after = cand->var_after;
+  iv_update = SSA_NAME_DEF_STMT (var_after);
+
+  /* Do not handle complicated iv update case.  */
+  incr_op = gimple_assign_rhs_code (iv_update);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return;
+
+  step = gimple_assign_rhs2 (iv_update);
+  if (!CONSTANT_CLASS_P (step))
+    return;
+
+  bb = gimple_bb (iv_update);
+  gsi = gsi_last_nondebug_bb (bb);
+  stmt = gsi_stmt (gsi);
+
+  /* Only handle conditional statement for now.  */
+  if (gimple_code (stmt) != GIMPLE_COND)
+    return;
+
+  cond = stmt;
+
+  gsi_prev_nondebug (&gsi);
+  stmt = gsi_stmt (gsi);
+  if (stmt != iv_update)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+    return;
+
+  stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+
+  if (gimple_assign_rhs_code (stmt) != TARGET_MEM_REF)
+    return;
+
+  mem_ref = stmt;
+  mem_ref_op = gimple_assign_rhs1 (mem_ref);
+
+  if (TREE_CODE (gimple_assign_lhs (mem_ref)) != SSA_NAME)
+    return;
+
+  if (!single_imm_use (gimple_assign_lhs (mem_ref), &use_p, &use_stmt))
+    return;
+
+  if (use_stmt != cond)
+    return;
+
+  /* Found code motion candidate -- the statement with mem_ref.  */
+
+  index = TMR_INDEX (mem_ref_op);
+  index_to_base = NULL;
+  if (index)
+    {
+      if (index != cand->var_before)
+        return;
+    }
+  else
+    {
+      /* Index used as base.  */
+      tree base = TMR_BASE (mem_ref_op);
+
+      if (TREE_CODE (base) != SSA_NAME)
+        return;
+
+      if (!has_single_use (base))
+        return;
+
+      index_to_base = SSA_NAME_DEF_STMT (base);
+      if (gimple_code (index_to_base) != GIMPLE_ASSIGN)
+        return;
+      if (gimple_assign_rhs_code (index_to_base) != NOP_EXPR)
+        return;
+      if (gimple_assign_rhs1 (index_to_base) != cand->var_before)
+        return;
+    }
+
+  stride = TMR_STEP (mem_ref_op);
+  offset = TMR_OFFSET (mem_ref_op);
+  if (stride && index)
+    offset_adjust = int_const_binop (MULT_EXPR, stride, step, 0);
+  else
+    offset_adjust = step;
+
+  if (offset_adjust == NULL)
+    return;
+
+  offset = int_const_binop ((incr_op == PLUS_EXPR
+                             ? MINUS_EXPR : PLUS_EXPR),
+                            (offset ? offset : size_zero_node),
+                            offset_adjust, 0);
+
+  if (offset == NULL)
+    return;
+
+  if (index_to_base)
+    gsi = gsi_for_stmt (index_to_base);
+  else
+    gsi = gsi_for_stmt (mem_ref);
+  gsi_iv = gsi_for_stmt (iv_update);
+  gsi_move_before (&gsi_iv, &gsi);
+
+  /* Now fix up the mem_ref.  */
+  FOR_EACH_IMM_USE_FAST (use_p, iter, cand->var_before)
+    {
+      if (USE_STMT (use_p) == mem_ref || USE_STMT (use_p) == index_to_base)
+        {
+          set_ssa_use_from_ptr (use_p, var_after);
+          if (index_to_base)
+            *gimple_assign_rhs1_ptr (index_to_base) = var_after;
+          else
+            TMR_INDEX (mem_ref_op) = var_after;
+
+          found = true;
+          break;
+        }
+    }
+  gcc_assert (found);
+  TMR_OFFSET (mem_ref_op) = offset;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Reordering \n");
+      print_gimple_stmt (dump_file, iv_update, 0, 0);
+      print_gimple_stmt (dump_file, mem_ref, 0, 0);
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Performs reordering peep hole optimization for all selected ivs in SET.  */
+
+static void
+adjust_update_pos_for_ivs (struct ivopts_data *data, struct iv_ca *set)
+{
+  unsigned i;
+  struct iv_cand *cand;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+    {
+      cand = iv_cand (data, i);
+      adjust_iv_update_pos (data, cand);
+    }
+}
+
 /* Creates a new induction variable corresponding to CAND.  */
 
 static void
@@ -5830,7 +6016,6 @@ tree_ssa_iv_optimize_loop (struct ivopts
 
   /* Create the new induction variables (item 4, part 1).  */
   create_new_ivs (data, iv_ca);
-  iv_ca_free (&iv_ca);
 
   /* Rewrite the uses (item 4, part 2).  */
   rewrite_uses (data);
@@ -5838,6 +6023,10 @@ tree_ssa_iv_optimize_loop (struct ivopts
   /* Remove the ivs that are unused after rewriting.  */
   remove_unused_ivs (data);
 
+  adjust_update_pos_for_ivs (data, iv_ca);
+
+  iv_ca_free (&iv_ca);
+
   /* We have changed the structure of induction variables; it might happen
      that definitions in the scev database refer to some of them that were
      eliminated.  */

[-- Attachment #6: ivopts_latest3.p --]
[-- Type: text/x-pascal, Size: 57147 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_6.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+#include <stdlib.h>
+int foo(const char* p, const char* p2, size_t N)
+{
+  const char* p_limit = p + N;
+  while (p  <= p_limit - 16
+        && *(long long*)p  <*(long long*)p2 )
+  {
+     p += 16;
+     p2 += 16;
+  }
+  N = p_limit - p;
+  return memcmp(p, p2, N);
+}
+
+/* { dg-final { scan-tree-dump-times "Sinking" 4 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "Reordering" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_7.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+#include <stdlib.h>
+
+int foo(const char* p, const char* p2, size_t N)
+{
+ const char* p_limit = p + N;
+ int s = 0;
+ while (p  <= p_limit - 16
+        && *(long long*)p <*(long long*)p2)
+ {
+     p += 8;
+     p2 += 8;
+     s += (*p + *p2);
+  }
+  return s;
+}
+/* { dg-final { scan-tree-dump-times "Reordering" 1 "ivopts"} } */
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_5_sink.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2  -m64 -fdump-tree-ivopts-details" } */
+int inner_longest_match(char *scan, char *match, char *strend)
+{
+  char *start_scan = scan;
+  do {
+  } while (*++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           *++scan == *++match && *++scan == *++match &&
+           scan < strend);
+
+  return scan - start_scan;
+}
+
+/* { dg-final { scan-tree-dump-times "Sinking" 7 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,29 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline unsigned
+avg_loop_niter (struct loop *loop)
+{
+  unsigned tc;
+  if (loop->header->count || loop->latch->count)
+    tc = expected_loop_iterations (loop);
+  else
+    tc = AVG_LOOP_NITER (loop);
+  if (tc == 0)
+    tc++;
+  return tc;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -156,6 +171,14 @@ struct cost_pair
 			   the new bound to compare with.  */
 };
 
+/* The use position for iv.  */
+enum iv_use_pos
+{
+  IU_UNKNOWN,
+  IU_OUTSIDE_LOOP_ONLY,
+  IU_INSIDE_LOOP
+};
+
 /* Use.  */
 struct iv_use
 {
@@ -173,6 +196,8 @@ struct iv_use
 
   struct iv_cand *selected;
 			/* The selected candidate.  */
+  enum iv_use_pos use_pos;
+                        /* The use position.  */
 };
 
 /* The position where the iv is computed.  */
@@ -218,6 +243,11 @@ typedef struct iv_cand *iv_cand_p;
 DEF_VEC_P(iv_cand_p);
 DEF_VEC_ALLOC_P(iv_cand_p,heap);
 
+typedef struct version_info *version_info_p;
+DEF_VEC_P(version_info_p);
+DEF_VEC_ALLOC_P(version_info_p,heap);
+
+
 struct ivopts_data
 {
   /* The currently optimized loop.  */
@@ -235,6 +265,10 @@ struct ivopts_data
   /* The array of information for the ssa names.  */
   struct version_info *version_info;
 
+
+  /* Pseudo version infos for generated loop invariants.  */
+  VEC(version_info_p,heap) *pseudo_version_info;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -250,6 +284,9 @@ struct ivopts_data
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
+  /* The minimal invariant id for pseudo invariants.  */
+  unsigned min_pseudo_inv_id;
+
   /* Whether to consider just related and important candidates when replacing a
      use.  */
   bool consider_all_candidates;
@@ -283,6 +320,9 @@ struct iv_ca
   /* Total number of registers needed.  */
   unsigned n_regs;
 
+  /* Total number of pseudo invariants.  */
+  unsigned n_pseudos;
+
   /* Total cost of expressing uses.  */
   comp_cost cand_use_cost;
 
@@ -335,6 +375,8 @@ struct iv_ca_delta
 
 static VEC(tree,heap) *decl_rtl_to_reset;
 
+static struct pointer_map_t *inverted_stmt_map;
+
 /* Number of uses recorded in DATA.  */
 
 static inline unsigned
@@ -513,6 +555,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -544,7 +599,11 @@ dump_cand (FILE *file, struct iv_cand *c
 static inline struct version_info *
 ver_info (struct ivopts_data *data, unsigned ver)
 {
-  return data->version_info + ver;
+  if (ver < data->min_pseudo_inv_id)
+    return data->version_info + ver;
+  else
+    return VEC_index (version_info_p, data->pseudo_version_info,
+                      ver - data->min_pseudo_inv_id);
 }
 
 /* Returns the info for ssa name NAME.  */
@@ -766,6 +825,8 @@ tree_ssa_iv_optimize_init (struct ivopts
 {
   data->version_info_size = 2 * num_ssa_names;
   data->version_info = XCNEWVEC (struct version_info, data->version_info_size);
+  data->min_pseudo_inv_id = num_ssa_names;
+  data->pseudo_version_info = NULL;
   data->relevant = BITMAP_ALLOC (NULL);
   data->important_candidates = BITMAP_ALLOC (NULL);
   data->max_inv_id = 0;
@@ -1102,6 +1163,7 @@ record_use (struct ivopts_data *data, tr
   use->stmt = stmt;
   use->op_p = use_p;
   use->related_cands = BITMAP_ALLOC (NULL);
+  use->use_pos = IU_UNKNOWN;
 
   /* To avoid showing ssa name in the dumps, if it was not reset by the
      caller.  */
@@ -1142,10 +1204,29 @@ record_invariant (struct ivopts_data *da
   bitmap_set_bit (data->relevant, SSA_NAME_VERSION (op));
 }
 
-/* Checks whether the use OP is interesting and if so, records it.  */
+/* Records a pseudo invariant and returns its VERSION_INFO.  */
+
+static struct version_info *
+record_pseudo_invariant (struct ivopts_data *data)
+{
+  struct version_info *info;
+
+  info = XCNEW (struct version_info);
+  info->name = NULL;
+  VEC_safe_push (version_info_p, heap, data->pseudo_version_info, info);
+  info->inv_id
+      = VEC_length (version_info_p, data->pseudo_version_info) - 1
+      + data->min_pseudo_inv_id;
+
+  return info;
+}
+
+/* Checks whether the use OP is interesting and if so, records it.
+   USE_POS indicates where the use comes from.  */
 
 static struct iv_use *
-find_interesting_uses_op (struct ivopts_data *data, tree op)
+find_interesting_uses_op (struct ivopts_data *data, tree op,
+                          enum iv_use_pos use_pos)
 {
   struct iv *iv;
   struct iv *civ;
@@ -1164,6 +1245,10 @@ find_interesting_uses_op (struct ivopts_
       use = iv_use (data, iv->use_id);
 
       gcc_assert (use->type == USE_NONLINEAR_EXPR);
+      gcc_assert (use->use_pos != IU_UNKNOWN);
+
+      if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+        use->use_pos = use_pos;
       return use;
     }
 
@@ -1183,6 +1268,7 @@ find_interesting_uses_op (struct ivopts_
 
   use = record_use (data, NULL, civ, stmt, USE_NONLINEAR_EXPR);
   iv->use_id = use->id;
+  use->use_pos = use_pos;
 
   return use;
 }
@@ -1260,17 +1346,19 @@ find_interesting_uses_cond (struct ivopt
 {
   tree *var_p, *bound_p;
   struct iv *var_iv, *civ;
+  struct iv_use *use;
 
   if (!extract_cond_operands (data, stmt, &var_p, &bound_p, &var_iv, NULL))
     {
-      find_interesting_uses_op (data, *var_p);
-      find_interesting_uses_op (data, *bound_p);
+      find_interesting_uses_op (data, *var_p, IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, *bound_p, IU_INSIDE_LOOP);
       return;
     }
 
   civ = XNEW (struct iv);
   *civ = *var_iv;
-  record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use = record_use (data, NULL, civ, stmt, USE_COMPARE);
+  use->use_pos = IU_INSIDE_LOOP;
 }
 
 /* Returns true if expression EXPR is obviously invariant in LOOP,
@@ -1433,11 +1521,13 @@ idx_record_use (tree base, tree *idx,
 		void *vdata)
 {
   struct ivopts_data *data = (struct ivopts_data *) vdata;
-  find_interesting_uses_op (data, *idx);
+  find_interesting_uses_op (data, *idx, IU_INSIDE_LOOP);
   if (TREE_CODE (base) == ARRAY_REF || TREE_CODE (base) == ARRAY_RANGE_REF)
     {
-      find_interesting_uses_op (data, array_ref_element_size (base));
-      find_interesting_uses_op (data, array_ref_low_bound (base));
+      find_interesting_uses_op (data, array_ref_element_size (base),
+                                IU_INSIDE_LOOP);
+      find_interesting_uses_op (data, array_ref_low_bound (base),
+                                IU_INSIDE_LOOP);
     }
   return true;
 }
@@ -1603,6 +1693,7 @@ find_interesting_uses_address (struct iv
   tree base = *op_p, step = build_int_cst (sizetype, 0);
   struct iv *civ;
   struct ifs_ivopts_data ifs_ivopts_data;
+  struct iv_use *use;
 
   /* Do not play with volatile memory references.  A bit too conservative,
      perhaps, but safe.  */
@@ -1696,11 +1787,13 @@ find_interesting_uses_address (struct iv
     }
 
   civ = alloc_iv (base, step);
-  record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  use = record_use (data, op_p, civ, stmt, USE_ADDRESS);
+  use->use_pos = IU_INSIDE_LOOP;
   return;
 
 fail:
   for_each_index (op_p, idx_record_use, data);
+  return;
 }
 
 /* Finds and records invariants used in STMT.  */
@@ -1762,7 +1855,7 @@ find_interesting_uses_stmt (struct ivopt
 	  if (REFERENCE_CLASS_P (*rhs))
 	    find_interesting_uses_address (data, stmt, rhs);
 	  else
-	    find_interesting_uses_op (data, *rhs);
+	    find_interesting_uses_op (data, *rhs, IU_INSIDE_LOOP);
 
 	  if (REFERENCE_CLASS_P (*lhs))
 	    find_interesting_uses_address (data, stmt, lhs);
@@ -1803,7 +1896,7 @@ find_interesting_uses_stmt (struct ivopt
       if (!iv)
 	continue;
 
-      find_interesting_uses_op (data, op);
+      find_interesting_uses_op (data, op, IU_INSIDE_LOOP);
     }
 }
 
@@ -1822,7 +1915,12 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        {
+          if (gimple_phi_num_args (phi) == 1)
+            find_interesting_uses_op (data, def, IU_OUTSIDE_LOOP_ONLY);
+	  else
+            find_interesting_uses_op (data, def, IU_INSIDE_LOOP);
+	}
     }
 }
 
@@ -2138,7 +2236,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3684,6 +3784,94 @@ difference_cost (struct ivopts_data *dat
   return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
 }
 
+/* Returns true if AFF1 and AFF2 are identical.  */
+
+static bool
+compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
+{
+  unsigned i;
+
+  if (aff1->n != aff2->n)
+    return false;
+
+  for (i = 0; i < aff1->n; i++)
+    {
+      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
+        return false;
+
+      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
+        return false;
+    }
+  return true;
+}
+
+/* Returns true if expression UBASE - RATIO * CBASE requires a new compiler
+   generated temporary.  */
+
+static bool
+create_loop_invariant_temp (tree ubase, tree cbase, HOST_WIDE_INT ratio)
+{
+  aff_tree ubase_aff, cbase_aff;
+
+  STRIP_NOPS (ubase);
+  STRIP_NOPS (cbase);
+
+  if ((TREE_CODE (ubase) == INTEGER_CST)
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (ubase) == SSA_NAME)
+       || (TREE_CODE (ubase) == ADDR_EXPR))
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return false;
+
+  if (((TREE_CODE (cbase) == SSA_NAME)
+       || (TREE_CODE (cbase) == ADDR_EXPR))
+      && (TREE_CODE (ubase) == INTEGER_CST))
+    return false;
+
+  if (ratio == 1)
+    {
+      if(operand_equal_p (ubase, cbase, 0))
+        return false;
+      if (TREE_CODE (ubase) == ADDR_EXPR
+        && TREE_CODE (cbase) == ADDR_EXPR)
+        {
+          tree usym, csym;
+
+          usym = TREE_OPERAND (ubase, 0);
+          csym = TREE_OPERAND (cbase, 0);
+          if (TREE_CODE (usym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (usym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                usym = TREE_OPERAND (usym, 0);
+            }
+          if (TREE_CODE (csym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (csym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                csym = TREE_OPERAND (csym, 0);
+            }
+          if (usym == csym)
+            return false;
+        }
+      /* Now do more complex comparison  */
+      tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
+      tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
+      if (compare_aff_trees (&ubase_aff, &cbase_aff))
+        return false;
+    }
+
+  return true;
+}
+
+
+
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
    to create an address from it, otherwise we want to get it into
@@ -3811,6 +3999,17 @@ get_computation_cost_at (struct ivopts_d
 					 &offset, depends_on));
     }
 
+  /* Loop invariant computation.  */
+  cost.cost /= avg_loop_niter (data->current_loop);
+
+  if (create_loop_invariant_temp (ubase, cbase, ratio))
+    {
+      struct version_info *pv = record_pseudo_invariant (data);
+       if (!*depends_on)
+         *depends_on = BITMAP_ALLOC (NULL);
+       bitmap_set_bit (*depends_on, pv->inv_id);
+    }
+
   /* If we are after the increment, the value of the candidate is higher by
      one iteration.  */
   stmt_is_after_inc = stmt_after_increment (data->current_loop, cand, at);
@@ -3841,7 +4040,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +4110,10 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY && !infinite_cost_p (cost))
+    cost.cost /= avg_loop_niter (data->current_loop);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4056,20 +4259,16 @@ may_eliminate_iv (struct ivopts_data *da
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
       if (!estimated_loop_iterations (loop, true, &max_niter))
 	return false;
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
+      if (double_int_ucmp (max_niter, period_value) > 0)
 	return false;
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4305,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4552,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4514,6 +4713,12 @@ cheaper_cost_pair (struct cost_pair *a, 
   return false;
 }
 
+
+/* Pseudo invariants may get commonned, and there is no simple way
+   to estimate that. Simply weight it down.  */
+
+#define PSEUDO_COMMON_PERC 30
+
 /* Computes the cost field of IVS structure.  */
 
 static void
@@ -4521,7 +4726,10 @@ iv_ca_recount_cost (struct ivopts_data *
 {
   comp_cost cost = ivs->cand_use_cost;
   cost.cost += ivs->cand_cost;
-  cost.cost += ivopts_global_cost_for_size (data, ivs->n_regs);
+  cost.cost += ivopts_global_cost_for_size (data,
+                                            ivs->n_regs
+                                            + (ivs->n_pseudos
+                                               * PSEUDO_COMMON_PERC)/100);
 
   ivs->cost = cost;
 }
@@ -4529,10 +4737,12 @@ iv_ca_recount_cost (struct ivopts_data *
 /* Remove invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_remove_invariants (struct ivopts_data *data,
+                             struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4541,7 +4751,12 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs--;
+          else
+            ivs->n_pseudos--;
+        }
     }
 }
 
@@ -4572,22 +4787,24 @@ iv_ca_set_no_cp (struct ivopts_data *dat
       ivs->n_cands--;
       ivs->cand_cost -= cp->cand->cost;
 
-      iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
+      iv_ca_set_remove_invariants (data, ivs, cp->cand->depends_on);
     }
 
   ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
 
-  iv_ca_set_remove_invariants (ivs, cp->depends_on);
+  iv_ca_set_remove_invariants (data, ivs, cp->depends_on);
   iv_ca_recount_cost (data, ivs);
 }
 
 /* Add invariants in set INVS to set IVS.  */
 
 static void
-iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
+iv_ca_set_add_invariants (struct ivopts_data *data,
+                          struct iv_ca *ivs, bitmap invs)
 {
   bitmap_iterator bi;
   unsigned iid;
+  unsigned pseudo_id_start = data->min_pseudo_inv_id;
 
   if (!invs)
     return;
@@ -4596,7 +4813,12 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        {
+          if (iid < pseudo_id_start)
+            ivs->n_regs++;
+          else
+            ivs->n_pseudos++;
+        }
     }
 }
 
@@ -4630,11 +4852,11 @@ iv_ca_set_cp (struct ivopts_data *data, 
 	  ivs->n_cands++;
 	  ivs->cand_cost += cp->cand->cost;
 
-	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
+	  iv_ca_set_add_invariants (data, ivs, cp->cand->depends_on);
 	}
 
       ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
-      iv_ca_set_add_invariants (ivs, cp->depends_on);
+      iv_ca_set_add_invariants (data, ivs, cp->depends_on);
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -4841,9 +5063,13 @@ iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
+  nw->n_pseudos = 0;
   nw->cand_use_cost = zero_cost;
   nw->cand_cost = 0;
-  nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->n_invariant_uses = XCNEWVEC (unsigned,
+                                   data->min_pseudo_inv_id
+                                   + VEC_length (version_info_p,
+                                                 data->pseudo_version_info));
   nw->cost = zero_cost;
 
   return nw;
@@ -4871,8 +5097,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,7 +5119,9 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
+  fprintf (file, "nregs: %d\nnpseudos: %d\n\n",
+           ivs->n_regs, ivs->n_pseudos);
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
@@ -4890,7 +5131,7 @@ iv_ca_dump (struct ivopts_data *data, FI
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +5155,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5351,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5385,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5445,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5293,6 +5535,252 @@ find_optimal_iv_set (struct ivopts_data 
   return set;
 }
 
+/* Returns a statement that undoes the operation in INCREMENT
+   on value OLD_VAL.  */
+
+static gimple
+get_inverted_increment_1 (gimple increment, tree old_val)
+{
+  tree new_assign_def;
+  gimple inverted_increment;
+  enum tree_code incr_op;
+  tree step;
+
+  new_assign_def = make_ssa_name (SSA_NAME_VAR (old_val), NULL);
+  step = unshare_expr (gimple_assign_rhs2 (increment));
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op == PLUS_EXPR)
+    incr_op = MINUS_EXPR;
+  else
+    {
+      gcc_assert (incr_op == MINUS_EXPR);
+      incr_op = PLUS_EXPR;
+    }
+  inverted_increment
+      = gimple_build_assign_with_ops (incr_op, new_assign_def,
+                                      old_val, step);
+
+  return inverted_increment;
+}
+
+/* Returns a statement that undos the operation in INCREMENT
+   on the result of phi NEW_PHI.  */
+
+static gimple
+get_inverted_increment (gimple reaching_increment, gimple new_phi)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  gimple inverted_increment;
+  tree phi_result;
+  void **slot;
+
+  gcc_assert (gimple_assign_lhs (reaching_increment)
+              == PHI_ARG_DEF (new_phi, 0));
+
+  if (!inverted_stmt_map)
+    inverted_stmt_map = pointer_map_create ();
+
+  slot = pointer_map_insert (inverted_stmt_map, new_phi);
+  if (*slot)
+    return (gimple) *slot;
+
+  phi_result = PHI_RESULT (new_phi);
+  bb = gimple_bb (new_phi);
+  gsi = gsi_after_labels (bb);
+
+  inverted_increment = get_inverted_increment_1 (reaching_increment,
+                                                 phi_result);
+  gsi_insert_before (&gsi, inverted_increment, GSI_NEW_STMT);
+  *slot = (void *) inverted_increment;
+  return inverted_increment;
+}
+
+/* Performs a peephole optimization to reorder the iv update statement with
+   a mem ref to enable instruction combining in later phases. The mem ref uses
+   the iv value before the update, so the reordering transformation requires
+   adjustment of the offset. CAND is the selected IV_CAND.
+
+   Example:
+
+   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
+   iv2 = iv1 + 1;
+
+   if (t < val)      (1)
+     goto L;
+   goto Head;
+
+
+   directly propagating t over to (1) will introduce overlapping live range
+   thus increase register pressure. This peephole transform it into:
+
+
+   iv2 = iv1 + 1;
+   t = MEM_REF (base, iv2, 8, 8);
+   if (t < val)
+     goto L;
+   goto Head;
+*/
+
+static void
+adjust_iv_update_pos (struct ivopts_data *data ATTRIBUTE_UNUSED,
+                      struct iv_cand *cand)
+{
+  tree var_after, step, stride, index, offset_adjust, offset, mem_ref_op;
+  gimple iv_update, stmt, cond, mem_ref, index_to_base, use_stmt;
+  basic_block bb;
+  gimple_stmt_iterator gsi, gsi_iv;
+  use_operand_p use_p;
+  enum tree_code incr_op;
+  imm_use_iterator iter;
+  bool found = false;
+
+  var_after = cand->var_after;
+  iv_update = SSA_NAME_DEF_STMT (var_after);
+
+  /* Do not handle complicated iv update case.  */
+  incr_op = gimple_assign_rhs_code (iv_update);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return;
+
+  step = gimple_assign_rhs2 (iv_update);
+  if (!CONSTANT_CLASS_P (step))
+    return;
+
+  bb = gimple_bb (iv_update);
+  gsi = gsi_last_nondebug_bb (bb);
+  stmt = gsi_stmt (gsi);
+
+  /* Only handle conditional statement for now.  */
+  if (gimple_code (stmt) != GIMPLE_COND)
+    return;
+
+  cond = stmt;
+
+  gsi_prev_nondebug (&gsi);
+  stmt = gsi_stmt (gsi);
+  if (stmt != iv_update)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+    return;
+
+  stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+
+  if (gimple_assign_rhs_code (stmt) != TARGET_MEM_REF)
+    return;
+
+  mem_ref = stmt;
+  mem_ref_op = gimple_assign_rhs1 (mem_ref);
+
+  if (TREE_CODE (gimple_assign_lhs (mem_ref)) != SSA_NAME)
+    return;
+
+  if (!single_imm_use (gimple_assign_lhs (mem_ref), &use_p, &use_stmt))
+    return;
+
+  if (use_stmt != cond)
+    return;
+
+  /* Found code motion candidate -- the statement with mem_ref.  */
+
+  index = TMR_INDEX (mem_ref_op);
+  index_to_base = NULL;
+  if (index)
+    {
+      if (index != cand->var_before)
+        return;
+    }
+  else
+    {
+      /* Index used as base.  */
+      tree base = TMR_BASE (mem_ref_op);
+
+      if (TREE_CODE (base) != SSA_NAME)
+        return;
+
+      if (!has_single_use (base))
+        return;
+
+      index_to_base = SSA_NAME_DEF_STMT (base);
+      if (gimple_code (index_to_base) != GIMPLE_ASSIGN)
+        return;
+      if (gimple_assign_rhs_code (index_to_base) != NOP_EXPR)
+        return;
+      if (gimple_assign_rhs1 (index_to_base) != cand->var_before)
+        return;
+    }
+
+  stride = TMR_STEP (mem_ref_op);
+  offset = TMR_OFFSET (mem_ref_op);
+  if (stride && index)
+    offset_adjust = int_const_binop (MULT_EXPR, stride, step, 0);
+  else
+    offset_adjust = step;
+
+  if (offset_adjust == NULL)
+    return;
+
+  offset = int_const_binop ((incr_op == PLUS_EXPR
+                             ? MINUS_EXPR : PLUS_EXPR),
+                            (offset ? offset : size_zero_node),
+                            offset_adjust, 0);
+
+  if (offset == NULL)
+    return;
+
+  if (index_to_base)
+    gsi = gsi_for_stmt (index_to_base);
+  else
+    gsi = gsi_for_stmt (mem_ref);
+  gsi_iv = gsi_for_stmt (iv_update);
+  gsi_move_before (&gsi_iv, &gsi);
+
+  /* Now fix up the mem_ref.  */
+  FOR_EACH_IMM_USE_FAST (use_p, iter, cand->var_before)
+    {
+      if (USE_STMT (use_p) == mem_ref || USE_STMT (use_p) == index_to_base)
+        {
+          set_ssa_use_from_ptr (use_p, var_after);
+          if (index_to_base)
+            *gimple_assign_rhs1_ptr (index_to_base) = var_after;
+          else
+            TMR_INDEX (mem_ref_op) = var_after;
+
+          found = true;
+          break;
+        }
+    }
+  gcc_assert (found);
+  TMR_OFFSET (mem_ref_op) = offset;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Reordering \n");
+      print_gimple_stmt (dump_file, iv_update, 0, 0);
+      print_gimple_stmt (dump_file, mem_ref, 0, 0);
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Performs reordering peep hole optimization for all selected ivs in SET.  */
+
+static void
+adjust_update_pos_for_ivs (struct ivopts_data *data, struct iv_ca *set)
+{
+  unsigned i;
+  struct iv_cand *cand;
+  bitmap_iterator bi;
+
+  EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+    {
+      cand = iv_cand (data, i);
+      adjust_iv_update_pos (data, cand);
+    }
+}
+
 /* Creates a new induction variable corresponding to CAND.  */
 
 static void
@@ -5329,8 +5817,8 @@ create_new_iv (struct ivopts_data *data,
       name_info (data, cand->var_after)->preserve_biv = true;
 
       /* Rewrite the increment so that it uses var_before directly.  */
-      find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
+      find_interesting_uses_op (data, cand->var_after,
+                                IU_INSIDE_LOOP)->selected = cand;
       return;
     }
 
@@ -5358,8 +5846,514 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
+
+/* Callback function in the tree walk to fix up old live out
+   names to loop exit phi's result.  */
+
+static tree
+fixup_use (tree *op,
+           int *unused ATTRIBUTE_UNUSED,
+           void *data)
+{
+  struct pointer_map_t *nm_to_def_map
+      = (struct pointer_map_t *) data;
+
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    {
+      void **slot;
+      slot = pointer_map_contains (nm_to_def_map, *op);
+      if (slot)
+        {
+          enum gimple_code gc;
+          gimple def = (gimple) (*slot);
+          gc = gimple_code (def);
+          if (gc == GIMPLE_PHI)
+            *op = PHI_RESULT (def);
+          else
+            *op = gimple_assign_lhs (def);
+        }
+    }
+
+  return 0;
 }
 
+/* Callback function in the tree walk to collect used ssa names
+   in the tree.  */
+
+static tree
+collect_ssa_names (tree *op,
+                   int *unused ATTRIBUTE_UNUSED,
+                   void *data)
+{
+  VEC(tree, heap) ** used_names = (VEC(tree, heap) **) data;
+  if (TREE_CODE (*op) == SSA_NAME && is_gimple_reg (*op))
+    VEC_safe_push (tree, heap, *used_names, *op);
+
+  return 0;
+}
+
+/* The function fixes up live out ssa names used in tree *VAL to
+   the matching loop exit phi's results. */
+
+static void
+fixup_iv_out_val (tree *val, struct pointer_map_t *nm_to_phi_map)
+{
+  walk_tree (val, fixup_use, nm_to_phi_map, NULL);
+}
+
+/* Returns the iv update statement if USE's cand variable is
+   the version before the update; otherwise returns NULL.  */
+
+static gimple
+cause_overlapping_lr (struct ivopts_data *data,
+                      tree nm_used, struct iv_use *use,
+                      basic_block use_bb)
+{
+  tree selected_iv_nm;
+  edge e;
+  gimple increment;
+  enum tree_code incr_op;
+
+  selected_iv_nm = var_at_stmt (data->current_loop,
+                                use->selected,
+                                use->stmt);
+
+  if (nm_used != selected_iv_nm)
+    return NULL;
+
+  if (selected_iv_nm == use->selected->var_after)
+    return NULL;
+
+  /* Check if def of var_after reaches use_bb.  */
+  gcc_assert (single_pred_p (use_bb));
+  e = single_pred_edge (use_bb);
+
+  increment = SSA_NAME_DEF_STMT (use->selected->var_after);
+
+  if (e->src != gimple_bb (increment))
+    return NULL;
+
+  /* Only handle simple increments  */
+  if (gimple_code (increment) != GIMPLE_ASSIGN)
+    return NULL;
+
+  incr_op = gimple_assign_rhs_code (increment);
+  if (incr_op != PLUS_EXPR && incr_op != MINUS_EXPR)
+    return NULL;
+
+  if (!CONSTANT_CLASS_P (gimple_assign_rhs2 (increment)))
+    return NULL;
+
+  return increment;
+}
+
+
+/* Returns the loop closing phi for LIVE_OUT_IV in basic block TGT_BB.
+   IV_UPDATE_STMT is the update statement for LIVE_OUT_IV, and
+   *FOR_UPDATED_VAL is set to true if the argument of the phi is defined
+   by IV_UPDATE_STMT.  */
+
+static gimple
+find_closing_phi (basic_block tgt_bb, tree live_out_iv,
+                  gimple iv_update_stmt, bool *for_updated_val)
+{
+  gimple_stmt_iterator psi;
+  gimple phi = NULL;
+
+  *for_updated_val = false;
+
+  /* Now try to find the existing matching phi.  */
+  for (psi = gsi_start_phis (tgt_bb); !gsi_end_p (psi); gsi_next (&psi))
+    {
+      gimple p;
+      p = gsi_stmt (psi);
+
+      if (SSA_NAME_VAR (PHI_ARG_DEF (p, 0))
+          == SSA_NAME_VAR (live_out_iv))
+        {
+          phi = p;
+          break;
+        }
+    }
+
+  if (!phi)
+    return NULL;
+
+  if (PHI_ARG_DEF (phi, 0) == live_out_iv)
+    {
+      *for_updated_val = false;
+      /* Found exact match.  */
+      return phi;
+    }
+  else if (iv_update_stmt &&
+           PHI_ARG_DEF (phi, 0) == gimple_assign_lhs (iv_update_stmt))
+    {
+      *for_updated_val = true;
+      return phi;
+    }
+
+  return NULL;
+}
+
+
+/* The function ensures closed SSA form for moving use statement from USE
+   across the loop exit. LIVE_OUT_NM is the original ssa name that is live out,
+   TGT_BB is the destination bb of the code motion, and NM_TO_DEF_MAP maps
+   the original name to the result of the closing phi.
+
+   Scenario 1:
+   ----------------
+   Loop:
+
+   Loop_exit:
+
+     closed_iv_val = PHI (live_out_iv)
+
+     Uses of (live_out_iv) get replaced with closed_iv_val
+
+
+
+   Scenario 2:
+   ----------------
+   Loop:
+
+     updated_iv_val = live_out_iv + 1
+   Loop_exit:
+
+     closed_iv_val = PHI (updated_iv_val)
+     updated_iv_val2 = closed_iv_val - 1
+
+     Uses of live_out_iv get replaced with updated_iv_val2
+*/
+
+static gimple
+ensure_closed_ssa_form_for (struct ivopts_data *data,
+                            tree live_out_nm, basic_block tgt_bb,
+                            struct iv_use *use,
+                            struct pointer_map_t *nm_to_def_map)
+{
+  gimple closing_phi = NULL;
+  bool closing_phi_for_updated_val = false;
+
+  gimple def_stmt, new_def_stmt = NULL;
+  basic_block def_bb;
+  gimple iv_update_stmt;
+  void **slot;
+
+  def_stmt = SSA_NAME_DEF_STMT (live_out_nm);
+  def_bb = gimple_bb (def_stmt);
+
+  if (!def_bb
+      || flow_bb_inside_loop_p (def_bb->loop_father, tgt_bb))
+    return NULL;;
+
+  iv_update_stmt
+      = cause_overlapping_lr (data, live_out_nm, use, tgt_bb);
+
+  gcc_assert (!iv_update_stmt ||
+              gimple_code (iv_update_stmt) == GIMPLE_ASSIGN);
+
+  closing_phi = find_closing_phi (tgt_bb, live_out_nm,
+                                  iv_update_stmt, &closing_phi_for_updated_val);
+
+  /* No closing phi is found.  */
+  if (!closing_phi)
+    {
+      edge e;
+      edge_iterator ei;
+
+      closing_phi = create_phi_node (live_out_nm, tgt_bb);
+      create_new_def_for (gimple_phi_result (closing_phi), closing_phi,
+                          gimple_phi_result_ptr (closing_phi));
+      gcc_assert (single_pred_p (tgt_bb));
+      if (!iv_update_stmt)
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, live_out_nm, e, UNKNOWN_LOCATION);
+          new_def_stmt = closing_phi;
+        }
+      else
+        {
+          FOR_EACH_EDGE (e, ei, tgt_bb->preds)
+              add_phi_arg (closing_phi, gimple_assign_lhs (iv_update_stmt),
+                           e, UNKNOWN_LOCATION);
+          /* Now make the value adjustment.  */
+          new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+        }
+    }
+  else if (!closing_phi_for_updated_val)
+    /* Scenario 1 above.  */
+    new_def_stmt = closing_phi;
+  else
+    {
+      /* Scenario 2 above.  */
+      gcc_assert (iv_update_stmt);
+      new_def_stmt = get_inverted_increment (iv_update_stmt, closing_phi);
+    }
+
+  /* Now map it.  */
+  slot = pointer_map_insert (nm_to_def_map, live_out_nm);
+  *slot = (void *) new_def_stmt;
+
+  return (new_def_stmt != closing_phi ? new_def_stmt : NULL);
+}
+
+/* The function ensures closed ssa form for all names used in
+   REPLACED_IV_OUT_VAL. TGT_BB is the target bb where the new
+   computation is going to be, USE is the nonlinear use to be
+   rewritten (at loop exits), and *FIXED_UP_VAL holds the live out
+   value after name fixup. It returns the inverted iv update
+   statement if it is created.  */
+
+static gimple
+ensure_closed_ssa_form (struct ivopts_data *data,
+                        basic_block tgt_bb,
+                        struct iv_use *use,
+                        tree replaced_iv_out_val,
+                        tree *fixed_up_val)
+{
+  unsigned i;
+  tree nm;
+  VEC(tree, heap) *used_ssa_names = NULL;
+  struct pointer_map_t *nm_to_def_map = NULL;
+  gimple inverted_incr = NULL;
+
+  nm_to_def_map = pointer_map_create ();
+  *fixed_up_val = unshare_expr (replaced_iv_out_val);
+  walk_tree_without_duplicates (fixed_up_val,
+                                collect_ssa_names, &used_ssa_names);
+
+  for (i = 0;
+       VEC_iterate (tree, used_ssa_names, i, nm); i++)
+    {
+      gimple inv_incr;
+      if ((inv_incr
+           = ensure_closed_ssa_form_for (data, nm, tgt_bb,
+                                         use, nm_to_def_map)))
+        {
+          gcc_assert (!inverted_incr);
+          inverted_incr = inv_incr;
+        }
+    }
+
+  /* Now fix up the references in val.  */
+  fixup_iv_out_val (fixed_up_val, nm_to_def_map);
+  pointer_map_destroy (nm_to_def_map);
+  return inverted_incr;
+}
+
+/* The function returns true if it is possible to sink final value
+   computation for REPLACED_IV_OUT_NAME at loop exits.  */
+
+static bool
+can_compute_final_value_at_exits_p (struct ivopts_data *data,
+                                    tree replaced_iv_out_name)
+{
+  imm_use_iterator iter;
+  use_operand_p use_p;
+  gimple use_stmt;
+
+  /* Walk through all nonlinear uses in all loop exit blocks
+     to see if the sinking transformation is doable.  */
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      edge e;
+      edge_iterator ei;
+      bool found_exit_edge = false;
+
+      use_stmt = USE_STMT (use_p);
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (single_pred_p (exit_bb))
+        continue;
+
+      FOR_EACH_EDGE (e, ei, exit_bb->preds)
+        {
+          if (!flow_bb_inside_loop_p (data->current_loop,
+                                      e->src))
+            continue;
+          /* Can not split the edge.  */
+          if (e->flags & EDGE_ABNORMAL)
+            return false;
+
+          /* Do not handle the case where the exit bb has
+             multiple incoming exit edges from the same loop.  */
+          if (found_exit_edge)
+            return false;
+
+          found_exit_edge = true;
+        }
+      if (!found_exit_edge)
+        return false;
+    }
+  return true;
+}
+
+/* The function splits the loop exit edge targeting EXIT_BB if EXIT_BB
+    and returns the newly split bb.  REPLACED_IV_OUT_NAME is the original
+    ssa name that is live out, and the new use statement (new phi) will
+    be stored in *USE_STMT.  */
+
+static basic_block
+split_exit_edge (struct ivopts_data* data, basic_block exit_bb,
+                 tree replaced_iv_out_name, gimple *use_stmt)
+{
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, exit_bb->preds)
+    {
+      edge exit_edge;
+      gimple_stmt_iterator psi;
+      gimple new_use_phi = NULL;
+
+      if (!flow_bb_inside_loop_p (data->current_loop, e->src))
+        continue;
+
+      gcc_assert (!(e->flags & EDGE_ABNORMAL));
+      exit_bb = split_loop_exit_edge (e);
+      exit_edge = single_pred_edge (exit_bb);
+
+      /* Now update the use stmt.  */
+      for (psi = gsi_start_phis (exit_bb);
+           !gsi_end_p (psi); gsi_next (&psi))
+        {
+          tree phi_arg;
+          gimple new_phi = gsi_stmt (psi);
+
+          phi_arg
+              = PHI_ARG_DEF_FROM_EDGE (new_phi, exit_edge);
+          if (phi_arg == replaced_iv_out_name)
+            {
+              new_use_phi = new_phi;
+              break;
+            }
+        }
+      gcc_assert (new_use_phi);
+      *use_stmt = new_use_phi;
+
+      /* There is only one exit edge to split.  */
+      break;
+    }
+
+  return exit_bb;
+}
+
+/* For a non linear use USE that is used outside the loop DATA->current_loop
+   only, try to evaluate the live out value at the exits of the loop.
+   REPLACED_IV_OUT_NAME is the original ssa name that is live out, and
+   REPLACED_IV_OUT_VAL is the expression (in terms of the selected iv cand)
+   to evaluate the live out value. The function tries to sink the computation
+   of replaced_iv_out_val into loop exits, and returns true if successful.  */
+
+static bool
+compute_final_value_at_exits (struct ivopts_data *data,
+                              struct iv_use *use,
+                              tree replaced_iv_out_name,
+                              tree replaced_iv_out_val)
+{
+  imm_use_iterator iter;
+  gimple use_stmt;
+  struct iv* replaced_iv;
+
+  if (!can_compute_final_value_at_exits_p (data, replaced_iv_out_name))
+    return false;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, replaced_iv_out_name)
+    {
+      basic_block exit_bb;
+      gimple new_assign;
+      gimple_stmt_iterator gsi, bsi;
+      tree phi_rslt, new_assign_rhs;
+      tree fixed_up_val;
+      gimple inverted_increment;
+
+      exit_bb = gimple_bb (use_stmt);
+
+      /* The use_stmt is another iv update
+         statement that also defines a liveout value and
+         has been removed.  */
+      if (!exit_bb)
+        continue;
+
+      if (is_gimple_debug (use_stmt))
+        continue;
+
+      if (flow_bb_inside_loop_p (data->current_loop, exit_bb))
+        continue;
+
+      if (!single_pred_p (exit_bb))
+        exit_bb = split_exit_edge (data, exit_bb,
+                                   replaced_iv_out_name, &use_stmt);
+
+      gcc_assert (single_pred_p (exit_bb));
+
+      inverted_increment
+          = ensure_closed_ssa_form (data, exit_bb, use,
+                                    replaced_iv_out_val,
+                                    &fixed_up_val);
+
+      gcc_assert (gimple_code (use_stmt) == GIMPLE_PHI);
+      gsi = gsi_for_stmt (use_stmt);
+      phi_rslt = PHI_RESULT (use_stmt);
+      bsi = (inverted_increment
+             ? gsi_for_stmt (inverted_increment)
+             : gsi_after_labels (exit_bb));
+
+      /* Now convert the original loop exit phi (for closed SSA form)
+         into an assignment statement.  */
+      remove_phi_node (&gsi, false);
+      new_assign_rhs = force_gimple_operand_gsi (&bsi, fixed_up_val,
+                                                 false, NULL_TREE,
+                                                 (inverted_increment == NULL),
+                                                 (inverted_increment == NULL
+                                                  ? GSI_SAME_STMT
+                                                  : GSI_CONTINUE_LINKING));
+      new_assign = gimple_build_assign (phi_rslt, new_assign_rhs);
+      if (inverted_increment)
+        gsi_insert_after (&bsi, new_assign, GSI_SAME_STMT);
+      else
+        gsi_insert_before (&bsi, new_assign, GSI_SAME_STMT);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Sinking computation into exit bb %d\n",
+                   exit_bb->index);
+          print_gimple_stmt (dump_file, new_assign, 0, 0);
+          fprintf (dump_file, "\n");
+	}
+    }
+
+  /* Now the original stmt that defines the liveout value can be removed */
+
+  replaced_iv = get_iv (data, replaced_iv_out_name);
+  gcc_assert (replaced_iv);
+  replaced_iv->have_use_for = false;
+
+  return true;
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5455,6 +6449,11 @@ rewrite_use_nonlinear_expr (struct ivopt
       gcc_unreachable ();
     }
 
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY)
+    {
+      if (compute_final_value_at_exits (data, use, tgt, comp))
+        return;
+    }
   op = force_gimple_operand_gsi (&bsi, comp, false, SSA_NAME_VAR (tgt),
 				 true, GSI_SAME_STMT);
 
@@ -5535,7 +6534,7 @@ rewrite_use_address (struct ivopts_data 
   aff_tree aff;
   gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   tree base_hint = NULL_TREE;
-  tree ref;
+  tree ref, iv;
   bool ok;
 
   ok = get_computation_aff (data->current_loop, use, cand, use->stmt, &aff);
@@ -5556,7 +6555,8 @@ rewrite_use_address (struct ivopts_data 
   if (cand->iv->base_object)
     base_hint = var_at_stmt (data->current_loop, cand, use->stmt);
 
-  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, base_hint,
+  iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, iv, base_hint,
 			data->speed);
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
@@ -5691,6 +6691,7 @@ free_loop_data (struct ivopts_data *data
   unsigned i, j;
   bitmap_iterator bi;
   tree obj;
+  struct version_info *vi;
 
   if (data->niters)
     {
@@ -5748,6 +6749,14 @@ free_loop_data (struct ivopts_data *data
 
   data->max_inv_id = 0;
 
+  for (i = 0; VEC_iterate (version_info_p,
+                           data->pseudo_version_info, i, vi); i++)
+    free (vi);
+
+  VEC_truncate (version_info_p, data->pseudo_version_info, 0);
+  data->min_pseudo_inv_id = num_ssa_names;
+
+
   for (i = 0; VEC_iterate (tree, decl_rtl_to_reset, i, obj); i++)
     SET_DECL_RTL (obj, NULL_RTX);
 
@@ -5768,6 +6777,11 @@ tree_ssa_iv_optimize_finalize (struct iv
   VEC_free (tree, heap, decl_rtl_to_reset);
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
+  if (inverted_stmt_map)
+    {
+      pointer_map_destroy (inverted_stmt_map);
+      inverted_stmt_map = NULL;
+    }
 }
 
 /* Optimizes the LOOP.  Returns true if anything changed.  */
@@ -5830,7 +6844,6 @@ tree_ssa_iv_optimize_loop (struct ivopts
 
   /* Create the new induction variables (item 4, part 1).  */
   create_new_ivs (data, iv_ca);
-  iv_ca_free (&iv_ca);
 
   /* Rewrite the uses (item 4, part 2).  */
   rewrite_uses (data);
@@ -5838,6 +6851,9 @@ tree_ssa_iv_optimize_loop (struct ivopts
   /* Remove the ivs that are unused after rewriting.  */
   remove_unused_ivs (data);
 
+  adjust_update_pos_for_ivs (data, iv_ca);
+
+  iv_ca_free (&iv_ca);
   /* We have changed the structure of induction variables; it might happen
      that definitions in the scev database refer to some of them that were
      eliminated.  */
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 159362)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -450,6 +450,31 @@ move_pointer_to_base (struct mem_address
   aff_combination_remove_elt (addr, i);
 }
 
+/* Moves the loop variant part V in linear address ADDR to be the index
+   of PARTS.  */
+
+static void
+move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
+{
+  unsigned i;
+  tree val = NULL_TREE;
+
+  gcc_assert (!parts->index);
+  for (i = 0; i < addr->n; i++)
+    {
+      val = addr->elts[i].val;
+      if (val == v)
+	break;
+    }
+
+  if (i == addr->n)
+    return;
+
+  parts->index = fold_convert (sizetype, val);
+  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
+  aff_combination_remove_elt (addr, i);
+}
+
 /* Adds ELT to PARTS.  */
 
 static void
@@ -553,7 +578,8 @@ most_expensive_mult_to_index (tree type,
 
 /* Splits address ADDR for a memory access of type TYPE into PARTS.
    If BASE_HINT is non-NULL, it specifies an SSA name to be used
-   preferentially as base of the reference.
+   preferentially as base of the reference, and IV_CAND is the selected
+   iv candidate used in ADDR.
 
    TODO -- be more clever about the distribution of the elements of ADDR
    to PARTS.  Some architectures do not support anything but single
@@ -563,8 +589,9 @@ most_expensive_mult_to_index (tree type,
    addressing modes is useless.  */
 
 static void
-addr_to_parts (tree type, aff_tree *addr, tree base_hint,
-	       struct mem_address *parts, bool speed)
+addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
+	       tree base_hint, struct mem_address *parts,
+               bool speed)
 {
   tree part;
   unsigned i;
@@ -582,9 +609,17 @@ addr_to_parts (tree type, aff_tree *addr
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
+  /* No need to do address parts reassociation if the number of parts
+     is <= 2 -- in that case, no loop invariant code motion can be
+     exposed.  */
+
+  if (!base_hint && (addr->n > 2))
+    move_variant_to_index (parts, addr, iv_cand);
+
   /* First move the most expensive feasible multiplication
      to index.  */
-  most_expensive_mult_to_index (type, parts, addr, speed);
+  if (!parts->index)
+    most_expensive_mult_to_index (type, parts, addr, speed);
 
   /* Try to find a base of the reference.  Since at the moment
      there is no reliable way how to distinguish between pointer and its
@@ -624,17 +659,19 @@ gimplify_mem_ref_parts (gimple_stmt_iter
 
 /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
    computations are emitted in front of GSI.  TYPE is the mode
-   of created memory reference.  */
+   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
+   and IS_CAND_BASE is a flag indidcats if IV_CAND comes from a base address
+   object.  */
 
 tree
 create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
-		tree base_hint, bool speed)
+		tree iv_cand, tree base_hint, bool speed)
 {
   tree mem_ref, tmp;
   tree atype;
   struct mem_address parts;
 
-  addr_to_parts (type, addr, base_hint, &parts, speed);
+  addr_to_parts (type, addr, iv_cand, base_hint, &parts, speed);
   gimplify_mem_ref_parts (gsi, &parts);
   mem_ref = create_mem_ref_raw (type, &parts);
   if (mem_ref)
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 159362)
+++ gcc/tree-flow.h	(working copy)
@@ -863,7 +863,7 @@ struct mem_address
 
 struct affine_tree_combination;
 tree create_mem_ref (gimple_stmt_iterator *, tree,
-		     struct affine_tree_combination *, tree, bool);
+		     struct affine_tree_combination *, tree, tree, bool);
 rtx addr_for_mem_ref (struct mem_address *, addr_space_t, bool);
 void get_address_description (tree, struct mem_address *);
 tree maybe_fold_tmr (tree);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25  0:17     ` Xinliang David Li
@ 2010-05-25 10:46       ` Zdenek Dvorak
  2010-05-25 17:39         ` Xinliang David Li
  2010-05-25 18:10       ` Toon Moene
                         ` (3 subsequent siblings)
  4 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-25 10:46 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> patch-1:
> 
> This patch improves the algorithm assigning iv candidates to uses: in
> the initial solution computation, do not compare cost savings
> 'locally' for one use-iv_cand pair, but comparing overall (all uses )
> cost of replacing with the new candidates (if possible) with the
> current best assignment cost. This will guarantee the initial solution
> to start from the minimal set of ivs.

so, this seems to be the important part of patch-1 (where min_ncand is
true only during the initial solution computation):

>  {
>    unsigned i;
>    comp_cost cost;
> @@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data, 
>        if (!iv_ca_has_deps (ivs, new_cp))
>  	continue;
>  
> -      if (!cheaper_cost_pair (new_cp, old_cp))
> -	continue;
> +      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
> +        continue;
>  
>        *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
>      }

This is a rather confusing way to get the effect of starting with a small set
of ivs.  It would probably be better to get rid of try_add_cand_for and rewrite
get_initial_solution with this goal in mind.

> This patch also added fixes to consider profile data in cost
> computation, better dumps, and some other minor bug fixes.

it would be better not to include similar unrelated changes, as they make
reviewing the patch rather difficult.

> +/* Returns the expected number of loop iterations for LOOP.
> +   The average trip count is computed from profile data if it
> +   exists. */
> +
> +static inline unsigned
> +avg_loop_niter (struct loop *loop)
> +{
> +  unsigned tc;
> +  if (loop->header->count || loop->latch->count)
> +    tc = expected_loop_iterations (loop);
> +  else
> +    tc = AVG_LOOP_NITER (loop);
> +  if (tc == 0)
> +    tc++;
> +  return tc;
> +}

Using estimated_loop_iterations_int (loop, false) instead of adding another
similar function would be better.

> @@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
>  					 &offset, depends_on));
>      }
>  
> +  /* Loop invariant computation.  */
> +  cost.cost /= avg_loop_niter (data->current_loop);
> +

This is wrong, at least some parts of the computation here are not loop invariant.

> @@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
>    /* If not, and if this is the only possible exit of the loop, see whether
>       we can get a conservative estimate on the number of iterations of the
>       entire loop and compare against that instead.  */
> -  else if (loop_only_exit_p (loop, exit))
> +  else

This change is wrong, the test is necessary.  See
http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00146.html
and the following discussion.

>      {
>        double_int period_value, max_niter;
>        if (!estimated_loop_iterations (loop, true, &max_niter))
>  	return false;
>        period_value = tree_to_double_int (period);
> -      if (double_int_ucmp (max_niter, period_value) >= 0)
> +      if (double_int_ucmp (max_niter, period_value) > 0)
>  	return false;
>      }

This also seems wrong (or at least inconsistent with that is done for
the constant number of iterations).

>  /* Try changing candidate in IVS to CAND for each use.  Return cost of the
> @@ -4890,7 +4933,7 @@ iv_ca_dump (struct ivopts_data *data, FI
>  static comp_cost
>  iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
>  	      struct iv_cand *cand, struct iv_ca_delta **delta,
> -	      unsigned *n_ivs)
> +	      unsigned *n_ivs, bool min_ncand)

Document min_ncand argument in the function comment.

> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +#define TYPE char*
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
> +       }
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#define TYPE char*
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
> +       }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#define TYPE char*
> +
> +void foo (int i_width, char* dst, char* src1, char* src2)
> +{
> +      int x;
> +       for( x = 0; x < i_width; x++ )
> +       {
> +           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
> +	   dst+=sizeof(TYPE);
> +	   src1+=sizeof(TYPE);
> +	   src2+=sizeof(TYPE);
> +       }
> +} 
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
> @@ -0,0 +1,18 @@
> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
> +
> +#ifndef TYPE
> +#define TYPE char*
> +#endif
> +
> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
> +{
> +      TYPE dstn= dst + i_width;
> +       for( ; dst < dstn; )
> +       {
> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
> +       }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
> +/* { dg-final { cleanup-tree-dump "ivopts" } } */

ivopt_{n}.c -> ivopts-{n+4}.c
Please add an explanation of what is the expected outcome (and why) to these
testcases.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25 10:46       ` Zdenek Dvorak
@ 2010-05-25 17:39         ` Xinliang David Li
  2010-05-25 18:25           ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-25 17:39 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Tue, May 25, 2010 at 2:32 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-1:
>>
>> This patch improves the algorithm assigning iv candidates to uses: in
>> the initial solution computation, do not compare cost savings
>> 'locally' for one use-iv_cand pair, but comparing overall (all uses )
>> cost of replacing with the new candidates (if possible) with the
>> current best assignment cost. This will guarantee the initial solution
>> to start from the minimal set of ivs.
>
> so, this seems to be the important part of patch-1 (where min_ncand is
> true only during the initial solution computation):
>
>>  {
>>    unsigned i;
>>    comp_cost cost;
>> @@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data,
>>        if (!iv_ca_has_deps (ivs, new_cp))
>>       continue;
>>
>> -      if (!cheaper_cost_pair (new_cp, old_cp))
>> -     continue;
>> +      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
>> +        continue;
>>
>>        *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
>>      }
>
> This is a rather confusing way to get the effect of starting with a small set
> of ivs.  It would probably be better to get rid of try_add_cand_for and rewrite
> get_initial_solution with this goal in mind.

Yes, the whole part of finding optimal assignment code probably needs
some rewrite -- probably next round with more careful thinking
(justified by some motivating example).

>
>> This patch also added fixes to consider profile data in cost
>> computation, better dumps, and some other minor bug fixes.
>
> it would be better not to include similar unrelated changes, as they make
> reviewing the patch rather difficult.
>

Yes, in general this should be done. I did not split it this time as
the changes are really small and it reduces overall testing time.

>> +/* Returns the expected number of loop iterations for LOOP.
>> +   The average trip count is computed from profile data if it
>> +   exists. */
>> +
>> +static inline unsigned
>> +avg_loop_niter (struct loop *loop)
>> +{
>> +  unsigned tc;
>> +  if (loop->header->count || loop->latch->count)
>> +    tc = expected_loop_iterations (loop);
>> +  else
>> +    tc = AVG_LOOP_NITER (loop);
>> +  if (tc == 0)
>> +    tc++;
>> +  return tc;
>> +}
>
> Using estimated_loop_iterations_int (loop, false) instead of adding another
> similar function would be better.

Ok.


>
>> @@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
>>                                        &offset, depends_on));
>>      }
>>
>> +  /* Loop invariant computation.  */
>> +  cost.cost /= avg_loop_niter (data->current_loop);
>> +
>
> This is wrong, at least some parts of the computation here are not loop invariant.

Which part is not loop invariant?


>
>> @@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
>>    /* If not, and if this is the only possible exit of the loop, see whether
>>       we can get a conservative estimate on the number of iterations of the
>>       entire loop and compare against that instead.  */
>> -  else if (loop_only_exit_p (loop, exit))
>> +  else
>
> This change is wrong, the test is necessary.  See
> http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00146.html
> and the following discussion.
>

The original fix to the problem is too conservative -- if there is
only one exit has the test to be replaced, it should be ok to do it,
right?



>>      {
>>        double_int period_value, max_niter;
>>        if (!estimated_loop_iterations (loop, true, &max_niter))
>>       return false;
>>        period_value = tree_to_double_int (period);
>> -      if (double_int_ucmp (max_niter, period_value) >= 0)
>> +      if (double_int_ucmp (max_niter, period_value) > 0)
>>       return false;
>>      }
>
> This also seems wrong (or at least inconsistent with that is done for
> the constant number of iterations).

This looks correct to me. Without this, many exit tests will fail to
be replaced.

>
>>  /* Try changing candidate in IVS to CAND for each use.  Return cost of the
>> @@ -4890,7 +4933,7 @@ iv_ca_dump (struct ivopts_data *data, FI
>>  static comp_cost
>>  iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
>>             struct iv_cand *cand, struct iv_ca_delta **delta,
>> -           unsigned *n_ivs)
>> +           unsigned *n_ivs, bool min_ncand)
>
> Document min_ncand argument in the function comment.

Ok. Will do.

>
>> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c   (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c   (revision 0)
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
>> +#define TYPE char*
>> +
>> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
>> +{
>> +      int x;
>> +       for( x = 0; x < i_width; x++ )
>> +       {
>> +           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
>> +       }
>> +}
>> +
>> +
>> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
>> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
>> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c   (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c   (revision 0)
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
>> +
>> +#define TYPE char*
>> +
>> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
>> +{
>> +      int x;
>> +       for( x = 0; x < i_width; x++ )
>> +       {
>> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
>> +       }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
>> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
>> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c   (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c   (revision 0)
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
>> +
>> +#define TYPE char*
>> +
>> +void foo (int i_width, char* dst, char* src1, char* src2)
>> +{
>> +      int x;
>> +       for( x = 0; x < i_width; x++ )
>> +       {
>> +           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
>> +        dst+=sizeof(TYPE);
>> +        src1+=sizeof(TYPE);
>> +        src2+=sizeof(TYPE);
>> +       }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
>> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
>> Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c   (revision 0)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c   (revision 0)
>> @@ -0,0 +1,18 @@
>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
>> +
>> +#ifndef TYPE
>> +#define TYPE char*
>> +#endif
>> +
>> +void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
>> +{
>> +      TYPE dstn= dst + i_width;
>> +       for( ; dst < dstn; )
>> +       {
>> +           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
>> +       }
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
>> +/* { dg-final { cleanup-tree-dump "ivopts" } } */
>
> ivopt_{n}.c -> ivopts-{n+4}.c
> Please add an explanation of what is the expected outcome (and why) to these
> testcases.

Ok, will do.

Thanks,

David

>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25  0:17     ` Xinliang David Li
  2010-05-25 10:46       ` Zdenek Dvorak
@ 2010-05-25 18:10       ` Toon Moene
  2010-05-27  9:28       ` Zdenek Dvorak
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 100+ messages in thread
From: Toon Moene @ 2010-05-25 18:10 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Zdenek Dvorak, GCC Patches

On 05/25/2010 01:56 AM, Xinliang David Li wrote:

> Toon, I reproduced the problem you reported -- it is due to
> mishandling of debug stmt. The combined full patch is also attached if
> you want to do some experiment.

Thanks for your work on this, but seeing the comments by Zdenek, I think 
it's better for me to wait a while until parts are approved and committed.

Cheers !

-- 
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.5/changes.html#Fortran

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25 17:39         ` Xinliang David Li
@ 2010-05-25 18:25           ` Zdenek Dvorak
  2010-05-25 23:30             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-25 18:25 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> >> @@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
> >> Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â &offset, depends_on));
> >> Â  Â  Â }
> >>
> >> + Â /* Loop invariant computation. Â */
> >> + Â cost.cost /= avg_loop_niter (data->current_loop);
> >> +
> >
> > This is wrong, at least some parts of the computation here are not loop invariant.
> 
> Which part is not loop invariant?

it depends on the actual form of the use.  But in the most general case, the
computation whose cost is determined here is ubase + ratio * (var - cbase), and
no part of this is loop invariant (except for the force_var_costs of ubase and cbase).

> >> @@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
> >> Â  Â /* If not, and if this is the only possible exit of the loop, see whether
> >> Â  Â  Â  we can get a conservative estimate on the number of iterations of the
> >> Â  Â  Â  entire loop and compare against that instead. Â */
> >> - Â else if (loop_only_exit_p (loop, exit))
> >> + Â else
> >
> > This change is wrong, the test is necessary. Â See
> > http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00146.html
> > and the following discussion.
> >
> 
> The original fix to the problem is too conservative -- if there is
> only one exit has the test to be replaced, it should be ok to do it,
> right?

Yes.  But I do not see your point -- your patch removes the loop_only_exit_p
test, which is necessary.

> >> Â  Â  Â {
> >> Â  Â  Â  Â double_int period_value, max_niter;
> >> Â  Â  Â  Â if (!estimated_loop_iterations (loop, true, &max_niter))
> >> Â  Â  Â  return false;
> >> Â  Â  Â  Â period_value = tree_to_double_int (period);
> >> - Â  Â  Â if (double_int_ucmp (max_niter, period_value) >= 0)
> >> + Â  Â  Â if (double_int_ucmp (max_niter, period_value) > 0)
> >> Â  Â  Â  return false;
> >> Â  Â  Â }
> >
> > This also seems wrong (or at least inconsistent with that is done for
> > the constant number of iterations).
> 
> This looks correct to me. 

I think you are right; but, then the preceding test for tree_int_cst_lt
should be changed as well (so that both conditions are the same).
It would also be nice to add testcases for the boundary values to the
testsuite, to make sure we are not making an off-by-one error.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25 18:25           ` Zdenek Dvorak
@ 2010-05-25 23:30             ` Xinliang David Li
  2010-05-26  2:35               ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-25 23:30 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Tue, May 25, 2010 at 11:12 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> @@ -3811,6 +3841,9 @@ get_computation_cost_at (struct ivopts_d
>> >>                                        &offset, depends_on));
>> >>      }
>> >>
>> >> +  /* Loop invariant computation.  */
>> >> +  cost.cost /= avg_loop_niter (data->current_loop);
>> >> +
>> >
>> > This is wrong, at least some parts of the computation here are not loop invariant.
>>
>> Which part is not loop invariant?
>
> it depends on the actual form of the use.  But in the most general case, the
> computation whose cost is determined here is ubase + ratio * (var - cbase), and
> no part of this is loop invariant (except for the force_var_costs of ubase and cbase).

You mean the last 'else' branch?


  else
    {
      cost = force_var_cost (data, cbase, depends_on);
      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
      cost = add_costs (cost,
			difference_cost (data,
					 ubase, build_int_cst (utype, 0),
					 &symbol_present, &var_present,
					 &offset, depends_on));
    }

but I don't see how this cost estimate matches expression 'ubase +
ratio * (var - cbase)'.  Also look at function 'get_computation_aff',
it looks like the aff expression is always normalized into a sum of
product form.      Also at the end of this cost function before
'fallback:' label, there is this

if (aratio != 1)
  cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);

What is this for? Looks like it is for the last term 'ratio * var'  ?


>
>> >> @@ -4056,20 +4090,16 @@ may_eliminate_iv (struct ivopts_data *da
>> >>    /* If not, and if this is the only possible exit of the loop, see whether
>> >>       we can get a conservative estimate on the number of iterations of the
>> >>       entire loop and compare against that instead.  */
>> >> -  else if (loop_only_exit_p (loop, exit))
>> >> +  else
>> >
>> > This change is wrong, the test is necessary.  See
>> > http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00146.html
>> > and the following discussion.
>> >
>>
>> The original fix to the problem is too conservative -- if there is
>> only one exit has the test to be replaced, it should be ok to do it,
>> right?
>
> Yes.  But I do not see your point -- your patch removes the loop_only_exit_p
> test, which is necessary.


Right, I want to discuss more about this. Looking at the original PR
(msg00146.html) -- the fundamental problem is that why 'cand_value_at'
call for the testing to be replaced returns a bound of value &c + 12
with nitr == 0x80000001? It seems t the wrapping is caused by wrong
compiler folding -- probably wrong type is passed in (TREE_TYPE
(iv->cand) which is the pointer to c[..]).  The original fix seems
enough -- if 'nitr' is not compile time constant, the cand_value_at
would never be wrapped --- so why is testing of multiple exits needed?



>
>> >>      {
>> >>        double_int period_value, max_niter;
>> >>        if (!estimated_loop_iterations (loop, true, &max_niter))
>> >>       return false;
>> >>        period_value = tree_to_double_int (period);
>> >> -      if (double_int_ucmp (max_niter, period_value) >= 0)
>> >> +      if (double_int_ucmp (max_niter, period_value) > 0)
>> >>       return false;
>> >>      }
>> >
>> > This also seems wrong (or at least inconsistent with that is done for
>> > the constant number of iterations).
>>
>> This looks correct to me.
>
> I think you are right; but, then the preceding test for tree_int_cst_lt
> should be changed as well (so that both conditions are the same).
> It would also be nice to add testcases for the boundary values to the
> testsuite, to make sure we are not making an off-by-one error.

I remember one the existing test case in the patch need this patch to
work -- I will double check.

Thanks,

David

>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25 23:30             ` Xinliang David Li
@ 2010-05-26  2:35               ` Zdenek Dvorak
  2010-05-26  3:17                 ` Xinliang David Li
  2010-05-27  1:31                 ` Xinliang David Li
  0 siblings, 2 replies; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-26  2:35 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> >> Which part is not loop invariant?
> >
> > it depends on the actual form of the use. Â But in the most general case, the
> > computation whose cost is determined here is ubase + ratio * (var - cbase), and
> > no part of this is loop invariant (except for the force_var_costs of ubase and cbase).
> 
> You mean the last 'else' branch?
> 
> 
>   else
>     {
>       cost = force_var_cost (data, cbase, depends_on);
>       cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
>       cost = add_costs (cost,
> 			difference_cost (data,
> 					 ubase, build_int_cst (utype, 0),
> 					 &symbol_present, &var_present,
> 					 &offset, depends_on));
>     }
> 
> but I don't see how this cost estimate matches expression 'ubase +
> ratio * (var - cbase)'. 

this code covers forcing cbase and ubase to registers (which is loop invariant)
and subtracting (add_cost) cbase from var (which is not).

>  Also at the end of this cost function before
> 'fallback:' label, there is this
> 
> if (aratio != 1)
>   cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
> 
> What is this for? Looks like it is for the last term 'ratio * var'  ?

Yes.

> >> The original fix to the problem is too conservative -- if there is
> >> only one exit has the test to be replaced, it should be ok to do it,
> >> right?
> >
> > Yes. Â But I do not see your point -- your patch removes the loop_only_exit_p
> > test, which is necessary.
> 
> 
> Right, I want to discuss more about this. Looking at the original PR
> (msg00146.html) -- the fundamental problem is that why 'cand_value_at'
> call for the testing to be replaced returns a bound of value &c + 12
> with nitr == 0x80000001? It seems t the wrapping is caused by wrong
> compiler folding

No.  We want to express the exit condition for 2147483647 iterations using
the variable with evolution [&c + some constant, +, 4], whose period is smaller
(1073741823 iterations).  This is obviously not possible.  That is, after 2147483647
iterations (and two overflows), this induction variable will have value &c + 12, as
the folding correctly determines.  However, this value is also achieved
after only three iterations, which causes the loop to exit prematurely.  The
fact that NITR is a compile-time constant has nothing to do with
the problem, which would reproduce even if NITR was a variable with the same value.

The cause of the problem is that the bound on the number of iterations (that we
use to ensure that the computation does not overflow in this way) comes from a
different exit.  So, we need to be sure that the bound corresponds to the
currently considered exit.  Which surely is the case if either NITR (which is
an expression giving the number of iterations before the current exit is taken)
is a compile-time constant, or if there are no other exits from the loop.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-26  2:35               ` Zdenek Dvorak
@ 2010-05-26  3:17                 ` Xinliang David Li
  2010-05-27  1:31                 ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-26  3:17 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Tue, May 25, 2010 at 4:59 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> Which part is not loop invariant?
>> >
>> > it depends on the actual form of the use.  But in the most general case, the
>> > computation whose cost is determined here is ubase + ratio * (var - cbase), and
>> > no part of this is loop invariant (except for the force_var_costs of ubase and cbase).
>>
>> You mean the last 'else' branch?
>>
>>
>>   else
>>     {
>>       cost = force_var_cost (data, cbase, depends_on);
>>       cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
>>       cost = add_costs (cost,
>>                       difference_cost (data,
>>                                        ubase, build_int_cst (utype, 0),
>>                                        &symbol_present, &var_present,
>>                                        &offset, depends_on));
>>     }
>>
>> but I don't see how this cost estimate matches expression 'ubase +
>> ratio * (var - cbase)'.
>
> this code covers forcing cbase and ubase to registers (which is loop invariant)
> and subtracting (add_cost) cbase from var (which is not).

Ok, will hoist the adjustment into branches.

>
>>  Also at the end of this cost function before
>> 'fallback:' label, there is this
>>
>> if (aratio != 1)
>>   cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
>>
>> What is this for? Looks like it is for the last term 'ratio * var'  ?
>
> Yes.
>
>> >> The original fix to the problem is too conservative -- if there is
>> >> only one exit has the test to be replaced, it should be ok to do it,
>> >> right?
>> >
>> > Yes.  But I do not see your point -- your patch removes the loop_only_exit_p
>> > test, which is necessary.
>>
>>
>> Right, I want to discuss more about this. Looking at the original PR
>> (msg00146.html) -- the fundamental problem is that why 'cand_value_at'
>> call for the testing to be replaced returns a bound of value &c + 12
>> with nitr == 0x80000001? It seems t the wrapping is caused by wrong
>> compiler folding
>
> No.  We want to express the exit condition for 2147483647 iterations using
> the variable with evolution [&c + some constant, +, 4], whose period is smaller
> (1073741823 iterations).  This is obviously not possible.  That is, after 2147483647
> iterations (and two overflows), this induction variable will have value &c + 12, as
> the folding correctly determines.  However, this value is also achieved
> after only three iterations, which causes the loop to exit prematurely.  The
> fact that NITR is a compile-time constant has nothing to do with
> the problem, which would reproduce even if NITR was a variable with the same value.
>
> The cause of the problem is that the bound on the number of iterations (that we
> use to ensure that the computation does not overflow in this way) comes from a
> different exit.  So, we need to be sure that the bound corresponds to the
> currently considered exit.  Which surely is the case if either NITR (which is
> an expression giving the number of iterations before the current exit is taken)
> is a compile-time constant, or if there are no other exits from the loop.
>

Yes yes yes -- that is precise.

Thanks,

David



> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-26  2:35               ` Zdenek Dvorak
  2010-05-26  3:17                 ` Xinliang David Li
@ 2010-05-27  1:31                 ` Xinliang David Li
  2010-05-27  9:12                   ` Zdenek Dvorak
  1 sibling, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-27  1:31 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]

The new patch incorporated your comments. Since multiple exit loops
are very common, not being able to do test replacement is a big flaw,
so I made additional changes to allow it in a safe way. Four more test
cases are added.

bootstrap and retested (the new added dump causes existing ivopts-3.c
to fail and the test case has yet to be fixed later).

Thanks,

David

On Tue, May 25, 2010 at 4:59 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> Which part is not loop invariant?
>> >
>> > it depends on the actual form of the use.  But in the most general case, the
>> > computation whose cost is determined here is ubase + ratio * (var - cbase), and
>> > no part of this is loop invariant (except for the force_var_costs of ubase and cbase).
>>
>> You mean the last 'else' branch?
>>
>>
>>   else
>>     {
>>       cost = force_var_cost (data, cbase, depends_on);
>>       cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
>>       cost = add_costs (cost,
>>                       difference_cost (data,
>>                                        ubase, build_int_cst (utype, 0),
>>                                        &symbol_present, &var_present,
>>                                        &offset, depends_on));
>>     }
>>
>> but I don't see how this cost estimate matches expression 'ubase +
>> ratio * (var - cbase)'.
>
> this code covers forcing cbase and ubase to registers (which is loop invariant)
> and subtracting (add_cost) cbase from var (which is not).
>
>>  Also at the end of this cost function before
>> 'fallback:' label, there is this
>>
>> if (aratio != 1)
>>   cost.cost += multiply_by_cost (aratio, TYPE_MODE (ctype), speed);
>>
>> What is this for? Looks like it is for the last term 'ratio * var'  ?
>
> Yes.
>
>> >> The original fix to the problem is too conservative -- if there is
>> >> only one exit has the test to be replaced, it should be ok to do it,
>> >> right?
>> >
>> > Yes.  But I do not see your point -- your patch removes the loop_only_exit_p
>> > test, which is necessary.
>>
>>
>> Right, I want to discuss more about this. Looking at the original PR
>> (msg00146.html) -- the fundamental problem is that why 'cand_value_at'
>> call for the testing to be replaced returns a bound of value &c + 12
>> with nitr == 0x80000001? It seems t the wrapping is caused by wrong
>> compiler folding
>
> No.  We want to express the exit condition for 2147483647 iterations using
> the variable with evolution [&c + some constant, +, 4], whose period is smaller
> (1073741823 iterations).  This is obviously not possible.  That is, after 2147483647
> iterations (and two overflows), this induction variable will have value &c + 12, as
> the folding correctly determines.  However, this value is also achieved
> after only three iterations, which causes the loop to exit prematurely.  The
> fact that NITR is a compile-time constant has nothing to do with
> the problem, which would reproduce even if NITR was a variable with the same value.
>
> The cause of the problem is that the bound on the number of iterations (that we
> use to ensure that the computation does not overflow in this way) comes from a
> different exit.  So, we need to be sure that the bound corresponds to the
> currently considered exit.  Which surely is the case if either NITR (which is
> an expression giving the number of iterations before the current exit is taken)
> is a compile-time constant, or if there are no other exits from the loop.
>
> Zdenek
>

[-- Attachment #2: ivopts_latest_part1_r2.p --]
[-- Type: text/x-pascal, Size: 20945 bytes --]

Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c	(revision 159362)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -2498,6 +2498,7 @@ record_estimate (struct loop *loop, tree
 {
   double_int delta;
   edge exit;
+  struct nb_iter_bound *elt = NULL;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -2522,7 +2523,7 @@ record_estimate (struct loop *loop, tree
      list.  */
   if (upper)
     {
-      struct nb_iter_bound *elt = GGC_NEW (struct nb_iter_bound);
+      elt = GGC_CNEW (struct nb_iter_bound);
 
       elt->bound = i_bound;
       elt->stmt = at_stmt;
@@ -2550,6 +2551,11 @@ record_estimate (struct loop *loop, tree
   if (double_int_ucmp (i_bound, delta) < 0)
     return;
 
+  if (is_exit && upper)
+    {
+      elt->nb_iterations_upper_bound = i_bound;
+      elt->has_upper_bound = true;
+    }
   record_niter_bound (loop, i_bound, realistic, upper);
 }
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+/* Testing that only one induction variable is selected after IVOPT on
+   the given target instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Testing on the given target, only one iv candidate instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Make sure only 1 iv candidate is selected after IVOPT.  */
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+/* Make sure only 1 iv candidate is selected.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Test if (p2 > p_limit2) can be replaced, so iv p2 can be
+ * eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit = p + N1;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (p  <= p_limit)
+    {
+      p++;
+      p2++;
+      if (p2 > p_limit2)
+        break;
+      s += (*p);
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Exit tests i < N1 and p2 > p_limit2 can be replaced, so
+ * two ivs i and p2 can be eliminate.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+       p++;
+       p2++;
+       i++;
+       if (p2 > p_limit2)
+         break;
+       s += (*p);
+    }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv p2 can be eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i++;
+      if (p2 > p_limit2)
+        break;
+      s += p[i];
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
@@ -0,0 +1,25 @@
+
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv i's step 16 so its period is smaller than the max iterations
+ * i.e. replacing if (p2 > p_limit2) with testing of i may result in
+ * overflow.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i+=16;
+      if (p2 > p_limit2)
+        break;
+     s += p[i];
+  }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,26 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline HOST_WIDE_INT
+avg_loop_niter (struct loop *loop)
+{
+  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  if (niter == -1)
+    return AVG_LOOP_NITER (loop);
+
+  return niter;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -513,6 +525,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -1822,7 +1847,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2138,7 +2163,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3779,6 +3806,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -3786,6 +3814,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -3799,16 +3828,18 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
+      cost.cost /= avg_loop_niter (data->current_loop);
+      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -3841,7 +3872,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +3942,7 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4049,27 +4081,24 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      if (!tree_int_cst_lt (nit, period))
+      if (!tree_int_cst_lt (nit, period)
+          && !tree_int_cst_equal (nit, period))
 	return false;
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
-      if (!estimated_loop_iterations (loop, true, &max_niter))
+      if (!estimated_loop_iterations_exit (loop, &max_niter, use->stmt))
 	return false;
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
+      if (double_int_ucmp (max_niter, period_value) > 0)
 	return false;
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4135,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4382,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4541,7 +4570,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4625,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4871,8 +4900,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,17 +4922,18 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
    new set, and store differences in DELTA.  Number of induction variables
-   in the new set is stored to N_IVS.  */
+   in the new set is stored to N_IVS. MIN_NCAND is a flag. When it is true
+   the function will try to find a solution with mimimal iv candidates.  */
 
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5153,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5187,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5247,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5330,7 +5374,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5358,8 +5401,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5582,6 +5635,11 @@ rewrite_use_compare (struct ivopts_data 
       tree var_type = TREE_TYPE (var);
       gimple_seq stmts;
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Replacing exit test: ");
+          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+        }
       compare = iv_elimination_compare (data, use);
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	(revision 159362)
+++ gcc/tree-data-ref.c	(working copy)
@@ -1692,6 +1692,33 @@ estimated_loop_iterations (struct loop *
   return true;
 }
 
+/* Sets NIT to the upper bound of the number of executions of the statements in
+   LOOP according to EXIT_STMT. If we have no reliable estimate, the function
+   returns false, otherwise returns true.  */
+
+bool
+estimated_loop_iterations_exit (struct loop *loop, double_int *nit,
+                                gimple exit_stmt)
+{
+  struct nb_iter_bound *bound = NULL;
+  bool found = false;
+
+  estimate_numbers_of_iterations_loop (loop);
+  for (bound = loop->bounds; bound; bound = bound->next)
+    {
+      if (bound->stmt == exit_stmt)
+        {
+          found = true;
+          break;
+        }
+    }
+  if (!found || !bound->has_upper_bound)
+    return false;
+
+  *nit = bound->nb_iterations_upper_bound;
+  return true;
+}
+
 /* Similar to estimated_loop_iterations, but returns the estimate only
    if it fits to HOST_WIDE_INT.  If this is not the case, or the estimate
    on the number of iterations of LOOP could not be derived, returns -1.  */
Index: gcc/cfgloop.h
===================================================================
--- gcc/cfgloop.h	(revision 159362)
+++ gcc/cfgloop.h	(working copy)
@@ -63,6 +63,12 @@ struct GTY ((chain_next ("%h.next"))) nb
      are executed at most BOUND times.  */
   bool is_exit;
 
+  /* True if nb_iterations_upper_bound is available.  */
+  bool has_upper_bound;
+  /* This field is for exit stmt only. It is the max number of iterations any
+     iterations can be executed according to the bound of this exit condition.  */
+  double_int nb_iterations_upper_bound;
+
   /* The next bound in the list.  */
   struct nb_iter_bound *next;
 };
@@ -277,6 +283,7 @@ extern rtx doloop_condition_get (rtx);
 void estimate_numbers_of_iterations_loop (struct loop *);
 HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
 bool estimated_loop_iterations (struct loop *, bool, double_int *);
+bool estimated_loop_iterations_exit (struct loop *, double_int *, gimple);
 
 /* Loop manipulation.  */
 extern bool can_duplicate_loop_p (const struct loop *loop);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27  1:31                 ` Xinliang David Li
@ 2010-05-27  9:12                   ` Zdenek Dvorak
  2010-05-27 17:33                     ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-27  9:12 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> Index: gcc/tree-ssa-loop-niter.c
> ===================================================================
> --- gcc/tree-ssa-loop-niter.c	(revision 159362)
> +++ gcc/tree-ssa-loop-niter.c	(working copy)
> @@ -2498,6 +2498,7 @@ record_estimate (struct loop *loop, tree
>  {
>    double_int delta;
>    edge exit;
> +  struct nb_iter_bound *elt = NULL;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -2522,7 +2523,7 @@ record_estimate (struct loop *loop, tree
>       list.  */
>    if (upper)
>      {
> -      struct nb_iter_bound *elt = GGC_NEW (struct nb_iter_bound);
> +      elt = GGC_CNEW (struct nb_iter_bound);
>  
>        elt->bound = i_bound;
>        elt->stmt = at_stmt;
> @@ -2550,6 +2551,11 @@ record_estimate (struct loop *loop, tree
>    if (double_int_ucmp (i_bound, delta) < 0)
>      return;
>  
> +  if (is_exit && upper)
> +    {
> +      elt->nb_iterations_upper_bound = i_bound;
> +      elt->has_upper_bound = true;
> +    }
>    record_niter_bound (loop, i_bound, realistic, upper);
>  }

I don't think nb_iterations_upper_bound is necessary.  You already have the
bound in elt->bound (the bound that you record is increased by one, since the
statements before the exit can be executed one more time than the loop latch,
but the code in may_eliminate_iv expects the number that is stored in
elt->bound).  Furthermore, has_upper_bound also seems unnecessary, since
elt is only recorded in the list if it is an upper bound.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25  0:17     ` Xinliang David Li
  2010-05-25 10:46       ` Zdenek Dvorak
  2010-05-25 18:10       ` Toon Moene
@ 2010-05-27  9:28       ` Zdenek Dvorak
  2010-05-27 17:51         ` Xinliang David Li
  2010-05-28  9:57       ` Zdenek Dvorak
  2010-06-05  9:01       ` Zdenek Dvorak
  4 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-27  9:28 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> patch-2:
> 
> This patch address PR31792 -- sinking computation of replaced IV out
> of the loop when it is live outside loop only.

ivopts seems like a wrong place for this optimization; it is already quite
complicated as it is, and should not do things that are only marginally
related.  Won't scheduling pass_sink_code after ivopts do the same thing?

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27  9:12                   ` Zdenek Dvorak
@ 2010-05-27 17:33                     ` Xinliang David Li
  2010-05-28  9:14                       ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-27 17:33 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2058 bytes --]

On Thu, May 27, 2010 at 12:56 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> Index: gcc/tree-ssa-loop-niter.c
>> ===================================================================
>> --- gcc/tree-ssa-loop-niter.c (revision 159362)
>> +++ gcc/tree-ssa-loop-niter.c (working copy)
>> @@ -2498,6 +2498,7 @@ record_estimate (struct loop *loop, tree
>>  {
>>    double_int delta;
>>    edge exit;
>> +  struct nb_iter_bound *elt = NULL;
>>
>>    if (dump_file && (dump_flags & TDF_DETAILS))
>>      {
>> @@ -2522,7 +2523,7 @@ record_estimate (struct loop *loop, tree
>>       list.  */
>>    if (upper)
>>      {
>> -      struct nb_iter_bound *elt = GGC_NEW (struct nb_iter_bound);
>> +      elt = GGC_CNEW (struct nb_iter_bound);
>>
>>        elt->bound = i_bound;
>>        elt->stmt = at_stmt;
>> @@ -2550,6 +2551,11 @@ record_estimate (struct loop *loop, tree
>>    if (double_int_ucmp (i_bound, delta) < 0)
>>      return;
>>
>> +  if (is_exit && upper)
>> +    {
>> +      elt->nb_iterations_upper_bound = i_bound;
>> +      elt->has_upper_bound = true;
>> +    }
>>    record_niter_bound (loop, i_bound, realistic, upper);
>>  }
>
> I don't think nb_iterations_upper_bound is necessary.  You already have the
> bound in elt->bound (the bound that you record is increased by one, since the
> statements before the exit can be executed one more time than the loop latch,
> but the code in may_eliminate_iv expects the number that is stored in
> elt->bound).

may_eliminate_iv actually checks loop->nb_iterations_upper_bound which
is the merge of 'elt->bound + 1' -- that is why I introduced the new
member -- but I think this is more conservative than needed -- so
using elt->bound should be good.

> Furthermore, has_upper_bound also seems unnecessary, since
> elt is only recorded in the list if it is an upper bound.

This was needed because after adding the delta, the upper bound may
overflow and becomes 'unavailable'.


Ok for this version?

Thanks,

David

>
> Zdenek
>

[-- Attachment #2: ivopts_latest_part1_r3.p --]
[-- Type: text/x-pascal, Size: 19442 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+/* Testing that only one induction variable is selected after IVOPT on
+   the given target instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Testing on the given target, only one iv candidate instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Make sure only 1 iv candidate is selected after IVOPT.  */
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+/* Make sure only 1 iv candidate is selected.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Test if (p2 > p_limit2) can be replaced, so iv p2 can be
+ * eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit = p + N1;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (p  <= p_limit)
+    {
+      p++;
+      p2++;
+      if (p2 > p_limit2)
+        break;
+      s += (*p);
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Exit tests i < N1 and p2 > p_limit2 can be replaced, so
+ * two ivs i and p2 can be eliminate.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+       p++;
+       p2++;
+       i++;
+       if (p2 > p_limit2)
+         break;
+       s += (*p);
+    }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv p2 can be eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i++;
+      if (p2 > p_limit2)
+        break;
+      s += p[i];
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
@@ -0,0 +1,25 @@
+
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv i's step 16 so its period is smaller than the max iterations
+ * i.e. replacing if (p2 > p_limit2) with testing of i may result in
+ * overflow.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i+=16;
+      if (p2 > p_limit2)
+        break;
+     s += p[i];
+  }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,26 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline HOST_WIDE_INT
+avg_loop_niter (struct loop *loop)
+{
+  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  if (niter == -1)
+    return AVG_LOOP_NITER (loop);
+
+  return niter;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -513,6 +525,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -1822,7 +1847,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2138,7 +2163,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3779,6 +3806,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -3786,6 +3814,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -3799,16 +3828,18 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
+      cost.cost /= avg_loop_niter (data->current_loop);
+      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -3841,7 +3872,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +3942,7 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -4049,27 +4081,24 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      if (!tree_int_cst_lt (nit, period))
+      if (!tree_int_cst_lt (nit, period)
+          && !tree_int_cst_equal (nit, period))
 	return false;
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
-      if (!estimated_loop_iterations (loop, true, &max_niter))
+      if (!estimated_loop_iterations_exit (loop, &max_niter, use->stmt))
 	return false;
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
+      if (double_int_ucmp (max_niter, period_value) > 0)
 	return false;
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4135,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4382,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4541,7 +4570,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4625,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4871,8 +4900,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,17 +4922,18 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
    new set, and store differences in DELTA.  Number of induction variables
-   in the new set is stored to N_IVS.  */
+   in the new set is stored to N_IVS. MIN_NCAND is a flag. When it is true
+   the function will try to find a solution with mimimal iv candidates.  */
 
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +4957,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5153,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5187,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5247,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5330,7 +5374,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5358,8 +5401,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5582,6 +5635,11 @@ rewrite_use_compare (struct ivopts_data 
       tree var_type = TREE_TYPE (var);
       gimple_seq stmts;
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Replacing exit test: ");
+          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+        }
       compare = iv_elimination_compare (data, use);
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	(revision 159362)
+++ gcc/tree-data-ref.c	(working copy)
@@ -1692,6 +1692,33 @@ estimated_loop_iterations (struct loop *
   return true;
 }
 
+/* Sets NIT to the upper bound of the number of executions of the statements in
+   LOOP according to EXIT_STMT. If we have no reliable estimate, the function
+   returns false, otherwise returns true.  */
+
+bool
+estimated_loop_iterations_exit (struct loop *loop, double_int *nit,
+                                gimple exit_stmt)
+{
+  struct nb_iter_bound *bound = NULL;
+  bool found = false;
+
+  estimate_numbers_of_iterations_loop (loop);
+  for (bound = loop->bounds; bound; bound = bound->next)
+    {
+      if (bound->stmt == exit_stmt)
+        {
+          found = true;
+          break;
+        }
+    }
+  if (!found)
+    return false;
+
+  *nit = bound->bound;
+  return true;
+}
+
 /* Similar to estimated_loop_iterations, but returns the estimate only
    if it fits to HOST_WIDE_INT.  If this is not the case, or the estimate
    on the number of iterations of LOOP could not be derived, returns -1.  */
Index: gcc/cfgloop.h
===================================================================
--- gcc/cfgloop.h	(revision 159362)
+++ gcc/cfgloop.h	(working copy)
@@ -277,6 +277,7 @@ extern rtx doloop_condition_get (rtx);
 void estimate_numbers_of_iterations_loop (struct loop *);
 HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
 bool estimated_loop_iterations (struct loop *, bool, double_int *);
+bool estimated_loop_iterations_exit (struct loop *, double_int *, gimple);
 
 /* Loop manipulation.  */
 extern bool can_duplicate_loop_p (const struct loop *loop);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27  9:28       ` Zdenek Dvorak
@ 2010-05-27 17:51         ` Xinliang David Li
  2010-05-27 22:48           ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-27 17:51 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Thu, May 27, 2010 at 2:14 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-2:
>>
>> This patch address PR31792 -- sinking computation of replaced IV out
>> of the loop when it is live outside loop only.
>
> ivopts seems like a wrong place for this optimization; it is already quite
> complicated as it is, and should not do things that are only marginally
> related.  Won't scheduling pass_sink_code after ivopts do the same thing?

There are reasons pass_sink_code can not do this -- the RHS of the
computation that can be sinked are loop variant -- though only the
value from last iteration matters.

What is more important is that this has impact on the cost computation
and iv selection -- without this, the use (nonlinear) cost of  the
live out only ivs can be high and the IVOPT may end up keeping the
original induction variable in the loop.

The changes in the patch are mostly independent of the rest of the code.

Thanks,

David

>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27 17:51         ` Xinliang David Li
@ 2010-05-27 22:48           ` Zdenek Dvorak
  2010-05-27 23:41             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-27 22:48 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> >> patch-2:
> >>
> >> This patch address PR31792 -- sinking computation of replaced IV out
> >> of the loop when it is live outside loop only.
> >
> > ivopts seems like a wrong place for this optimization; it is already quite
> > complicated as it is, and should not do things that are only marginally
> > related. Â Won't scheduling pass_sink_code after ivopts do the same thing?
> 
> There are reasons pass_sink_code can not do this -- the RHS of the
> computation that can be sinked are loop variant

that should not be a problem, it suffices that the computed value is only
used on the exit edge.

> -- though only the
> value from last iteration matters.
> 
> What is more important is that this has impact on the cost computation
> and iv selection -- without this, the use (nonlinear) cost of  the
> live out only ivs can be high and the IVOPT may end up keeping the
> original induction variable in the loop.

Well, you can just adjust the cost for the expected sinking, and let
pass_sink_code do it; that is the simple part of the patch.  Btw.

+
+  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY && !infinite_cost_p (cost))
+    cost.cost /= AVG_LOOP_NITER (data->current_loop);

This causes the loop invariant parts of the cost to be divided by AVG_LOOP_NITER
twice.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27 22:48           ` Zdenek Dvorak
@ 2010-05-27 23:41             ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-27 23:41 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Thu, May 27, 2010 at 3:17 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> patch-2:
>> >>
>> >> This patch address PR31792 -- sinking computation of replaced IV out
>> >> of the loop when it is live outside loop only.
>> >
>> > ivopts seems like a wrong place for this optimization; it is already quite
>> > complicated as it is, and should not do things that are only marginally
>> > related.  Won't scheduling pass_sink_code after ivopts do the same thing?
>>
>> There are reasons pass_sink_code can not do this -- the RHS of the
>> computation that can be sinked are loop variant
>
> that should not be a problem, it suffices that the computed value is only
> used on the exit edge.

>
>> -- though only the
>> value from last iteration matters.


What I meant is that tree-sink pass might not be able to handle it --
and I verified that it indeed did not handle it. The good news is that
sccp pass can do just this.


>>
>> What is more important is that this has impact on the cost computation
>> and iv selection -- without this, the use (nonlinear) cost of  the
>> live out only ivs can be high and the IVOPT may end up keeping the
>> original induction variable in the loop.
>
> Well, you can just adjust the cost for the expected sinking, and let
> pass_sink_code do it; that is the simple part of the patch.  Btw.

Ok, this sounds reasonable -- the only risk is that if the later pass
does not do the job as expected, we will generate lousy code.


>
> +
> +  if (use->use_pos == IU_OUTSIDE_LOOP_ONLY && !infinite_cost_p (cost))
> +    cost.cost /= AVG_LOOP_NITER (data->current_loop);
>
> This causes the loop invariant parts of the cost to be divided by AVG_LOOP_NITER
> twice.


Good catch. The adjustment needs to be done only on part of them.

Thanks,

David
>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-27 17:33                     ` Xinliang David Li
@ 2010-05-28  9:14                       ` Zdenek Dvorak
  2010-05-28 23:51                         ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-28  9:14 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> > I don't think nb_iterations_upper_bound is necessary. Â You already have the
> > bound in elt->bound (the bound that you record is increased by one, since the
> > statements before the exit can be executed one more time than the loop latch,
> > but the code in may_eliminate_iv expects the number that is stored in
> > elt->bound).
> 
> may_eliminate_iv actually checks loop->nb_iterations_upper_bound which
> is the merge of 'elt->bound + 1' -- that is why I introduced the new
> member -- but I think this is more conservative than needed -- so
> using elt->bound should be good.

let's not guess about this -- we should have a testcase for the boundary values
(e.g., one iteration before overflow, overflow, one iteration after overflow).

Also, note that for loops with one exit, your change actually makes the test weaker.
For instance, before your change, we could deduce that

int a[100];
for (i = 0; i < n; i++)
  a[i] = i;

iterates at most 100 times.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25  0:17     ` Xinliang David Li
                         ` (2 preceding siblings ...)
  2010-05-27  9:28       ` Zdenek Dvorak
@ 2010-05-28  9:57       ` Zdenek Dvorak
  2010-06-01 23:13         ` Xinliang David Li
  2010-06-05  9:01       ` Zdenek Dvorak
  4 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-28  9:57 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> patch-4:
> 
> A simple local optimization that reorders iv update statement with
> preceding target_mem_ref so that instruction combining can happen in
> later phases.

> +/* Performs a peephole optimization to reorder the iv update statement with
> +   a mem ref to enable instruction combining in later phases. The mem ref uses
> +   the iv value before the update, so the reordering transformation requires
> +   adjustment of the offset. CAND is the selected IV_CAND.
> +
> +   Example:
> +
> +   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
> +   iv2 = iv1 + 1;
> +
> +   if (t < val)      (1)
> +     goto L;
> +   goto Head;
> +
> +
> +   directly propagating t over to (1) will introduce overlapping live range
> +   thus increase register pressure. This peephole transform it into:
> +
> +
> +   iv2 = iv1 + 1;
> +   t = MEM_REF (base, iv2, 8, 8);
> +   if (t < val)
> +     goto L;
> +   goto Head;
> +*/

looks reasonable.  Just two notes:
1) you should check whether the new value of the offset is allowed for the
   current architecture.
2) rather than rewriting the resulting code, it might be easier to change the
   position of the candidate (to IP_BEFORE_USE for the use in the memory
   reference) before create_new_ivs is run.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-28  9:14                       ` Zdenek Dvorak
@ 2010-05-28 23:51                         ` Xinliang David Li
  2010-05-29 16:57                           ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-28 23:51 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On Fri, May 28, 2010 at 1:50 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> > I don't think nb_iterations_upper_bound is necessary.  You already have the
>> > bound in elt->bound (the bound that you record is increased by one, since the
>> > statements before the exit can be executed one more time than the loop latch,
>> > but the code in may_eliminate_iv expects the number that is stored in
>> > elt->bound).
>>
>> may_eliminate_iv actually checks loop->nb_iterations_upper_bound which
>> is the merge of 'elt->bound + 1' -- that is why I introduced the new
>> member -- but I think this is more conservative than needed -- so
>> using elt->bound should be good.
>
> let's not guess about this -- we should have a testcase for the boundary values
> (e.g., one iteration before overflow, overflow, one iteration after overflow).

elt->bound computes the number of iterations where the exit condition
evaluates to true which means if exit test is bottom test, elt->bound
is one less than the number of iterations.  In fact the overflow check
in none of the revisions (including the original) is correct.  What
matters is that 'cand_value_at' won't overflow using cand -- this
means the max_niter should match 'niter''s semantics, and the overflow
check should match the computation in cand_value_at. The latest
revision attached cleans this up (hopefully). (BTW, it is not easy to
create off by one test cases)


>
> Also, note that for loops with one exit, your change actually makes the test weaker.
> For instance, before your change, we could deduce that
>
> int a[100];
> for (i = 0; i < n; i++)
>  a[i] = i;
>
> iterates at most 100 times.

Fixed and added two test cases.

(Note -- one more bug in the original code was found and fixed -- the
period computation is wrong when step is not power of 2).

Thanks,

David


>
> Zdenek
>

[-- Attachment #2: ivopts_latest_part1_r4.p --]
[-- Type: text/x-pascal, Size: 24482 bytes --]

Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+/* Testing that only one induction variable is selected after IVOPT on
+   the given target instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Testing on the given target, only one iv candidate instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Make sure only 1 iv candidate is selected after IVOPT.  */
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+int a[400];
+
+/* Testing inferred loop iteration from array -- exit test can be replaced.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+      TYPE dst0 = dst;
+      unsigned long long i = 0;
+       for( ; dst <= dstn; )
+       {
+           dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) >> 1;
+           dst++;
+	   i += 7;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+/* Make sure only 1 iv candidate is selected.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+extern int a[];
+
+/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+      TYPE dst0 = dst;
+      unsigned long long i = 0;
+       for( ; dst <= dstn; )
+       {
+           dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) >> 1;
+           dst++;
+	   i += 7;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be
+ * eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit = p + N1;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (p  <= p_limit)
+    {
+      p++;
+      p2++;
+      if (p2 > p_limit2)
+        break;
+      s += (*p);
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
+ * two ivs i and p2 can be eliminate.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+       p++;
+       p2++;
+       i++;
+       if (p2 > p_limit2)
+         break;
+       s += (*p);
+    }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv p2 can be eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i++;
+      if (p2 > p_limit2)
+        break;
+      s += p[i];
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
@@ -0,0 +1,25 @@
+
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv i's step 16 so its period is smaller than the max iterations
+ * i.e. replacing if (p2 > p_limit2) with testing of i may result in
+ * overflow.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i += 16;
+      if (p2 > p_limit2)
+        break;
+     s += p[i];
+  }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 159362)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,26 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline HOST_WIDE_INT
+avg_loop_niter (struct loop *loop)
+{
+  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  if (niter == -1)
+    return AVG_LOOP_NITER (loop);
+
+  return niter;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -513,6 +525,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -706,9 +731,10 @@ contains_abnormal_ssa_name_p (tree expr)
     EXIT of DATA->current_loop, or NULL if something goes wrong.  */
 
 static tree
-niter_for_exit (struct ivopts_data *data, edge exit)
+niter_for_exit (struct ivopts_data *data, edge exit,
+                struct tree_niter_desc **desc_p)
 {
-  struct tree_niter_desc desc;
+  struct tree_niter_desc* desc = NULL;
   tree niter;
   void **slot;
 
@@ -727,19 +753,24 @@ niter_for_exit (struct ivopts_data *data
 	 being zero).  Also, we cannot safely work with ssa names that
 	 appear in phi nodes on abnormal edges, so that we do not create
 	 overlapping life ranges for them (PR 27283).  */
+      desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
-				     exit, &desc, true)
-	  && integer_zerop (desc.may_be_zero)
-     	  && !contains_abnormal_ssa_name_p (desc.niter))
-	niter = desc.niter;
+				     exit, desc, true)
+	  && integer_zerop (desc->may_be_zero)
+     	  && !contains_abnormal_ssa_name_p (desc->niter))
+	niter = desc->niter;
       else
 	niter = NULL_TREE;
 
-      *pointer_map_insert (data->niters, exit) = niter;
+      desc->niter = niter;
+      slot = pointer_map_insert (data->niters, exit);
+      *slot = desc;
     }
   else
-    niter = (tree) *slot;
+    niter = ((struct tree_niter_desc *) *slot)->niter;
 
+  if (desc_p)
+    *desc_p = (struct tree_niter_desc *) *slot;
   return niter;
 }
 
@@ -755,7 +786,7 @@ niter_for_single_dom_exit (struct ivopts
   if (!exit)
     return NULL;
 
-  return niter_for_exit (data, exit);
+  return niter_for_exit (data, exit, NULL);
 }
 
 /* Initializes data structures used by the iv optimization pass, stored
@@ -1822,7 +1853,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2138,7 +2169,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3779,6 +3812,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -3786,6 +3820,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -3799,16 +3834,18 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
+      cost.cost /= avg_loop_niter (data->current_loop);
+      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -3841,7 +3878,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +3948,7 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -3977,15 +4015,27 @@ iv_period (struct iv *iv)
 
   gcc_assert (step && TREE_CODE (step) == INTEGER_CST);
 
+  type = unsigned_type_for (TREE_TYPE (step));
   /* Period of the iv is gcd (step, type range).  Since type range is power
      of two, it suffices to determine the maximum power of two that divides
      step.  */
-  pow2div = num_ending_zeros (step);
-  type = unsigned_type_for (TREE_TYPE (step));
+  if (integer_pow2p (step))
+    {
+      pow2div = num_ending_zeros (step);
 
-  period = build_low_bits_mask (type,
-				(TYPE_PRECISION (type)
-				 - tree_low_cst (pow2div, 1)));
+      period = build_low_bits_mask (type,
+                                    (TYPE_PRECISION (type)
+                                     - tree_low_cst (pow2div, 1)));
+    }
+  else
+    {
+      double_int type_val_range, step_val, period_val;
+
+      type_val_range = tree_to_double_int (TYPE_MAX_VALUE (type));
+      step_val = tree_to_double_int (step);
+      period_val = double_int_udiv (type_val_range, step_val, FLOOR_DIV_EXPR);
+      period = double_int_to_tree (type, period_val);
+    }
 
   return period;
 }
@@ -4019,6 +4069,7 @@ may_eliminate_iv (struct ivopts_data *da
   tree nit, period;
   struct loop *loop = data->current_loop;
   aff_tree bnd;
+  struct tree_niter_desc *desc = NULL;
 
   if (TREE_CODE (cand->iv->step) != INTEGER_CST)
     return false;
@@ -4037,7 +4088,7 @@ may_eliminate_iv (struct ivopts_data *da
   if (flow_bb_inside_loop_p (loop, exit->dest))
     return false;
 
-  nit = niter_for_exit (data, exit);
+  nit = niter_for_exit (data, exit, &desc);
   if (!nit)
     return false;
 
@@ -4049,27 +4100,46 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      if (!tree_int_cst_lt (nit, period))
-	return false;
+      /* See cand_value_at.  */
+      if (stmt_after_increment (loop, cand, use->stmt))
+        {
+          if (!tree_int_cst_lt (nit, period))
+            return false;
+        }
+      else
+        {
+          if (tree_int_cst_lt (period, nit))
+            return false;
+        }
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
-      if (!estimated_loop_iterations (loop, true, &max_niter))
-	return false;
+
+      max_niter = desc->max;
+      if (stmt_after_increment (loop, cand, use->stmt))
+        max_niter = double_int_add (max_niter, double_int_one);
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
-	return false;
+      if (double_int_ucmp (max_niter, period_value) > 0)
+        {
+          /* See if we can take advantage of infered loop bound information.  */
+          if (loop_only_exit_p (loop, exit))
+            {
+              if (!estimated_loop_iterations (loop, true, &max_niter))
+                return false;
+              /* The loop bound is already adjusted by adding 1.  */
+              if (double_int_ucmp (max_niter, period_value) > 0)
+                return false;
+            }
+          else
+            return false;
+        }
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4176,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4423,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4541,7 +4611,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4666,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4871,8 +4941,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,17 +4963,18 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
    new set, and store differences in DELTA.  Number of induction variables
-   in the new set is stored to N_IVS.  */
+   in the new set is stored to N_IVS. MIN_NCAND is a flag. When it is true
+   the function will try to find a solution with mimimal iv candidates.  */
 
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +4998,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5194,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5228,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5288,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5330,7 +5415,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5358,8 +5442,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5582,6 +5676,11 @@ rewrite_use_compare (struct ivopts_data 
       tree var_type = TREE_TYPE (var);
       gimple_seq stmts;
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Replacing exit test: ");
+          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+        }
       compare = iv_elimination_compare (data, use);
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
@@ -5683,6 +5782,20 @@ remove_unused_ivs (struct ivopts_data *d
   BITMAP_FREE (toremove);
 }
 
+/* Frees memory occupied by struct tree_niter_desc in *VALUE. Callback
+   for pointer_map_traverse.  */
+
+static
+bool
+free_tree_niter_desc (const void *key ATTRIBUTE_UNUSED, void **value,
+                      void *data ATTRIBUTE_UNUSED)
+{
+  struct tree_niter_desc *const niter = (struct tree_niter_desc *) *value;
+
+  free (niter);
+  return true;
+}
+
 /* Frees data allocated by the optimization of a single loop.  */
 
 static void
@@ -5694,6 +5807,7 @@ free_loop_data (struct ivopts_data *data
 
   if (data->niters)
     {
+      pointer_map_traverse (data->niters, free_tree_niter_desc, NULL);
       pointer_map_destroy (data->niters);
       data->niters = NULL;
     }

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-28 23:51                         ` Xinliang David Li
@ 2010-05-29 16:57                           ` Zdenek Dvorak
  2010-05-29 19:51                             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-29 16:57 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> > Also, note that for loops with one exit, your change actually makes the test weaker.
> > For instance, before your change, we could deduce that
> >
> > int a[100];
> > for (i = 0; i < n; i++)
> > Â a[i] = i;
> >
> > iterates at most 100 times.
> 
> Fixed and added two test cases.
> 
> (Note -- one more bug in the original code was found and fixed -- the
> period computation is wrong when step is not power of 2).

that is wrong, the original computation is correct.  If step is (e.g.) odd,
then it takes (range of type) iterations before the variable achieves the same
value (that it overflows in the meantime several times does not matter, since
we are careful to use the type in that overflow has defined semantics, and
we test for equality in the replacement condition),

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-29 16:57                           ` Zdenek Dvorak
@ 2010-05-29 19:51                             ` Xinliang David Li
  2010-05-29 20:18                               ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-05-29 19:51 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Sat, May 29, 2010 at 8:22 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> > Also, note that for loops with one exit, your change actually makes the test weaker.
>> > For instance, before your change, we could deduce that
>> >
>> > int a[100];
>> > for (i = 0; i < n; i++)
>> >  a[i] = i;
>> >
>> > iterates at most 100 times.
>>
>> Fixed and added two test cases.
>>
>> (Note -- one more bug in the original code was found and fixed -- the
>> period computation is wrong when step is not power of 2).
>
> that is wrong, the original computation is correct.  If step is (e.g.) odd,
> then it takes (range of type) iterations before the variable achieves the same
> value (that it overflows in the meantime several times does not matter, since
> we are careful to use the type in that overflow has defined semantics, and
> we test for equality in the replacement condition),

The overflow semantics is indeed different -- it is also true for any
iv cand with non zero base. The period is really LCM (type_range,
step)/step - 1 --- the computation in original code matches this --
but the comment seems wrong.

Thanks,

David

>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-29 19:51                             ` Xinliang David Li
@ 2010-05-29 20:18                               ` Zdenek Dvorak
  2010-05-30  0:22                                 ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-05-29 20:18 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> >> > Also, note that for loops with one exit, your change actually makes the test weaker.
> >> > For instance, before your change, we could deduce that
> >> >
> >> > int a[100];
> >> > for (i = 0; i < n; i++)
> >> > Â a[i] = i;
> >> >
> >> > iterates at most 100 times.
> >>
> >> Fixed and added two test cases.
> >>
> >> (Note -- one more bug in the original code was found and fixed -- the
> >> period computation is wrong when step is not power of 2).
> >
> > that is wrong, the original computation is correct. Â If step is (e.g.) odd,
> > then it takes (range of type) iterations before the variable achieves the same
> > value (that it overflows in the meantime several times does not matter, since
> > we are careful to use the type in that overflow has defined semantics, and
> > we test for equality in the replacement condition),
> 
> The overflow semantics is indeed different -- it is also true for any
> iv cand with non zero base. The period is really LCM (type_range,
> step)/step - 1 --- the computation in original code matches this --
> but the comment seems wrong.

yes, the comment needs to be fixed,

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-29 20:18                               ` Zdenek Dvorak
@ 2010-05-30  0:22                                 ` Xinliang David Li
       [not found]                                   ` <20100604105451.GB5105@kam.mff.cuni.cz>
  2010-12-30 17:23                                   ` H.J. Lu
  0 siblings, 2 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-05-30  0:22 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

patch-1 ok for this revision?

David

On Sat, May 29, 2010 at 12:14 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> >> > Also, note that for loops with one exit, your change actually makes the test weaker.
>> >> > For instance, before your change, we could deduce that
>> >> >
>> >> > int a[100];
>> >> > for (i = 0; i < n; i++)
>> >> >  a[i] = i;
>> >> >
>> >> > iterates at most 100 times.
>> >>
>> >> Fixed and added two test cases.
>> >>
>> >> (Note -- one more bug in the original code was found and fixed -- the
>> >> period computation is wrong when step is not power of 2).
>> >
>> > that is wrong, the original computation is correct.  If step is (e.g.) odd,
>> > then it takes (range of type) iterations before the variable achieves the same
>> > value (that it overflows in the meantime several times does not matter, since
>> > we are careful to use the type in that overflow has defined semantics, and
>> > we test for equality in the replacement condition),
>>
>> The overflow semantics is indeed different -- it is also true for any
>> iv cand with non zero base. The period is really LCM (type_range,
>> step)/step - 1 --- the computation in original code matches this --
>> but the comment seems wrong.
>
> yes, the comment needs to be fixed,
>
> Zdenek
>

[-- Attachment #2: ivopts_latest_part1_r5.p --]
[-- Type: application/octet-stream, Size: 24161 bytes --]

Index: testsuite/gcc.dg/tree-ssa/ivopt_1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_1.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+#define TYPE char*
+
+/* Testing that only one induction variable is selected after IVOPT on
+   the given target instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
+       }
+}
+
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_2.c	(revision 0)
@@ -0,0 +1,17 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Testing on the given target, only one iv candidate instead of 3.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_3.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#define TYPE char*
+
+/* Make sure only 1 iv candidate is selected after IVOPT.  */
+void foo (int i_width, char* dst, char* src1, char* src2)
+{
+      int x;
+       for( x = 0; x < i_width; x++ )
+       {
+           *((TYPE)dst) = ( *((TYPE)src1) + *((TYPE)src2) + 1 ) >> 1;
+	   dst+=sizeof(TYPE);
+	   src1+=sizeof(TYPE);
+	   src2+=sizeof(TYPE);
+       }
+} 
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_infer_1.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+int a[400];
+
+/* Testing inferred loop iteration from array -- exit test can be replaced.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+      TYPE dst0 = dst;
+      unsigned long long i = 0;
+       for( ; dst <= dstn; )
+       {
+           dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) >> 1;
+           dst++;
+	   i += 16;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_4.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+/* Make sure only 1 iv candidate is selected.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+       for( ; dst < dstn; )
+       {
+           *dst++ = ( *src1++ + *src2++ + 1 ) >> 1;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <ivtmp" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+#ifndef TYPE
+#define TYPE char*
+#endif
+
+extern int a[];
+
+/* Can not infer loop iteration from array -- exit test can not be replaced.  */
+void foo (int i_width, TYPE dst, TYPE src1, TYPE src2)
+{
+      TYPE dstn= dst + i_width;
+      TYPE dst0 = dst;
+      unsigned long long i = 0;
+       for( ; dst <= dstn; )
+       {
+           dst0[i] = ( src1[i] + src2[i] + 1 +a[i]) >> 1;
+           dst++;
+	   i += 16;
+       }
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be
+ * eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit = p + N1;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (p  <= p_limit)
+    {
+      p++;
+      p2++;
+      if (p2 > p_limit2)
+        break;
+      s += (*p);
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so
+ * two ivs i and p2 can be eliminate.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  int i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+       p++;
+       p2++;
+       i++;
+       if (p2 > p_limit2)
+         break;
+       s += (*p);
+    }
+
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv p2 can be eliminated.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i++;
+      if (p2 > p_limit2)
+        break;
+      s += p[i];
+    }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
+++ testsuite/gcc.dg/tree-ssa/ivopt_mult_4.c	(revision 0)
@@ -0,0 +1,25 @@
+
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */
+
+/* iv i's step 16 so its period is smaller than the max iterations
+ * i.e. replacing if (p2 > p_limit2) with testing of i may result in
+ * overflow.  */
+long foo(long* p, long* p2, int N1, int N2)
+{
+  unsigned long  i = 0;
+  long* p_limit2 = p2 + N2;
+  long s = 0;
+  while (i < N1)
+    {
+      p2++;
+      i += 16;
+      if (p2 > p_limit2)
+        break;
+     s += p[i];
+  }
+  return s;
+}
+
+/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */
Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 159362)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -91,14 +91,26 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline HOST_WIDE_INT
+avg_loop_niter (struct loop *loop)
+{
+  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  if (niter == -1)
+    return AVG_LOOP_NITER (loop);
+
+  return niter;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -513,6 +525,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -706,9 +731,10 @@ contains_abnormal_ssa_name_p (tree expr)
     EXIT of DATA->current_loop, or NULL if something goes wrong.  */
 
 static tree
-niter_for_exit (struct ivopts_data *data, edge exit)
+niter_for_exit (struct ivopts_data *data, edge exit,
+                struct tree_niter_desc **desc_p)
 {
-  struct tree_niter_desc desc;
+  struct tree_niter_desc* desc = NULL;
   tree niter;
   void **slot;
 
@@ -727,19 +753,24 @@ niter_for_exit (struct ivopts_data *data
 	 being zero).  Also, we cannot safely work with ssa names that
 	 appear in phi nodes on abnormal edges, so that we do not create
 	 overlapping life ranges for them (PR 27283).  */
+      desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
-				     exit, &desc, true)
-	  && integer_zerop (desc.may_be_zero)
-     	  && !contains_abnormal_ssa_name_p (desc.niter))
-	niter = desc.niter;
+				     exit, desc, true)
+	  && integer_zerop (desc->may_be_zero)
+     	  && !contains_abnormal_ssa_name_p (desc->niter))
+	niter = desc->niter;
       else
 	niter = NULL_TREE;
 
-      *pointer_map_insert (data->niters, exit) = niter;
+      desc->niter = niter;
+      slot = pointer_map_insert (data->niters, exit);
+      *slot = desc;
     }
   else
-    niter = (tree) *slot;
+    niter = ((struct tree_niter_desc *) *slot)->niter;
 
+  if (desc_p)
+    *desc_p = (struct tree_niter_desc *) *slot;
   return niter;
 }
 
@@ -755,7 +786,7 @@ niter_for_single_dom_exit (struct ivopts
   if (!exit)
     return NULL;
 
-  return niter_for_exit (data, exit);
+  return niter_for_exit (data, exit, NULL);
 }
 
 /* Initializes data structures used by the iv optimization pass, stored
@@ -1822,7 +1853,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2138,7 +2169,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3779,6 +3812,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -3786,6 +3820,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -3799,16 +3834,18 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
+      cost.cost /= avg_loop_niter (data->current_loop);
+      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -3841,7 +3878,7 @@ get_computation_cost_at (struct ivopts_d
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
     cost.cost += add_cost (TYPE_MODE (ctype), speed)
-		 / AVG_LOOP_NITER (data->current_loop);
+		 / avg_loop_niter (data->current_loop);
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
@@ -3911,6 +3948,7 @@ determine_use_iv_cost_generic (struct iv
     }
 
   cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
+
   set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
 
   return !infinite_cost_p (cost);
@@ -3977,15 +4015,20 @@ iv_period (struct iv *iv)
 
   gcc_assert (step && TREE_CODE (step) == INTEGER_CST);
 
-  /* Period of the iv is gcd (step, type range).  Since type range is power
-     of two, it suffices to determine the maximum power of two that divides
-     step.  */
-  pow2div = num_ending_zeros (step);
   type = unsigned_type_for (TREE_TYPE (step));
+  /* Period of the iv is lcm (step, type_range)/step -1,
+     i.e., N*type_range/step - 1. Since type range is power
+     of two, N == (step >> num_of_ending_zeros_binary (step),
+     so the final result is
+
+       (type_range >> num_of_ending_zeros_binary (step)) - 1
+
+  */
+  pow2div = num_ending_zeros (step);
 
   period = build_low_bits_mask (type,
-				(TYPE_PRECISION (type)
-				 - tree_low_cst (pow2div, 1)));
+                                (TYPE_PRECISION (type)
+                                 - tree_low_cst (pow2div, 1)));
 
   return period;
 }
@@ -4019,6 +4062,7 @@ may_eliminate_iv (struct ivopts_data *da
   tree nit, period;
   struct loop *loop = data->current_loop;
   aff_tree bnd;
+  struct tree_niter_desc *desc = NULL;
 
   if (TREE_CODE (cand->iv->step) != INTEGER_CST)
     return false;
@@ -4037,7 +4081,7 @@ may_eliminate_iv (struct ivopts_data *da
   if (flow_bb_inside_loop_p (loop, exit->dest))
     return false;
 
-  nit = niter_for_exit (data, exit);
+  nit = niter_for_exit (data, exit, &desc);
   if (!nit)
     return false;
 
@@ -4049,27 +4093,46 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      if (!tree_int_cst_lt (nit, period))
-	return false;
+      /* See cand_value_at.  */
+      if (stmt_after_increment (loop, cand, use->stmt))
+        {
+          if (!tree_int_cst_lt (nit, period))
+            return false;
+        }
+      else
+        {
+          if (tree_int_cst_lt (period, nit))
+            return false;
+        }
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
-      if (!estimated_loop_iterations (loop, true, &max_niter))
-	return false;
+
+      max_niter = desc->max;
+      if (stmt_after_increment (loop, cand, use->stmt))
+        max_niter = double_int_add (max_niter, double_int_one);
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
-	return false;
+      if (double_int_ucmp (max_niter, period_value) > 0)
+        {
+          /* See if we can take advantage of infered loop bound information.  */
+          if (loop_only_exit_p (loop, exit))
+            {
+              if (!estimated_loop_iterations (loop, true, &max_niter))
+                return false;
+              /* The loop bound is already adjusted by adding 1.  */
+              if (double_int_ucmp (max_niter, period_value) > 0)
+                return false;
+            }
+          else
+            return false;
+        }
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4106,7 +4169,7 @@ determine_use_iv_cost_condition (struct 
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost /= AVG_LOOP_NITER (data->current_loop);
+      elim_cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     elim_cost = infinite_cost;
@@ -4353,7 +4416,7 @@ determine_iv_cost (struct ivopts_data *d
   cost_base = force_var_cost (data, base, NULL);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
-  cost = cost_step + cost_base.cost / AVG_LOOP_NITER (current_loop);
+  cost = cost_step + cost_base.cost / avg_loop_niter (data->current_loop);
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -4541,7 +4604,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4596,7 +4659,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4871,8 +4934,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4880,17 +4956,18 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
    new set, and store differences in DELTA.  Number of induction variables
-   in the new set is stored to N_IVS.  */
+   in the new set is stored to N_IVS. MIN_NCAND is a flag. When it is true
+   the function will try to find a solution with mimimal iv candidates.  */
 
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4914,8 +4991,8 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -5110,7 +5187,8 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5143,7 +5221,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5203,7 +5281,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5330,7 +5408,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5358,8 +5435,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5582,6 +5669,11 @@ rewrite_use_compare (struct ivopts_data 
       tree var_type = TREE_TYPE (var);
       gimple_seq stmts;
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Replacing exit test: ");
+          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+        }
       compare = iv_elimination_compare (data, use);
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
@@ -5683,6 +5775,20 @@ remove_unused_ivs (struct ivopts_data *d
   BITMAP_FREE (toremove);
 }
 
+/* Frees memory occupied by struct tree_niter_desc in *VALUE. Callback
+   for pointer_map_traverse.  */
+
+static
+bool
+free_tree_niter_desc (const void *key ATTRIBUTE_UNUSED, void **value,
+                      void *data ATTRIBUTE_UNUSED)
+{
+  struct tree_niter_desc *const niter = (struct tree_niter_desc *) *value;
+
+  free (niter);
+  return true;
+}
+
 /* Frees data allocated by the optimization of a single loop.  */
 
 static void
@@ -5694,6 +5800,7 @@ free_loop_data (struct ivopts_data *data
 
   if (data->niters)
     {
+      pointer_map_traverse (data->niters, free_tree_niter_desc, NULL);
       pointer_map_destroy (data->niters);
       data->niters = NULL;
     }

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-28  9:57       ` Zdenek Dvorak
@ 2010-06-01 23:13         ` Xinliang David Li
  2010-06-02 20:57           ` Zdenek Dvorak
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-06-01 23:13 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Fri, May 28, 2010 at 2:14 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-4:
>>
>> A simple local optimization that reorders iv update statement with
>> preceding target_mem_ref so that instruction combining can happen in
>> later phases.
>
>> +/* Performs a peephole optimization to reorder the iv update statement with
>> +   a mem ref to enable instruction combining in later phases. The mem ref uses
>> +   the iv value before the update, so the reordering transformation requires
>> +   adjustment of the offset. CAND is the selected IV_CAND.
>> +
>> +   Example:
>> +
>> +   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
>> +   iv2 = iv1 + 1;
>> +
>> +   if (t < val)      (1)
>> +     goto L;
>> +   goto Head;
>> +
>> +
>> +   directly propagating t over to (1) will introduce overlapping live range
>> +   thus increase register pressure. This peephole transform it into:
>> +
>> +
>> +   iv2 = iv1 + 1;
>> +   t = MEM_REF (base, iv2, 8, 8);
>> +   if (t < val)
>> +     goto L;
>> +   goto Head;
>> +*/
>
> looks reasonable.  Just two notes:
> 1) you should check whether the new value of the offset is allowed for the
>   current architecture.

What query should be used? It is not checked in maybe_fold_tmr call
nor tmr creation in ivopts.

> 2) rather than rewriting the resulting code, it might be easier to change the
>   position of the candidate (to IP_BEFORE_USE for the use in the memory
>   reference) before create_new_ivs is run.
>

There is ordering issue -- the target tmr operation for this
optimization is created after address use rewrite. It might be doable
as your suggested, but may be more intrusive.

Thanks,

David

> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-06-01 23:13         ` Xinliang David Li
@ 2010-06-02 20:57           ` Zdenek Dvorak
  2010-06-03  5:39             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-06-02 20:57 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> > 1) you should check whether the new value of the offset is allowed for the
> > Â  current architecture.
> 
> What query should be used? It is not checked in maybe_fold_tmr call
> nor tmr creation in ivopts.

valid_mem_ref_p

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-06-02 20:57           ` Zdenek Dvorak
@ 2010-06-03  5:39             ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-06-03  5:39 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

See new revision -- moved the iv pos adjustment to be just before
address use rewrite -- this eliminates the need to 1) do extra valid
mem_ref check; and 2) to do the mem_ref fix up -- i.e., your second
suggestion.

Retested (together with patch-1).

Regarding patch-3 (sinking) -- neither store sink nor scev_cprop are
good for it -- store sinking pass does not al sinking out of loop
(different loop nest) -- any naive enhancement may result in increased
register pressure. scev_cprop does not handle multiple exit loops.  I
will hold this patch for now.

Any feedback on patch-4?

Thanks,

David

On Wed, Jun 2, 2010 at 1:56 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> > 1) you should check whether the new value of the offset is allowed for the
>> >   current architecture.
>>
>> What query should be used? It is not checked in maybe_fold_tmr call
>> nor tmr creation in ivopts.
>
> valid_mem_ref_p
>
> Zdenek
>

[-- Attachment #2: ivopts_latest_part4.p.r2 --]
[-- Type: application/octet-stream, Size: 2574 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 160058)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -5531,6 +5531,86 @@ copy_ref_info (tree new_ref, tree old_re
     }
 }
 
+/* Performs a peephole optimization to reorder the iv update statement with
+   a mem ref to enable instruction combining in later phases. The mem ref uses
+   the iv value before the update, so the reordering transformation requires
+   adjustment of the offset. CAND is the selected IV_CAND.
+
+   Example:
+
+   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
+   iv2 = iv1 + 1;
+
+   if (t < val)      (1)
+     goto L;
+   goto Head;
+
+
+   directly propagating t over to (1) will introduce overlapping live range
+   thus increase register pressure. This peephole transform it into:
+
+
+   iv2 = iv1 + 1;
+   t = MEM_REF (base, iv2, 8, 8);
+   if (t < val)
+     goto L;
+   goto Head;
+*/
+
+static void
+adjust_iv_update_pos (struct iv_cand *cand, struct iv_use *use)
+{
+  tree var_after;
+  gimple iv_update, stmt;
+  basic_block bb;
+  gimple_stmt_iterator gsi, gsi_iv;
+
+  if (cand->pos != IP_NORMAL)
+    return;
+
+  var_after = cand->var_after;
+  iv_update = SSA_NAME_DEF_STMT (var_after);
+
+  bb = gimple_bb (iv_update);
+  gsi = gsi_last_nondebug_bb (bb);
+  stmt = gsi_stmt (gsi);
+
+  /* Only handle conditional statement for now.  */
+  if (gimple_code (stmt) != GIMPLE_COND)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  stmt = gsi_stmt (gsi);
+  if (stmt != iv_update)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+    return;
+
+  stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+
+  if (stmt != use->stmt)
+    return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Reordering \n");
+      print_gimple_stmt (dump_file, iv_update, 0, 0);
+      print_gimple_stmt (dump_file, use->stmt, 0, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  gsi = gsi_for_stmt (use->stmt);
+  gsi_iv = gsi_for_stmt (iv_update);
+  gsi_move_before (&gsi_iv, &gsi);
+
+  cand->pos = IP_BEFORE_USE;
+  cand->incremented_at = use->stmt;
+}
+
 /* Rewrites USE (address that is an iv) using candidate CAND.  */
 
 static void
@@ -5543,6 +5623,7 @@ rewrite_use_address (struct ivopts_data 
   tree ref;
   bool ok;
 
+  adjust_iv_update_pos (cand, use);
   ok = get_computation_aff (data->current_loop, use, cand, use->stmt, &aff);
   gcc_assert (ok);
   unshare_aff_combination (&aff);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-25  0:17     ` Xinliang David Li
                         ` (3 preceding siblings ...)
  2010-05-28  9:57       ` Zdenek Dvorak
@ 2010-06-05  9:01       ` Zdenek Dvorak
  2010-06-05 22:37         ` Xinliang David Li
  4 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-06-05  9:01 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches

Hi,

> patch-3:
> 
> The new expression for the use expressed in terms of ubase, cbase,
> ratio, and iv_cand may contain loop invariant sub-expressions that may
> be hoisted out of  the loop later. The patch implements a mechanism to
> evaluate the additional register pressure caused by such expressions
> (that can not be constant folded).

what are the benchmark results of this part of the patch, separately from the
rest of the changes?

> The patch also makes sure that variant part of the use expression ( a
> sum of product) gets assigned to the index part of  the target_mem_ref
> first to expose loop invariant code motion.

This looks ok to me.

> +/* Returns true if AFF1 and AFF2 are identical.  */
> +
> +static bool
> +compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
> +{
> +  unsigned i;
> +
> +  if (aff1->n != aff2->n)
> +    return false;
> +
> +  for (i = 0; i < aff1->n; i++)
> +    {
> +      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
> +        return false;
> +
> +      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
> +        return false;
> +    }
> +  return true;
> +}

No particular order is guaranteed for the elements of the affine combination.
So this function will only work if the order is the same in AFF1 and AFF2 by chance.

> +/* Returns true if expression UBASE - RATIO * CBASE requires a new compiler
> +   generated temporary.  */

The comment should explain in more detail what is tested here; e.g., it is not
clear from the currend description why false is returned for SSA_NAME - INTEGER_CST.

> +static bool
> +create_loop_invariant_temp (tree ubase, tree cbase, HOST_WIDE_INT ratio)
> +{

...

> +      if (TREE_CODE (ubase) == ADDR_EXPR
> +        && TREE_CODE (cbase) == ADDR_EXPR)
> +        {
> +          tree usym, csym;
> +
> +          usym = TREE_OPERAND (ubase, 0);
> +          csym = TREE_OPERAND (cbase, 0);
> +          if (TREE_CODE (usym) == ARRAY_REF)
> +            {
> +              tree ind = TREE_OPERAND (usym, 1);
> +              if (TREE_CODE (ind) == INTEGER_CST
> +                  && host_integerp (ind, 0)
> +                  && TREE_INT_CST_LOW (ind) == 0)
> +                usym = TREE_OPERAND (usym, 0);
> +            }
> +          if (TREE_CODE (csym) == ARRAY_REF)
> +            {
> +              tree ind = TREE_OPERAND (csym, 1);
> +              if (TREE_CODE (ind) == INTEGER_CST
> +                  && host_integerp (ind, 0)
> +                  && TREE_INT_CST_LOW (ind) == 0)
> +                csym = TREE_OPERAND (csym, 0);
> +            }
> +          if (usym == csym)
> +            return false;

Trees should not be compared by ==
Anyway, you had some compile time problems here? Or what is the purpose of the above piece of code?

> +/* Moves the loop variant part V in linear address ADDR to be the index
> +   of PARTS.  */
> +
> +static void
> +move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
> +{
> +  unsigned i;
> +  tree val = NULL_TREE;
> +
> +  gcc_assert (!parts->index);
> +  for (i = 0; i < addr->n; i++)
> +    {
> +      val = addr->elts[i].val;
> +      if (val == v)

operand_equal_p

> +	break;
> +    }
> +
> +  if (i == addr->n)
> +    return;
> +
> +  parts->index = fold_convert (sizetype, val);
> +  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
> +  aff_combination_remove_elt (addr, i);
> +}
> +
>  /* Adds ELT to PARTS.  */
>  
>  static void
> @@ -553,7 +578,8 @@ most_expensive_mult_to_index (tree type,
>  
>  /* Splits address ADDR for a memory access of type TYPE into PARTS.
>     If BASE_HINT is non-NULL, it specifies an SSA name to be used
> -   preferentially as base of the reference.
> +   preferentially as base of the reference, and IV_CAND is the selected
> +   iv candidate used in ADDR.

This comment should explain what is IV_CAND used for; the current comment is
meaningless if the function is considered separately.

>  /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
>     computations are emitted in front of GSI.  TYPE is the mode
> -   of created memory reference.  */
> +   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
> +   and IS_CAND_BASE is a flag indidcats if IV_CAND comes from a base address
> +   object.  */
>  
>  tree
>  create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
> -		tree base_hint, bool speed)
> +		tree iv_cand, tree base_hint, bool speed)

The mention of IS_CAND_BASE should be removed from the comment.

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-06-05  9:01       ` Zdenek Dvorak
@ 2010-06-05 22:37         ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-06-05 22:37 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches

On Sat, Jun 5, 2010 at 2:01 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-3:
>>
>> The new expression for the use expressed in terms of ubase, cbase,
>> ratio, and iv_cand may contain loop invariant sub-expressions that may
>> be hoisted out of  the loop later. The patch implements a mechanism to
>> evaluate the additional register pressure caused by such expressions
>> (that can not be constant folded).
>
> what are the benchmark results of this part of the patch, separately from the
> rest of the changes?

This is really important for any programs with hot loop nests where
inner loops are completely unrolled due to const trip count -- lots of
loop invariants are exposed. There is a case (an image similarity
comparison) where the patch can yield around 30% difference in
performance.


David

>
>> The patch also makes sure that variant part of the use expression ( a
>> sum of product) gets assigned to the index part of  the target_mem_ref
>> first to expose loop invariant code motion.
>
> This looks ok to me.
>
>> +/* Returns true if AFF1 and AFF2 are identical.  */
>> +
>> +static bool
>> +compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
>> +{
>> +  unsigned i;
>> +
>> +  if (aff1->n != aff2->n)
>> +    return false;
>> +
>> +  for (i = 0; i < aff1->n; i++)
>> +    {
>> +      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
>> +        return false;
>> +
>> +      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
>> +        return false;
>> +    }
>> +  return true;
>> +}
>
> No particular order is guaranteed for the elements of the affine combination.
> So this function will only work if the order is the same in AFF1 and AFF2 by chance.
>
>> +/* Returns true if expression UBASE - RATIO * CBASE requires a new compiler
>> +   generated temporary.  */
>
> The comment should explain in more detail what is tested here; e.g., it is not
> clear from the currend description why false is returned for SSA_NAME - INTEGER_CST.
>
>> +static bool
>> +create_loop_invariant_temp (tree ubase, tree cbase, HOST_WIDE_INT ratio)
>> +{
>
> ...
>
>> +      if (TREE_CODE (ubase) == ADDR_EXPR
>> +        && TREE_CODE (cbase) == ADDR_EXPR)
>> +        {
>> +          tree usym, csym;
>> +
>> +          usym = TREE_OPERAND (ubase, 0);
>> +          csym = TREE_OPERAND (cbase, 0);
>> +          if (TREE_CODE (usym) == ARRAY_REF)
>> +            {
>> +              tree ind = TREE_OPERAND (usym, 1);
>> +              if (TREE_CODE (ind) == INTEGER_CST
>> +                  && host_integerp (ind, 0)
>> +                  && TREE_INT_CST_LOW (ind) == 0)
>> +                usym = TREE_OPERAND (usym, 0);
>> +            }
>> +          if (TREE_CODE (csym) == ARRAY_REF)
>> +            {
>> +              tree ind = TREE_OPERAND (csym, 1);
>> +              if (TREE_CODE (ind) == INTEGER_CST
>> +                  && host_integerp (ind, 0)
>> +                  && TREE_INT_CST_LOW (ind) == 0)
>> +                csym = TREE_OPERAND (csym, 0);
>> +            }
>> +          if (usym == csym)
>> +            return false;
>
> Trees should not be compared by ==
> Anyway, you had some compile time problems here? Or what is the purpose of the above piece of code?
>
>> +/* Moves the loop variant part V in linear address ADDR to be the index
>> +   of PARTS.  */
>> +
>> +static void
>> +move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
>> +{
>> +  unsigned i;
>> +  tree val = NULL_TREE;
>> +
>> +  gcc_assert (!parts->index);
>> +  for (i = 0; i < addr->n; i++)
>> +    {
>> +      val = addr->elts[i].val;
>> +      if (val == v)
>
> operand_equal_p
>
>> +     break;
>> +    }
>> +
>> +  if (i == addr->n)
>> +    return;
>> +
>> +  parts->index = fold_convert (sizetype, val);
>> +  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
>> +  aff_combination_remove_elt (addr, i);
>> +}
>> +
>>  /* Adds ELT to PARTS.  */
>>
>>  static void
>> @@ -553,7 +578,8 @@ most_expensive_mult_to_index (tree type,
>>
>>  /* Splits address ADDR for a memory access of type TYPE into PARTS.
>>     If BASE_HINT is non-NULL, it specifies an SSA name to be used
>> -   preferentially as base of the reference.
>> +   preferentially as base of the reference, and IV_CAND is the selected
>> +   iv candidate used in ADDR.
>
> This comment should explain what is IV_CAND used for; the current comment is
> meaningless if the function is considered separately.
>
>>  /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
>>     computations are emitted in front of GSI.  TYPE is the mode
>> -   of created memory reference.  */
>> +   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
>> +   and IS_CAND_BASE is a flag indidcats if IV_CAND comes from a base address
>> +   object.  */
>>
>>  tree
>>  create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
>> -             tree base_hint, bool speed)
>> +             tree iv_cand, tree base_hint, bool speed)
>
> The mention of IS_CAND_BASE should be removed from the comment.
>
> Zdenek
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
       [not found]                                   ` <20100604105451.GB5105@kam.mff.cuni.cz>
@ 2010-07-21  7:27                                     ` Xinliang David Li
  2010-07-26 16:33                                       ` Sebastian Pop
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-21  7:27 UTC (permalink / raw)
  To: GCC Patches; +Cc: Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 3241 bytes --]

Sorry for the delay on this patch.

I rewrote the patch-3 (handling of pseudo invariants). The new
implementation uses the cost_pair to store the invariant id, and it
also tracks common invariants (that can be CSEed) such that the
register pressure increase is not over counted. There are also more
tunings in heuristics to determine if an invariant expression can be
created (mainly driven by spec performance and bug fixes such as
fixing regressions in hmmer, sixtrack and tonto).

The patch has gone through lots of performance testing with spec06 and
spec2k. I have fixed many performance regressions but some regressions
still exist possibly hit by some uArch related issues (see below).

The perf measurement was done on my Intel core-2 box with option -O2
-ffast-math -mfpmath=sse

1. SPEC06

m32
---------

bwaves:  +14.7%
calculiux: +12.8%
wrf        :  +5.7%
GemsFDTD: +3.8%
cactusADM:  +3.6%
leslie3d     :    +3.0%
povray      :    +1.2%
zeusmp:       +1.8%
xalancbmk:  +1%
mcf:            +5.3%

a) I also verified that large improvements from bwaves and calculix on
opteron box -- they are reproducible
b) There are more rooms that I did not persue further -- for instance,
in the process of perf regression fixing, I noticed the speed up of
cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.

m64
-------
calculix:    +8.1%
bwaves :   +2.1%
povray :     +1.1%
wrf      :      +1.4%
gromacs:    +1.0%
xalanbmk:   +1.2%
h264ref:      +1.4%

SPEC06 degradations:

gamess:   -6% (32bit and 64bit)
bzip2:    -3% (32bit only)

Investigation of gamess degradation shows that the performance
difference comes from the difference of IVOPT on the inner most loop
(in a 3-deep loop nest) in function twotff_.    With the IVOPT patch,
the inner loop has only 3 ivs and is tighter compared with loop
without the patch, in which 6 ivs are generated.    Profile data shows
that the number of instructions retired got reduced a lot with the
IVOPT patch while the unhalted CPU cyclecs increased on core-2.
However, when running the program on an opteron box, the patched
version is actually ~5% faster.


2. SPEC2k

m32
------

perlbmk:    +7.8%
bzip2:        +1.4%
mgrid:        +2.7%
mesa:        +2.2%
facerec:      +2.5%
apsi:           +2.7%
gap:            +2.0%

m64
------
gzip :     2.0%
perlbmk:  2.1%
wupwise: 8.0%
mgrid:     2.6%
applu:     2.5%

Degredations:

applu: -2.5% (m32)
mesa: -2.3% (m64)

Ok to checkin the patch with the above performance impact ? (I may
find time to look at the regressions later after the checkin).

Thanks,

David


On Fri, Jun 4, 2010 at 3:54 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> patch-1 ok for this revision?
>
> yes, modulo the standard formalities (missing changelog, information about
> testing).  Also, for the final submission, please split off the trivial
> changes (formatting, comments, new debug dumps, ...) to a separate patch.  Furthermore,
> the avg. # of iterations part and the iv. elimination changes should be
> separate patches (this will make it easier to find the source of the problems,
> should any arise later),
>
> Zdenek
>

[-- Attachment #2: ivopts_latest7.p --]
[-- Type: application/octet-stream, Size: 40695 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162195)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -89,6 +89,7 @@ along with GCC; see the file COPYING3.  
 #include "langhooks.h"
 #include "tree-affine.h"
 #include "target.h"
+#include "tree-inline.h"
 #include "tree-ssa-propagate.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
@@ -99,10 +100,21 @@ along with GCC; see the file COPYING3.  
 /* The infinite cost.  */
 #define INFTY 10000000
 
-/* The expected number of loop iterations.  TODO -- use profiling instead of
-   this.  */
 #define AVG_LOOP_NITER(LOOP) 5
 
+/* Returns the expected number of loop iterations for LOOP.
+   The average trip count is computed from profile data if it
+   exists. */
+
+static inline HOST_WIDE_INT
+avg_loop_niter (struct loop *loop)
+{
+  HOST_WIDE_INT niter = estimated_loop_iterations_int (loop, false);
+  if (niter == -1)
+    return AVG_LOOP_NITER (loop);
+
+  return niter;
+}
 
 /* Representation of the induction variable.  */
 struct iv
@@ -158,6 +170,7 @@ struct cost_pair
   tree value;		/* For final value elimination, the expression for
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
+  int inv_expr_id;      /* Loop invariant expression id.  */
 };
 
 /* Use.  */
@@ -212,6 +225,14 @@ struct iv_cand
 			   biv.  */
 };
 
+/* Loop invariant expression hashtable entry.  */
+struct iv_inv_expr_ent
+{
+  tree expr;
+  int id;
+  hashval_t hash;
+};
+
 /* The data used by the induction variable optimizations.  */
 
 typedef struct iv_use *iv_use_p;
@@ -222,6 +243,11 @@ typedef struct iv_cand *iv_cand_p;
 DEF_VEC_P(iv_cand_p);
 DEF_VEC_ALLOC_P(iv_cand_p,heap);
 
+typedef struct version_info *version_info_p;
+DEF_VEC_P(version_info_p);
+DEF_VEC_ALLOC_P(version_info_p,heap);
+
+
 struct ivopts_data
 {
   /* The currently optimized loop.  */
@@ -239,6 +265,13 @@ struct ivopts_data
   /* The array of information for the ssa names.  */
   struct version_info *version_info;
 
+  /* The hashtable of loop invariant expressions created
+     by ivopt.  */
+  htab_t inv_expr_tab;
+
+  /* Loop invariant expression id.  */
+  int inv_expr_id;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -520,6 +553,19 @@ dump_cand (FILE *file, struct iv_cand *c
       return;
     }
 
+  if (cand->var_before)
+    {
+      fprintf (file, "  var_before ");
+      print_generic_expr (file, cand->var_before, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+  if (cand->var_after)
+    {
+      fprintf (file, "  var_after ");
+      print_generic_expr (file, cand->var_after, TDF_SLIM);
+      fprintf (file, "\n");
+    }
+
   switch (cand->pos)
     {
     case IP_NORMAL:
@@ -718,9 +764,10 @@ contains_abnormal_ssa_name_p (tree expr)
     EXIT of DATA->current_loop, or NULL if something goes wrong.  */
 
 static tree
-niter_for_exit (struct ivopts_data *data, edge exit)
+niter_for_exit (struct ivopts_data *data, edge exit,
+                struct tree_niter_desc **desc_p)
 {
-  struct tree_niter_desc desc;
+  struct tree_niter_desc* desc = NULL;
   tree niter;
   void **slot;
 
@@ -739,19 +786,24 @@ niter_for_exit (struct ivopts_data *data
 	 being zero).  Also, we cannot safely work with ssa names that
 	 appear in phi nodes on abnormal edges, so that we do not create
 	 overlapping life ranges for them (PR 27283).  */
+      desc = XNEW (struct tree_niter_desc);
       if (number_of_iterations_exit (data->current_loop,
-				     exit, &desc, true)
-	  && integer_zerop (desc.may_be_zero)
-     	  && !contains_abnormal_ssa_name_p (desc.niter))
-	niter = desc.niter;
+				     exit, desc, true)
+	  && integer_zerop (desc->may_be_zero)
+     	  && !contains_abnormal_ssa_name_p (desc->niter))
+	niter = desc->niter;
       else
 	niter = NULL_TREE;
 
-      *pointer_map_insert (data->niters, exit) = niter;
+      desc->niter = niter;
+      slot = pointer_map_insert (data->niters, exit);
+      *slot = desc;
     }
   else
-    niter = (tree) *slot;
+    niter = ((struct tree_niter_desc *) *slot)->niter;
 
+  if (desc_p)
+    *desc_p = (struct tree_niter_desc *) *slot;
   return niter;
 }
 
@@ -767,7 +819,30 @@ niter_for_single_dom_exit (struct ivopts
   if (!exit)
     return NULL;
 
-  return niter_for_exit (data, exit);
+  return niter_for_exit (data, exit, NULL);
+}
+
+/* Hash table equality function for expressions.  */
+
+static int
+htab_inv_expr_eq (const void *ent1, const void *ent2)
+{
+  const struct iv_inv_expr_ent *expr1 =
+      (const struct iv_inv_expr_ent *)ent1;
+  const struct iv_inv_expr_ent *expr2 =
+      (const struct iv_inv_expr_ent *)ent2;
+
+  return operand_equal_p (expr1->expr, expr2->expr, 0);
+}
+
+/* Hash function for loop invariant expressions.  */
+
+static hashval_t
+htab_inv_expr_hash (const void *ent)
+{
+  const struct iv_inv_expr_ent *expr =
+      (const struct iv_inv_expr_ent *)ent;
+  return expr->hash;
 }
 
 /* Initializes data structures used by the iv optimization pass, stored
@@ -784,6 +859,9 @@ tree_ssa_iv_optimize_init (struct ivopts
   data->niters = NULL;
   data->iv_uses = VEC_alloc (iv_use_p, heap, 20);
   data->iv_candidates = VEC_alloc (iv_cand_p, heap, 20);
+  data->inv_expr_tab = htab_create (10, htab_inv_expr_hash,
+                                    htab_inv_expr_eq, free);
+  data->inv_expr_id = 0;
   decl_rtl_to_reset = VEC_alloc (tree, heap, 20);
 }
 
@@ -1834,7 +1912,7 @@ find_interesting_uses_outside (struct iv
       phi = gsi_stmt (psi);
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (is_gimple_reg (def))
-	find_interesting_uses_op (data, def);
+        find_interesting_uses_op (data, def);
     }
 }
 
@@ -2151,7 +2229,9 @@ add_candidate_1 (struct ivopts_data *dat
 	continue;
 
       if (operand_equal_p (base, cand->iv->base, 0)
-	  && operand_equal_p (step, cand->iv->step, 0))
+	  && operand_equal_p (step, cand->iv->step, 0)
+          && (TYPE_PRECISION (TREE_TYPE (base))
+              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -2565,7 +2645,8 @@ infinite_cost_p (comp_cost cost)
 static void
 set_use_iv_cost (struct ivopts_data *data,
 		 struct iv_use *use, struct iv_cand *cand,
-		 comp_cost cost, bitmap depends_on, tree value)
+		 comp_cost cost, bitmap depends_on, tree value,
+                 int inv_expr_id)
 {
   unsigned i, s;
 
@@ -2581,6 +2662,7 @@ set_use_iv_cost (struct ivopts_data *dat
       use->cost_map[cand->id].cost = cost;
       use->cost_map[cand->id].depends_on = depends_on;
       use->cost_map[cand->id].value = value;
+      use->cost_map[cand->id].inv_expr_id = inv_expr_id;
       return;
     }
 
@@ -2600,6 +2682,7 @@ found:
   use->cost_map[i].cost = cost;
   use->cost_map[i].depends_on = depends_on;
   use->cost_map[i].value = value;
+  use->cost_map[i].inv_expr_id = inv_expr_id;
 }
 
 /* Gets cost of (USE, CANDIDATE) pair.  */
@@ -2950,7 +3033,7 @@ adjust_setup_cost (struct ivopts_data *d
   if (cost == INFTY)
     return cost;
   else if (optimize_loop_for_speed_p (data->current_loop))
-    return cost / AVG_LOOP_NITER (data->current_loop);
+    return cost / avg_loop_niter (data->current_loop);
   else
     return cost;
 }
@@ -3165,7 +3248,7 @@ get_address_cost (bool symbol_present, b
       HOST_WIDE_INT i;
       HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3174,8 +3257,10 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 2)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 2;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i <= 1ll << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3184,7 +3269,7 @@ get_address_cost (bool symbol_present, b
       data->max_offset = i == start ? 0 : i >> 1;
       off = data->max_offset;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i <= 1ll << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3195,12 +3280,12 @@ get_address_cost (bool symbol_present, b
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;
@@ -3711,6 +3796,144 @@ difference_cost (struct ivopts_data *dat
   return force_var_cost (data, aff_combination_to_tree (&aff_e1), depends_on);
 }
 
+/* Returns true if AFF1 and AFF2 are identical.  */
+
+static bool
+compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
+{
+  unsigned i;
+
+  if (aff1->n != aff2->n)
+    return false;
+
+  for (i = 0; i < aff1->n; i++)
+    {
+      if (double_int_cmp (aff1->elts[i].coef, aff2->elts[i].coef, 0) != 0)
+        return false;
+
+      if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
+        return false;
+    }
+  return true;
+}
+
+/* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
+   requires a new compiler generated temporary.  Returns -1 otherwise.
+   ADDRESS_P is a flag indicating if the expression is for address
+   computation.  */
+
+static int
+get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
+                            tree cbase, HOST_WIDE_INT ratio,
+                            bool address_p)
+{
+  aff_tree ubase_aff, cbase_aff;
+  tree expr, ub, cb;
+  struct iv_inv_expr_ent ent;
+  struct iv_inv_expr_ent **slot;
+
+  STRIP_NOPS (ubase);
+  STRIP_NOPS (cbase);
+  ub = ubase;
+  cb = cbase;
+
+  if ((TREE_CODE (ubase) == INTEGER_CST)
+      && (TREE_CODE (cbase) == INTEGER_CST))
+    return -1;
+
+  /* Strips the constant part. */
+  if (TREE_CODE (ubase) == PLUS_EXPR
+      || TREE_CODE (ubase) == MINUS_EXPR
+      || TREE_CODE (ubase) == POINTER_PLUS_EXPR)
+    {
+      if (TREE_CODE (TREE_OPERAND (ubase, 1)) == INTEGER_CST)
+        ubase = TREE_OPERAND (ubase, 0);
+    }
+
+  /* Strips the constant part. */
+  if (TREE_CODE (cbase) == PLUS_EXPR
+      || TREE_CODE (cbase) == MINUS_EXPR
+      || TREE_CODE (cbase) == POINTER_PLUS_EXPR)
+    {
+      if (TREE_CODE (TREE_OPERAND (cbase, 1)) == INTEGER_CST)
+        cbase = TREE_OPERAND (cbase, 0);
+    }
+
+  if (address_p)
+    {
+      if (((TREE_CODE (ubase) == SSA_NAME)
+           || (TREE_CODE (ubase) == ADDR_EXPR
+               && is_gimple_min_invariant (ubase)))
+          && (TREE_CODE (cbase) == INTEGER_CST))
+        return -1;
+
+      if (((TREE_CODE (cbase) == SSA_NAME)
+           || (TREE_CODE (cbase) == ADDR_EXPR
+               && is_gimple_min_invariant (cbase)))
+          && (TREE_CODE (ubase) == INTEGER_CST))
+        return -1;
+    }
+
+  if (ratio == 1)
+    {
+      if(operand_equal_p (ubase, cbase, 0))
+        return -1;
+
+      if (TREE_CODE (ubase) == ADDR_EXPR
+          && TREE_CODE (cbase) == ADDR_EXPR)
+        {
+          tree usym, csym;
+
+          usym = TREE_OPERAND (ubase, 0);
+          csym = TREE_OPERAND (cbase, 0);
+          if (TREE_CODE (usym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (usym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                usym = TREE_OPERAND (usym, 0);
+            }
+          if (TREE_CODE (csym) == ARRAY_REF)
+            {
+              tree ind = TREE_OPERAND (csym, 1);
+              if (TREE_CODE (ind) == INTEGER_CST
+                  && host_integerp (ind, 0)
+                  && TREE_INT_CST_LOW (ind) == 0)
+                csym = TREE_OPERAND (csym, 0);
+            }
+          if (operand_equal_p (usym, csym, 0))
+            return -1;
+        }
+      /* Now do more complex comparison  */
+      tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
+      tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
+      if (compare_aff_trees (&ubase_aff, &cbase_aff))
+        return -1;
+    }
+
+  tree_to_aff_combination (ub, TREE_TYPE (ub), &ubase_aff);
+  tree_to_aff_combination (cb, TREE_TYPE (cb), &cbase_aff);
+
+  aff_combination_scale (&cbase_aff, shwi_to_double_int (-1 * ratio));
+  aff_combination_add (&ubase_aff, &cbase_aff);
+  expr = aff_combination_to_tree (&ubase_aff);
+  ent.expr = expr;
+  ent.hash = iterative_hash_expr (expr, 0);
+  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
+                                                     &ent, INSERT);
+  if (*slot)
+    return (*slot)->id;
+
+  *slot = XNEW (struct iv_inv_expr_ent);
+  (*slot)->expr = expr;
+  (*slot)->hash = ent.hash;
+  (*slot)->id = data->inv_expr_id++;
+  return  (*slot)->id;
+}
+
+
+
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
    to create an address from it, otherwise we want to get it into
@@ -3723,7 +3946,8 @@ static comp_cost
 get_computation_cost_at (struct ivopts_data *data,
 			 struct iv_use *use, struct iv_cand *cand,
 			 bool address_p, bitmap *depends_on, gimple at,
-			 bool *can_autoinc)
+			 bool *can_autoinc,
+                         int *inv_expr_id)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase, cstep;
@@ -3806,6 +4030,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -3813,6 +4038,7 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -3826,16 +4052,27 @@ get_computation_cost_at (struct ivopts_d
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
+      cost.cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
       cost = add_costs (cost,
 			difference_cost (data,
 					 ubase, build_int_cst (utype, 0),
 					 &symbol_present, &var_present,
 					 &offset, depends_on));
+      cost.cost /= avg_loop_niter (data->current_loop);
+      cost.cost += add_cost (TYPE_MODE (ctype), data->speed);
+    }
+
+  if (inv_expr_id)
+    {
+      *inv_expr_id =
+          get_loop_invariant_expr_id (data, ubase, cbase, ratio, address_p);
+      /* Clear depends on.  */
+      if (*inv_expr_id != -1 && depends_on && *depends_on)
+        bitmap_clear (*depends_on);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -3910,11 +4147,12 @@ fallback:
 static comp_cost
 get_computation_cost (struct ivopts_data *data,
 		      struct iv_use *use, struct iv_cand *cand,
-		      bool address_p, bitmap *depends_on, bool *can_autoinc)
+		      bool address_p, bitmap *depends_on,
+                      bool *can_autoinc, int *inv_expr_id)
 {
   return get_computation_cost_at (data,
 				  use, cand, address_p, depends_on, use->stmt,
-				  can_autoinc);
+				  can_autoinc, inv_expr_id);
 }
 
 /* Determines cost of basing replacement of USE on CAND in a generic
@@ -3926,6 +4164,7 @@ determine_use_iv_cost_generic (struct iv
 {
   bitmap depends_on;
   comp_cost cost;
+  int inv_expr_id = -1;
 
   /* The simple case first -- if we need to express value of the preserved
      original biv, the cost is 0.  This also prevents us from counting the
@@ -3934,12 +4173,15 @@ determine_use_iv_cost_generic (struct iv
   if (cand->pos == IP_ORIGINAL
       && cand->incremented_at == use->stmt)
     {
-      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE);
+      set_use_iv_cost (data, use, cand, zero_cost, NULL, NULL_TREE, -1);
       return true;
     }
 
-  cost = get_computation_cost (data, use, cand, false, &depends_on, NULL);
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
+  cost = get_computation_cost (data, use, cand, false, &depends_on,
+                               NULL, &inv_expr_id);
+
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+                   inv_expr_id);
 
   return !infinite_cost_p (cost);
 }
@@ -3952,8 +4194,9 @@ determine_use_iv_cost_address (struct iv
 {
   bitmap depends_on;
   bool can_autoinc;
+  int inv_expr_id = -1;
   comp_cost cost = get_computation_cost (data, use, cand, true, &depends_on,
-					 &can_autoinc);
+					 &can_autoinc, &inv_expr_id);
 
   if (cand->ainc_use == use)
     {
@@ -3965,7 +4208,8 @@ determine_use_iv_cost_address (struct iv
       else if (cand->pos == IP_AFTER_USE || cand->pos == IP_BEFORE_USE)
 	cost = infinite_cost;
     }
-  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE);
+  set_use_iv_cost (data, use, cand, cost, depends_on, NULL_TREE,
+                   inv_expr_id);
 
   return !infinite_cost_p (cost);
 }
@@ -4005,15 +4249,20 @@ iv_period (struct iv *iv)
 
   gcc_assert (step && TREE_CODE (step) == INTEGER_CST);
 
-  /* Period of the iv is gcd (step, type range).  Since type range is power
-     of two, it suffices to determine the maximum power of two that divides
-     step.  */
-  pow2div = num_ending_zeros (step);
   type = unsigned_type_for (TREE_TYPE (step));
+  /* Period of the iv is lcm (step, type_range)/step -1,
+     i.e., N*type_range/step - 1. Since type range is power
+     of two, N == (step >> num_of_ending_zeros_binary (step),
+     so the final result is
+
+       (type_range >> num_of_ending_zeros_binary (step)) - 1
+
+  */
+  pow2div = num_ending_zeros (step);
 
   period = build_low_bits_mask (type,
-				(TYPE_PRECISION (type)
-				 - tree_low_cst (pow2div, 1)));
+                                (TYPE_PRECISION (type)
+                                 - tree_low_cst (pow2div, 1)));
 
   return period;
 }
@@ -4047,6 +4296,7 @@ may_eliminate_iv (struct ivopts_data *da
   tree nit, period;
   struct loop *loop = data->current_loop;
   aff_tree bnd;
+  struct tree_niter_desc *desc = NULL;
 
   if (TREE_CODE (cand->iv->step) != INTEGER_CST)
     return false;
@@ -4065,7 +4315,7 @@ may_eliminate_iv (struct ivopts_data *da
   if (flow_bb_inside_loop_p (loop, exit->dest))
     return false;
 
-  nit = niter_for_exit (data, exit);
+  nit = niter_for_exit (data, exit, &desc);
   if (!nit)
     return false;
 
@@ -4077,27 +4327,46 @@ may_eliminate_iv (struct ivopts_data *da
   /* If the number of iterations is constant, compare against it directly.  */
   if (TREE_CODE (nit) == INTEGER_CST)
     {
-      if (!tree_int_cst_lt (nit, period))
-	return false;
+      /* See cand_value_at.  */
+      if (stmt_after_increment (loop, cand, use->stmt))
+        {
+          if (!tree_int_cst_lt (nit, period))
+            return false;
+        }
+      else
+        {
+          if (tree_int_cst_lt (period, nit))
+            return false;
+        }
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
      we can get a conservative estimate on the number of iterations of the
      entire loop and compare against that instead.  */
-  else if (loop_only_exit_p (loop, exit))
+  else
     {
       double_int period_value, max_niter;
-      if (!estimated_loop_iterations (loop, true, &max_niter))
-	return false;
+
+      max_niter = desc->max;
+      if (stmt_after_increment (loop, cand, use->stmt))
+        max_niter = double_int_add (max_niter, double_int_one);
       period_value = tree_to_double_int (period);
-      if (double_int_ucmp (max_niter, period_value) >= 0)
-	return false;
+      if (double_int_ucmp (max_niter, period_value) > 0)
+        {
+          /* See if we can take advantage of infered loop bound information.  */
+          if (loop_only_exit_p (loop, exit))
+            {
+              if (!estimated_loop_iterations (loop, true, &max_niter))
+                return false;
+              /* The loop bound is already adjusted by adding 1.  */
+              if (double_int_ucmp (max_niter, period_value) > 0)
+                return false;
+            }
+          else
+            return false;
+        }
     }
 
-  /* Otherwise, punt.  */
-  else
-    return false;
-
   cand_value_at (loop, cand, use->stmt, nit, &bnd);
 
   *bound = aff_combination_to_tree (&bnd);
@@ -4119,12 +4388,13 @@ determine_use_iv_cost_condition (struct 
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
   comp_cost elim_cost, express_cost, cost;
   bool ok;
+  int inv_expr_id = -1;
   tree *control_var, *bound_cst;
 
   /* Only consider real candidates.  */
   if (!cand->iv)
     {
-      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE);
+      set_use_iv_cost (data, use, cand, infinite_cost, NULL, NULL_TREE, -1);
       return false;
     }
 
@@ -4158,7 +4428,8 @@ determine_use_iv_cost_condition (struct 
     elim_cost.cost -= 1;
 
   express_cost = get_computation_cost (data, use, cand, false,
-				       &depends_on_express, NULL);
+				       &depends_on_express, NULL,
+                                       &inv_expr_id);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
@@ -4177,7 +4448,7 @@ determine_use_iv_cost_condition (struct 
       bound = NULL_TREE;
     }
 
-  set_use_iv_cost (data, use, cand, cost, depends_on, bound);
+  set_use_iv_cost (data, use, cand, cost, depends_on, bound, inv_expr_id);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -4225,7 +4496,7 @@ autoinc_possible_for_pair (struct ivopts
     return false;
 
   cost = get_computation_cost (data, use, cand, true, &depends_on,
-			       &can_autoinc);
+			       &can_autoinc, NULL);
 
   BITMAP_FREE (depends_on);
 
@@ -4349,6 +4620,8 @@ determine_use_iv_costs (struct ivopts_da
 	      if (use->cost_map[j].depends_on)
 		bitmap_print (dump_file,
 			      use->cost_map[j].depends_on, "","");
+              if (use->cost_map[j].inv_expr_id != -1)
+                fprintf (dump_file, " inv_expr:%d", use->cost_map[j].inv_expr_id);
 	      fprintf (dump_file, "\n");
 	    }
 
@@ -4524,14 +4797,54 @@ cheaper_cost_pair (struct cost_pair *a, 
   return false;
 }
 
+
+/* Returns candidate by that USE is expressed in IVS.  */
+
+static struct cost_pair *
+iv_ca_cand_for_use (struct iv_ca *ivs, struct iv_use *use)
+{
+  return ivs->cand_for_use[use->id];
+}
+
+
+/* Returns the number of temps needed for new loop invariant
+   expressions.  */
+
+static int
+iv_ca_get_num_inv_exprs (struct ivopts_data *data, struct iv_ca *ivs)
+{
+  unsigned i, n = 0;
+  unsigned *used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
+
+  for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp && cp->inv_expr_id != -1)
+        {
+          used_inv_expr[cp->inv_expr_id]++;
+          if (used_inv_expr[cp->inv_expr_id] == 1)
+            n++;
+        }
+    }
+
+  free (used_inv_expr);
+  return n;
+}
+
 /* Computes the cost field of IVS structure.  */
 
 static void
 iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
 {
+  unsigned n_inv_exprs = 0;
   comp_cost cost = ivs->cand_use_cost;
+
   cost.cost += ivs->cand_cost;
-  cost.cost += ivopts_global_cost_for_size (data, ivs->n_regs);
+
+  n_inv_exprs = iv_ca_get_num_inv_exprs (data, ivs);
+  cost.cost += ivopts_global_cost_for_size (data,
+                                            ivs->n_regs + n_inv_exprs);
 
   ivs->cost = cost;
 }
@@ -4551,7 +4864,7 @@ iv_ca_set_remove_invariants (struct iv_c
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-	ivs->n_regs--;
+        ivs->n_regs--;
     }
 }
 
@@ -4606,7 +4919,7 @@ iv_ca_set_add_invariants (struct iv_ca *
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-	ivs->n_regs++;
+        ivs->n_regs++;
     }
 }
 
@@ -4650,14 +4963,16 @@ iv_ca_set_cp (struct ivopts_data *data, 
 }
 
 /* Extend set IVS by expressing USE by some of the candidates in it
-   if possible.  */
+   if possible. All important candidates will be considered
+   if IMPORTANT_CANDIDATES is true w  */
 
 static void
 iv_ca_add_use (struct ivopts_data *data, struct iv_ca *ivs,
-	       struct iv_use *use)
+	       struct iv_use *use, bool important_candidates)
 {
   struct cost_pair *best_cp = NULL, *cp;
   bitmap_iterator bi;
+  bitmap cands;
   unsigned i;
 
   gcc_assert (ivs->upto >= use->id);
@@ -4668,9 +4983,12 @@ iv_ca_add_use (struct ivopts_data *data,
       ivs->bad_uses++;
     }
 
-  EXECUTE_IF_SET_IN_BITMAP (ivs->cands, 0, i, bi)
+  cands = (important_candidates ? data->important_candidates : ivs->cands);
+  EXECUTE_IF_SET_IN_BITMAP (cands, 0, i, bi)
     {
-      cp = get_use_iv_cost (data, use, iv_cand (data, i));
+      struct iv_cand *cand = iv_cand (data, i);
+
+      cp = get_use_iv_cost (data, use, cand);
 
       if (cheaper_cost_pair (cp, best_cp))
 	best_cp = cp;
@@ -4750,14 +5068,6 @@ iv_ca_delta_join (struct iv_ca_delta *l1
   return l1;
 }
 
-/* Returns candidate by that USE is expressed in IVS.  */
-
-static struct cost_pair *
-iv_ca_cand_for_use (struct iv_ca *ivs, struct iv_use *use)
-{
-  return ivs->cand_for_use[use->id];
-}
-
 /* Reverse the list of changes DELTA, forming the inverse to it.  */
 
 static struct iv_ca_delta *
@@ -4881,8 +5191,21 @@ iv_ca_dump (struct ivopts_data *data, FI
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost %d (complexity %d)\n", cost.cost, cost.complexity);
-  bitmap_print (file, ivs->cands, "  candidates ","\n");
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cand_cost: %d\n  cand_use_cost: %d (complexity %d)\n",
+           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+  bitmap_print (file, ivs->cands, "  candidates: ","\n");
+
+   for (i = 0; i < ivs->upto; i++)
+    {
+      struct iv_use *use = iv_use (data, i);
+      struct cost_pair *cp = iv_ca_cand_for_use (ivs, use);
+      if (cp)
+        fprintf (file, "   use:%d --> iv_cand:%d, cost=(%d,%d)\n",
+                 use->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+      else
+        fprintf (file, "   use:%d --> ??\n", use->id);
+    }
 
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
@@ -4890,17 +5213,18 @@ iv_ca_dump (struct ivopts_data *data, FI
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
-  fprintf (file, "\n");
+  fprintf (file, "\n\n");
 }
 
 /* Try changing candidate in IVS to CAND for each use.  Return cost of the
    new set, and store differences in DELTA.  Number of induction variables
-   in the new set is stored to N_IVS.  */
+   in the new set is stored to N_IVS. MIN_NCAND is a flag. When it is true
+   the function will try to find a solution with mimimal iv candidates.  */
 
 static comp_cost
 iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	      struct iv_cand *cand, struct iv_ca_delta **delta,
-	      unsigned *n_ivs)
+	      unsigned *n_ivs, bool min_ncand)
 {
   unsigned i;
   comp_cost cost;
@@ -4921,11 +5245,11 @@ iv_ca_extend (struct ivopts_data *data, 
       if (!new_cp)
 	continue;
 
-      if (!iv_ca_has_deps (ivs, new_cp))
+      if (!min_ncand && !iv_ca_has_deps (ivs, new_cp))
 	continue;
 
-      if (!cheaper_cost_pair (new_cp, old_cp))
-	continue;
+      if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
+        continue;
 
       *delta = iv_ca_delta_add (use, old_cp, new_cp, *delta);
     }
@@ -4976,8 +5300,9 @@ iv_ca_narrow (struct ivopts_data *data, 
 	      cp = get_use_iv_cost (data, use, cnd);
 	      if (!cp)
 		continue;
+
 	      if (!iv_ca_has_deps (ivs, cp))
-		continue;
+                continue; 
 
 	      if (!cheaper_cost_pair (cp, new_cp))
 		continue;
@@ -5089,10 +5414,18 @@ try_add_cand_for (struct ivopts_data *da
   struct iv_ca_delta *best_delta = NULL, *act_delta;
   struct cost_pair *cp;
 
-  iv_ca_add_use (data, ivs, use);
+  iv_ca_add_use (data, ivs, use, false);
   best_cost = iv_ca_cost (ivs);
 
   cp = iv_ca_cand_for_use (ivs, use);
+  if (!cp)
+    {
+      ivs->upto--;
+      ivs->bad_uses--;
+      iv_ca_add_use (data, ivs, use, true);
+      best_cost = iv_ca_cost (ivs);
+      cp = iv_ca_cand_for_use (ivs, use);
+    }
   if (cp)
     {
       best_delta = iv_ca_delta_add (use, NULL, cp, NULL);
@@ -5119,14 +5452,15 @@ try_add_cand_for (struct ivopts_data *da
 	continue;
 
       if (iv_ca_cand_used_p (ivs, cand))
-	continue;
+        continue;
 
       cp = get_use_iv_cost (data, use, cand);
       if (!cp)
 	continue;
 
       iv_ca_set_cp (data, ivs, use, cp);
-      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+      act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
+                               true);
       iv_ca_set_no_cp (data, ivs, use);
       act_delta = iv_ca_delta_add (use, NULL, cp, act_delta);
 
@@ -5164,7 +5498,7 @@ try_add_cand_for (struct ivopts_data *da
 
 	  act_delta = NULL;
 	  iv_ca_set_cp (data, ivs, use, cp);
-	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL);
+	  act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL, true);
 	  iv_ca_set_no_cp (data, ivs, use);
 	  act_delta = iv_ca_delta_add (use, iv_ca_cand_for_use (ivs, use),
 				       cp, act_delta);
@@ -5224,7 +5558,7 @@ try_improve_iv_set (struct ivopts_data *
       if (iv_ca_cand_used_p (ivs, cand))
 	continue;
 
-      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs);
+      acost = iv_ca_extend (data, ivs, cand, &act_delta, &n_ivs, false);
       if (!act_delta)
 	continue;
 
@@ -5384,7 +5718,6 @@ create_new_iv (struct ivopts_data *data,
 
       /* Rewrite the increment so that it uses var_before directly.  */
       find_interesting_uses_op (data, cand->var_after)->selected = cand;
-
       return;
     }
 
@@ -5412,8 +5745,18 @@ create_new_ivs (struct ivopts_data *data
       cand = iv_cand (data, i);
       create_new_iv (data, cand);
     }
-}
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nSelected IV set: \n");
+      EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
+        {
+          cand = iv_cand (data, i);
+          dump_cand (dump_file, cand);
+        }
+      fprintf (dump_file, "\n");
+    }
+}
 
 /* Rewrites USE (definition of iv used in a nonlinear expression)
    using candidate CAND.  */
@@ -5619,6 +5962,89 @@ copy_ref_info (tree new_ref, tree old_re
     }
 }
 
+/* Performs a peephole optimization to reorder the iv update statement with
+   a mem ref to enable instruction combining in later phases. The mem ref uses
+   the iv value before the update, so the reordering transformation requires
+   adjustment of the offset. CAND is the selected IV_CAND.
+
+   Example:
+
+   t = MEM_REF (base, iv1, 8, 16);  // base, index, stride, offset
+   iv2 = iv1 + 1;
+
+   if (t < val)      (1)
+     goto L;
+   goto Head;
+
+
+   directly propagating t over to (1) will introduce overlapping live range
+   thus increase register pressure. This peephole transform it into:
+
+
+   iv2 = iv1 + 1;
+   t = MEM_REF (base, iv2, 8, 8);
+   if (t < val)
+     goto L;
+   goto Head;
+*/
+
+static void
+adjust_iv_update_pos (struct iv_cand *cand, struct iv_use *use)
+{
+  tree var_after;
+  gimple iv_update, stmt;
+  basic_block bb;
+  gimple_stmt_iterator gsi, gsi_iv;
+
+  if (cand->pos != IP_NORMAL)
+    return;
+
+  var_after = cand->var_after;
+  iv_update = SSA_NAME_DEF_STMT (var_after);
+
+  bb = gimple_bb (iv_update);
+  gsi = gsi_last_nondebug_bb (bb);
+  stmt = gsi_stmt (gsi);
+
+  /* Only handle conditional statement for now.  */
+  if (gimple_code (stmt) != GIMPLE_COND)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  stmt = gsi_stmt (gsi);
+  if (stmt != iv_update)
+    return;
+
+  gsi_prev_nondebug (&gsi);
+  if (gsi_end_p (gsi))
+    return;
+
+  stmt = gsi_stmt (gsi);
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return;
+
+  if (stmt != use->stmt)
+    return;
+
+  if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
+    return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Reordering \n");
+      print_gimple_stmt (dump_file, iv_update, 0, 0);
+      print_gimple_stmt (dump_file, use->stmt, 0, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  gsi = gsi_for_stmt (use->stmt);
+  gsi_iv = gsi_for_stmt (iv_update);
+  gsi_move_before (&gsi_iv, &gsi);
+
+  cand->pos = IP_BEFORE_USE;
+  cand->incremented_at = use->stmt;
+}
+
 /* Rewrites USE (address that is an iv) using candidate CAND.  */
 
 static void
@@ -5628,9 +6054,10 @@ rewrite_use_address (struct ivopts_data 
   aff_tree aff;
   gimple_stmt_iterator bsi = gsi_for_stmt (use->stmt);
   tree base_hint = NULL_TREE;
-  tree ref;
+  tree ref, iv;
   bool ok;
 
+  adjust_iv_update_pos (cand, use);
   ok = get_computation_aff (data->current_loop, use, cand, use->stmt, &aff);
   gcc_assert (ok);
   unshare_aff_combination (&aff);
@@ -5649,9 +6076,10 @@ rewrite_use_address (struct ivopts_data 
   if (cand->iv->base_object)
     base_hint = var_at_stmt (data->current_loop, cand, use->stmt);
 
-  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p),
+  iv = var_at_stmt (data->current_loop, cand, use->stmt);
+  ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff,
 			reference_alias_ptr_type (*use->op_p),
-			&aff, base_hint, data->speed);
+                        iv, base_hint, data->speed);
   copy_ref_info (ref, *use->op_p);
   *use->op_p = ref;
 }
@@ -5676,6 +6104,11 @@ rewrite_use_compare (struct ivopts_data 
       tree var_type = TREE_TYPE (var);
       gimple_seq stmts;
 
+      if (dump_file && (dump_flags & TDF_DETAILS))
+        {
+          fprintf (dump_file, "Replacing exit test: ");
+          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+        }
       compare = iv_elimination_compare (data, use);
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
@@ -5777,6 +6210,19 @@ remove_unused_ivs (struct ivopts_data *d
   BITMAP_FREE (toremove);
 }
 
+/* Frees memory occupied by struct tree_niter_desc in *VALUE. Callback
+   for pointer_map_traverse.  */
+
+static bool
+free_tree_niter_desc (const void *key ATTRIBUTE_UNUSED, void **value,
+                      void *data ATTRIBUTE_UNUSED)
+{
+  struct tree_niter_desc *const niter = (struct tree_niter_desc *) *value;
+
+  free (niter);
+  return true;
+}
+
 /* Frees data allocated by the optimization of a single loop.  */
 
 static void
@@ -5788,6 +6234,7 @@ free_loop_data (struct ivopts_data *data
 
   if (data->niters)
     {
+      pointer_map_traverse (data->niters, free_tree_niter_desc, NULL);
       pointer_map_destroy (data->niters);
       data->niters = NULL;
     }
@@ -5846,6 +6293,9 @@ free_loop_data (struct ivopts_data *data
     SET_DECL_RTL (obj, NULL_RTX);
 
   VEC_truncate (tree, decl_rtl_to_reset, 0);
+
+  htab_empty (data->inv_expr_tab);
+  data->inv_expr_id = 0;
 }
 
 /* Finalizes data structures used by the iv optimization pass.  LOOPS is the
@@ -5862,6 +6312,7 @@ tree_ssa_iv_optimize_finalize (struct iv
   VEC_free (tree, heap, decl_rtl_to_reset);
   VEC_free (iv_use_p, heap, data->iv_uses);
   VEC_free (iv_cand_p, heap, data->iv_candidates);
+  htab_delete (data->inv_expr_tab);
 }
 
 /* Returns true if the loop body BODY includes any function calls.  */
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 162195)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -470,6 +470,31 @@ move_pointer_to_base (struct mem_address
   aff_combination_remove_elt (addr, i);
 }
 
+/* Moves the loop variant part V in linear address ADDR to be the index
+   of PARTS.  */
+
+static void
+move_variant_to_index (struct mem_address *parts, aff_tree *addr, tree v)
+{
+  unsigned i;
+  tree val = NULL_TREE;
+
+  gcc_assert (!parts->index);
+  for (i = 0; i < addr->n; i++)
+    {
+      val = addr->elts[i].val;
+      if (operand_equal_p (val, v, 0))
+	break;
+    }
+
+  if (i == addr->n)
+    return;
+
+  parts->index = fold_convert (sizetype, val);
+  parts->step = double_int_to_tree (sizetype, addr->elts[i].coef);
+  aff_combination_remove_elt (addr, i);
+}
+
 /* Adds ELT to PARTS.  */
 
 static void
@@ -573,7 +598,8 @@ most_expensive_mult_to_index (tree type,
 
 /* Splits address ADDR for a memory access of type TYPE into PARTS.
    If BASE_HINT is non-NULL, it specifies an SSA name to be used
-   preferentially as base of the reference.
+   preferentially as base of the reference, and IV_CAND is the selected
+   iv candidate used in ADDR.
 
    TODO -- be more clever about the distribution of the elements of ADDR
    to PARTS.  Some architectures do not support anything but single
@@ -583,8 +609,9 @@ most_expensive_mult_to_index (tree type,
    addressing modes is useless.  */
 
 static void
-addr_to_parts (tree type, aff_tree *addr, tree base_hint,
-	       struct mem_address *parts, bool speed)
+addr_to_parts (tree type, aff_tree *addr, tree iv_cand,
+	       tree base_hint, struct mem_address *parts,
+               bool speed)
 {
   tree part;
   unsigned i;
@@ -602,9 +629,17 @@ addr_to_parts (tree type, aff_tree *addr
   /* Try to find a symbol.  */
   move_fixed_address_to_symbol (parts, addr);
 
+  /* No need to do address parts reassociation if the number of parts
+     is <= 2 -- in that case, no loop invariant code motion can be
+     exposed.  */
+
+  if (!base_hint && (addr->n > 2))
+    move_variant_to_index (parts, addr, iv_cand);
+
   /* First move the most expensive feasible multiplication
      to index.  */
-  most_expensive_mult_to_index (type, parts, addr, speed);
+  if (!parts->index)
+    most_expensive_mult_to_index (type, parts, addr, speed);
 
   /* Try to find a base of the reference.  Since at the moment
      there is no reliable way how to distinguish between pointer and its
@@ -644,17 +679,19 @@ gimplify_mem_ref_parts (gimple_stmt_iter
 
 /* Creates and returns a TARGET_MEM_REF for address ADDR.  If necessary
    computations are emitted in front of GSI.  TYPE is the mode
-   of created memory reference.  */
+   of created memory reference. IV_CAND is the selected iv candidate in ADDR,
+   and BASE_HINT is non NULL if IV_CAND comes from a base address
+   object.  */
 
 tree
-create_mem_ref (gimple_stmt_iterator *gsi, tree type, tree alias_ptr_type,
-		aff_tree *addr, tree base_hint, bool speed)
+create_mem_ref (gimple_stmt_iterator *gsi, tree type, aff_tree *addr,
+		tree alias_ptr_type, tree iv_cand, tree base_hint, bool speed)
 {
   tree mem_ref, tmp;
   tree atype;
   struct mem_address parts;
 
-  addr_to_parts (type, addr, base_hint, &parts, speed);
+  addr_to_parts (type, addr, iv_cand, base_hint, &parts, speed);
   gimplify_mem_ref_parts (gsi, &parts);
   mem_ref = create_mem_ref_raw (type, alias_ptr_type, &parts);
   if (mem_ref)
Index: gcc/tree-flow.h
===================================================================
--- gcc/tree-flow.h	(revision 162195)
+++ gcc/tree-flow.h	(working copy)
@@ -833,8 +833,8 @@ struct mem_address
 };
 
 struct affine_tree_combination;
-tree create_mem_ref (gimple_stmt_iterator *, tree, tree,
-		     struct affine_tree_combination *, tree, bool);
+tree create_mem_ref (gimple_stmt_iterator *, tree,
+		     struct affine_tree_combination *, tree, tree, tree, bool);
 rtx addr_for_mem_ref (struct mem_address *, addr_space_t, bool);
 void get_address_description (tree, struct mem_address *);
 tree maybe_fold_tmr (tree);

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-21  7:27                                     ` Xinliang David Li
@ 2010-07-26 16:33                                       ` Sebastian Pop
  2010-07-26 16:43                                         ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Sebastian Pop @ 2010-07-26 16:33 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches, Zdenek Dvorak

On Wed, Jul 21, 2010 at 02:27, Xinliang David Li <davidxl@google.com> wrote:
> The perf measurement was done on my Intel core-2 box with option -O2
> -ffast-math -mfpmath=sse
>
> 1. SPEC06
>
> m32
> ---------
>
> bwaves:  +14.7%
> calculiux: +12.8%
> wrf        :  +5.7%
> GemsFDTD: +3.8%
> cactusADM:  +3.6%
> leslie3d     :    +3.0%
> povray      :    +1.2%
> zeusmp:       +1.8%
> xalancbmk:  +1%
> mcf:            +5.3%
>
> a) I also verified that large improvements from bwaves and calculix on
> opteron box -- they are reproducible
> b) There are more rooms that I did not persue further -- for instance,
> in the process of perf regression fixing, I noticed the speed up of
> cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.
>
> m64
> -------
> calculix:    +8.1%
> bwaves :   +2.1%
> povray :     +1.1%
> wrf      :      +1.4%
> gromacs:    +1.0%
> xalanbmk:   +1.2%
> h264ref:      +1.4%
>
> SPEC06 degradations:
>
> gamess:   -6% (32bit and 64bit)
> bzip2:    -3% (32bit only)
>
> Investigation of gamess degradation shows that the performance
> difference comes from the difference of IVOPT on the inner most loop
> (in a 3-deep loop nest) in function twotff_.    With the IVOPT patch,
> the inner loop has only 3 ivs and is tighter compared with loop
> without the patch, in which 6 ivs are generated.    Profile data shows
> that the number of instructions retired got reduced a lot with the
> IVOPT patch while the unhalted CPU cyclecs increased on core-2.
> However, when running the program on an opteron box, the patched
> version is actually ~5% faster.
>

Here are the CPU2k6 results on AMD Phenom(tm) 9950 Quad-Core.

Old: Gcc 4.6.0 revision 162423
New: Gcc 4.6.0 revision 162423 + this patch.
Flags: -O3 -funroll-loops -fpeel-loops -ffast-math -march=native

The number is the run time percentage: (old - new) / old * 100
(positive is better)

400.perlbench	-2.14%
401.bzip2	0.33%
403.gcc	0.86%
429.mcf	6.06%
445.gobmk	3.20%
456.hmmer	1.14%
458.sjeng	0.70%
462.libquantum	-0.13%
464.h264ref	2.73%
471.omnetpp	0.69%
473.astar	0.12%
483.xalancbmk	-1.28%
410.bwaves	5.71%
416.gamess	3.10%
433.milc	0.29%
434.zeusmp	1.86%
435.gromacs	-0.18%
436.cactusADM	1.83%
437.leslie3d	-0.61%
444.namd	0.14%
447.dealII	0.81%
450.soplex	0.61%
453.povray	-3.37%
454.calculix	5.79%
459.GemsFDTD	0.74%
465.tonto	1.01%
470.lbm	0.35%
481.wrf	0.78%
482.sphinx3	0.00%

Overall it looks like a good improvement as well on AMD processors.

> Ok to checkin the patch with the above performance impact ? (I may
> find time to look at the regressions later after the checkin).
>

Note that your patch still contains formatting errors.  Please use this
script to check the patch and correct the warnings:
http://gcc.gnu.org/viewcvs/trunk/contrib/check_GNU_style.sh

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-26 16:33                                       ` Sebastian Pop
@ 2010-07-26 16:43                                         ` Xinliang David Li
  2010-07-27 20:04                                           ` Pat Haugen
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-26 16:43 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: GCC Patches, Zdenek Dvorak

Thanks Sebatian for testing it out. I also asked Pat to help testing
the patch again on powerpc. I will first split off the unrelated
patches and submit them first (e.g, multiple exit loop handling etc).

David

On Mon, Jul 26, 2010 at 9:31 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Wed, Jul 21, 2010 at 02:27, Xinliang David Li <davidxl@google.com> wrote:
>> The perf measurement was done on my Intel core-2 box with option -O2
>> -ffast-math -mfpmath=sse
>>
>> 1. SPEC06
>>
>> m32
>> ---------
>>
>> bwaves:  +14.7%
>> calculiux: +12.8%
>> wrf        :  +5.7%
>> GemsFDTD: +3.8%
>> cactusADM:  +3.6%
>> leslie3d     :    +3.0%
>> povray      :    +1.2%
>> zeusmp:       +1.8%
>> xalancbmk:  +1%
>> mcf:            +5.3%
>>
>> a) I also verified that large improvements from bwaves and calculix on
>> opteron box -- they are reproducible
>> b) There are more rooms that I did not persue further -- for instance,
>> in the process of perf regression fixing, I noticed the speed up of
>> cactusADM can be up to +14%, wrf upto 9%, and deallI upto +8%.
>>
>> m64
>> -------
>> calculix:    +8.1%
>> bwaves :   +2.1%
>> povray :     +1.1%
>> wrf      :      +1.4%
>> gromacs:    +1.0%
>> xalanbmk:   +1.2%
>> h264ref:      +1.4%
>>
>> SPEC06 degradations:
>>
>> gamess:   -6% (32bit and 64bit)
>> bzip2:    -3% (32bit only)
>>
>> Investigation of gamess degradation shows that the performance
>> difference comes from the difference of IVOPT on the inner most loop
>> (in a 3-deep loop nest) in function twotff_.    With the IVOPT patch,
>> the inner loop has only 3 ivs and is tighter compared with loop
>> without the patch, in which 6 ivs are generated.    Profile data shows
>> that the number of instructions retired got reduced a lot with the
>> IVOPT patch while the unhalted CPU cyclecs increased on core-2.
>> However, when running the program on an opteron box, the patched
>> version is actually ~5% faster.
>>
>
> Here are the CPU2k6 results on AMD Phenom(tm) 9950 Quad-Core.
>
> Old: Gcc 4.6.0 revision 162423
> New: Gcc 4.6.0 revision 162423 + this patch.
> Flags: -O3 -funroll-loops -fpeel-loops -ffast-math -march=native
>
> The number is the run time percentage: (old - new) / old * 100
> (positive is better)
>
> 400.perlbench   -2.14%
> 401.bzip2       0.33%
> 403.gcc 0.86%
> 429.mcf 6.06%
> 445.gobmk       3.20%
> 456.hmmer       1.14%
> 458.sjeng       0.70%
> 462.libquantum  -0.13%
> 464.h264ref     2.73%
> 471.omnetpp     0.69%
> 473.astar       0.12%
> 483.xalancbmk   -1.28%
> 410.bwaves      5.71%
> 416.gamess      3.10%
> 433.milc        0.29%
> 434.zeusmp      1.86%
> 435.gromacs     -0.18%
> 436.cactusADM   1.83%
> 437.leslie3d    -0.61%
> 444.namd        0.14%
> 447.dealII      0.81%
> 450.soplex      0.61%
> 453.povray      -3.37%
> 454.calculix    5.79%
> 459.GemsFDTD    0.74%
> 465.tonto       1.01%
> 470.lbm 0.35%
> 481.wrf 0.78%
> 482.sphinx3     0.00%
>
> Overall it looks like a good improvement as well on AMD processors.
>
>> Ok to checkin the patch with the above performance impact ? (I may
>> find time to look at the regressions later after the checkin).
>>
>
> Note that your patch still contains formatting errors.  Please use this
> script to check the patch and correct the warnings:
> http://gcc.gnu.org/viewcvs/trunk/contrib/check_GNU_style.sh
>
> Thanks,
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-26 16:43                                         ` Xinliang David Li
@ 2010-07-27 20:04                                           ` Pat Haugen
  2010-07-27 20:25                                             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Pat Haugen @ 2010-07-27 20:04 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches, Zdenek Dvorak

>
> Thanks Sebatian for testing it out. I also asked Pat to help testing
> the patch again on powerpc. I will first split off the unrelated
> patches and submit them first (e.g, multiple exit loop handling etc).
>

There were 2 good improvements on PowerPC, the rest were pretty much a wash
(< +/-2%):

410.bwaves	10.0%
434.zeusmp	6.6%

One thing I did notice however is that comparing these results to the run I
did back in May on an earlier version of the patch is that both
improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
don't have the old builds around, but could recreate if you're not aware of
anything to explain the drop.

-Pat

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-27 20:04                                           ` Pat Haugen
@ 2010-07-27 20:25                                             ` Xinliang David Li
  2010-07-29  3:50                                               ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-27 20:25 UTC (permalink / raw)
  To: Pat Haugen; +Cc: GCC Patches, Zdenek Dvorak

On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>
>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>> the patch again on powerpc. I will first split off the unrelated
>> patches and submit them first (e.g, multiple exit loop handling etc).
>>
>
> There were 2 good improvements on PowerPC, the rest were pretty much a wash
> (< +/-2%):
>
> 410.bwaves      10.0%
> 434.zeusmp      6.6%
>
> One thing I did notice however is that comparing these results to the run I
> did back in May on an earlier version of the patch is that both
> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
> don't have the old builds around, but could recreate if you're not aware of
> anything to explain the drop.
>

Thanks. I will check in this version first and do some triaging on the
performance drop (with your help).  One thing to be aware is that
r161844 was checked in during this period of time which might be
related, but not sure until further investigation -- the two stage
initial iv set computation introduced by the patch may not be needed
(if this patch is in).

Thanks,

David

> -Pat
>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-27 20:25                                             ` Xinliang David Li
@ 2010-07-29  3:50                                               ` H.J. Lu
  2010-07-29  5:57                                                 ` H.J. Lu
  2010-07-29  7:26                                                 ` Xinliang David Li
  0 siblings, 2 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29  3:50 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>
>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>> the patch again on powerpc. I will first split off the unrelated
>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>
>>
>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>> (< +/-2%):
>>
>> 410.bwaves      10.0%
>> 434.zeusmp      6.6%
>>
>> One thing I did notice however is that comparing these results to the run I
>> did back in May on an earlier version of the patch is that both
>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>> don't have the old builds around, but could recreate if you're not aware of
>> anything to explain the drop.
>>
>
> Thanks. I will check in this version first and do some triaging on the
> performance drop (with your help).  One thing to be aware is that
> r161844 was checked in during this period of time which might be
> related, but not sure until further investigation -- the two stage
> initial iv set computation introduced by the patch may not be needed
> (if this patch is in).
>

Your checkin caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119

-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  3:50                                               ` H.J. Lu
@ 2010-07-29  5:57                                                 ` H.J. Lu
  2010-07-29  7:44                                                   ` Xinliang David Li
                                                                     ` (2 more replies)
  2010-07-29  7:26                                                 ` Xinliang David Li
  1 sibling, 3 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29  5:57 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>
>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>> the patch again on powerpc. I will first split off the unrelated
>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>
>>>
>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>> (< +/-2%):
>>>
>>> 410.bwaves      10.0%
>>> 434.zeusmp      6.6%
>>>
>>> One thing I did notice however is that comparing these results to the run I
>>> did back in May on an earlier version of the patch is that both
>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>> don't have the old builds around, but could recreate if you're not aware of
>>> anything to explain the drop.
>>>
>>
>> Thanks. I will check in this version first and do some triaging on the
>> performance drop (with your help).  One thing to be aware is that
>> r161844 was checked in during this period of time which might be
>> related, but not sure until further investigation -- the two stage
>> initial iv set computation introduced by the patch may not be needed
>> (if this patch is in).
>>
>
> Your checkin caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  3:50                                               ` H.J. Lu
  2010-07-29  5:57                                                 ` H.J. Lu
@ 2010-07-29  7:26                                                 ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-29  7:26 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

What is the build configuration?

Thanks,

David

On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>
>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>> the patch again on powerpc. I will first split off the unrelated
>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>
>>>
>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>> (< +/-2%):
>>>
>>> 410.bwaves      10.0%
>>> 434.zeusmp      6.6%
>>>
>>> One thing I did notice however is that comparing these results to the run I
>>> did back in May on an earlier version of the patch is that both
>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>> don't have the old builds around, but could recreate if you're not aware of
>>> anything to explain the drop.
>>>
>>
>> Thanks. I will check in this version first and do some triaging on the
>> performance drop (with your help).  One thing to be aware is that
>> r161844 was checked in during this period of time which might be
>> related, but not sure until further investigation -- the two stage
>> initial iv set computation introduced by the patch may not be needed
>> (if this patch is in).
>>
>
> Your checkin caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  5:57                                                 ` H.J. Lu
@ 2010-07-29  7:44                                                   ` Xinliang David Li
  2010-07-29  8:28                                                     ` Zdenek Dvorak
  2010-07-29 15:27                                                     ` H.J. Lu
  2010-07-29 14:17                                                   ` H.J. Lu
  2010-07-30 15:06                                                   ` H.J. Lu
  2 siblings, 2 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-29  7:44 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

The attached patch should fix the problem -- it reverts a small part
of the last patch that is needed for fixing sixtrack performance
regression caused by wrong iv-use costs because address offset range
is conservatively computed. I will revert the change first and
investigate better fix (Suggestions are welcome).

Ok for checkin (after testing is done)?

Thanks,

David

On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>
>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>
>>>>
>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>> (< +/-2%):
>>>>
>>>> 410.bwaves      10.0%
>>>> 434.zeusmp      6.6%
>>>>
>>>> One thing I did notice however is that comparing these results to the run I
>>>> did back in May on an earlier version of the patch is that both
>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>> don't have the old builds around, but could recreate if you're not aware of
>>>> anything to explain the drop.
>>>>
>>>
>>> Thanks. I will check in this version first and do some triaging on the
>>> performance drop (with your help).  One thing to be aware is that
>>> r161844 was checked in during this period of time which might be
>>> related, but not sure until further investigation -- the two stage
>>> initial iv set computation introduced by the patch may not be needed
>>> (if this patch is in).
>>>
>>
>> Your checkin caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>
>
> This also caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>
>
> --
> H.J.
>

[-- Attachment #2: address_cost_bug.p --]
[-- Type: application/octet-stream, Size: 2095 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162653)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3243,7 +3243,7 @@ get_address_cost (bool symbol_present, b
       HOST_WIDE_INT i;
       HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected, width;
+      int old_cse_not_expected;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,10 +3252,8 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
-      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 2)
-          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 2;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1ll << width; i <<= 1)
+      for (i = start; i <= 1 << 20; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3264,7 +3262,7 @@ get_address_cost (bool symbol_present, b
       data->max_offset = i == start ? 0 : i >> 1;
       off = data->max_offset;
 
-      for (i = start; i <= 1ll << width; i <<= 1)
+      for (i = start; i <= 1 << 20; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3275,12 +3273,12 @@ get_address_cost (bool symbol_present, b
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
+	  fprintf (dump_file, "  min offset %s %d\n",
 		   GET_MODE_NAME (mem_mode),
-		   data->min_offset);
-	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
+		   (int) data->min_offset);
+	  fprintf (dump_file, "  max offset %s %d\n",
 		   GET_MODE_NAME (mem_mode),
-		   data->max_offset);
+		   (int) data->max_offset);
 	}
 
       rat = 1;

[-- Attachment #3: cl --]
[-- Type: application/octet-stream, Size: 125 bytes --]

2010-07-28  Xinliang David Li  <davidxl@google.com>

	* tree-ssa-loop-ivopts.c (get_address_cost): Revert change
	in 162652.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  7:44                                                   ` Xinliang David Li
@ 2010-07-29  8:28                                                     ` Zdenek Dvorak
  2010-07-29 14:37                                                       ` H.J. Lu
  2010-07-29 15:27                                                     ` H.J. Lu
  1 sibling, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-07-29  8:28 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: H.J. Lu, Pat Haugen, GCC Patches

Hi,

> The attached patch should fix the problem -- it reverts a small part
> of the last patch that is needed for fixing sixtrack performance
> regression caused by wrong iv-use costs because address offset range
> is conservatively computed. I will revert the change first and
> investigate better fix (Suggestions are welcome).
> 
> Ok for checkin (after testing is done)?

OK,

Zdenek

> Thanks,
> 
> David
> 
> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> > On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
> >>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
> >>>>>
> >>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
> >>>>> the patch again on powerpc. I will first split off the unrelated
> >>>>> patches and submit them first (e.g, multiple exit loop handling etc).
> >>>>>
> >>>>
> >>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
> >>>> (< +/-2%):
> >>>>
> >>>> 410.bwaves Â  Â  Â 10.0%
> >>>> 434.zeusmp Â  Â  Â 6.6%
> >>>>
> >>>> One thing I did notice however is that comparing these results to the run I
> >>>> did back in May on an earlier version of the patch is that both
> >>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
> >>>> don't have the old builds around, but could recreate if you're not aware of
> >>>> anything to explain the drop.
> >>>>
> >>>
> >>> Thanks. I will check in this version first and do some triaging on the
> >>> performance drop (with your help). Â One thing to be aware is that
> >>> r161844 was checked in during this period of time which might be
> >>> related, but not sure until further investigation -- the two stage
> >>> initial iv set computation introduced by the patch may not be needed
> >>> (if this patch is in).
> >>>
> >>
> >> Your checkin caused:
> >>
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
> >>
> >
> > This also caused:
> >
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
> >
> >
> > --
> > H.J.
> >



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  5:57                                                 ` H.J. Lu
  2010-07-29  7:44                                                   ` Xinliang David Li
@ 2010-07-29 14:17                                                   ` H.J. Lu
  2010-07-29 17:00                                                     ` Xinliang David Li
                                                                       ` (2 more replies)
  2010-07-30 15:06                                                   ` H.J. Lu
  2 siblings, 3 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 14:17 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>
>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>
>>>>
>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>> (< +/-2%):
>>>>
>>>> 410.bwaves      10.0%
>>>> 434.zeusmp      6.6%
>>>>
>>>> One thing I did notice however is that comparing these results to the run I
>>>> did back in May on an earlier version of the patch is that both
>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>> don't have the old builds around, but could recreate if you're not aware of
>>>> anything to explain the drop.
>>>>
>>>
>>> Thanks. I will check in this version first and do some triaging on the
>>> performance drop (with your help).  One thing to be aware is that
>>> r161844 was checked in during this period of time which might be
>>> related, but not sure until further investigation -- the two stage
>>> initial iv set computation introduced by the patch may not be needed
>>> (if this patch is in).
>>>
>>
>> Your checkin caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>
>
> This also caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>

This may also cause:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45131


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  8:28                                                     ` Zdenek Dvorak
@ 2010-07-29 14:37                                                       ` H.J. Lu
  0 siblings, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 14:37 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: Xinliang David Li, Pat Haugen, GCC Patches

2010/7/29 Zdenek Dvorak <rakdver@kam.mff.cuni.cz>:
> Hi,
>
>> The attached patch should fix the problem -- it reverts a small part
>> of the last patch that is needed for fixing sixtrack performance
>> regression caused by wrong iv-use costs because address offset range
>> is conservatively computed. I will revert the change first and
>> investigate better fix (Suggestions are welcome).
>>
>> Ok for checkin (after testing is done)?
>
> OK,
>
> Zdenek

I verified the fix and checked in.

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  7:44                                                   ` Xinliang David Li
  2010-07-29  8:28                                                     ` Zdenek Dvorak
@ 2010-07-29 15:27                                                     ` H.J. Lu
  2010-07-29 16:09                                                       ` H.J. Lu
  2010-07-29 16:11                                                       ` H.J. Lu
  1 sibling, 2 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 15:27 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
> The attached patch should fix the problem -- it reverts a small part
> of the last patch that is needed for fixing sixtrack performance
> regression caused by wrong iv-use costs because address offset range
> is conservatively computed. I will revert the change first and
> investigate better fix (Suggestions are welcome).
>

Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
it sounds like a HOST_WIDE_INT issue.


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 15:27                                                     ` H.J. Lu
@ 2010-07-29 16:09                                                       ` H.J. Lu
  2010-07-29 16:17                                                         ` Richard Guenther
  2010-07-29 16:11                                                       ` H.J. Lu
  1 sibling, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 16:09 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> The attached patch should fix the problem -- it reverts a small part
>> of the last patch that is needed for fixing sixtrack performance
>> regression caused by wrong iv-use costs because address offset range
>> is conservatively computed. I will revert the change first and
>> investigate better fix (Suggestions are welcome).
>>
>
> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
> it sounds like a HOST_WIDE_INT issue.
>

This patch fixed the infinite loop.


-- 
H.J.
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 519f66e..44f2eb2 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
ratio, enum machine_mode mode,

 typedef struct
 {
-  HOST_WIDE_INT min_offset, max_offset;
+  HOST_WIDEST_INT min_offset, max_offset;
   unsigned costs[2][2][2][2];
 } *address_cost_data;

@@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
   if (!data)
     {
-      HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
+      HOST_WIDEST_INT i;
+      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+      HOST_WIDEST_INT rat, off;
       int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 15:27                                                     ` H.J. Lu
  2010-07-29 16:09                                                       ` H.J. Lu
@ 2010-07-29 16:11                                                       ` H.J. Lu
  1 sibling, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 16:11 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>> The attached patch should fix the problem -- it reverts a small part
>> of the last patch that is needed for fixing sixtrack performance
>> regression caused by wrong iv-use costs because address offset range
>> is conservatively computed. I will revert the change first and
>> investigate better fix (Suggestions are welcome).
>>
>
> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
> it sounds like a HOST_WIDE_INT issue.
>

Here is the patch.  OK for trunk?

Thanks.

-- 
H.J.
----
2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (address_cost_data): Replace
	HOST_WIDE_INT with HOST_WIDEST_INT.
	(get_address_cost): Likewise.

[-- Attachment #2: gcc-pr45119-1.patch --]
[-- Type: text/plain, Size: 2818 bytes --]

2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (address_cost_data): Replace
	HOST_WIDE_INT with HOST_WIDEST_INT.
	(get_address_cost): Likewise.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d65b4a..92e19d1 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT ratio, enum machine_mode mode,
 
 typedef struct
 {
-  HOST_WIDE_INT min_offset, max_offset;
+  HOST_WIDEST_INT min_offset, max_offset;
   unsigned costs[2][2][2][2];
 } *address_cost_data;
 
@@ -3240,10 +3240,10 @@ get_address_cost (bool symbol_present, bool var_present,
   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
   if (!data)
     {
-      HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDEST_INT i;
+      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+      HOST_WIDEST_INT rat, off;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,8 +3252,10 @@ get_address_cost (bool symbol_present, bool var_present,
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 2)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 2;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i <= 1ll << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3262,7 +3264,7 @@ get_address_cost (bool symbol_present, bool var_present,
       data->max_offset = i == start ? 0 : i >> 1;
       off = data->max_offset;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i <= 1ll << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3273,12 +3275,14 @@ get_address_cost (bool symbol_present, bool var_present,
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s "
+		   HOST_WIDEST_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s "
+		   HOST_WIDEST_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 16:09                                                       ` H.J. Lu
@ 2010-07-29 16:17                                                         ` Richard Guenther
  2010-07-29 16:55                                                           ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Richard Guenther @ 2010-07-29 16:17 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Xinliang David Li, Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> The attached patch should fix the problem -- it reverts a small part
>>> of the last patch that is needed for fixing sixtrack performance
>>> regression caused by wrong iv-use costs because address offset range
>>> is conservatively computed. I will revert the change first and
>>> investigate better fix (Suggestions are welcome).
>>>
>>
>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>> it sounds like a HOST_WIDE_INT issue.
>>
>
> This patch fixed the infinite loop.

That doesn't make sense.  Please use double_ints instead.

Richard.

>
> --
> H.J.
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 519f66e..44f2eb2 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
> ratio, enum machine_mode mode,
>
>  typedef struct
>  {
> -  HOST_WIDE_INT min_offset, max_offset;
> +  HOST_WIDEST_INT min_offset, max_offset;
>   unsigned costs[2][2][2][2];
>  } *address_cost_data;
>
> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>   if (!data)
>     {
> -      HOST_WIDE_INT i;
> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> -      HOST_WIDE_INT rat, off;
> +      HOST_WIDEST_INT i;
> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> +      HOST_WIDEST_INT rat, off;
>       int old_cse_not_expected, width;
>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>       rtx seq, addr, base;
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 16:17                                                         ` Richard Guenther
@ 2010-07-29 16:55                                                           ` H.J. Lu
  2010-07-30  1:04                                                             ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 16:55 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Xinliang David Li, Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> The attached patch should fix the problem -- it reverts a small part
>>>> of the last patch that is needed for fixing sixtrack performance
>>>> regression caused by wrong iv-use costs because address offset range
>>>> is conservatively computed. I will revert the change first and
>>>> investigate better fix (Suggestions are welcome).
>>>>
>>>
>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>> it sounds like a HOST_WIDE_INT issue.
>>>
>>
>> This patch fixed the infinite loop.
>
> That doesn't make sense.  Please use double_ints instead.
>
> Richard.

I am not familiar wit this code. I will leave it to David.


H.J.
---
>>
>> --
>> H.J.
>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>> index 519f66e..44f2eb2 100644
>> --- a/gcc/tree-ssa-loop-ivopts.c
>> +++ b/gcc/tree-ssa-loop-ivopts.c
>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>> ratio, enum machine_mode mode,
>>
>>  typedef struct
>>  {
>> -  HOST_WIDE_INT min_offset, max_offset;
>> +  HOST_WIDEST_INT min_offset, max_offset;
>>   unsigned costs[2][2][2][2];
>>  } *address_cost_data;
>>
>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>   if (!data)
>>     {
>> -      HOST_WIDE_INT i;
>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>> -      HOST_WIDE_INT rat, off;
>> +      HOST_WIDEST_INT i;
>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>> +      HOST_WIDEST_INT rat, off;
>>       int old_cse_not_expected, width;
>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>       rtx seq, addr, base;
>>
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 14:17                                                   ` H.J. Lu
@ 2010-07-29 17:00                                                     ` Xinliang David Li
  2010-07-29 17:10                                                       ` H.J. Lu
  2010-10-28 19:28                                                     ` H.J. Lu
  2011-04-27 13:23                                                     ` H.J. Lu
  2 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-29 17:00 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

Just to clarify -- this patch is not the cause for this regression, right?

David

On Thu, Jul 29, 2010 at 7:14 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>>
>>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>>
>>>>>
>>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>>> (< +/-2%):
>>>>>
>>>>> 410.bwaves      10.0%
>>>>> 434.zeusmp      6.6%
>>>>>
>>>>> One thing I did notice however is that comparing these results to the run I
>>>>> did back in May on an earlier version of the patch is that both
>>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>>> don't have the old builds around, but could recreate if you're not aware of
>>>>> anything to explain the drop.
>>>>>
>>>>
>>>> Thanks. I will check in this version first and do some triaging on the
>>>> performance drop (with your help).  One thing to be aware is that
>>>> r161844 was checked in during this period of time which might be
>>>> related, but not sure until further investigation -- the two stage
>>>> initial iv set computation introduced by the patch may not be needed
>>>> (if this patch is in).
>>>>
>>>
>>> Your checkin caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>>
>>
>> This also caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>>
>
> This may also cause:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45131
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 17:00                                                     ` Xinliang David Li
@ 2010-07-29 17:10                                                       ` H.J. Lu
  0 siblings, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-29 17:10 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 9:54 AM, Xinliang David Li <davidxl@google.com> wrote:
> Just to clarify -- this patch is not the cause for this regression, right?

That is correct.


H.J.
> David
>
> On Thu, Jul 29, 2010 at 7:14 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>>>
>>>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>>>
>>>>>>
>>>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>>>> (< +/-2%):
>>>>>>
>>>>>> 410.bwaves      10.0%
>>>>>> 434.zeusmp      6.6%
>>>>>>
>>>>>> One thing I did notice however is that comparing these results to the run I
>>>>>> did back in May on an earlier version of the patch is that both
>>>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>>>> don't have the old builds around, but could recreate if you're not aware of
>>>>>> anything to explain the drop.
>>>>>>
>>>>>
>>>>> Thanks. I will check in this version first and do some triaging on the
>>>>> performance drop (with your help).  One thing to be aware is that
>>>>> r161844 was checked in during this period of time which might be
>>>>> related, but not sure until further investigation -- the two stage
>>>>> initial iv set computation introduced by the patch may not be needed
>>>>> (if this patch is in).
>>>>>
>>>>
>>>> Your checkin caused:
>>>>
>>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>>>
>>>
>>> This also caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>>>
>>
>> This may also cause:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45131
>>
>>
>> --
>> H.J.
>>
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 16:55                                                           ` H.J. Lu
@ 2010-07-30  1:04                                                             ` Xinliang David Li
  2010-07-30  2:06                                                               ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30  1:04 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

Please take a look at the following patch -- it is less conservative
than before and also does not lead to infinite loop (due to integer
overflow).

Testing is under going. Ok for trunk after that is done?

Thanks,

David

On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>> regression caused by wrong iv-use costs because address offset range
>>>>> is conservatively computed. I will revert the change first and
>>>>> investigate better fix (Suggestions are welcome).
>>>>>
>>>>
>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>> it sounds like a HOST_WIDE_INT issue.
>>>>
>>>
>>> This patch fixed the infinite loop.
>>
>> That doesn't make sense.  Please use double_ints instead.
>>
>> Richard.
>
> I am not familiar wit this code. I will leave it to David.
>
>
> H.J.
> ---
>>>
>>> --
>>> H.J.
>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>> index 519f66e..44f2eb2 100644
>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>> ratio, enum machine_mode mode,
>>>
>>>  typedef struct
>>>  {
>>> -  HOST_WIDE_INT min_offset, max_offset;
>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>   unsigned costs[2][2][2][2];
>>>  } *address_cost_data;
>>>
>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>   if (!data)
>>>     {
>>> -      HOST_WIDE_INT i;
>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>> -      HOST_WIDE_INT rat, off;
>>> +      HOST_WIDEST_INT i;
>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>> +      HOST_WIDEST_INT rat, off;
>>>       int old_cse_not_expected, width;
>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>       rtx seq, addr, base;
>>>
>>
>
>
>
> --
> H.J.
>

[-- Attachment #2: address_offset.p --]
[-- Type: text/x-pascal, Size: 2369 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162696)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,37 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = 1; i < width; i++)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
+          HOST_WIDE_INT offset = (1ll << i);
+	  XEXP (addr, 1) = gen_int_mode (offset, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
+      data->max_offset = i == 1 ? 0 : (1ll << (i - 1));
       off = data->max_offset;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = 1; i < width; i++)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
+          HOST_WIDE_INT offset = -(1ll << i);
+	  XEXP (addr, 1) = gen_int_mode (offset, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->min_offset = i == 1 ? 0 : -(1ll << (i-1));
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

[-- Attachment #3: address_offset.cl --]
[-- Type: application/octet-stream, Size: 153 bytes --]

2010-07-29  Xinliang David Li  <davidxl@google.com>

	* tree-ssa-loop-ivopts.c (get_address_cost): Better
	computation of max/min offsets for addresses.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30  1:04                                                             ` Xinliang David Li
@ 2010-07-30  2:06                                                               ` H.J. Lu
  2010-07-30  5:41                                                                 ` Xinliang David Li
  2010-07-30 15:56                                                                 ` H.J. Lu
  0 siblings, 2 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-30  2:06 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

It looks strange:

+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = 1; i < width; i++)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
+          HOST_WIDE_INT offset = (1ll << i);
+	  XEXP (addr, 1) = gen_int_mode (offset, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}

HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
I think width can be >= 31. Depending on HOST_WIDE_INT,

HOST_WIDE_INT offset = -(1ll << i);

may have different values. The whole function looks odd to me.


H.J.
----
On Thu, Jul 29, 2010 at 5:55 PM, Xinliang David Li <davidxl@google.com> wrote:
> Please take a look at the following patch -- it is less conservative
> than before and also does not lead to infinite loop (due to integer
> overflow).
>
> Testing is under going. Ok for trunk after that is done?
>
> Thanks,
>
> David
>
> On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>>> regression caused by wrong iv-use costs because address offset range
>>>>>> is conservatively computed. I will revert the change first and
>>>>>> investigate better fix (Suggestions are welcome).
>>>>>>
>>>>>
>>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>>> it sounds like a HOST_WIDE_INT issue.
>>>>>
>>>>
>>>> This patch fixed the infinite loop.
>>>
>>> That doesn't make sense.  Please use double_ints instead.
>>>
>>> Richard.
>>
>> I am not familiar wit this code. I will leave it to David.
>>
>>
>> H.J.
>> ---
>>>>
>>>> --
>>>> H.J.
>>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>>> index 519f66e..44f2eb2 100644
>>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>>> ratio, enum machine_mode mode,
>>>>
>>>>  typedef struct
>>>>  {
>>>> -  HOST_WIDE_INT min_offset, max_offset;
>>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>>   unsigned costs[2][2][2][2];
>>>>  } *address_cost_data;
>>>>
>>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>>   if (!data)
>>>>     {
>>>> -      HOST_WIDE_INT i;
>>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>> -      HOST_WIDE_INT rat, off;
>>>> +      HOST_WIDEST_INT i;
>>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>> +      HOST_WIDEST_INT rat, off;
>>>>       int old_cse_not_expected, width;
>>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>>       rtx seq, addr, base;
>>>>
>>>
>>
>>
>>
>> --
>> H.J.
>>
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30  2:06                                                               ` H.J. Lu
@ 2010-07-30  5:41                                                                 ` Xinliang David Li
  2010-07-30  7:19                                                                   ` Jakub Jelinek
  2010-07-30 15:56                                                                 ` H.J. Lu
  1 sibling, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30  5:41 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

The width is set to a value so that 1ll<<i is guaranteed to not
overflow HOST_WIDE_INT type. THe suffix is needed so that the
intermediate value does not get truncated when HOST_WIDE_INT is wider
than 32bit. Is there a portable way to represent the integer literal
with HOST_WIDE_TYPE?

Thanks,

David

On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> It looks strange:
>
> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> -      for (i = start; i <= 1 << 20; i <<= 1)
> +      for (i = 1; i < width; i++)
>        {
> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
> +          HOST_WIDE_INT offset = (1ll << i);
> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>
> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
> I think width can be >= 31. Depending on HOST_WIDE_INT,
>
> HOST_WIDE_INT offset = -(1ll << i);
>
> may have different values. The whole function looks odd to me.
>
>
> H.J.
> ----
> On Thu, Jul 29, 2010 at 5:55 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Please take a look at the following patch -- it is less conservative
>> than before and also does not lead to infinite loop (due to integer
>> overflow).
>>
>> Testing is under going. Ok for trunk after that is done?
>>
>> Thanks,
>>
>> David
>>
>> On Thu, Jul 29, 2010 at 9:51 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 9:16 AM, Richard Guenther
>>> <richard.guenther@gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 6:00 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Jul 29, 2010 at 8:22 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Wed, Jul 28, 2010 at 9:32 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>>> The attached patch should fix the problem -- it reverts a small part
>>>>>>> of the last patch that is needed for fixing sixtrack performance
>>>>>>> regression caused by wrong iv-use costs because address offset range
>>>>>>> is conservatively computed. I will revert the change first and
>>>>>>> investigate better fix (Suggestions are welcome).
>>>>>>>
>>>>>>
>>>>>> Since "gcc -m32" works on Linux/x86-64 and goes into an infinite loop,
>>>>>> it sounds like a HOST_WIDE_INT issue.
>>>>>>
>>>>>
>>>>> This patch fixed the infinite loop.
>>>>
>>>> That doesn't make sense.  Please use double_ints instead.
>>>>
>>>> Richard.
>>>
>>> I am not familiar wit this code. I will leave it to David.
>>>
>>>
>>> H.J.
>>> ---
>>>>>
>>>>> --
>>>>> H.J.
>>>>> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
>>>>> index 519f66e..44f2eb2 100644
>>>>> --- a/gcc/tree-ssa-loop-ivopts.c
>>>>> +++ b/gcc/tree-ssa-loop-ivopts.c
>>>>> @@ -3207,7 +3207,7 @@ multiplier_allowed_in_address_p (HOST_WIDE_INT
>>>>> ratio, enum machine_mode mode,
>>>>>
>>>>>  typedef struct
>>>>>  {
>>>>> -  HOST_WIDE_INT min_offset, max_offset;
>>>>> +  HOST_WIDEST_INT min_offset, max_offset;
>>>>>   unsigned costs[2][2][2][2];
>>>>>  } *address_cost_data;
>>>>>
>>>>> @@ -3240,9 +3240,9 @@ get_address_cost (bool symbol_present, bool var_present,
>>>>>   data = VEC_index (address_cost_data, address_cost_data_list, data_index);
>>>>>   if (!data)
>>>>>     {
>>>>> -      HOST_WIDE_INT i;
>>>>> -      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>>> -      HOST_WIDE_INT rat, off;
>>>>> +      HOST_WIDEST_INT i;
>>>>> +      HOST_WIDEST_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
>>>>> +      HOST_WIDEST_INT rat, off;
>>>>>       int old_cse_not_expected, width;
>>>>>       unsigned sym_p, var_p, off_p, rat_p, add_c;
>>>>>       rtx seq, addr, base;
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> H.J.
>>>
>>
>
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30  5:41                                                                 ` Xinliang David Li
@ 2010-07-30  7:19                                                                   ` Jakub Jelinek
  2010-07-30 16:45                                                                     ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Jakub Jelinek @ 2010-07-30  7:19 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: H.J. Lu, Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 09:18:07PM -0700, Xinliang David Li wrote:
> The width is set to a value so that 1ll<<i is guaranteed to not
> overflow HOST_WIDE_INT type. THe suffix is needed so that the
> intermediate value does not get truncated when HOST_WIDE_INT is wider
> than 32bit. Is there a portable way to represent the integer literal
> with HOST_WIDE_TYPE?

Sure:

(HOST_WIDE_INT) 1 << i

	Jakub

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29  5:57                                                 ` H.J. Lu
  2010-07-29  7:44                                                   ` Xinliang David Li
  2010-07-29 14:17                                                   ` H.J. Lu
@ 2010-07-30 15:06                                                   ` H.J. Lu
  2010-07-30 16:50                                                     ` Xinliang David Li
  2 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-07-30 15:06 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>
>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>
>>>>
>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>> (< +/-2%):
>>>>
>>>> 410.bwaves      10.0%
>>>> 434.zeusmp      6.6%
>>>>
>>>> One thing I did notice however is that comparing these results to the run I
>>>> did back in May on an earlier version of the patch is that both
>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>> don't have the old builds around, but could recreate if you're not aware of
>>>> anything to explain the drop.
>>>>
>>>
>>> Thanks. I will check in this version first and do some triaging on the
>>> performance drop (with your help).  One thing to be aware is that
>>> r161844 was checked in during this period of time which might be
>>> related, but not sure until further investigation -- the two stage
>>> initial iv set computation introduced by the patch may not be needed
>>> (if this patch is in).
>>>
>>
>> Your checkin caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>
>
> This also caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>

This bug isn't fixed. David, did you verify your change by
running "make check"?

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30  2:06                                                               ` H.J. Lu
  2010-07-30  5:41                                                                 ` Xinliang David Li
@ 2010-07-30 15:56                                                                 ` H.J. Lu
  2010-07-30 16:58                                                                   ` Xinliang David Li
  2010-07-30 17:01                                                                   ` Xinliang David Li
  1 sibling, 2 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-30 15:56 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 1197 bytes --]

On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> It looks strange:
>
> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> -      for (i = start; i <= 1 << 20; i <<= 1)
> +      for (i = 1; i < width; i++)
>        {
> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
> +          HOST_WIDE_INT offset = (1ll << i);
> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>
> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
> I think width can be >= 31. Depending on HOST_WIDE_INT,
>
> HOST_WIDE_INT offset = -(1ll << i);
>
> may have different values. The whole function looks odd to me.
>
>

Here is a different approach to check address overflow.


-- 
H.J.
--
2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
	162652.  Check address overflow.

[-- Attachment #2: gcc-pr45119-2.patch --]
[-- Type: text/plain, Size: 2329 bytes --]

2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
	162652.  Check address overflow.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d65b4a..55aa10c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3243,7 +3243,7 @@ get_address_cost (bool symbol_present, bool var_present,
       HOST_WIDE_INT i;
       HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,8 +3252,10 @@ get_address_cost (bool symbol_present, bool var_present,
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 2)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 2;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i && i <= (HOST_WIDE_INT) 1 << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3262,7 +3264,7 @@ get_address_cost (bool symbol_present, bool var_present,
       data->max_offset = i == start ? 0 : i >> 1;
       off = data->max_offset;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = start; i && i <= (HOST_WIDE_INT) 1 << width; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
@@ -3273,12 +3275,12 @@ get_address_cost (bool symbol_present, bool var_present,
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30  7:19                                                                   ` Jakub Jelinek
@ 2010-07-30 16:45                                                                     ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 16:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: H.J. Lu, Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

On Fri, Jul 30, 2010 at 12:10 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jul 29, 2010 at 09:18:07PM -0700, Xinliang David Li wrote:
>> The width is set to a value so that 1ll<<i is guaranteed to not
>> overflow HOST_WIDE_INT type. THe suffix is needed so that the
>> intermediate value does not get truncated when HOST_WIDE_INT is wider
>> than 32bit. Is there a portable way to represent the integer literal
>> with HOST_WIDE_TYPE?
>
> Sure:
>
> (HOST_WIDE_INT) 1 << i
>

Yes, of course ;)

David
>        Jakub
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 15:06                                                   ` H.J. Lu
@ 2010-07-30 16:50                                                     ` Xinliang David Li
  2010-07-30 18:28                                                       ` Bernd Schmidt
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 16:50 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

Ok -- the output of the test depends on ivopt and therefore is target
dependent. I fixed it to make it target independent now. Please try
again.

Thanks,

David

On Fri, Jul 30, 2010 at 7:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>>
>>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>>
>>>>>
>>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>>> (< +/-2%):
>>>>>
>>>>> 410.bwaves      10.0%
>>>>> 434.zeusmp      6.6%
>>>>>
>>>>> One thing I did notice however is that comparing these results to the run I
>>>>> did back in May on an earlier version of the patch is that both
>>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>>> don't have the old builds around, but could recreate if you're not aware of
>>>>> anything to explain the drop.
>>>>>
>>>>
>>>> Thanks. I will check in this version first and do some triaging on the
>>>> performance drop (with your help).  One thing to be aware is that
>>>> r161844 was checked in during this period of time which might be
>>>> related, but not sure until further investigation -- the two stage
>>>> initial iv set computation introduced by the patch may not be needed
>>>> (if this patch is in).
>>>>
>>>
>>> Your checkin caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>>
>>
>> This also caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>>
>
> This bug isn't fixed. David, did you verify your change by
> running "make check"?
>
> Thanks.
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 15:56                                                                 ` H.J. Lu
@ 2010-07-30 16:58                                                                   ` Xinliang David Li
  2010-07-30 17:07                                                                     ` Xinliang David Li
  2010-07-30 17:01                                                                   ` Xinliang David Li
  1 sibling, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 16:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

This looks fine to me -- Zdenek or other reviewers --- is this one ok?

Thanks,

David

On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> It looks strange:
>>
>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>> -      for (i = start; i <= 1 << 20; i <<= 1)
>> +      for (i = 1; i < width; i++)
>>        {
>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>> +          HOST_WIDE_INT offset = (1ll << i);
>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>            break;
>>        }
>>
>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>
>> HOST_WIDE_INT offset = -(1ll << i);
>>
>> may have different values. The whole function looks odd to me.
>>
>>
>
> Here is a different approach to check address overflow.
>
>
> --
> H.J.
> --
> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR bootstrap/45119
>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>        162652.  Check address overflow.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 15:56                                                                 ` H.J. Lu
  2010-07-30 16:58                                                                   ` Xinliang David Li
@ 2010-07-30 17:01                                                                   ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 17:01 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

This looks fine to me -- Zdenek or other reviewers --- is this one ok?

Thanks,

David

On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> It looks strange:
>>
>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>> -      for (i = start; i <= 1 << 20; i <<= 1)
>> +      for (i = 1; i < width; i++)
>>        {
>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>> +          HOST_WIDE_INT offset = (1ll << i);
>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>            break;
>>        }
>>
>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>
>> HOST_WIDE_INT offset = -(1ll << i);
>>
>> may have different values. The whole function looks odd to me.
>>
>>
>
> Here is a different approach to check address overflow.
>
>
> --
> H.J.
> --
> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>
>        PR bootstrap/45119
>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>        162652.  Check address overflow.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 16:58                                                                   ` Xinliang David Li
@ 2010-07-30 17:07                                                                     ` Xinliang David Li
  2010-07-30 17:43                                                                       ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 17:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

There is a problem in this patch -- when i wraps to zero and terminate
the loop, the maxoffset computed will be zero which is wrong.

My previous patch won't have this problem.

David

On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>
> Thanks,
>
> David
>
> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> It looks strange:
>>>
>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>> +      for (i = 1; i < width; i++)
>>>        {
>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>> +          HOST_WIDE_INT offset = (1ll << i);
>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>            break;
>>>        }
>>>
>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>
>>> HOST_WIDE_INT offset = -(1ll << i);
>>>
>>> may have different values. The whole function looks odd to me.
>>>
>>>
>>
>> Here is a different approach to check address overflow.
>>
>>
>> --
>> H.J.
>> --
>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>
>>        PR bootstrap/45119
>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>        162652.  Check address overflow.
>>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 17:07                                                                     ` Xinliang David Li
@ 2010-07-30 17:43                                                                       ` H.J. Lu
  2010-07-30 18:10                                                                         ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-07-30 17:43 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 1986 bytes --]

On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
> There is a problem in this patch -- when i wraps to zero and terminate
> the loop, the maxoffset computed will be zero which is wrong.
>
> My previous patch won't have this problem.

Your patch changed the start offset.  Here is the updated patch.


H.J.
>
> David
>
> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>
>> Thanks,
>>
>> David
>>
>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> It looks strange:
>>>>
>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>> +      for (i = 1; i < width; i++)
>>>>        {
>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>            break;
>>>>        }
>>>>
>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>
>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>
>>>> may have different values. The whole function looks odd to me.
>>>>
>>>>
>>>
>>> Here is a different approach to check address overflow.
>>>
>>>
>>> --
>>> H.J.
>>> --
>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>
>>>        PR bootstrap/45119
>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>        162652.  Check address overflow.
>>>
>>
>



-- 
H.J.

[-- Attachment #2: gcc-pr45119-3.patch --]
[-- Type: text/x-csrc, Size: 2483 bytes --]

2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
	162652.  Check address overflow.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d65b4a..e9016c3 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3242,8 +3242,8 @@ get_address_cost (bool symbol_present, bool var_present,
     {
       HOST_WIDE_INT i;
       HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off, max_off, last_off;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3252,40 @@ get_address_cost (bool symbol_present, bool var_present,
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) < HOST_BITS_PER_WIDE_INT - 1)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
+      max_off = (HOST_WIDE_INT) 1 << width;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+      last_off = start;
+      for (i = start; i && i <= max_off; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
+	  last_off = i;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
+      data->max_offset = last_off;
       off = data->max_offset;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      last_off = -start;
+      for (i = start; i && i <= max_off; i <<= 1)
 	{
 	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
 	  if (!memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
+	  last_off = -i;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->min_offset = last_off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 17:43                                                                       ` H.J. Lu
@ 2010-07-30 18:10                                                                         ` Xinliang David Li
  2010-07-30 18:57                                                                           ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 18:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

Why is start offset not 1 to begin with? Let's assume it is correct,
there are a couple of problems in this patch:

1) when the precision of the HOST_WIDE_INT is the same as the bitsize
of the address_mode, max_offset = (HOST_WIDE_INT) 1 << width will
produce a negative number
2) last_off should be initialized to 0 to match the original behavior
3) The i&& guard will make sure the loop terminates, but the offset
compuation will be wrong -- i<<1 will first overflows to a negative
number, then gets truncated to zero,  that means when this happens,
the last_off will be negative when the loop terminates.

David

On Fri, Jul 30, 2010 at 10:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
>> There is a problem in this patch -- when i wraps to zero and terminate
>> the loop, the maxoffset computed will be zero which is wrong.
>>
>> My previous patch won't have this problem.
>
> Your patch changed the start offset.  Here is the updated patch.
>
>
> H.J.
>>
>> David
>>
>> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>>
>>> Thanks,
>>>
>>> David
>>>
>>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> It looks strange:
>>>>>
>>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>>> +      for (i = 1; i < width; i++)
>>>>>        {
>>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>>            break;
>>>>>        }
>>>>>
>>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>>
>>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>>
>>>>> may have different values. The whole function looks odd to me.
>>>>>
>>>>>
>>>>
>>>> Here is a different approach to check address overflow.
>>>>
>>>>
>>>> --
>>>> H.J.
>>>> --
>>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>>
>>>>        PR bootstrap/45119
>>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>>        162652.  Check address overflow.
>>>>
>>>
>>
>
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 16:50                                                     ` Xinliang David Li
@ 2010-07-30 18:28                                                       ` Bernd Schmidt
  2010-07-30 18:34                                                         ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Bernd Schmidt @ 2010-07-30 18:28 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: H.J. Lu, Pat Haugen, GCC Patches, Zdenek Dvorak

On 07/30/2010 06:45 PM, Xinliang David Li wrote:
> Ok -- the output of the test depends on ivopt and therefore is target
> dependent. I fixed it to make it target independent now. Please try
> again.

Please post all patches you commit.


Bernd

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 18:28                                                       ` Bernd Schmidt
@ 2010-07-30 18:34                                                         ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 18:34 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: H.J. Lu, Pat Haugen, GCC Patches, Zdenek Dvorak

See below.

David


Index: c-c++-common/uninit-17.c
===================================================================
--- c-c++-common/uninit-17.c    (revision 162719)
+++ c-c++-common/uninit-17.c    (revision 162720)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Wuninitialized" } */
+/* { dg-options "-O2 -Wuninitialized -fno-ivopts" } */

 inline int foo(int x)
 {


On Fri, Jul 30, 2010 at 11:06 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 07/30/2010 06:45 PM, Xinliang David Li wrote:
>> Ok -- the output of the test depends on ivopt and therefore is target
>> dependent. I fixed it to make it target independent now. Please try
>> again.
>
> Please post all patches you commit.
>
>
> Bernd
>
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 18:10                                                                         ` Xinliang David Li
@ 2010-07-30 18:57                                                                           ` H.J. Lu
  2010-07-30 21:04                                                                             ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-07-30 18:57 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 3116 bytes --]

On Fri, Jul 30, 2010 at 10:54 AM, Xinliang David Li <davidxl@google.com> wrote:
> Why is start offset not 1 to begin with? Let's assume it is correct,
> there are a couple of problems in this patch:
>
> 1) when the precision of the HOST_WIDE_INT is the same as the bitsize
> of the address_mode, max_offset = (HOST_WIDE_INT) 1 << width will
> produce a negative number
> 2) last_off should be initialized to 0 to match the original behavior
> 3) The i&& guard will make sure the loop terminates, but the offset
> compuation will be wrong -- i<<1 will first overflows to a negative
> number, then gets truncated to zero,  that means when this happens,
> the last_off will be negative when the loop terminates.
>
> David

I don't know exactly what get_address_cost is supposed to do. Here is
a new patch which avoids overflow and speeds up finding max/min offsets.


H.J.
---
>
> On Fri, Jul 30, 2010 at 10:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> There is a problem in this patch -- when i wraps to zero and terminate
>>> the loop, the maxoffset computed will be zero which is wrong.
>>>
>>> My previous patch won't have this problem.
>>
>> Your patch changed the start offset.  Here is the updated patch.
>>
>>
>> H.J.
>>>
>>> David
>>>
>>> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>>>
>>>> Thanks,
>>>>
>>>> David
>>>>
>>>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> It looks strange:
>>>>>>
>>>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>>>> +      for (i = 1; i < width; i++)
>>>>>>        {
>>>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>>>            break;
>>>>>>        }
>>>>>>
>>>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>>>
>>>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>>>
>>>>>> may have different values. The whole function looks odd to me.
>>>>>>
>>>>>>
>>>>>
>>>>> Here is a different approach to check address overflow.
>>>>>
>>>>>
>>>>> --
>>>>> H.J.
>>>>> --
>>>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>>>
>>>>>        PR bootstrap/45119
>>>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>>>        162652.  Check address overflow.
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> H.J.
>>
>



-- 
H.J.

[-- Attachment #2: gcc-pr45119-4.patch --]
[-- Type: text/plain, Size: 2569 bytes --]

2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
	162652.  Avoid address overflow.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d65b4a..b5ab63e 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, bool var_present,
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,37 @@ get_address_cost (bool symbol_present, bool var_present,
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = (GET_MODE_BITSIZE (address_mode) < HOST_BITS_PER_WIDE_INT - 1)
+          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = off;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 18:57                                                                           ` H.J. Lu
@ 2010-07-30 21:04                                                                             ` H.J. Lu
  2010-07-30 21:13                                                                               ` Xinliang David Li
  2010-08-02 21:23                                                                               ` Xinliang David Li
  0 siblings, 2 replies; 100+ messages in thread
From: H.J. Lu @ 2010-07-30 21:04 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

[-- Attachment #1: Type: text/plain, Size: 4230 bytes --]

On Fri, Jul 30, 2010 at 11:46 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jul 30, 2010 at 10:54 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Why is start offset not 1 to begin with? Let's assume it is correct,
>> there are a couple of problems in this patch:
>>
>> 1) when the precision of the HOST_WIDE_INT is the same as the bitsize
>> of the address_mode, max_offset = (HOST_WIDE_INT) 1 << width will
>> produce a negative number
>> 2) last_off should be initialized to 0 to match the original behavior
>> 3) The i&& guard will make sure the loop terminates, but the offset
>> compuation will be wrong -- i<<1 will first overflows to a negative
>> number, then gets truncated to zero,  that means when this happens,
>> the last_off will be negative when the loop terminates.
>>
>> David
>
> I don't know exactly what get_address_cost is supposed to do. Here is
> a new patch which avoids overflow and speeds up finding max/min offsets.
>


The code is wrong for -m32 on 64bit host. We should start with
the maximum and minimum offsets like:

      width = GET_MODE_BITSIZE (address_mode) - 1;
      if (width > (HOST_BITS_PER_WIDE_INT - 1))
        width = HOST_BITS_PER_WIDE_INT - 1;
      addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);

      for (i = width; i; i--)
        {
          off = -((HOST_WIDE_INT) 1 << i);
          XEXP (addr, 1) = gen_int_mode (off, address_mode);
          if (memory_address_addr_space_p (mem_mode, addr, as))
            break;
        }
      data->min_offset = off;

      for (i = width; i; i--)
        {
          off = ((HOST_WIDE_INT) 1 << i) - 1;
          XEXP (addr, 1) = gen_int_mode (off, address_mode);
          if (memory_address_addr_space_p (mem_mode, addr, as))
            break;
        }
      data->max_offset = off;

Here is the updated patch.


H.J.
---
> H.J.
> ---
>>
>> On Fri, Jul 30, 2010 at 10:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>> There is a problem in this patch -- when i wraps to zero and terminate
>>>> the loop, the maxoffset computed will be zero which is wrong.
>>>>
>>>> My previous patch won't have this problem.
>>>
>>> Your patch changed the start offset.  Here is the updated patch.
>>>
>>>
>>> H.J.
>>>>
>>>> David
>>>>
>>>> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> David
>>>>>
>>>>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> It looks strange:
>>>>>>>
>>>>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>>>>> +      for (i = 1; i < width; i++)
>>>>>>>        {
>>>>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>>>>            break;
>>>>>>>        }
>>>>>>>
>>>>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>>>>
>>>>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>>>>
>>>>>>> may have different values. The whole function looks odd to me.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Here is a different approach to check address overflow.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> H.J.
>>>>>> --
>>>>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>>>>
>>>>>>        PR bootstrap/45119
>>>>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>>>>        162652.  Check address overflow.
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> H.J.
>>>
>>
>
>
>
> --
> H.J.
>



-- 
H.J.

[-- Attachment #2: gcc-pr45119-5.patch --]
[-- Type: text/plain, Size: 2555 bytes --]

2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>

	PR bootstrap/45119
	* tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
	162652.  Avoid address overflow.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1d65b4a..b47acc7 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, bool var_present,
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,38 @@ get_address_cost (bool symbol_present, bool var_present,
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = off;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 21:04                                                                             ` H.J. Lu
@ 2010-07-30 21:13                                                                               ` Xinliang David Li
  2010-08-02 21:23                                                                               ` Xinliang David Li
  1 sibling, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-07-30 21:13 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Richard Guenther, Pat Haugen, GCC Patches, Zdenek Dvorak

This final version looks good to me -- and it is more precise.

Thanks,

David

On Fri, Jul 30, 2010 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jul 30, 2010 at 11:46 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Jul 30, 2010 at 10:54 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> Why is start offset not 1 to begin with? Let's assume it is correct,
>>> there are a couple of problems in this patch:
>>>
>>> 1) when the precision of the HOST_WIDE_INT is the same as the bitsize
>>> of the address_mode, max_offset = (HOST_WIDE_INT) 1 << width will
>>> produce a negative number
>>> 2) last_off should be initialized to 0 to match the original behavior
>>> 3) The i&& guard will make sure the loop terminates, but the offset
>>> compuation will be wrong -- i<<1 will first overflows to a negative
>>> number, then gets truncated to zero,  that means when this happens,
>>> the last_off will be negative when the loop terminates.
>>>
>>> David
>>
>> I don't know exactly what get_address_cost is supposed to do. Here is
>> a new patch which avoids overflow and speeds up finding max/min offsets.
>>
>
>
> The code is wrong for -m32 on 64bit host. We should start with
> the maximum and minimum offsets like:
>
>      width = GET_MODE_BITSIZE (address_mode) - 1;
>      if (width > (HOST_BITS_PER_WIDE_INT - 1))
>        width = HOST_BITS_PER_WIDE_INT - 1;
>      addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>
>      for (i = width; i; i--)
>        {
>          off = -((HOST_WIDE_INT) 1 << i);
>          XEXP (addr, 1) = gen_int_mode (off, address_mode);
>          if (memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>      data->min_offset = off;
>
>      for (i = width; i; i--)
>        {
>          off = ((HOST_WIDE_INT) 1 << i) - 1;
>          XEXP (addr, 1) = gen_int_mode (off, address_mode);
>          if (memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>      data->max_offset = off;
>
> Here is the updated patch.
>
>
> H.J.
> ---
>> H.J.
>> ---
>>>
>>> On Fri, Jul 30, 2010 at 10:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> There is a problem in this patch -- when i wraps to zero and terminate
>>>>> the loop, the maxoffset computed will be zero which is wrong.
>>>>>
>>>>> My previous patch won't have this problem.
>>>>
>>>> Your patch changed the start offset.  Here is the updated patch.
>>>>
>>>>
>>>> H.J.
>>>>>
>>>>> David
>>>>>
>>>>> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>> It looks strange:
>>>>>>>>
>>>>>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>>>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>>>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>>>>>> +      for (i = 1; i < width; i++)
>>>>>>>>        {
>>>>>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>>>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>>>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>>>>>            break;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>>>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>>>>>
>>>>>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>>>>>
>>>>>>>> may have different values. The whole function looks odd to me.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Here is a different approach to check address overflow.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> H.J.
>>>>>>> --
>>>>>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>>>>>
>>>>>>>        PR bootstrap/45119
>>>>>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>>>>>        162652.  Check address overflow.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> H.J.
>>>>
>>>
>>
>>
>>
>> --
>> H.J.
>>
>
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-30 21:04                                                                             ` H.J. Lu
  2010-07-30 21:13                                                                               ` Xinliang David Li
@ 2010-08-02 21:23                                                                               ` Xinliang David Li
  2010-08-09  8:44                                                                                 ` Zdenek Dvorak
  1 sibling, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-08-02 21:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Guenther, Pat Haugen, Zdenek Dvorak, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 4664 bytes --]

Compiler bootstrapped and tested with Lu's patch (with one minor
change to initialize off variable) (x86-64/linux) -- also checked dump
file that offsets are properly computed.

Ok for trunk?

Thanks,

David

On Fri, Jul 30, 2010 at 1:43 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Jul 30, 2010 at 11:46 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Fri, Jul 30, 2010 at 10:54 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> Why is start offset not 1 to begin with? Let's assume it is correct,
>>> there are a couple of problems in this patch:
>>>
>>> 1) when the precision of the HOST_WIDE_INT is the same as the bitsize
>>> of the address_mode, max_offset = (HOST_WIDE_INT) 1 << width will
>>> produce a negative number
>>> 2) last_off should be initialized to 0 to match the original behavior
>>> 3) The i&& guard will make sure the loop terminates, but the offset
>>> compuation will be wrong -- i<<1 will first overflows to a negative
>>> number, then gets truncated to zero,  that means when this happens,
>>> the last_off will be negative when the loop terminates.
>>>
>>> David
>>
>> I don't know exactly what get_address_cost is supposed to do. Here is
>> a new patch which avoids overflow and speeds up finding max/min offsets.
>>
>
>
> The code is wrong for -m32 on 64bit host. We should start with
> the maximum and minimum offsets like:
>
>      width = GET_MODE_BITSIZE (address_mode) - 1;
>      if (width > (HOST_BITS_PER_WIDE_INT - 1))
>        width = HOST_BITS_PER_WIDE_INT - 1;
>      addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>
>      for (i = width; i; i--)
>        {
>          off = -((HOST_WIDE_INT) 1 << i);
>          XEXP (addr, 1) = gen_int_mode (off, address_mode);
>          if (memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>      data->min_offset = off;
>
>      for (i = width; i; i--)
>        {
>          off = ((HOST_WIDE_INT) 1 << i) - 1;
>          XEXP (addr, 1) = gen_int_mode (off, address_mode);
>          if (memory_address_addr_space_p (mem_mode, addr, as))
>            break;
>        }
>      data->max_offset = off;
>
> Here is the updated patch.
>
>
> H.J.
> ---
>> H.J.
>> ---
>>>
>>> On Fri, Jul 30, 2010 at 10:27 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Fri, Jul 30, 2010 at 9:58 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> There is a problem in this patch -- when i wraps to zero and terminate
>>>>> the loop, the maxoffset computed will be zero which is wrong.
>>>>>
>>>>> My previous patch won't have this problem.
>>>>
>>>> Your patch changed the start offset.  Here is the updated patch.
>>>>
>>>>
>>>> H.J.
>>>>>
>>>>> David
>>>>>
>>>>> On Fri, Jul 30, 2010 at 9:49 AM, Xinliang David Li <davidxl@google.com> wrote:
>>>>>> This looks fine to me -- Zdenek or other reviewers --- is this one ok?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>> On Fri, Jul 30, 2010 at 8:45 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>> On Thu, Jul 29, 2010 at 6:04 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>>> It looks strange:
>>>>>>>>
>>>>>>>> +      width = (GET_MODE_BITSIZE (address_mode) <  HOST_BITS_PER_WIDE_INT - 1)
>>>>>>>> +          ? GET_MODE_BITSIZE (address_mode) : HOST_BITS_PER_WIDE_INT - 1;
>>>>>>>>       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>>>>>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>>>>>>> +      for (i = 1; i < width; i++)
>>>>>>>>        {
>>>>>>>> -         XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>>>>>>> +          HOST_WIDE_INT offset = (1ll << i);
>>>>>>>> +         XEXP (addr, 1) = gen_int_mode (offset, address_mode);
>>>>>>>>          if (!memory_address_addr_space_p (mem_mode, addr, as))
>>>>>>>>            break;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> HOST_WIDE_INT may be long or long long. "1ll" isn't always correct.
>>>>>>>> I think width can be >= 31. Depending on HOST_WIDE_INT,
>>>>>>>>
>>>>>>>> HOST_WIDE_INT offset = -(1ll << i);
>>>>>>>>
>>>>>>>> may have different values. The whole function looks odd to me.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Here is a different approach to check address overflow.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> H.J.
>>>>>>> --
>>>>>>> 2010-07-29  H.J. Lu  <hongjiu.lu@intel.com>
>>>>>>>
>>>>>>>        PR bootstrap/45119
>>>>>>>        * tree-ssa-loop-ivopts.c (get_address_cost): Re-apply revision
>>>>>>>        162652.  Check address overflow.
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> H.J.
>>>>
>>>
>>
>>
>>
>> --
>> H.J.
>>
>
>
>
> --
> H.J.
>

[-- Attachment #2: address_offset2.p --]
[-- Type: text/x-pascal, Size: 2426 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162821)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off = 0;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,38 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = off;
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-02 21:23                                                                               ` Xinliang David Li
@ 2010-08-09  8:44                                                                                 ` Zdenek Dvorak
  2010-08-09 23:07                                                                                   ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Zdenek Dvorak @ 2010-08-09  8:44 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches, Richard Guenther, Pat Haugen, H.J. Lu

Hi,

> Compiler bootstrapped and tested with Lu's patch (with one minor
> change to initialize off variable) (x86-64/linux) -- also checked dump
> file that offsets are properly computed.

in case that no offsets are allowed (or more hypotetically, if only offsets of
+1 or -1 are allowed), the code below will set min_offset to -2 and max_offset
to +2, thus incorrectly extending the range of allowed offsets.

Zdenek

>        reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
>  
> +      width = GET_MODE_BITSIZE (address_mode) - 1;
> +      if (width > (HOST_BITS_PER_WIDE_INT - 1))
> +	width = HOST_BITS_PER_WIDE_INT - 1;
>        addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
> -      for (i = start; i <= 1 << 20; i <<= 1)
> +
> +      for (i = width; i; i--)
>  	{
> -	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
> -	  if (!memory_address_addr_space_p (mem_mode, addr, as))
> +	  off = -((HOST_WIDE_INT) 1 << i);
> +	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +	  if (memory_address_addr_space_p (mem_mode, addr, as))
>  	    break;
>  	}
> -      data->max_offset = i == start ? 0 : i >> 1;
> -      off = data->max_offset;
> +      data->min_offset = off;
>  
> -      for (i = start; i <= 1 << 20; i <<= 1)
> +      for (i = width; i; i--)
>  	{
> -	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
> -	  if (!memory_address_addr_space_p (mem_mode, addr, as))
> +	  off = ((HOST_WIDE_INT) 1 << i) - 1;
> +	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
> +	  if (memory_address_addr_space_p (mem_mode, addr, as))
>  	    break;
>  	}
> -      data->min_offset = i == start ? 0 : -(i >> 1);
> +      data->max_offset = off;
>  
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	{
>  	  fprintf (dump_file, "get_address_cost:\n");
> -	  fprintf (dump_file, "  min offset %s %d\n",
> +	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>  		   GET_MODE_NAME (mem_mode),
> -		   (int) data->min_offset);
> -	  fprintf (dump_file, "  max offset %s %d\n",
> +		   data->min_offset);
> +	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>  		   GET_MODE_NAME (mem_mode),
> -		   (int) data->max_offset);
> +		   data->max_offset);
>  	}
>  
>        rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-09  8:44                                                                                 ` Zdenek Dvorak
@ 2010-08-09 23:07                                                                                   ` Xinliang David Li
  2010-08-10  2:37                                                                                     ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-08-09 23:07 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches, Richard Guenther, Pat Haugen, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 2754 bytes --]

You are right. The attached is the revised version.  Ok this time
(after testing is done)?

Thanks,

David

On Mon, Aug 9, 2010 at 12:55 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
> Hi,
>
>> Compiler bootstrapped and tested with Lu's patch (with one minor
>> change to initialize off variable) (x86-64/linux) -- also checked dump
>> file that offsets are properly computed.
>
> in case that no offsets are allowed (or more hypotetically, if only offsets of
> +1 or -1 are allowed), the code below will set min_offset to -2 and max_offset
> to +2, thus incorrectly extending the range of allowed offsets.
>
> Zdenek
>
>>        reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
>>
>> +      width = GET_MODE_BITSIZE (address_mode) - 1;
>> +      if (width > (HOST_BITS_PER_WIDE_INT - 1))
>> +     width = HOST_BITS_PER_WIDE_INT - 1;
>>        addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>> -      for (i = start; i <= 1 << 20; i <<= 1)
>> +
>> +      for (i = width; i; i--)
>>       {
>> -       XEXP (addr, 1) = gen_int_mode (i, address_mode);
>> -       if (!memory_address_addr_space_p (mem_mode, addr, as))
>> +       off = -((HOST_WIDE_INT) 1 << i);
>> +       XEXP (addr, 1) = gen_int_mode (off, address_mode);
>> +       if (memory_address_addr_space_p (mem_mode, addr, as))
>>           break;
>>       }
>> -      data->max_offset = i == start ? 0 : i >> 1;
>> -      off = data->max_offset;
>> +      data->min_offset = off;
>>
>> -      for (i = start; i <= 1 << 20; i <<= 1)
>> +      for (i = width; i; i--)
>>       {
>> -       XEXP (addr, 1) = gen_int_mode (-i, address_mode);
>> -       if (!memory_address_addr_space_p (mem_mode, addr, as))
>> +       off = ((HOST_WIDE_INT) 1 << i) - 1;
>> +       XEXP (addr, 1) = gen_int_mode (off, address_mode);
>> +       if (memory_address_addr_space_p (mem_mode, addr, as))
>>           break;
>>       }
>> -      data->min_offset = i == start ? 0 : -(i >> 1);
>> +      data->max_offset = off;
>>
>>        if (dump_file && (dump_flags & TDF_DETAILS))
>>       {
>>         fprintf (dump_file, "get_address_cost:\n");
>> -       fprintf (dump_file, "  min offset %s %d\n",
>> +       fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>>                  GET_MODE_NAME (mem_mode),
>> -                (int) data->min_offset);
>> -       fprintf (dump_file, "  max offset %s %d\n",
>> +                data->min_offset);
>> +       fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>>                  GET_MODE_NAME (mem_mode),
>> -                (int) data->max_offset);
>> +                data->max_offset);
>>       }
>>
>>        rat = 1;
>
>

[-- Attachment #2: address_offset3.p --]
[-- Type: text/x-pascal, Size: 2431 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162822)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
       HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,38 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = (i == -1? 0 : off);
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = (i == -1? 0 : off);
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-09 23:07                                                                                   ` Xinliang David Li
@ 2010-08-10  2:37                                                                                     ` Xinliang David Li
  2010-08-10 13:13                                                                                       ` Zdenek Dvorak
  2010-08-10 13:35                                                                                       ` H.J. Lu
  0 siblings, 2 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-08-10  2:37 UTC (permalink / raw)
  To: Zdenek Dvorak; +Cc: GCC Patches, Richard Guenther, Pat Haugen, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]

Wrong patch in the last email. Here is the one.

David

On Mon, Aug 9, 2010 at 3:54 PM, Xinliang David Li <davidxl@google.com> wrote:
> You are right. The attached is the revised version.  Ok this time
> (after testing is done)?
>
> Thanks,
>
> David
>
> On Mon, Aug 9, 2010 at 12:55 AM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
>> Hi,
>>
>>> Compiler bootstrapped and tested with Lu's patch (with one minor
>>> change to initialize off variable) (x86-64/linux) -- also checked dump
>>> file that offsets are properly computed.
>>
>> in case that no offsets are allowed (or more hypotetically, if only offsets of
>> +1 or -1 are allowed), the code below will set min_offset to -2 and max_offset
>> to +2, thus incorrectly extending the range of allowed offsets.
>>
>> Zdenek
>>
>>>        reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
>>>
>>> +      width = GET_MODE_BITSIZE (address_mode) - 1;
>>> +      if (width > (HOST_BITS_PER_WIDE_INT - 1))
>>> +     width = HOST_BITS_PER_WIDE_INT - 1;
>>>        addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>> +
>>> +      for (i = width; i; i--)
>>>       {
>>> -       XEXP (addr, 1) = gen_int_mode (i, address_mode);
>>> -       if (!memory_address_addr_space_p (mem_mode, addr, as))
>>> +       off = -((HOST_WIDE_INT) 1 << i);
>>> +       XEXP (addr, 1) = gen_int_mode (off, address_mode);
>>> +       if (memory_address_addr_space_p (mem_mode, addr, as))
>>>           break;
>>>       }
>>> -      data->max_offset = i == start ? 0 : i >> 1;
>>> -      off = data->max_offset;
>>> +      data->min_offset = off;
>>>
>>> -      for (i = start; i <= 1 << 20; i <<= 1)
>>> +      for (i = width; i; i--)
>>>       {
>>> -       XEXP (addr, 1) = gen_int_mode (-i, address_mode);
>>> -       if (!memory_address_addr_space_p (mem_mode, addr, as))
>>> +       off = ((HOST_WIDE_INT) 1 << i) - 1;
>>> +       XEXP (addr, 1) = gen_int_mode (off, address_mode);
>>> +       if (memory_address_addr_space_p (mem_mode, addr, as))
>>>           break;
>>>       }
>>> -      data->min_offset = i == start ? 0 : -(i >> 1);
>>> +      data->max_offset = off;
>>>
>>>        if (dump_file && (dump_flags & TDF_DETAILS))
>>>       {
>>>         fprintf (dump_file, "get_address_cost:\n");
>>> -       fprintf (dump_file, "  min offset %s %d\n",
>>> +       fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>>>                  GET_MODE_NAME (mem_mode),
>>> -                (int) data->min_offset);
>>> -       fprintf (dump_file, "  max offset %s %d\n",
>>> +                data->min_offset);
>>> +       fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
>>>                  GET_MODE_NAME (mem_mode),
>>> -                (int) data->max_offset);
>>> +                data->max_offset);
>>>       }
>>>
>>>        rat = 1;
>>
>>
>

[-- Attachment #2: address_offset3.p --]
[-- Type: text/x-pascal, Size: 2500 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162822)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off = 0;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,39 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = (i == -1? 0 : off);
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = (i == -1? 0 : off);
+      offset = data->max_offset;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10  2:37                                                                                     ` Xinliang David Li
@ 2010-08-10 13:13                                                                                       ` Zdenek Dvorak
  2010-08-10 13:35                                                                                       ` H.J. Lu
  1 sibling, 0 replies; 100+ messages in thread
From: Zdenek Dvorak @ 2010-08-10 13:13 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: GCC Patches, Richard Guenther, Pat Haugen, H.J. Lu

Hi,

> Wrong patch in the last email. Here is the one.
> 
> David
> 
> On Mon, Aug 9, 2010 at 3:54 PM, Xinliang David Li <davidxl@google.com> wrote:
> > You are right. The attached is the revised version. Â Ok this time
> > (after testing is done)?

OK,

Zdenek

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10  2:37                                                                                     ` Xinliang David Li
  2010-08-10 13:13                                                                                       ` Zdenek Dvorak
@ 2010-08-10 13:35                                                                                       ` H.J. Lu
  2010-08-10 14:18                                                                                         ` H.J. Lu
  1 sibling, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-08-10 13:35 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

On Mon, Aug 9, 2010 at 4:47 PM, Xinliang David Li <davidxl@google.com> wrote:
> Wrong patch in the last email. Here is the one.
>

You changed the code from setting "off" to setting "offset":

-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = (i == -1? 0 : off);
+      offset = data->max_offset;

"off" is used later:

3345               if (off_p)
3346                 base = gen_rtx_fmt_e (CONST, address_mode,
3347                                       gen_rtx_fmt_ee
3348                                         (PLUS, address_mode, base,
3349                                          gen_int_mode (off,
address_mode)))     ;
3350             }
3351           else if (off_p)
3352             base = gen_int_mode (off, address_mode);
3353           else

You can just add

off = 0;

before the loop. Then you can use

data->min_offset = off;
data->max_offset = off;

after the loop. It is faster.


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 13:35                                                                                       ` H.J. Lu
@ 2010-08-10 14:18                                                                                         ` H.J. Lu
  2010-08-10 16:31                                                                                           ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-08-10 14:18 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

On Tue, Aug 10, 2010 at 6:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 9, 2010 at 4:47 PM, Xinliang David Li <davidxl@google.com> wrote:
>> Wrong patch in the last email. Here is the one.
>>
>
> You changed the code from setting "off" to setting "offset":
>
> -      data->min_offset = i == start ? 0 : -(i >> 1);
> +      data->max_offset = (i == -1? 0 : off);
> +      offset = data->max_offset;
>
> "off" is used later:
>
> 3345               if (off_p)
> 3346                 base = gen_rtx_fmt_e (CONST, address_mode,
> 3347                                       gen_rtx_fmt_ee
> 3348                                         (PLUS, address_mode, base,
> 3349                                          gen_int_mode (off,
> address_mode)))     ;
> 3350             }
> 3351           else if (off_p)
> 3352             base = gen_int_mode (off, address_mode);
> 3353           else
>
> You can just add
>
> off = 0;
>
> before the loop. Then you can use
>
> data->min_offset = off;
> data->max_offset = off;
>
> after the loop. It is faster.
>

Never mind this comment. But "off" is different from before.



-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 14:18                                                                                         ` H.J. Lu
@ 2010-08-10 16:31                                                                                           ` Xinliang David Li
  2010-08-10 16:38                                                                                             ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-08-10 16:31 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

Yes -- fixed the typo. Will retest and then commit.

THanks,

David

On Tue, Aug 10, 2010 at 7:12 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 6:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 9, 2010 at 4:47 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> Wrong patch in the last email. Here is the one.
>>>
>>
>> You changed the code from setting "off" to setting "offset":
>>
>> -      data->min_offset = i == start ? 0 : -(i >> 1);
>> +      data->max_offset = (i == -1? 0 : off);
>> +      offset = data->max_offset;
>>
>> "off" is used later:
>>
>> 3345               if (off_p)
>> 3346                 base = gen_rtx_fmt_e (CONST, address_mode,
>> 3347                                       gen_rtx_fmt_ee
>> 3348                                         (PLUS, address_mode, base,
>> 3349                                          gen_int_mode (off,
>> address_mode)))     ;
>> 3350             }
>> 3351           else if (off_p)
>> 3352             base = gen_int_mode (off, address_mode);
>> 3353           else
>>
>> You can just add
>>
>> off = 0;
>>
>> before the loop. Then you can use
>>
>> data->min_offset = off;
>> data->max_offset = off;
>>
>> after the loop. It is faster.
>>
>
> Never mind this comment. But "off" is different from before.
>
>
>
> --
> H.J.
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 16:31                                                                                           ` Xinliang David Li
@ 2010-08-10 16:38                                                                                             ` H.J. Lu
  2010-08-10 17:13                                                                                               ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-08-10 16:38 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

On Tue, Aug 10, 2010 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
> Yes -- fixed the typo. Will retest and then commit.
>

Can you post your patch?

Thanks.


> THanks,
>
> David
>
> On Tue, Aug 10, 2010 at 7:12 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Aug 10, 2010 at 6:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Mon, Aug 9, 2010 at 4:47 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> Wrong patch in the last email. Here is the one.
>>>>
>>>
>>> You changed the code from setting "off" to setting "offset":
>>>
>>> -      data->min_offset = i == start ? 0 : -(i >> 1);
>>> +      data->max_offset = (i == -1? 0 : off);
>>> +      offset = data->max_offset;
>>>
>>> "off" is used later:
>>>
>>> 3345               if (off_p)
>>> 3346                 base = gen_rtx_fmt_e (CONST, address_mode,
>>> 3347                                       gen_rtx_fmt_ee
>>> 3348                                         (PLUS, address_mode, base,
>>> 3349                                          gen_int_mode (off,
>>> address_mode)))     ;
>>> 3350             }
>>> 3351           else if (off_p)
>>> 3352             base = gen_int_mode (off, address_mode);
>>> 3353           else
>>>
>>> You can just add
>>>
>>> off = 0;
>>>
>>> before the loop. Then you can use
>>>
>>> data->min_offset = off;
>>> data->max_offset = off;
>>>
>>> after the loop. It is faster.
>>>
>>
>> Never mind this comment. But "off" is different from before.
>>
>>
>>
>> --
>> H.J.
>>
>



-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 16:38                                                                                             ` H.J. Lu
@ 2010-08-10 17:13                                                                                               ` Xinliang David Li
  2010-08-10 17:26                                                                                                 ` H.J. Lu
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-08-10 17:13 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

[-- Attachment #1: Type: text/plain, Size: 1733 bytes --]

see attached.

David

On Tue, Aug 10, 2010 at 9:36 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 9:27 AM, Xinliang David Li <davidxl@google.com> wrote:
>> Yes -- fixed the typo. Will retest and then commit.
>>
>
> Can you post your patch?
>
> Thanks.
>
>
>> THanks,
>>
>> David
>>
>> On Tue, Aug 10, 2010 at 7:12 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Aug 10, 2010 at 6:16 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Mon, Aug 9, 2010 at 4:47 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>>> Wrong patch in the last email. Here is the one.
>>>>>
>>>>
>>>> You changed the code from setting "off" to setting "offset":
>>>>
>>>> -      data->min_offset = i == start ? 0 : -(i >> 1);
>>>> +      data->max_offset = (i == -1? 0 : off);
>>>> +      offset = data->max_offset;
>>>>
>>>> "off" is used later:
>>>>
>>>> 3345               if (off_p)
>>>> 3346                 base = gen_rtx_fmt_e (CONST, address_mode,
>>>> 3347                                       gen_rtx_fmt_ee
>>>> 3348                                         (PLUS, address_mode, base,
>>>> 3349                                          gen_int_mode (off,
>>>> address_mode)))     ;
>>>> 3350             }
>>>> 3351           else if (off_p)
>>>> 3352             base = gen_int_mode (off, address_mode);
>>>> 3353           else
>>>>
>>>> You can just add
>>>>
>>>> off = 0;
>>>>
>>>> before the loop. Then you can use
>>>>
>>>> data->min_offset = off;
>>>> data->max_offset = off;
>>>>
>>>> after the loop. It is faster.
>>>>
>>>
>>> Never mind this comment. But "off" is different from before.
>>>
>>>
>>>
>>> --
>>> H.J.
>>>
>>
>
>
>
> --
> H.J.
>

[-- Attachment #2: address_offset4.p --]
[-- Type: text/x-pascal, Size: 2497 bytes --]

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 162822)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off = 0;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,39 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = (i == -1? 0 : off);
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      data->max_offset = (i == -1? 0 : off);
+      off = data->max_offset;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 17:13                                                                                               ` Xinliang David Li
@ 2010-08-10 17:26                                                                                                 ` H.J. Lu
  2010-08-10 17:42                                                                                                   ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: H.J. Lu @ 2010-08-10 17:26 UTC (permalink / raw)
  To: Xinliang David Li
  Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

On Tue, Aug 10, 2010 at 10:09 AM, Xinliang David Li <davidxl@google.com> wrote:
> see attached.
>
> David
>

You have

+      data->max_offset = (i == -1? 0 : off);
+      off = data->max_offset;

if (i == -1)
  off = 0;
data->max_offset; = off;

may  avoid one memory access.

-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 17:26                                                                                                 ` H.J. Lu
@ 2010-08-10 17:42                                                                                                   ` Xinliang David Li
  2010-08-11  0:45                                                                                                     ` Xinliang David Li
  0 siblings, 1 reply; 100+ messages in thread
From: Xinliang David Li @ 2010-08-10 17:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

[-- Attachment #1: Type: text/plain, Size: 459 bytes --]

Ok, if you insist on the perfection :)

David

On Tue, Aug 10, 2010 at 10:13 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Tue, Aug 10, 2010 at 10:09 AM, Xinliang David Li <davidxl@google.com> wrote:
>> see attached.
>>
>> David
>>
>
> You have
>
> +      data->max_offset = (i == -1? 0 : off);
> +      off = data->max_offset;
>
> if (i == -1)
>  off = 0;
> data->max_offset; = off;
>
> may  avoid one memory access.
>
> --
> H.J.
>

[-- Attachment #2: address_offset4.p --]
[-- Type: text/x-pascal, Size: 2477 bytes --]

Index: tree-ssa-loop-ivopts.c
===================================================================
--- tree-ssa-loop-ivopts.c	(revision 162822)
+++ tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off = 0;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,40 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = (i == -1? 0 : off);
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      if (i == -1)
+        off = 0;
+      data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-08-10 17:42                                                                                                   ` Xinliang David Li
@ 2010-08-11  0:45                                                                                                     ` Xinliang David Li
  0 siblings, 0 replies; 100+ messages in thread
From: Xinliang David Li @ 2010-08-11  0:45 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Zdenek Dvorak, GCC Patches, Richard Guenther, Pat Haugen

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

The committed patch also include a fix to a test case (make it more robust).

David

On Tue, Aug 10, 2010 at 10:26 AM, Xinliang David Li <davidxl@google.com> wrote:
> Ok, if you insist on the perfection :)
>
> David
>
> On Tue, Aug 10, 2010 at 10:13 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Tue, Aug 10, 2010 at 10:09 AM, Xinliang David Li <davidxl@google.com> wrote:
>>> see attached.
>>>
>>> David
>>>
>>
>> You have
>>
>> +      data->max_offset = (i == -1? 0 : off);
>> +      off = data->max_offset;
>>
>> if (i == -1)
>>  off = 0;
>> data->max_offset; = off;
>>
>> may  avoid one memory access.
>>
>> --
>> H.J.
>>
>

[-- Attachment #2: address_offset5.p --]
[-- Type: text/x-pascal, Size: 3883 bytes --]

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 163079)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2010-08-10  Xinliang David Li  <davidxl@google.com>
+
+	* tree-ssa-loop-ivopts.c (get_address_cost): Properly
+	compute max/min offset in address.
+
 2010-08-10  Nathan Froyd  <froydnj@codesourcery.com>
 
 	* coverage.c (ctr_labels): Delete.
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-19.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-19.c	(revision 163079)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-19.c	(working copy)
@@ -6,7 +6,7 @@
 
 /* { dg-do compile { target { i?86-*-* || { x86_64-*-* || powerpc_hard_double } } } } */
 /* { dg-require-effective-target nonpic } */
-/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-options "-O3 -fno-prefetch-loop-arrays -fdump-tree-optimized" } */
 
 # define N      2000000
 static double   a[N],c[N];
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog	(revision 163079)
+++ gcc/testsuite/ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2010-08-10  xinliang David Li  <davidxl@google.com>
+	* gcc.dg/tree-ssa/loop-19.c: Add option
+	-fno-prefetch-loop-array
+
 2010-08-10  Bernd Schmidt  <bernds@codesourcery.com>
 
 	PR middle-end/45182
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 163079)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3241,9 +3241,8 @@ get_address_cost (bool symbol_present, b
   if (!data)
     {
       HOST_WIDE_INT i;
-      HOST_WIDE_INT start = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      HOST_WIDE_INT rat, off;
-      int old_cse_not_expected;
+      HOST_WIDE_INT rat, off = 0;
+      int old_cse_not_expected, width;
       unsigned sym_p, var_p, off_p, rat_p, add_c;
       rtx seq, addr, base;
       rtx reg0, reg1;
@@ -3252,33 +3251,40 @@ get_address_cost (bool symbol_present, b
 
       reg1 = gen_raw_REG (address_mode, LAST_VIRTUAL_REGISTER + 1);
 
+      width = GET_MODE_BITSIZE (address_mode) - 1;
+      if (width > (HOST_BITS_PER_WIDE_INT - 1))
+	width = HOST_BITS_PER_WIDE_INT - 1;
       addr = gen_rtx_fmt_ee (PLUS, address_mode, reg1, NULL_RTX);
-      for (i = start; i <= 1 << 20; i <<= 1)
+
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = -((HOST_WIDE_INT) 1 << i);
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->max_offset = i == start ? 0 : i >> 1;
-      off = data->max_offset;
+      data->min_offset = (i == -1? 0 : off);
 
-      for (i = start; i <= 1 << 20; i <<= 1)
+      for (i = width; i >= 0; i--)
 	{
-	  XEXP (addr, 1) = gen_int_mode (-i, address_mode);
-	  if (!memory_address_addr_space_p (mem_mode, addr, as))
+	  off = ((HOST_WIDE_INT) 1 << i) - 1;
+	  XEXP (addr, 1) = gen_int_mode (off, address_mode);
+	  if (memory_address_addr_space_p (mem_mode, addr, as))
 	    break;
 	}
-      data->min_offset = i == start ? 0 : -(i >> 1);
+      if (i == -1)
+        off = 0;
+      data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
 	  fprintf (dump_file, "get_address_cost:\n");
-	  fprintf (dump_file, "  min offset %s %d\n",
+	  fprintf (dump_file, "  min offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->min_offset);
-	  fprintf (dump_file, "  max offset %s %d\n",
+		   data->min_offset);
+	  fprintf (dump_file, "  max offset %s " HOST_WIDE_INT_PRINT_DEC "\n",
 		   GET_MODE_NAME (mem_mode),
-		   (int) data->max_offset);
+		   data->max_offset);
 	}
 
       rat = 1;

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 14:17                                                   ` H.J. Lu
  2010-07-29 17:00                                                     ` Xinliang David Li
@ 2010-10-28 19:28                                                     ` H.J. Lu
  2011-04-27 13:23                                                     ` H.J. Lu
  2 siblings, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2010-10-28 19:28 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 7:14 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>>
>>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>>
>>>>>
>>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>>> (< +/-2%):
>>>>>
>>>>> 410.bwaves      10.0%
>>>>> 434.zeusmp      6.6%
>>>>>
>>>>> One thing I did notice however is that comparing these results to the run I
>>>>> did back in May on an earlier version of the patch is that both
>>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>>> don't have the old builds around, but could recreate if you're not aware of
>>>>> anything to explain the drop.
>>>>>
>>>>
>>>> Thanks. I will check in this version first and do some triaging on the
>>>> performance drop (with your help).  One thing to be aware is that
>>>> r161844 was checked in during this period of time which might be
>>>> related, but not sure until further investigation -- the two stage
>>>> initial iv set computation introduced by the patch may not be needed
>>>> (if this patch is in).
>>>>
>>>
>>> Your checkin caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>>
>>
>> This also caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>>
>
> This may also cause:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45131
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46200

-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-05-30  0:22                                 ` Xinliang David Li
       [not found]                                   ` <20100604105451.GB5105@kam.mff.cuni.cz>
@ 2010-12-30 17:23                                   ` H.J. Lu
  1 sibling, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2010-12-30 17:23 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Zdenek Dvorak, GCC Patches

On Sat, May 29, 2010 at 3:20 PM, Xinliang David Li <davidxl@google.com> wrote:
> patch-1 ok for this revision?
>
> David
>
> On Sat, May 29, 2010 at 12:14 PM, Zdenek Dvorak <rakdver@kam.mff.cuni.cz> wrote:
>> Hi,
>>
>>> >> > Also, note that for loops with one exit, your change actually makes the test weaker.
>>> >> > For instance, before your change, we could deduce that
>>> >> >
>>> >> > int a[100];
>>> >> > for (i = 0; i < n; i++)
>>> >> >  a[i] = i;
>>> >> >
>>> >> > iterates at most 100 times.
>>> >>
>>> >> Fixed and added two test cases.
>>> >>
>>> >> (Note -- one more bug in the original code was found and fixed -- the
>>> >> period computation is wrong when step is not power of 2).
>>> >
>>> > that is wrong, the original computation is correct.  If step is (e.g.) odd,
>>> > then it takes (range of type) iterations before the variable achieves the same
>>> > value (that it overflows in the meantime several times does not matter, since
>>> > we are careful to use the type in that overflow has defined semantics, and
>>> > we test for equality in the replacement condition),
>>>
>>> The overflow semantics is indeed different -- it is also true for any
>>> iv cand with non zero base. The period is really LCM (type_range,
>>> step)/step - 1 --- the computation in original code matches this --
>>> but the comment seems wrong.
>>
>> yes, the comment needs to be fixed,
>>
>> Zdenek
>>
>

This patch caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47028


-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: IVOPT improvement patch
  2010-07-29 14:17                                                   ` H.J. Lu
  2010-07-29 17:00                                                     ` Xinliang David Li
  2010-10-28 19:28                                                     ` H.J. Lu
@ 2011-04-27 13:23                                                     ` H.J. Lu
  2 siblings, 0 replies; 100+ messages in thread
From: H.J. Lu @ 2011-04-27 13:23 UTC (permalink / raw)
  To: Xinliang David Li; +Cc: Pat Haugen, GCC Patches, Zdenek Dvorak

On Thu, Jul 29, 2010 at 7:14 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jul 28, 2010 at 8:50 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Wed, Jul 28, 2010 at 6:07 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Jul 27, 2010 at 1:20 PM, Xinliang David Li <davidxl@google.com> wrote:
>>>> On Tue, Jul 27, 2010 at 12:57 PM, Pat Haugen <pthaugen@us.ibm.com> wrote:
>>>>>>
>>>>>> Thanks Sebatian for testing it out. I also asked Pat to help testing
>>>>>> the patch again on powerpc. I will first split off the unrelated
>>>>>> patches and submit them first (e.g, multiple exit loop handling etc).
>>>>>>
>>>>>
>>>>> There were 2 good improvements on PowerPC, the rest were pretty much a wash
>>>>> (< +/-2%):
>>>>>
>>>>> 410.bwaves      10.0%
>>>>> 434.zeusmp      6.6%
>>>>>
>>>>> One thing I did notice however is that comparing these results to the run I
>>>>> did back in May on an earlier version of the patch is that both
>>>>> improvements dropped. bwaves was 27% on that run and zeusmp was 8.4%. I
>>>>> don't have the old builds around, but could recreate if you're not aware of
>>>>> anything to explain the drop.
>>>>>
>>>>
>>>> Thanks. I will check in this version first and do some triaging on the
>>>> performance drop (with your help).  One thing to be aware is that
>>>> r161844 was checked in during this period of time which might be
>>>> related, but not sure until further investigation -- the two stage
>>>> initial iv set computation introduced by the patch may not be needed
>>>> (if this patch is in).
>>>>
>>>
>>> Your checkin caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45119
>>>
>>
>> This also caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
>>
>
> This may also cause:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45131
>

This also caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48774

-- 
H.J.

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2011-04-27 12:44 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-11  6:35 IVOPT improvement patch Xinliang David Li
2010-05-11  7:18 ` Zdenek Dvorak
2010-05-11 17:29   ` Xinliang David Li
2010-05-25  0:17     ` Xinliang David Li
2010-05-25 10:46       ` Zdenek Dvorak
2010-05-25 17:39         ` Xinliang David Li
2010-05-25 18:25           ` Zdenek Dvorak
2010-05-25 23:30             ` Xinliang David Li
2010-05-26  2:35               ` Zdenek Dvorak
2010-05-26  3:17                 ` Xinliang David Li
2010-05-27  1:31                 ` Xinliang David Li
2010-05-27  9:12                   ` Zdenek Dvorak
2010-05-27 17:33                     ` Xinliang David Li
2010-05-28  9:14                       ` Zdenek Dvorak
2010-05-28 23:51                         ` Xinliang David Li
2010-05-29 16:57                           ` Zdenek Dvorak
2010-05-29 19:51                             ` Xinliang David Li
2010-05-29 20:18                               ` Zdenek Dvorak
2010-05-30  0:22                                 ` Xinliang David Li
     [not found]                                   ` <20100604105451.GB5105@kam.mff.cuni.cz>
2010-07-21  7:27                                     ` Xinliang David Li
2010-07-26 16:33                                       ` Sebastian Pop
2010-07-26 16:43                                         ` Xinliang David Li
2010-07-27 20:04                                           ` Pat Haugen
2010-07-27 20:25                                             ` Xinliang David Li
2010-07-29  3:50                                               ` H.J. Lu
2010-07-29  5:57                                                 ` H.J. Lu
2010-07-29  7:44                                                   ` Xinliang David Li
2010-07-29  8:28                                                     ` Zdenek Dvorak
2010-07-29 14:37                                                       ` H.J. Lu
2010-07-29 15:27                                                     ` H.J. Lu
2010-07-29 16:09                                                       ` H.J. Lu
2010-07-29 16:17                                                         ` Richard Guenther
2010-07-29 16:55                                                           ` H.J. Lu
2010-07-30  1:04                                                             ` Xinliang David Li
2010-07-30  2:06                                                               ` H.J. Lu
2010-07-30  5:41                                                                 ` Xinliang David Li
2010-07-30  7:19                                                                   ` Jakub Jelinek
2010-07-30 16:45                                                                     ` Xinliang David Li
2010-07-30 15:56                                                                 ` H.J. Lu
2010-07-30 16:58                                                                   ` Xinliang David Li
2010-07-30 17:07                                                                     ` Xinliang David Li
2010-07-30 17:43                                                                       ` H.J. Lu
2010-07-30 18:10                                                                         ` Xinliang David Li
2010-07-30 18:57                                                                           ` H.J. Lu
2010-07-30 21:04                                                                             ` H.J. Lu
2010-07-30 21:13                                                                               ` Xinliang David Li
2010-08-02 21:23                                                                               ` Xinliang David Li
2010-08-09  8:44                                                                                 ` Zdenek Dvorak
2010-08-09 23:07                                                                                   ` Xinliang David Li
2010-08-10  2:37                                                                                     ` Xinliang David Li
2010-08-10 13:13                                                                                       ` Zdenek Dvorak
2010-08-10 13:35                                                                                       ` H.J. Lu
2010-08-10 14:18                                                                                         ` H.J. Lu
2010-08-10 16:31                                                                                           ` Xinliang David Li
2010-08-10 16:38                                                                                             ` H.J. Lu
2010-08-10 17:13                                                                                               ` Xinliang David Li
2010-08-10 17:26                                                                                                 ` H.J. Lu
2010-08-10 17:42                                                                                                   ` Xinliang David Li
2010-08-11  0:45                                                                                                     ` Xinliang David Li
2010-07-30 17:01                                                                   ` Xinliang David Li
2010-07-29 16:11                                                       ` H.J. Lu
2010-07-29 14:17                                                   ` H.J. Lu
2010-07-29 17:00                                                     ` Xinliang David Li
2010-07-29 17:10                                                       ` H.J. Lu
2010-10-28 19:28                                                     ` H.J. Lu
2011-04-27 13:23                                                     ` H.J. Lu
2010-07-30 15:06                                                   ` H.J. Lu
2010-07-30 16:50                                                     ` Xinliang David Li
2010-07-30 18:28                                                       ` Bernd Schmidt
2010-07-30 18:34                                                         ` Xinliang David Li
2010-07-29  7:26                                                 ` Xinliang David Li
2010-12-30 17:23                                   ` H.J. Lu
2010-05-25 18:10       ` Toon Moene
2010-05-27  9:28       ` Zdenek Dvorak
2010-05-27 17:51         ` Xinliang David Li
2010-05-27 22:48           ` Zdenek Dvorak
2010-05-27 23:41             ` Xinliang David Li
2010-05-28  9:57       ` Zdenek Dvorak
2010-06-01 23:13         ` Xinliang David Li
2010-06-02 20:57           ` Zdenek Dvorak
2010-06-03  5:39             ` Xinliang David Li
2010-06-05  9:01       ` Zdenek Dvorak
2010-06-05 22:37         ` Xinliang David Li
2010-05-11  7:26 ` Steven Bosscher
2010-05-11 17:23   ` Xinliang David Li
2010-05-11  8:34 ` Richard Guenther
2010-05-11  9:48   ` Jan Hubicka
2010-05-11 10:04     ` Steven Bosscher
2010-05-11 14:24   ` Peter Bergner
2010-05-11 17:28   ` Xinliang David Li
2010-05-12  8:55     ` Richard Guenther
2010-05-11 17:19 ` Toon Moene
2010-05-11 17:49   ` Xinliang David Li
2010-05-11 21:52     ` Toon Moene
2010-05-11 22:31       ` Xinliang David Li
2010-05-11 22:44         ` Toon Moene
2010-05-13 13:00 ` Toon Moene
2010-05-13 13:30   ` Toon Moene
2010-05-13 16:23     ` Xinliang David Li
2010-05-14  4:26     ` Xinliang David Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).