public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] IVOPTS: support profiling
@ 2016-04-29 11:58 marxin
  2016-04-29 11:58 ` [PATCH 3/3] Enhance dumps of IVOPTS marxin
                   ` (3 more replies)
  0 siblings, 4 replies; 34+ messages in thread
From: marxin @ 2016-04-29 11:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: amker.cheng

Hello.

As profile-guided optimization can provide very useful information
about basic block frequencies within a loop, following patch set leverages
that information. It speeds up a single benchmark from upcoming SPECv6
suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
also improve others (currently measuring numbers for PGO).

Idea is quite simple, where each cost (belonging to a BB) is
multiplied by (bb_frequency / header_frequency), which suppress IV uses
in basic blocks with a low frequency.

The patch set can bootstrap on ppc64le-linux-gnu (and also
x86_64-linux-gnu) and no new regression is introduced.

Ready for trunk?
Thanks,
Martin

marxin (3):
  Encapsulate comp_cost within a class with methods.
  Add profiling support for IVOPTS
  Enhance dumps of IVOPTS

 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c               | 690 ++++++++++++++++++++++---------
 2 files changed, 491 insertions(+), 201 deletions(-)

-- 
2.8.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 3/3] Enhance dumps of IVOPTS
  2016-04-29 11:58 [PATCH 0/3] IVOPTS: support profiling marxin
@ 2016-04-29 11:58 ` marxin
  2016-05-06  9:19   ` Martin Liška
  2016-04-29 11:58 ` [PATCH 1/3] Encapsulate comp_cost within a class with methods marxin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: marxin @ 2016-04-29 11:58 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog:

2016-04-25  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add inv_expr_map.
	(tree_ssa_iv_optimize_init): Initialize it.
	(get_expr_id): Assign expressions to the map.
	(iv_ca_dump): Dump invariant expressions.
	(create_new_ivs): Dump # of inv. expressions and loop niter.
	(tree_ssa_iv_optimize_finalize): Release the newly added map.

gcc/testsuite/ChangeLog:

2016-04-29  Martin Liska  <mliska@suse.cz>

	* g++.dg/tree-ssa/ivopts-3.C: Change test-case to follow
	the new format of dump output.
---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |  2 +-
 gcc/tree-ssa-loop-ivopts.c               | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 6194e9d..eb72581 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index af00ff0..52c8184 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -666,6 +666,9 @@ struct ivopts_data
   /* The maximum invariant expression id.  */
   int max_inv_expr_id;
 
+  /* Dictionary of inv_expr with id used as a key.  */
+  vec<iv_inv_expr_ent *> inv_expr_map;
+
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
 
@@ -1186,6 +1189,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->important_candidates = BITMAP_ALLOC (NULL);
   data->max_inv_id = 0;
   data->niters = NULL;
+  data->inv_expr_map.create (20);
   data->vgroups.create (20);
   data->vcands.create (20);
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
@@ -4812,6 +4816,12 @@ get_expr_id (struct ivopts_data *data, tree expr)
   (*slot)->expr = expr;
   (*slot)->hash = ent.hash;
   (*slot)->id = data->max_inv_expr_id++;
+
+  unsigned id = (*slot)->id;
+  if (id + 1 >= data->inv_expr_map.length ())
+    data->inv_expr_map.safe_grow (id + 1);
+  data->inv_expr_map[id] = *slot;
+
   return (*slot)->id;
 }
 
@@ -6590,6 +6600,20 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
+
+  if (ivs->num_used_inv_expr)
+    {
+      fprintf (dump_file, "\n  used invariant expressions:\n");
+      for (int i = 0; i <= data->max_inv_expr_id; i++)
+	if (ivs->used_inv_expr[i])
+	  {
+	    fprintf (dump_file, "   inv_expr:%d: \t", i);
+	    print_generic_expr (dump_file, data->inv_expr_map[i]->expr,
+				TDF_SLIM);
+	    fprintf (dump_file, "\n");
+	  }
+    }
+
   fprintf (file, "\n\n");
 }
 
@@ -7251,6 +7275,9 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
       if (data->loop_loc != UNKNOWN_LOCATION)
 	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 		 LOCATION_LINE (data->loop_loc));
+      fprintf (dump_file, ", %lu avg niters",
+	       avg_loop_niter (data->current_loop));
+      fprintf (dump_file, ", %u expressions", set->num_used_inv_expr);
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
         {
@@ -7820,6 +7847,7 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   BITMAP_FREE (data->important_candidates);
 
   decl_rtl_to_reset.release ();
+  data->inv_expr_map.release ();
   data->vgroups.release ();
   data->vcands.release ();
   delete data->inv_expr_tab;
-- 
2.8.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-04-29 11:58 [PATCH 0/3] IVOPTS: support profiling marxin
  2016-04-29 11:58 ` [PATCH 3/3] Enhance dumps of IVOPTS marxin
@ 2016-04-29 11:58 ` marxin
  2016-05-16 10:14   ` Bin.Cheng
  2016-04-29 11:58 ` [PATCH 2/3] Add profiling support for IVOPTS marxin
  2016-05-03  9:28 ` [PATCH 0/3] IVOPTS: support profiling Bin.Cheng
  3 siblings, 1 reply; 34+ messages in thread
From: marxin @ 2016-04-29 11:58 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog:

2016-04-25  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c(comp_cost::operator=): New function.
	(comp_cost::infinite_cost_p): Likewise.
	(operator+): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator-): Likewise.
	(operator<): Likewise.
	(operator==): Likewise.
	(operator<=): Likewise.
	(comp_cost::get_cost): Likewise.
	(comp_cost::set_cost): Likewise.
	(comp_cost::get_complexity): Likewise.
	(comp_cost::set_complexity): Likewise.
	(comp_cost::get_scratch): Likewise.
	(comp_cost::set_scratch): Likewise.
	(comp_cost::get_infinite): Likewise.
	(comp_cost::get_no_cost): Likewise.
	(struct ivopts_data): Rename inv_expr_id to max_inv_expr_id;
	(tree_ssa_iv_optimize_init): Use the renamed property.
	(new_cost): Remove.
	(infinite_cost_p): Likewise.
	(add_costs): Likewise.
	(sub_costs): Likewise.
	(compare_costs): Likewise.
	(set_group_iv_cost): Use comp_cost::infinite_cost_p.
	(get_address_cost): Use new comp_cost::comp_cost.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Use new comp_cost::get_no_cost.
	(split_address_cost): Likewise.
	(ptr_difference_cost): Likewise.
	(difference_cost): Likewise.
	(get_expr_id): Use max_inv_expr_id.
	(get_computation_cost_at): Use comp_cost::get_infinite.
	(determine_group_iv_cost_generic): Use comp_cost::get_no_cost.
	(determine_group_iv_cost_address): Likewise.
	(determine_group_iv_cost_cond): Use comp_const::infinite_cost_p.
	(autoinc_possible_for_pair): Likewise.
	(determine_group_iv_costs): Use new methods of comp_cost.
	(determine_iv_cost): Likewise.
	(cheaper_cost_pair): Use comp_cost operators.
	(iv_ca_recount_cost): Likewise.
	(iv_ca_set_no_cp): Likewise.
	(iv_ca_set_cp): Likewise.
	(iv_ca_cost): Use comp_cost::get_infinite.
	(iv_ca_new): Use comp_cost::get_no_cost.
	(iv_ca_dump): Use new methods of comp_cost.
	(iv_ca_narrow): Use operators of comp_cost.
	(iv_ca_prune): Likewi.se
	(iv_ca_replace): Likewise.
	(try_add_cand_for): Likewise.
	(try_improve_iv_set): Likewise.
	(find_optimal_iv_set): Use new methods of comp_cost.
	(free_loop_data): Use renamed max_inv_expr_id.
---
 gcc/tree-ssa-loop-ivopts.c | 548 +++++++++++++++++++++++++++++----------------
 1 file changed, 352 insertions(+), 196 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9314363..1e68927 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -173,16 +173,236 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  int cost;		/* The runtime cost.  */
-  unsigned complexity;	/* The estimate of the complexity of the code for
-			   the computation (in no concrete units --
-			   complexity field should be larger for more
-			   complex expressions and addressing modes).  */
-  int scratch;		/* Scratch used during cost computation.  */
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  {}
+
+  comp_cost (int cost, unsigned complexity)
+    : m_cost (cost), m_complexity (complexity), m_scratch (0)
+  {}
+
+  comp_cost& operator= (const comp_cost& other);
+
+  /* Returns true if COST is infinite.  */
+  bool infinite_cost_p ();
+
+  /* Adds costs COST1 and COST2.  */
+  friend comp_cost operator+ (comp_cost cost1, comp_cost cost2);
+
+  /* Adds COST to the comp_cost.  */
+  comp_cost operator+= (comp_cost cost);
+
+  /* Adds constant C to this comp_cost.  */
+  comp_cost operator+= (HOST_WIDE_INT c);
+
+  /* Subtracts constant C to this comp_cost.  */
+  comp_cost operator-= (HOST_WIDE_INT c);
+
+  /* Divide the comp_cost by constant C.  */
+  comp_cost operator/= (HOST_WIDE_INT c);
+
+  /* Multiply the comp_cost by constant C.  */
+  comp_cost operator*= (HOST_WIDE_INT c);
+
+  /* Subtracts costs COST1 and COST2.  */
+  friend comp_cost operator- (comp_cost cost1, comp_cost cost2);
+
+  /* Subtracts COST from this comp_cost.  */
+  comp_cost operator-= (comp_cost cost);
+
+  /* Returns true if COST1 is smaller than COST2.  */
+  friend bool operator< (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 and COST2 are equal.  */
+  friend bool operator== (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 is smaller or equal than COST2.  */
+  friend bool operator<= (comp_cost cost1, comp_cost cost2);
+
+  /* Return the cost.  */
+  int get_cost ();
+
+  /* Set the cost to C.  */
+  void set_cost (int c);
+
+  /* Return the complexity.  */
+  unsigned get_complexity ();
+
+  /* Set the complexity to C.  */
+  void set_complexity (unsigned c);
+
+  /* Return the scratch.  */
+  int get_scratch ();
+
+  /* Set the scratch to S.  */
+  void set_scratch (unsigned s);
+
+  /* Return infinite comp_cost.  */
+  static comp_cost get_infinite ();
+
+  /* Return empty comp_cost.  */
+  static comp_cost get_no_cost ();
+
+private:
+  int m_cost;		  /* The runtime cost.  */
+  unsigned m_complexity;  /* The estimate of the complexity of the code for
+			     the computation (in no concrete units --
+			     complexity field should be larger for more
+			     complex expressions and addressing modes).  */
+  int m_scratch;	  /* Scratch used during cost computation.  */
 };
 
-static const comp_cost no_cost = {0, 0, 0};
-static const comp_cost infinite_cost = {INFTY, INFTY, INFTY};
+comp_cost&
+comp_cost::operator= (const comp_cost& other)
+{
+  m_cost = other.m_cost;
+  m_complexity = other.m_complexity;
+  m_scratch = other.m_scratch;
+
+  return *this;
+}
+
+bool
+comp_cost::infinite_cost_p ()
+{
+  return m_cost == INFTY;
+}
+
+comp_cost
+operator+ (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
+    return comp_cost::get_infinite ();
+
+  cost1.m_cost += cost2.m_cost;
+  cost1.m_complexity += cost2.m_complexity;
+
+  return cost1;
+}
+
+comp_cost
+comp_cost::operator+= (comp_cost cost)
+{
+  *this = *this + cost;
+  return *this;
+}
+
+comp_cost
+comp_cost::operator+= (HOST_WIDE_INT c)
+{
+  this->m_cost += c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator-= (HOST_WIDE_INT c)
+{
+  this->m_cost -= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator/= (HOST_WIDE_INT c)
+{
+  this->m_cost /= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator*= (HOST_WIDE_INT c)
+{
+  this->m_cost *= c;
+
+  return *this;
+}
+
+comp_cost
+operator- (comp_cost cost1, comp_cost cost2)
+{
+  cost1.m_cost -= cost2.m_cost;
+  cost1.m_complexity -= cost2.m_complexity;
+
+  return cost1;
+}
+
+comp_cost
+comp_cost::operator-= (comp_cost cost)
+{
+  *this = *this - cost;
+  return *this;
+}
+
+bool
+operator< (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.m_cost == cost2.m_cost)
+    return cost1.m_complexity < cost2.m_complexity;
+
+  return cost1.m_cost < cost2.m_cost;
+}
+
+bool
+operator== (comp_cost cost1, comp_cost cost2)
+{
+  return cost1.m_cost == cost2.m_cost
+    && cost1.m_complexity == cost2.m_complexity;
+}
+
+bool
+operator<= (comp_cost cost1, comp_cost cost2)
+{
+  return cost1 < cost2 || cost1 == cost2;
+}
+
+int
+comp_cost::get_cost ()
+{
+  return m_cost;
+}
+
+void
+comp_cost::set_cost (int c)
+{
+  m_cost = c;
+}
+
+unsigned
+comp_cost::get_complexity ()
+{
+  return m_complexity;
+}
+
+void
+comp_cost::set_complexity (unsigned c)
+{
+  m_complexity = c;
+}
+
+int
+comp_cost::get_scratch ()
+{
+  return m_scratch;
+}
+
+void
+comp_cost::set_scratch (unsigned s)
+{
+  m_scratch = s;
+}
+
+comp_cost
+comp_cost::get_infinite ()
+{
+  return comp_cost (INFTY, INFTY);
+}
+
+comp_cost
+comp_cost::get_no_cost ()
+{
+  return comp_cost ();
+}
 
 /* The candidate - cost pair.  */
 struct cost_pair
@@ -362,8 +582,8 @@ struct ivopts_data
      by ivopt.  */
   hash_table<iv_inv_expr_hasher> *inv_expr_tab;
 
-  /* Loop invariant expression id.  */
-  int inv_expr_id;
+  /* The maximum invariant expression id.  */
+  int max_inv_expr_id;
 
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
@@ -888,7 +1108,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->vgroups.create (20);
   data->vcands.create (20);
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
   data->name_expansion_cache = NULL;
   data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
   data->iv_common_cands.create (20);
@@ -3263,64 +3483,6 @@ alloc_use_cost_map (struct ivopts_data *data)
     }
 }
 
-/* Returns description of computation cost of expression whose runtime
-   cost is RUNTIME and complexity corresponds to COMPLEXITY.  */
-
-static comp_cost
-new_cost (unsigned runtime, unsigned complexity)
-{
-  comp_cost cost;
-
-  cost.cost = runtime;
-  cost.complexity = complexity;
-
-  return cost;
-}
-
-/* Returns true if COST is infinite.  */
-
-static bool
-infinite_cost_p (comp_cost cost)
-{
-  return cost.cost == INFTY;
-}
-
-/* Adds costs COST1 and COST2.  */
-
-static comp_cost
-add_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (infinite_cost_p (cost1) || infinite_cost_p (cost2))
-    return infinite_cost;
-
-  cost1.cost += cost2.cost;
-  cost1.complexity += cost2.complexity;
-
-  return cost1;
-}
-/* Subtracts costs COST1 and COST2.  */
-
-static comp_cost
-sub_costs (comp_cost cost1, comp_cost cost2)
-{
-  cost1.cost -= cost2.cost;
-  cost1.complexity -= cost2.complexity;
-
-  return cost1;
-}
-
-/* Returns a negative number if COST1 < COST2, a positive number if
-   COST1 > COST2, and 0 if COST1 = COST2.  */
-
-static int
-compare_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (cost1.cost == cost2.cost)
-    return cost1.complexity - cost2.complexity;
-
-  return cost1.cost - cost2.cost;
-}
-
 /* Sets cost of (GROUP, CAND) pair to COST and record that it depends
    on invariants DEPENDS_ON and that the value used in expressing it
    is VALUE, and in case of iv elimination the comparison operator is COMP.  */
@@ -3333,7 +3495,7 @@ set_group_iv_cost (struct ivopts_data *data,
 {
   unsigned i, s;
 
-  if (infinite_cost_p (cost))
+  if (cost.infinite_cost_p ())
     {
       BITMAP_FREE (depends_on);
       return;
@@ -4149,7 +4311,7 @@ get_address_cost (bool symbol_present, bool var_present,
   else
     acost = data->costs[symbol_present][var_present][offset_p][ratio_p];
   complexity = (symbol_present != 0) + (var_present != 0) + offset_p + ratio_p;
-  return new_cost (cost + acost, complexity);
+  return comp_cost (cost + acost, complexity);
 }
 
  /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
@@ -4186,12 +4348,12 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
                 ? shiftsub1_cost (speed, mode, m)
                 : shiftsub0_cost (speed, mode, m)));
 
-  res = new_cost (MIN (as_cost, sa_cost), 0);
-  res = add_costs (res, mult_in_op1 ? cost0 : cost1);
+  res = comp_cost (MIN (as_cost, sa_cost), 0);
+  res += (mult_in_op1 ? cost0 : cost1);
 
   STRIP_NOPS (multop);
   if (!is_gimple_val (multop))
-    res = add_costs (res, force_expr_to_var_cost (multop, speed));
+    res = res + force_expr_to_var_cost (multop, speed);
 
   *cost = res;
   return true;
@@ -4251,12 +4413,12 @@ force_expr_to_var_cost (tree expr, bool speed)
   STRIP_NOPS (expr);
 
   if (SSA_VAR_P (expr))
-    return no_cost;
+    return comp_cost::get_no_cost ();
 
   if (is_gimple_min_invariant (expr))
     {
       if (TREE_CODE (expr) == INTEGER_CST)
-	return new_cost (integer_cost [speed], 0);
+	return comp_cost (integer_cost [speed], 0);
 
       if (TREE_CODE (expr) == ADDR_EXPR)
 	{
@@ -4265,10 +4427,10 @@ force_expr_to_var_cost (tree expr, bool speed)
 	  if (TREE_CODE (obj) == VAR_DECL
 	      || TREE_CODE (obj) == PARM_DECL
 	      || TREE_CODE (obj) == RESULT_DECL)
-	    return new_cost (symbol_cost [speed], 0);
+	    return comp_cost (symbol_cost [speed], 0);
 	}
 
-      return new_cost (address_cost [speed], 0);
+      return comp_cost (address_cost [speed], 0);
     }
 
   switch (TREE_CODE (expr))
@@ -4292,18 +4454,18 @@ force_expr_to_var_cost (tree expr, bool speed)
 
     default:
       /* Just an arbitrary value, FIXME.  */
-      return new_cost (target_spill_cost[speed], 0);
+      return comp_cost (target_spill_cost[speed], 0);
     }
 
   if (op0 == NULL_TREE
       || TREE_CODE (op0) == SSA_NAME || CONSTANT_CLASS_P (op0))
-    cost0 = no_cost;
+    cost0 = comp_cost::get_no_cost ();
   else
     cost0 = force_expr_to_var_cost (op0, speed);
 
   if (op1 == NULL_TREE
       || TREE_CODE (op1) == SSA_NAME || CONSTANT_CLASS_P (op1))
-    cost1 = no_cost;
+    cost1 = comp_cost::get_no_cost ();
   else
     cost1 = force_expr_to_var_cost (op1, speed);
 
@@ -4314,7 +4476,7 @@ force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
-      cost = new_cost (add_cost (speed, mode), 0);
+      cost = comp_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
         {
           tree mult = NULL_TREE;
@@ -4337,35 +4499,35 @@ force_expr_to_var_cost (tree expr, bool speed)
 	tree inner_mode, outer_mode;
 	outer_mode = TREE_TYPE (expr);
 	inner_mode = TREE_TYPE (op0);
-	cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+	cost = comp_cost (convert_cost (TYPE_MODE (outer_mode),
 				       TYPE_MODE (inner_mode), speed), 0);
       }
       break;
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op0),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op0),
 					     mode, speed), 0);
       else if (cst_and_fits_in_hwi (op1))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op1),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op1),
 					     mode, speed), 0);
       else
-	return new_cost (target_spill_cost [speed], 0);
+	return comp_cost (target_spill_cost [speed], 0);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  cost = add_costs (cost, cost0);
-  cost = add_costs (cost, cost1);
+  cost += cost0;
+  cost += cost1;
 
   /* Bound the cost by target_spill_cost.  The parts of complicated
      computations often are either loop invariant or at least can
      be shared between several iv uses, so letting this grow without
      limits would not give reasonable results.  */
-  if (cost.cost > (int) target_spill_cost [speed])
-    cost.cost = target_spill_cost [speed];
+  if (cost.get_cost () > (int) target_spill_cost [speed])
+    cost.set_cost (target_spill_cost [speed]);
 
   return cost;
 }
@@ -4417,7 +4579,7 @@ split_address_cost (struct ivopts_data *data,
       if (depends_on)
 	walk_tree (&addr, find_depends, depends_on, NULL);
 
-      return new_cost (target_spill_cost[data->speed], 0);
+      return comp_cost (target_spill_cost[data->speed], 0);
     }
 
   *offset += bitpos / BITS_PER_UNIT;
@@ -4426,12 +4588,12 @@ split_address_cost (struct ivopts_data *data,
     {
       *symbol_present = true;
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   *symbol_present = false;
   *var_present = true;
-  return no_cost;
+  return comp_cost::get_no_cost ();
 }
 
 /* Estimates cost of expressing difference of addresses E1 - E2 as
@@ -4456,7 +4618,7 @@ ptr_difference_cost (struct ivopts_data *data,
       *offset += diff;
       *symbol_present = false;
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   if (integer_zerop (e2))
@@ -4506,7 +4668,7 @@ difference_cost (struct ivopts_data *data,
   if (operand_equal_p (e1, e2, 0))
     {
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   *var_present = true;
@@ -4517,7 +4679,7 @@ difference_cost (struct ivopts_data *data,
   if (integer_zerop (e1))
     {
       comp_cost cost = force_var_cost (data, e2, depends_on);
-      cost.cost += mult_by_coeff_cost (-1, mode, data->speed);
+      cost += mult_by_coeff_cost (-1, mode, data->speed);
       return cost;
     }
 
@@ -4568,7 +4730,7 @@ get_expr_id (struct ivopts_data *data, tree expr)
   *slot = XNEW (struct iv_inv_expr_ent);
   (*slot)->expr = expr;
   (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
+  (*slot)->id = data->max_inv_expr_id++;
   return (*slot)->id;
 }
 
@@ -4709,7 +4871,7 @@ get_computation_cost_at (struct ivopts_data *data,
 
   /* Only consider real candidates.  */
   if (!cand->iv)
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   cbase = cand->iv->base;
   cstep = cand->iv->step;
@@ -4718,7 +4880,7 @@ get_computation_cost_at (struct ivopts_data *data,
   if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
     {
       /* We do not have a precision to express the values of use.  */
-      return infinite_cost;
+      return comp_cost::get_infinite ();
     }
 
   if (address_p
@@ -4735,7 +4897,7 @@ get_computation_cost_at (struct ivopts_data *data,
       if (use->iv->base_object
 	  && cand->iv->base_object
 	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
-	return infinite_cost;
+	return comp_cost::get_infinite ();
     }
 
   if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
@@ -4756,12 +4918,12 @@ get_computation_cost_at (struct ivopts_data *data,
     cstepi = 0;
 
   if (!constant_multiple_of (ustep, cstep, &rat))
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   if (wi::fits_shwi_p (rat))
     ratio = rat.to_shwi ();
   else
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   STRIP_NOPS (cbase);
   ctype = TREE_TYPE (cbase);
@@ -4782,7 +4944,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -4806,7 +4968,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, real_cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -4827,22 +4989,20 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost = add_costs (cost,
-			difference_cost (data,
-					 ubase, build_int_cst (utype, 0),
-					 &symbol_present, &var_present,
-					 &offset, depends_on));
-      cost.cost /= avg_loop_niter (data->current_loop);
-      cost.cost += add_cost (data->speed, TYPE_MODE (ctype));
+      cost += difference_cost (data, ubase, build_int_cst (utype, 0),
+			       &symbol_present, &var_present, &offset,
+			       depends_on);
+      cost /= avg_loop_niter (data->current_loop);
+      cost += add_cost (data->speed, TYPE_MODE (ctype));
     }
 
   /* Record setup cost in scrach field.  */
-  cost.scratch = cost.cost;
+  cost.set_scratch (cost.get_cost ());
 
   if (inv_expr_id)
     {
@@ -4862,38 +5022,36 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return add_costs (cost,
-		      get_address_cost (symbol_present, var_present,
-					offset, ratio, cstepi,
-					mem_mode,
-					TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-					speed, stmt_is_after_inc,
-					can_autoinc));
+    return cost + get_address_cost (symbol_present, var_present,
+				    offset, ratio, cstepi,
+				    mem_mode,
+				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				    speed, stmt_is_after_inc, can_autoinc);
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
-	cost.cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
+	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
       return cost;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
-    cost.cost += adjust_setup_cost (data,
+    cost += adjust_setup_cost (data,
 				    add_cost (speed, TYPE_MODE (ctype)));
 
   /* Having offset does not affect runtime cost in case it is added to
      symbol, but it increases complexity.  */
   if (offset)
-    cost.complexity++;
+    cost.set_complexity (cost.get_complexity () + 1);
 
-  cost.cost += add_cost (speed, TYPE_MODE (ctype));
+  cost += add_cost (speed, TYPE_MODE (ctype));
 
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
-    cost.cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
+    cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
   return cost;
 
 fallback:
@@ -4905,14 +5063,12 @@ fallback:
     tree comp = get_computation_at (data->current_loop, use, cand, at);
 
     if (!comp)
-      return infinite_cost;
+      return comp_cost::get_infinite ();
 
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    cost = new_cost (computation_cost (comp, speed), 0);
-    cost.scratch = 0;
-    return cost;
+    return comp_cost (computation_cost (comp, speed), 0);
   }
 }
 
@@ -4951,14 +5107,14 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
      cost of increment twice -- once at this use and once in the cost of
      the candidate.  */
   if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt)
-    cost = no_cost;
+    cost = comp_cost::get_no_cost ();
   else
     cost = get_computation_cost (data, use, cand, false,
 				 &depends_on, NULL, &inv_expr_id);
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr_id);
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND in addresses.  */
@@ -4972,27 +5128,27 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   bool can_autoinc, first = true;
   int inv_expr_id = -1;
   struct iv_use *use = group->vuses[0];
-  comp_cost sum_cost = no_cost, cost;
+  comp_cost sum_cost = comp_cost::get_no_cost (), cost;
 
   cost = get_computation_cost (data, use, cand, true,
 			       &depends_on, &can_autoinc, &inv_expr_id);
 
   sum_cost = cost;
-  if (!infinite_cost_p (sum_cost) && cand->ainc_use == use)
+  if (!sum_cost.infinite_cost_p () && cand->ainc_use == use)
     {
       if (can_autoinc)
-	sum_cost.cost -= cand->cost_step;
+	sum_cost -= cand->cost_step;
       /* If we generated the candidate solely for exploiting autoincrement
 	 opportunities, and it turns out it can't be used, set the cost to
 	 infinity to make sure we ignore it.  */
       else if (cand->pos == IP_AFTER_USE || cand->pos == IP_BEFORE_USE)
-	sum_cost = infinite_cost;
+	sum_cost = comp_cost::get_infinite ();
     }
 
   /* Uses in a group can share setup code, so only add setup cost once.  */
-  cost.cost -= cost.scratch;
+  cost -= cost.get_scratch ();
   /* Compute and add costs for rest uses of this group.  */
-  for (i = 1; i < group->vuses.length () && !infinite_cost_p (sum_cost); i++)
+  for (i = 1; i < group->vuses.length () && !sum_cost.infinite_cost_p (); i++)
     {
       struct iv_use *next = group->vuses[i];
 
@@ -5017,15 +5173,15 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 	  cost = get_computation_cost (data, next, cand, true,
 				       NULL, &can_autoinc, NULL);
 	  /* Remove setup cost.  */
-	  if (!infinite_cost_p (cost))
-	    cost.cost -= cost.scratch;
+	  if (!cost.infinite_cost_p ())
+	    cost -= cost.get_scratch ();
 	}
-      sum_cost = add_costs (sum_cost, cost);
+      sum_cost += cost;
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr_id);
 
-  return !infinite_cost_p (sum_cost);
+  return !sum_cost.infinite_cost_p ();
 }
 
 /* Computes value of candidate CAND at position AT in iteration NITER, and
@@ -5453,10 +5609,10 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   if (may_eliminate_iv (data, use, cand, &bound, &comp))
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
-      if (elim_cost.cost == 0)
-        elim_cost.cost = parm_decl_cost (data, bound);
+      if (elim_cost.get_cost () == 0)
+	elim_cost.set_cost (parm_decl_cost (data, bound));
       else if (TREE_CODE (bound) == INTEGER_CST)
-        elim_cost.cost = 0;
+	elim_cost.set_cost (0);
       /* If we replace a loop condition 'i < n' with 'p < base + n',
 	 depends_on_elim will have 'base' and 'n' set, which implies
 	 that both 'base' and 'n' will be live during the loop.	 More likely,
@@ -5470,10 +5626,10 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 	}
       /* The bound is a loop invariant, so it will be only computed
 	 once.  */
-      elim_cost.cost = adjust_setup_cost (data, elim_cost.cost);
+      elim_cost.set_cost (adjust_setup_cost (data, elim_cost.get_cost ()));
     }
   else
-    elim_cost = infinite_cost;
+    elim_cost = comp_cost::get_infinite ();
 
   /* Try expressing the original giv.  If it is compared with an invariant,
      note that we cannot get rid of it.  */
@@ -5487,11 +5643,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
      TODO: The constant that we're subtracting from the cost should
      be target-dependent.  This information should be added to the
      target costs for each backend.  */
-  if (!infinite_cost_p (elim_cost) /* Do not try to decrease infinite! */
+  if (!elim_cost.infinite_cost_p () /* Do not try to decrease infinite! */
       && integer_zerop (*bound_cst)
       && (operand_equal_p (*control_var, cand->var_after, 0)
 	  || operand_equal_p (*control_var, cand->var_before, 0)))
-    elim_cost.cost -= 1;
+    elim_cost -= 1;
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
@@ -5501,14 +5657,14 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 
   /* Count the cost of the original bound as well.  */
   bound_cost = force_var_cost (data, *bound_cst, NULL);
-  if (bound_cost.cost == 0)
-    bound_cost.cost = parm_decl_cost (data, *bound_cst);
+  if (bound_cost.get_cost () == 0)
+    bound_cost.set_cost (parm_decl_cost (data, *bound_cst));
   else if (TREE_CODE (*bound_cst) == INTEGER_CST)
-    bound_cost.cost = 0;
-  express_cost.cost += bound_cost.cost;
+    bound_cost.set_cost (0);
+  express_cost += bound_cost;
 
   /* Choose the better approach, preferring the eliminated IV. */
-  if (compare_costs (elim_cost, express_cost) <= 0)
+  if (elim_cost <= express_cost)
     {
       cost = elim_cost;
       depends_on = depends_on_elim;
@@ -5533,7 +5689,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   if (depends_on_express)
     BITMAP_FREE (depends_on_express);
 
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND.  Returns false
@@ -5578,7 +5734,7 @@ autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use,
 
   BITMAP_FREE (depends_on);
 
-  return !infinite_cost_p (cost) && can_autoinc;
+  return !cost.infinite_cost_p () && can_autoinc;
 }
 
 /* Examine IP_ORIGINAL candidates to see if they are incremented next to a
@@ -5727,13 +5883,13 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
-		  || infinite_cost_p (group->cost_map[j].cost))
+		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
 	      fprintf (dump_file, "  %d\t%d\t%d\t",
 		       group->cost_map[j].cand->id,
-		       group->cost_map[j].cost.cost,
-		       group->cost_map[j].cost.complexity);
+		       group->cost_map[j].cost.get_cost (),
+		       group->cost_map[j].cost.get_complexity ());
 	      if (group->cost_map[j].depends_on)
 		bitmap_print (dump_file,
 			      group->cost_map[j].depends_on, "","");
@@ -5773,11 +5929,11 @@ determine_iv_cost (struct ivopts_data *data, struct iv_cand *cand)
   /* It will be exceptional that the iv register happens to be initialized with
      the proper value at no cost.  In general, there will at least be a regcopy
      or a const set.  */
-  if (cost_base.cost == 0)
-    cost_base.cost = COSTS_N_INSNS (1);
+  if (cost_base.get_cost () == 0)
+    cost_base.set_cost (COSTS_N_INSNS (1));
   cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base)));
 
-  cost = cost_step + adjust_setup_cost (data, cost_base.cost);
+  cost = cost_step + adjust_setup_cost (data, cost_base.get_cost ());
 
   /* Prefer the original ivs unless we may gain something by replacing it.
      The reason is to make debugging simpler; so this is not relevant for
@@ -5899,19 +6055,16 @@ determine_set_costs (struct ivopts_data *data)
 static bool
 cheaper_cost_pair (struct cost_pair *a, struct cost_pair *b)
 {
-  int cmp;
-
   if (!a)
     return false;
 
   if (!b)
     return true;
 
-  cmp = compare_costs (a->cost, b->cost);
-  if (cmp < 0)
+  if (a->cost < b->cost)
     return true;
 
-  if (cmp > 0)
+  if (b->cost < a->cost)
     return false;
 
   /* In case the costs are the same, prefer the cheaper candidate.  */
@@ -5937,10 +6090,10 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
 {
   comp_cost cost = ivs->cand_use_cost;
 
-  cost.cost += ivs->cand_cost;
+  cost += ivs->cand_cost;
 
-  cost.cost += ivopts_global_cost_for_size (data,
-                                            ivs->n_regs + ivs->num_used_inv_expr);
+  cost += ivopts_global_cost_for_size (data,
+				       ivs->n_regs + ivs->num_used_inv_expr);
 
   ivs->cost = cost;
 }
@@ -5994,7 +6147,7 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
     }
 
-  ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
+  ivs->cand_use_cost -= cp->cost;
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
@@ -6059,7 +6212,7 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
 	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
 	}
 
-      ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
+      ivs->cand_use_cost += cp->cost;
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
       if (cp->inv_expr_id != -1)
@@ -6119,7 +6272,7 @@ iv_ca_cost (struct iv_ca *ivs)
   /* This was a conditional expression but it triggered a bug in
      Sun C 5.5.  */
   if (ivs->bad_groups)
-    return infinite_cost;
+    return comp_cost::get_infinite ();
   else
     return ivs->cost;
 }
@@ -6273,11 +6426,11 @@ iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
-  nw->cand_use_cost = no_cost;
+  nw->cand_use_cost = comp_cost::get_no_cost ();
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
-  nw->cost = no_cost;
-  nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
+  nw->cost = comp_cost::get_no_cost ();
+  nw->used_inv_expr = XCNEWVEC (unsigned, data->max_inv_expr_id + 1);
   nw->num_used_inv_expr = 0;
 
   return nw;
@@ -6306,9 +6459,11 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.get_cost (),
+	   cost.get_complexity ());
   fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
-           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+	   ivs->cand_cost, ivs->cand_use_cost.get_cost (),
+	   ivs->cand_use_cost.get_complexity ());
   bitmap_print (file, ivs->cands, "  candidates: ","\n");
 
   for (i = 0; i < ivs->upto; i++)
@@ -6316,8 +6471,9 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
       struct iv_group *group = data->vgroups[i];
       struct cost_pair *cp = iv_ca_cand_for_group (ivs, group);
       if (cp)
-	fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
-		 group->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+        fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
+		 group->id, cp->cand->id, cp->cost.get_cost (),
+		 cp->cost.get_complexity ());
       else
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
@@ -6423,7 +6579,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6446,7 +6602,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6459,7 +6615,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
       if (!new_cp)
 	{
 	  iv_ca_delta_free (delta);
-	  return infinite_cost;
+	  return comp_cost::get_infinite ();
 	}
 
       *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
@@ -6498,7 +6654,7 @@ iv_ca_prune (struct ivopts_data *data, struct iv_ca *ivs,
 
       acost = iv_ca_narrow (data, ivs, cand, except_cand, &act_delta);
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6611,7 +6767,7 @@ iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_delta_commit (data, ivs, act_delta, false);
       act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 
-      if (compare_costs (acost, orig_cost) < 0)
+      if (acost < orig_cost)
 	{
 	  *delta = act_delta;
 	  return acost;
@@ -6680,7 +6836,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
-      if (compare_costs (act_cost, best_cost) < 0)
+      if (act_cost < best_cost)
 	{
 	  best_cost = act_cost;
 
@@ -6691,7 +6847,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	iv_ca_delta_free (&act_delta);
     }
 
-  if (infinite_cost_p (best_cost))
+  if (best_cost.infinite_cost_p ())
     {
       for (i = 0; i < group->n_map_members; i++)
 	{
@@ -6720,7 +6876,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 				       iv_ca_cand_for_group (ivs, group),
 				       cp, act_delta);
 
-	  if (compare_costs (act_cost, best_cost) < 0)
+	  if (act_cost < best_cost)
 	    {
 	      best_cost = act_cost;
 
@@ -6736,7 +6892,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
   iv_ca_delta_commit (data, ivs, best_delta, true);
   iv_ca_delta_free (&best_delta);
 
-  return !infinite_cost_p (best_cost);
+  return !best_cost.infinite_cost_p ();
 }
 
 /* Finds an initial assignment of candidates to uses.  */
@@ -6792,7 +6948,7 @@ try_improve_iv_set (struct ivopts_data *data,
 	  act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 	}
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6826,7 +6982,7 @@ try_improve_iv_set (struct ivopts_data *data,
     }
 
   iv_ca_delta_commit (data, ivs, best_delta, true);
-  gcc_assert (compare_costs (best_cost, iv_ca_cost (ivs)) == 0);
+  gcc_assert (best_cost == iv_ca_cost (ivs));
   iv_ca_delta_free (&best_delta);
   return true;
 }
@@ -6884,19 +7040,19 @@ find_optimal_iv_set (struct ivopts_data *data)
   if (!origset && !set)
     return NULL;
 
-  origcost = origset ? iv_ca_cost (origset) : infinite_cost;
-  cost = set ? iv_ca_cost (set) : infinite_cost;
+  origcost = origset ? iv_ca_cost (origset) : comp_cost::get_infinite ();
+  cost = set ? iv_ca_cost (set) : comp_cost::get_infinite ();
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       fprintf (dump_file, "Original cost %d (complexity %d)\n\n",
-	       origcost.cost, origcost.complexity);
+	       origcost.get_cost (), origcost.get_complexity ());
       fprintf (dump_file, "Final cost %d (complexity %d)\n\n",
-	       cost.cost, cost.complexity);
+	       cost.get_cost (), cost.get_complexity ());
     }
 
   /* Choose the one with the best cost.  */
-  if (compare_costs (origcost, cost) <= 0)
+  if (origcost <= cost)
     {
       if (set)
 	iv_ca_free (&set);
@@ -7540,7 +7696,7 @@ free_loop_data (struct ivopts_data *data)
   decl_rtl_to_reset.truncate (0);
 
   data->inv_expr_tab->empty ();
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
 
   data->iv_common_cand_tab->empty ();
   data->iv_common_cands.truncate (0);
-- 
2.8.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 2/3] Add profiling support for IVOPTS
  2016-04-29 11:58 [PATCH 0/3] IVOPTS: support profiling marxin
  2016-04-29 11:58 ` [PATCH 3/3] Enhance dumps of IVOPTS marxin
  2016-04-29 11:58 ` [PATCH 1/3] Encapsulate comp_cost within a class with methods marxin
@ 2016-04-29 11:58 ` marxin
  2016-05-16 13:56   ` Martin Liška
  2016-05-03  9:28 ` [PATCH 0/3] IVOPTS: support profiling Bin.Cheng
  3 siblings, 1 reply; 34+ messages in thread
From: marxin @ 2016-04-29 11:58 UTC (permalink / raw)
  To: gcc-patches

gcc/ChangeLog:

2016-04-25  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (struct comp_cost): Introduce
	m_cost_scaled and m_frequency fields.
	(comp_cost::operator=): Assign to m_cost_scaled.
	(operator+): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator-): Likewise.
	(comp_cost::set_cost): Likewise.
	(comp_cost::get_cost_scaled): New function.
	(comp_cost::calculate_scaled_cost): Likewise.
	(comp_cost::propagate_scaled_cost): Likewise.
	(comp_cost::get_frequency): Likewise.
	(comp_cost::scale_cost): Likewise.
	(comp_cost::has_frequency): Likewise.
	(get_computation_cost_at): Propagate ratio of frequencies
	of loop header and another basic block.
	(determine_group_iv_costs): Dump new fields.
---
 gcc/tree-ssa-loop-ivopts.c | 130 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 118 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1e68927..af00ff0 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -107,6 +107,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "builtins.h"
 #include "tree-vectorizer.h"
+#include "sreal.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
@@ -173,11 +174,13 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0),
+    m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost (int cost, unsigned complexity)
-    : m_cost (cost), m_complexity (complexity), m_scratch (0)
+    : m_cost (cost), m_complexity (complexity), m_scratch (0),
+      m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost& operator= (const comp_cost& other);
@@ -236,6 +239,26 @@ struct comp_cost
   /* Set the scratch to S.  */
   void set_scratch (unsigned s);
 
+  /* Return scaled cost.  */
+  double get_cost_scaled ();
+
+  /* Calculate scaled cost based on frequency of a basic block with
+     frequency equal to NOMINATOR / DENOMINATOR.  */
+  void calculate_scaled_cost (int nominator, int denominator);
+
+  /* Propagate scaled cost which is based on frequency of basic block
+     the cost belongs to.  */
+  void propagate_scaled_cost ();
+
+  /* Return frequency of the cost.  */
+  double get_frequency ();
+
+  /* Scale COST by frequency of the cost.  */
+  const sreal scale_cost (int cost);
+
+  /* Return true if the frequency has a valid value.  */
+  bool has_frequency ();
+
   /* Return infinite comp_cost.  */
   static comp_cost get_infinite ();
 
@@ -249,6 +272,9 @@ private:
 			     complexity field should be larger for more
 			     complex expressions and addressing modes).  */
   int m_scratch;	  /* Scratch used during cost computation.  */
+  sreal m_frequency;	  /* Frequency of the basic block this comp_cost
+			     belongs to.  */
+  sreal m_cost_scaled;	  /* Scalled runtime cost.  */
 };
 
 comp_cost&
@@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
   m_cost = other.m_cost;
   m_complexity = other.m_complexity;
   m_scratch = other.m_scratch;
+  m_frequency = other.m_frequency;
+  m_cost_scaled = other.m_cost_scaled;
 
   return *this;
 }
@@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
 
   cost1.m_cost += cost2.m_cost;
   cost1.m_complexity += cost2.m_complexity;
+  cost1.m_cost_scaled += cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -290,6 +319,8 @@ comp_cost
 comp_cost::operator+= (HOST_WIDE_INT c)
 {
   this->m_cost += c;
+  if (has_frequency ())
+    this->m_cost_scaled += scale_cost (c);
 
   return *this;
 }
@@ -298,6 +329,8 @@ comp_cost
 comp_cost::operator-= (HOST_WIDE_INT c)
 {
   this->m_cost -= c;
+  if (has_frequency ())
+    this->m_cost_scaled -= scale_cost (c);
 
   return *this;
 }
@@ -306,6 +339,8 @@ comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
   this->m_cost /= c;
+  if (has_frequency ())
+    this->m_cost_scaled /= scale_cost (c);
 
   return *this;
 }
@@ -314,6 +349,8 @@ comp_cost
 comp_cost::operator*= (HOST_WIDE_INT c)
 {
   this->m_cost *= c;
+  if (has_frequency ())
+    this->m_cost_scaled *= scale_cost (c);
 
   return *this;
 }
@@ -323,6 +360,7 @@ operator- (comp_cost cost1, comp_cost cost2)
 {
   cost1.m_cost -= cost2.m_cost;
   cost1.m_complexity -= cost2.m_complexity;
+  cost1.m_cost_scaled -= cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -366,6 +404,7 @@ void
 comp_cost::set_cost (int c)
 {
   m_cost = c;
+  m_cost_scaled = scale_cost (c);
 }
 
 unsigned
@@ -392,6 +431,48 @@ comp_cost::set_scratch (unsigned s)
   m_scratch = s;
 }
 
+double
+comp_cost::get_cost_scaled ()
+{
+  return m_cost_scaled.to_double ();
+}
+
+void
+comp_cost::calculate_scaled_cost (int nominator, int denominator)
+{
+  m_frequency = denominator == 0
+    ? sreal (1) : sreal (nominator) / sreal (denominator);
+
+  m_cost_scaled = scale_cost (m_cost);
+}
+
+void
+comp_cost::propagate_scaled_cost ()
+{
+  if (m_cost < 0)
+    return;
+
+  m_cost = m_cost_scaled.to_int ();
+}
+
+double
+comp_cost::get_frequency ()
+{
+  return m_frequency.to_double ();
+}
+
+const sreal
+comp_cost::scale_cost (int cost)
+{
+  return m_frequency * cost;
+}
+
+bool
+comp_cost::has_frequency ()
+{
+  return m_frequency != sreal (0);
+}
+
 comp_cost
 comp_cost::get_infinite ()
 {
@@ -5022,18 +5103,21 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return cost + get_address_cost (symbol_present, var_present,
-				    offset, ratio, cstepi,
-				    mem_mode,
-				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-				    speed, stmt_is_after_inc, can_autoinc);
+    {
+      cost += get_address_cost (symbol_present, var_present,
+				offset, ratio, cstepi,
+				mem_mode,
+				TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				speed, stmt_is_after_inc, can_autoinc);
+      goto ret;
+    }
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-      return cost;
+      goto ret;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5052,7 +5136,8 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
     cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+
+  goto ret;
 
 fallback:
   if (can_autoinc)
@@ -5068,8 +5153,13 @@ fallback:
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    return comp_cost (computation_cost (comp, speed), 0);
+    cost = comp_cost (computation_cost (comp, speed), 0);
   }
+
+ret:
+  cost.calculate_scaled_cost (at->bb->frequency,
+			      data->current_loop->header->frequency);
+  return cost;
 }
 
 /* Determines the cost of the computation by that USE is expressed
@@ -5879,16 +5969,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  group = data->vgroups[i];
 
 	  fprintf (dump_file, "Group %d:\n", i);
-	  fprintf (dump_file, "  cand\tcost\tcompl.\tdepends on\n");
+	  fprintf (dump_file, "  cand\tcost\tscaled\tfreq\tcompl.\t"
+		   "depends on\n");
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
 		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
-	      fprintf (dump_file, "  %d\t%d\t%d\t",
+	      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f%d\t",
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.get_cost (),
+		       group->cost_map[j].cost.get_cost_scaled (),
+		       group->cost_map[j].cost.get_frequency (),
 		       group->cost_map[j].cost.get_complexity ());
 	      if (group->cost_map[j].depends_on)
 		bitmap_print (dump_file,
@@ -5903,6 +5996,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	}
       fprintf (dump_file, "\n");
     }
+
+  for (i = 0; i < data->vgroups.length (); i++)
+    {
+      group = data->vgroups[i];
+      for (j = 0; j < group->n_map_members; j++)
+	{
+	  if (!group->cost_map[j].cand
+	      || group->cost_map[j].cost.infinite_cost_p ())
+	    continue;
+
+	  group->cost_map[j].cost.propagate_scaled_cost ();
+	}
+    }
 }
 
 /* Determines cost of the candidate CAND.  */
-- 
2.8.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/3] IVOPTS: support profiling
  2016-04-29 11:58 [PATCH 0/3] IVOPTS: support profiling marxin
                   ` (2 preceding siblings ...)
  2016-04-29 11:58 ` [PATCH 2/3] Add profiling support for IVOPTS marxin
@ 2016-05-03  9:28 ` Bin.Cheng
  3 siblings, 0 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-03  9:28 UTC (permalink / raw)
  To: marxin; +Cc: gcc-patches List

On Fri, Apr 29, 2016 at 12:56 PM, marxin <mliska@suse.cz> wrote:
> Hello.
>
> As profile-guided optimization can provide very useful information
> about basic block frequencies within a loop, following patch set leverages
> that information. It speeds up a single benchmark from upcoming SPECv6
> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
> also improve others (currently measuring numbers for PGO).
>
> Idea is quite simple, where each cost (belonging to a BB) is
> multiplied by (bb_frequency / header_frequency), which suppress IV uses
> in basic blocks with a low frequency.
>
> The patch set can bootstrap on ppc64le-linux-gnu (and also
> x86_64-linux-gnu) and no new regression is introduced.
>
> Ready for trunk?
Hi Martin,
Thanks for working on this.  I will first measure it on AArch64 and
read it in details.

Thanks,
bin
> Thanks,
> Martin
>
> marxin (3):
>   Encapsulate comp_cost within a class with methods.
>   Add profiling support for IVOPTS
>   Enhance dumps of IVOPTS
>
>  gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
>  gcc/tree-ssa-loop-ivopts.c               | 690 ++++++++++++++++++++++---------
>  2 files changed, 491 insertions(+), 201 deletions(-)
>
> --
> 2.8.1
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-04-29 11:58 ` [PATCH 3/3] Enhance dumps of IVOPTS marxin
@ 2016-05-06  9:19   ` Martin Liška
  2016-05-09  9:47     ` Richard Biener
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-06  9:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka

Hi.

Honza asked me to explain the change more verbosely. 
The patch simplify enhances verbose dump of IVOPTS so that
# of iterations is printed. Apart from that it also prints
invariant expression that are used during the algorithm which
considers a set of candidates which is improved.

Main motivation for doing this was that sometimes the optimization
considers a constant integer as invariant expression (Bin Cheng
is working on removal of these) and that both IVs and IE are considered
by the cost model to occupy a register. Which is not ideal and
it sometimes tend to introduce more IVs that one would expect.

=== New format ===:
Improved to:
  cost: 27 (complexity 2)
  cand_cost: 11
  cand_group_cost: 10 (complexity 2)
  candidates: 3, 5
   group:0 --> iv_cand:5, cost=(2,0)
   group:1 --> iv_cand:5, cost=(4,1)
   group:2 --> iv_cand:5, cost=(4,1)
   group:3 --> iv_cand:3, cost=(0,0)
   group:4 --> iv_cand:3, cost=(0,0)
  invariants 1, 6
  used invariant expressions:
   inv_expr:3: 	((sizetype) _976 - (sizetype) _922) * 4
   inv_expr:6: 	((sizetype) _1335 - (sizetype) _922) * 4


Original cost 27 (complexity 2)

Final cost 27 (complexity 2)

Selected IV set for loop 96 at original.f90:820, 5 avg niters, 2 expressions, 2 IVs:

=== Before ===:

Improved to:
  cost: 27 (complexity 2)
  cand_cost: 11
  cand_group_cost: 10 (complexity 2)
  candidates: 3, 5
   group:0 --> iv_cand:5, cost=(2,0)
   group:1 --> iv_cand:5, cost=(4,1)
   group:2 --> iv_cand:5, cost=(4,1)
   group:3 --> iv_cand:3, cost=(0,0)
   group:4 --> iv_cand:3, cost=(0,0)
  invariants 1, 6

Original cost 27 (complexity 2)

Final cost 27 (complexity 2)

Selected IV set for loop 96 at original.f90:820, 2 IVs:


Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-06  9:19   ` Martin Liška
@ 2016-05-09  9:47     ` Richard Biener
  2016-05-10 13:16       ` Bin.Cheng
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Biener @ 2016-05-09  9:47 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, Jan Hubicka

On Fri, May 6, 2016 at 11:19 AM, Martin Liška <mliska@suse.cz> wrote:
> Hi.
>
> Honza asked me to explain the change more verbosely.
> The patch simplify enhances verbose dump of IVOPTS so that
> # of iterations is printed. Apart from that it also prints
> invariant expression that are used during the algorithm which
> considers a set of candidates which is improved.
>
> Main motivation for doing this was that sometimes the optimization
> considers a constant integer as invariant expression (Bin Cheng
> is working on removal of these) and that both IVs and IE are considered
> by the cost model to occupy a register. Which is not ideal and
> it sometimes tend to introduce more IVs that one would expect.
>
> === New format ===:
> Improved to:
>   cost: 27 (complexity 2)
>   cand_cost: 11
>   cand_group_cost: 10 (complexity 2)
>   candidates: 3, 5
>    group:0 --> iv_cand:5, cost=(2,0)
>    group:1 --> iv_cand:5, cost=(4,1)
>    group:2 --> iv_cand:5, cost=(4,1)
>    group:3 --> iv_cand:3, cost=(0,0)
>    group:4 --> iv_cand:3, cost=(0,0)
>   invariants 1, 6
>   used invariant expressions:
>    inv_expr:3:  ((sizetype) _976 - (sizetype) _922) * 4
>    inv_expr:6:  ((sizetype) _1335 - (sizetype) _922) * 4
>
>
> Original cost 27 (complexity 2)
>
> Final cost 27 (complexity 2)
>
> Selected IV set for loop 96 at original.f90:820, 5 avg niters, 2 expressions, 2 IVs:
>
> === Before ===:
>
> Improved to:
>   cost: 27 (complexity 2)
>   cand_cost: 11
>   cand_group_cost: 10 (complexity 2)
>   candidates: 3, 5
>    group:0 --> iv_cand:5, cost=(2,0)
>    group:1 --> iv_cand:5, cost=(4,1)
>    group:2 --> iv_cand:5, cost=(4,1)
>    group:3 --> iv_cand:3, cost=(0,0)
>    group:4 --> iv_cand:3, cost=(0,0)
>   invariants 1, 6
>
> Original cost 27 (complexity 2)
>
> Final cost 27 (complexity 2)
>
> Selected IV set for loop 96 at original.f90:820, 2 IVs:

But it slows donw compile-time just for enhanced dump files.  Can you
make the new
hash-map conditional on dumping?

Richard.

>
> Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-09  9:47     ` Richard Biener
@ 2016-05-10 13:16       ` Bin.Cheng
  2016-05-11 14:18         ` Martin Liška
  2016-05-12 12:14         ` Martin Liška
  0 siblings, 2 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-10 13:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches, Jan Hubicka

On Mon, May 9, 2016 at 10:46 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, May 6, 2016 at 11:19 AM, Martin Liška <mliska@suse.cz> wrote:
>> Hi.
>>
>> Honza asked me to explain the change more verbosely.
>> The patch simplify enhances verbose dump of IVOPTS so that
>> # of iterations is printed. Apart from that it also prints
>> invariant expression that are used during the algorithm which
>> considers a set of candidates which is improved.
>>
>> Main motivation for doing this was that sometimes the optimization
>> considers a constant integer as invariant expression (Bin Cheng
>> is working on removal of these) and that both IVs and IE are considered
>> by the cost model to occupy a register. Which is not ideal and
>> it sometimes tend to introduce more IVs that one would expect.
>>
>> === New format ===:
>> Improved to:
>>   cost: 27 (complexity 2)
>>   cand_cost: 11
>>   cand_group_cost: 10 (complexity 2)
>>   candidates: 3, 5
>>    group:0 --> iv_cand:5, cost=(2,0)
>>    group:1 --> iv_cand:5, cost=(4,1)
>>    group:2 --> iv_cand:5, cost=(4,1)
>>    group:3 --> iv_cand:3, cost=(0,0)
>>    group:4 --> iv_cand:3, cost=(0,0)
>>   invariants 1, 6
>>   used invariant expressions:
>>    inv_expr:3:  ((sizetype) _976 - (sizetype) _922) * 4
>>    inv_expr:6:  ((sizetype) _1335 - (sizetype) _922) * 4
>>
>>
>> Original cost 27 (complexity 2)
>>
>> Final cost 27 (complexity 2)
>>
>> Selected IV set for loop 96 at original.f90:820, 5 avg niters, 2 expressions, 2 IVs:
>>
>> === Before ===:
>>
>> Improved to:
>>   cost: 27 (complexity 2)
>>   cand_cost: 11
>>   cand_group_cost: 10 (complexity 2)
>>   candidates: 3, 5
>>    group:0 --> iv_cand:5, cost=(2,0)
>>    group:1 --> iv_cand:5, cost=(4,1)
>>    group:2 --> iv_cand:5, cost=(4,1)
>>    group:3 --> iv_cand:3, cost=(0,0)
>>    group:4 --> iv_cand:3, cost=(0,0)
>>   invariants 1, 6
>>
>> Original cost 27 (complexity 2)
>>
>> Final cost 27 (complexity 2)
>>
>> Selected IV set for loop 96 at original.f90:820, 2 IVs:
>
> But it slows donw compile-time just for enhanced dump files.  Can you
> make the new
> hash-map conditional on dumping?
Hi,

Another way is to remove the use of id for struct iv_inv_expr_ent once
for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
to pointers, and rename iv_inv_expr_ent.id to count and use this to
record reference number in iv_ca.  This if-statement on dump_file can
be saved.  Also I think it simplifies current code a bit.  For now,
there are id <-> struct maps for different structures in IVOPT which
make it not straightforward.

Thanks,
bin
>
> Richard.
>
>>
>> Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-10 13:16       ` Bin.Cheng
@ 2016-05-11 14:18         ` Martin Liška
  2016-05-12 12:14         ` Martin Liška
  1 sibling, 0 replies; 34+ messages in thread
From: Martin Liška @ 2016-05-11 14:18 UTC (permalink / raw)
  To: Bin.Cheng, Richard Biener; +Cc: GCC Patches, Jan Hubicka

On 05/10/2016 03:16 PM, Bin.Cheng wrote:
> Another way is to remove the use of id for struct iv_inv_expr_ent once
> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
> to pointers, and rename iv_inv_expr_ent.id to count and use this to
> record reference number in iv_ca.  This if-statement on dump_file can
> be saved.  Also I think it simplifies current code a bit.  For now,
> there are id <-> struct maps for different structures in IVOPT which
> make it not straightforward.

Sound good to me, I will re-implement dump enhancement in suggested manner.

Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-10 13:16       ` Bin.Cheng
  2016-05-11 14:18         ` Martin Liška
@ 2016-05-12 12:14         ` Martin Liška
  2016-05-12 13:51           ` Bin.Cheng
  1 sibling, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-12 12:14 UTC (permalink / raw)
  To: Bin.Cheng, Richard Biener; +Cc: GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On 05/10/2016 03:16 PM, Bin.Cheng wrote:
> Another way is to remove the use of id for struct iv_inv_expr_ent once
> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
> to pointers, and rename iv_inv_expr_ent.id to count and use this to
> record reference number in iv_ca.  This if-statement on dump_file can
> be saved.  Also I think it simplifies current code a bit.  For now,
> there are id <-> struct maps for different structures in IVOPT which
> make it not straightforward.

Hi.

I'm sending second version of the patch. I tried to follow your advices, but
because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
putting counter to iv_inv_expr_ent does not works. Instead of that, I've
decided to replace used_inv_expr with a hash_map that contains used inv_exps
and where value of the map is # of usages.

Further questions:
+ iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
Group 0:
  cand	cost	scaled	freq	compl.	depends on
  5	2	2.00	1.000	
  6	4	4.00	1.001	 inv_expr:0
  7	4	4.00	1.001	 inv_expr:1
  8	4	4.00	1.001	 inv_expr:2

That can be replaced with print_generic_expr, but I think using ids makes the dump
output more clear.

+ As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
to fix all 8 spaces issues. Hope it's fine.

I'm going to test the patch.
Thoughts?

Martin

[-- Attachment #2: 0003-Enhance-dumps-of-IVOPTS.patch --]
[-- Type: text/x-patch, Size: 33232 bytes --]

From ce02c80c053c2a8a63ce6e87f5779a8dc5f470ee Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 25 Apr 2016 14:29:01 +0200
Subject: [PATCH 3/3] Enhance dumps of IVOPTS

gcc/ChangeLog:

2016-05-12  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (avg_loop_niter): Fix coding style.
	(struct cost_pair): Replace inv_expr_id with direct pointer
	to a iv_inv_expr_ent.
	(struct iv_inv_expr_ent): Add comment for struct fields.
	(struct iv_ca): Remove used_inv_exprs and replace it with a
	hash_map called used_inv_exprs.
	(niter_for_exit): Fix coding style.
	(determine_base_object): Likewise.
	(alloc_iv): Likewise.
	(find_interesting_uses_outside): Likewise.
	(add_candidate_1): Likewise.
	(add_standard_iv_candidates): Likewise.
	(set_group_iv_cost): Use inv_expr instead of inv_expr_id.
	(prepare_decl_rtl): Fix coding style.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(compare_aff_trees): Likewise.
	(get_expr_id): Return iv_inv_expr_ent * instead of inv_expr_id.
	(get_loop_invariant_expr_id): Likewise.
	(get_computation_cost_at):
	(get_computation_cost): Replace usage of inv_expr_id (int) with
	inv_expr (iv_inv_expr_ent *).
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(iv_period): Fix coding style.
	(iv_elimination_compare_lt): Likewise.
	(may_eliminate_iv): Likewise.
	(determine_group_iv_cost_cond): Replace usage of inv_expr_id (int) with
	inv_expr (iv_inv_expr_ent *).
	(determine_group_iv_costs): Likewise.
	(iv_ca_recount_cost): Use used_inv_exprs to determine # of
	used invariant expressions.
	(iv_ca_set_remove_invariants): Fix coding style.
	(iv_ca_set_no_cp): Use newly added hash_map.
	(iv_ca_set_add_invariants): Likewise.
	(iv_ca_set_cp): Likewise.
	(iv_ca_new): Initialize the newly added hash_map.
	(iv_ca_free): Delete it.
	(iv_ca_dump): Fix coding style and dump used invariant
	expressions.
	(iv_ca_extend): Fix coding style.
	(try_add_cand_for): Likewise.
	(create_new_ivs): Display information about # of avg niters and
	# of used invariant expressions.
	(rewrite_use_compare): Fix coding style.

gcc/ChangeLog:

gcc/testsuite/ChangeLog:

2016-04-29  Martin Liska  <mliska@suse.cz>

	* g++.dg/tree-ssa/ivopts-3.C: Change test-case to follow
	the new format of dump output.
---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c               | 378 ++++++++++++++++---------------
 2 files changed, 201 insertions(+), 179 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 6194e9d..eb72581 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 17af590..5a48db2 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -130,7 +130,7 @@ avg_loop_niter (struct loop *loop)
     {
       niter = max_stmt_executions_int (loop);
       if (niter == -1 || niter > AVG_LOOP_NITER (loop))
-        return AVG_LOOP_NITER (loop);
+	return AVG_LOOP_NITER (loop);
     }
 
   return niter;
@@ -485,6 +485,9 @@ comp_cost::get_no_cost ()
   return comp_cost ();
 }
 
+/* Forward declaration.  */
+struct iv_inv_expr_ent;
+
 /* The candidate - cost pair.  */
 struct cost_pair
 {
@@ -496,7 +499,7 @@ struct cost_pair
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
   enum tree_code comp;	/* For iv elimination, the comparison.  */
-  int inv_expr_id;      /* Loop invariant expression id.  */
+  iv_inv_expr_ent *inv_expr; /* Loop invariant expression.  */
 };
 
 /* Use.  */
@@ -608,10 +611,14 @@ iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
 }
 
 /* Loop invariant expression hashtable entry.  */
+
 struct iv_inv_expr_ent
 {
+  /* Tree expression of the entry.  */
   tree expr;
+  /* Unique indentifier.  */
   int id;
+  /* Hash value.  */
   hashval_t hash;
 };
 
@@ -744,12 +751,8 @@ struct iv_ca
   /* Number of times each invariant is used.  */
   unsigned *n_invariant_uses;
 
-  /* The array holding the number of uses of each loop
-     invariant expressions created by ivopt.  */
-  unsigned *used_inv_expr;
-
-  /* The number of created loop invariants.  */
-  unsigned num_used_inv_expr;
+  /* Hash set with used invariant expression.  */
+  hash_map <iv_inv_expr_ent *, unsigned> *used_inv_exprs;
 
   /* Total cost of the assignment.  */
   comp_cost cost;
@@ -1141,8 +1144,8 @@ niter_for_exit (struct ivopts_data *data, edge exit)
   if (!slot)
     {
       /* Try to determine number of iterations.  We cannot safely work with ssa
-         names that appear in phi nodes on abnormal edges, so that we do not
-         create overlapping life ranges for them (PR 27283).  */
+	 names that appear in phi nodes on abnormal edges, so that we do not
+	 create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (!number_of_iterations_exit (data->current_loop,
 				      exit, desc, true)
@@ -1231,7 +1234,7 @@ determine_base_object (tree expr)
 	return determine_base_object (TREE_OPERAND (base, 0));
 
       return fold_convert (ptr_type_node,
-		           build_fold_addr_expr (base));
+			   build_fold_addr_expr (base));
 
     case POINTER_PLUS_EXPR:
       return determine_base_object (TREE_OPERAND (expr, 0));
@@ -1290,7 +1293,7 @@ alloc_iv (struct ivopts_data *data, tree base, tree step,
      By doing this:
        1) More accurate cost can be computed for address expressions;
        2) Duplicate candidates won't be created for bases in different
-          forms, like &a[0] and &a.  */
+	  forms, like &a[0] and &a.  */
   STRIP_NOPS (expr);
   if ((TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
       || contain_complex_addr_expr (expr))
@@ -2566,7 +2569,7 @@ find_interesting_uses_outside (struct ivopts_data *data, edge exit)
       phi = psi.phi ();
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (!virtual_operand_p (def))
-        find_interesting_uses_op (data, def);
+	find_interesting_uses_op (data, def);
     }
 }
 
@@ -3086,8 +3089,8 @@ add_candidate_1 (struct ivopts_data *data,
 
       if (operand_equal_p (base, cand->iv->base, 0)
 	  && operand_equal_p (step, cand->iv->step, 0)
-          && (TYPE_PRECISION (TREE_TYPE (base))
-              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
+	  && (TYPE_PRECISION (TREE_TYPE (base))
+	      == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -3237,14 +3240,14 @@ add_standard_iv_candidates (struct ivopts_data *data)
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_integer_type_node) > TYPE_PRECISION (integer_type_node)
+	(long_integer_type_node) > TYPE_PRECISION (integer_type_node)
       && TYPE_PRECISION (long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_integer_type_node, 0),
 		   build_int_cst (long_integer_type_node, 1), true, NULL);
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
+	(long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
       && TYPE_PRECISION (long_long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_long_integer_type_node, 0),
 		   build_int_cst (long_long_integer_type_node, 1), true, NULL);
@@ -3572,7 +3575,7 @@ static void
 set_group_iv_cost (struct ivopts_data *data,
 		   struct iv_group *group, struct iv_cand *cand,
 		   comp_cost cost, bitmap depends_on, tree value,
-		   enum tree_code comp, int inv_expr_id)
+		   enum tree_code comp, iv_inv_expr_ent *inv_expr)
 {
   unsigned i, s;
 
@@ -3589,7 +3592,7 @@ set_group_iv_cost (struct ivopts_data *data,
       group->cost_map[cand->id].depends_on = depends_on;
       group->cost_map[cand->id].value = value;
       group->cost_map[cand->id].comp = comp;
-      group->cost_map[cand->id].inv_expr_id = inv_expr_id;
+      group->cost_map[cand->id].inv_expr = inv_expr;
       return;
     }
 
@@ -3610,7 +3613,7 @@ found:
   group->cost_map[i].depends_on = depends_on;
   group->cost_map[i].value = value;
   group->cost_map[i].comp = comp;
-  group->cost_map[i].inv_expr_id = inv_expr_id;
+  group->cost_map[i].inv_expr = inv_expr;
 }
 
 /* Gets cost of (GROUP, CAND) pair.  */
@@ -3697,7 +3700,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data)
 	continue;
       obj = *expr_p;
       if (DECL_P (obj) && HAS_RTL_P (obj) && !DECL_RTL_SET_P (obj))
-        x = produce_memory_decl_rtl (obj, regno);
+	x = produce_memory_decl_rtl (obj, regno);
       break;
 
     case SSA_NAME:
@@ -4151,7 +4154,7 @@ get_address_cost (bool symbol_present, bool var_present,
 	    }
 	}
       if (i == -1)
-        off = 0;
+	off = 0;
       data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -4283,9 +4286,9 @@ get_address_cost (bool symbol_present, bool var_present,
 	 However, the symbol will have to be loaded in any case before the
 	 loop (and quite likely we have it in register already), so it does not
 	 make much sense to penalize them too heavily.  So make some final
-         tweaks for the SYMBOL_PRESENT modes:
+	 tweaks for the SYMBOL_PRESENT modes:
 
-         If VAR_PRESENT is false, and the mode obtained by changing symbol to
+	 If VAR_PRESENT is false, and the mode obtained by changing symbol to
 	 var is cheaper, use this mode with small penalty.
 	 If VAR_PRESENT is true, try whether the mode with
 	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
@@ -4402,7 +4405,7 @@ get_address_cost (bool symbol_present, bool var_present,
 
 static bool
 get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
-                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+		   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
 {
   comp_cost res;
   tree op1 = TREE_OPERAND (expr, 1);
@@ -4424,10 +4427,10 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
   /* If the target has a cheap shift-and-add or shift-and-sub instruction,
      use that in preference to a shift insn followed by an add insn.  */
   sa_cost = (TREE_CODE (expr) != MINUS_EXPR
-             ? shiftadd_cost (speed, mode, m)
-             : (mult_in_op1
-                ? shiftsub1_cost (speed, mode, m)
-                : shiftsub0_cost (speed, mode, m)));
+	     ? shiftadd_cost (speed, mode, m)
+	     : (mult_in_op1
+		? shiftsub1_cost (speed, mode, m)
+		: shiftsub0_cost (speed, mode, m)));
 
   res = comp_cost (MIN (as_cost, sa_cost), 0);
   res += (mult_in_op1 ? cost0 : cost1);
@@ -4559,20 +4562,20 @@ force_expr_to_var_cost (tree expr, bool speed)
     case NEGATE_EXPR:
       cost = comp_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
-        {
-          tree mult = NULL_TREE;
-          comp_cost sa_cost;
-          if (TREE_CODE (op1) == MULT_EXPR)
-            mult = op1;
-          else if (TREE_CODE (op0) == MULT_EXPR)
-            mult = op0;
-
-          if (mult != NULL_TREE
-              && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
-              && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
-                                    speed, &sa_cost))
-            return sa_cost;
-        }
+	{
+	  tree mult = NULL_TREE;
+	  comp_cost sa_cost;
+	  if (TREE_CODE (op1) == MULT_EXPR)
+	    mult = op1;
+	  else if (TREE_CODE (op0) == MULT_EXPR)
+	    mult = op0;
+
+	  if (mult != NULL_TREE
+	      && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
+	      && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
+				    speed, &sa_cost))
+	    return sa_cost;
+	}
       break;
 
     CASE_CONVERT:
@@ -4786,17 +4789,17 @@ compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
   for (i = 0; i < aff1->n; i++)
     {
       if (aff1->elts[i].coef != aff2->elts[i].coef)
-        return false;
+	return false;
 
       if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
-        return false;
+	return false;
     }
   return true;
 }
 
 /* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
 
-static int
+static iv_inv_expr_ent *
 get_expr_id (struct ivopts_data *data, tree expr)
 {
   struct iv_inv_expr_ent ent;
@@ -4806,13 +4809,13 @@ get_expr_id (struct ivopts_data *data, tree expr)
   ent.hash = iterative_hash_expr (expr, 0);
   slot = data->inv_expr_tab->find_slot (&ent, INSERT);
   if (*slot)
-    return (*slot)->id;
+    return *slot;
 
   *slot = XNEW (struct iv_inv_expr_ent);
   (*slot)->expr = expr;
   (*slot)->hash = ent.hash;
   (*slot)->id = data->max_inv_expr_id++;
-  return (*slot)->id;
+  return *slot;
 }
 
 /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
@@ -4820,10 +4823,10 @@ get_expr_id (struct ivopts_data *data, tree expr)
    ADDRESS_P is a flag indicating if the expression is for address
    computation.  */
 
-static int
+static iv_inv_expr_ent *
 get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
-                            tree cbase, HOST_WIDE_INT ratio,
-                            bool address_p)
+			    tree cbase, HOST_WIDE_INT ratio,
+			    bool address_p)
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
@@ -4835,7 +4838,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
 
   if ((TREE_CODE (ubase) == INTEGER_CST)
       && (TREE_CODE (cbase) == INTEGER_CST))
-    return -1;
+    return NULL;
 
   /* Strips the constant part. */
   if (TREE_CODE (ubase) == PLUS_EXPR
@@ -4843,7 +4846,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (ubase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (ubase, 1)) == INTEGER_CST)
-        ubase = TREE_OPERAND (ubase, 0);
+	ubase = TREE_OPERAND (ubase, 0);
     }
 
   /* Strips the constant part. */
@@ -4852,60 +4855,60 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (cbase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (cbase, 1)) == INTEGER_CST)
-        cbase = TREE_OPERAND (cbase, 0);
+	cbase = TREE_OPERAND (cbase, 0);
     }
 
   if (address_p)
     {
       if (((TREE_CODE (ubase) == SSA_NAME)
-           || (TREE_CODE (ubase) == ADDR_EXPR
-               && is_gimple_min_invariant (ubase)))
-          && (TREE_CODE (cbase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (ubase) == ADDR_EXPR
+	       && is_gimple_min_invariant (ubase)))
+	  && (TREE_CODE (cbase) == INTEGER_CST))
+	return NULL;
 
       if (((TREE_CODE (cbase) == SSA_NAME)
-           || (TREE_CODE (cbase) == ADDR_EXPR
-               && is_gimple_min_invariant (cbase)))
-          && (TREE_CODE (ubase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (cbase) == ADDR_EXPR
+	       && is_gimple_min_invariant (cbase)))
+	  && (TREE_CODE (ubase) == INTEGER_CST))
+	return NULL;
     }
 
   if (ratio == 1)
     {
       if (operand_equal_p (ubase, cbase, 0))
-        return -1;
+	return NULL;
 
       if (TREE_CODE (ubase) == ADDR_EXPR
-          && TREE_CODE (cbase) == ADDR_EXPR)
-        {
-          tree usym, csym;
-
-          usym = TREE_OPERAND (ubase, 0);
-          csym = TREE_OPERAND (cbase, 0);
-          if (TREE_CODE (usym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (usym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                usym = TREE_OPERAND (usym, 0);
-            }
-          if (TREE_CODE (csym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (csym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                csym = TREE_OPERAND (csym, 0);
-            }
-          if (operand_equal_p (usym, csym, 0))
-            return -1;
-        }
+	  && TREE_CODE (cbase) == ADDR_EXPR)
+	{
+	  tree usym, csym;
+
+	  usym = TREE_OPERAND (ubase, 0);
+	  csym = TREE_OPERAND (cbase, 0);
+	  if (TREE_CODE (usym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (usym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		usym = TREE_OPERAND (usym, 0);
+	    }
+	  if (TREE_CODE (csym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (csym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		csym = TREE_OPERAND (csym, 0);
+	    }
+	  if (operand_equal_p (usym, csym, 0))
+	    return NULL;
+	}
       /* Now do more complex comparison  */
       tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
       tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
       if (compare_aff_trees (&ubase_aff, &cbase_aff))
-        return -1;
+	return NULL;
     }
 
   tree_to_aff_combination (ub, TREE_TYPE (ub), &ubase_aff);
@@ -4932,7 +4935,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			 struct iv_use *use, struct iv_cand *cand,
 			 bool address_p, bitmap *depends_on, gimple *at,
 			 bool *can_autoinc,
-                         int *inv_expr_id)
+			 iv_inv_expr_ent **inv_expr)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase, cstep;
@@ -5033,17 +5036,17 @@ get_computation_cost_at (struct ivopts_data *data,
 
       /* Check to see if any adjustment is needed.  */
       if (cstepi == 0 && stmt_is_after_inc)
-        {
-          aff_tree real_cbase_aff;
-          aff_tree cstep_aff;
+	{
+	  aff_tree real_cbase_aff;
+	  aff_tree cstep_aff;
 
-          tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
-                                   &real_cbase_aff);
-          tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
+	  tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
+				   &real_cbase_aff);
+	  tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
 
-          aff_combination_add (&real_cbase_aff, &cstep_aff);
-          real_cbase = aff_combination_to_tree (&real_cbase_aff);
-        }
+	  aff_combination_add (&real_cbase_aff, &cstep_aff);
+	  real_cbase = aff_combination_to_tree (&real_cbase_aff);
+	}
 
       cost = difference_cost (data,
 			      ubase, real_cbase,
@@ -5087,13 +5090,13 @@ get_computation_cost_at (struct ivopts_data *data,
   /* Record setup cost in scrach field.  */
   cost.set_scratch (cost.get_cost ());
 
-  if (inv_expr_id && depends_on && *depends_on)
+  if (inv_expr && depends_on && *depends_on)
     {
-      *inv_expr_id =
-          get_loop_invariant_expr_id (data, ubase, cbase, ratio, address_p);
+      *inv_expr
+	= get_loop_invariant_expr_id (data, ubase, cbase, ratio, address_p);
       /* Clear depends on.  */
-      if (*inv_expr_id != -1)
-        bitmap_clear (*depends_on);
+      if (inv_expr != NULL)
+	bitmap_clear (*depends_on);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -5175,11 +5178,11 @@ static comp_cost
 get_computation_cost (struct ivopts_data *data,
 		      struct iv_use *use, struct iv_cand *cand,
 		      bool address_p, bitmap *depends_on,
-                      bool *can_autoinc, int *inv_expr_id)
+		      bool *can_autoinc, iv_inv_expr_ent **inv_expr)
 {
   return get_computation_cost_at (data,
 				  use, cand, address_p, depends_on, use->stmt,
-				  can_autoinc, inv_expr_id);
+				  can_autoinc, inv_expr);
 }
 
 /* Determines cost of computing the use in GROUP with CAND in a generic
@@ -5190,7 +5193,7 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
 				 struct iv_group *group, struct iv_cand *cand)
 {
   comp_cost cost;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   bitmap depends_on = NULL;
   struct iv_use *use = group->vuses[0];
 
@@ -5202,10 +5205,10 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
     cost = comp_cost::get_no_cost ();
   else
     cost = get_computation_cost (data, use, cand, false,
-				 &depends_on, NULL, &inv_expr_id);
+				 &depends_on, NULL, &inv_expr);
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
   return !cost.infinite_cost_p ();
 }
 
@@ -5218,12 +5221,12 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   unsigned i;
   bitmap depends_on;
   bool can_autoinc, first = true;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
   comp_cost sum_cost = comp_cost::get_no_cost (), cost;
 
   cost = get_computation_cost (data, use, cand, true,
-			       &depends_on, &can_autoinc, &inv_expr_id);
+			       &depends_on, &can_autoinc, &inv_expr);
 
   sum_cost = cost;
   if (!sum_cost.infinite_cost_p () && cand->ainc_use == use)
@@ -5271,7 +5274,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
       sum_cost += cost;
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
 
   return !sum_cost.infinite_cost_p ();
 }
@@ -5327,8 +5330,8 @@ iv_period (struct iv *iv)
   pow2div = num_ending_zeros (step);
 
   period = build_low_bits_mask (type,
-                                (TYPE_PRECISION (type)
-                                 - tree_to_uhwi (pow2div)));
+				(TYPE_PRECISION (type)
+				 - tree_to_uhwi (pow2div)));
 
   return period;
 }
@@ -5461,7 +5464,7 @@ difference_cannot_overflow_p (struct ivopts_data *data, tree base, tree offset)
 
 static bool
 iv_elimination_compare_lt (struct ivopts_data *data,
-                           struct iv_cand *cand, enum tree_code *comp_p,
+			   struct iv_cand *cand, enum tree_code *comp_p,
 			   struct tree_niter_desc *niter)
 {
   tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset;
@@ -5510,10 +5513,10 @@ iv_elimination_compare_lt (struct ivopts_data *data,
 
       /* Handle b < a + 1.  */
       if (TREE_CODE (op1) == PLUS_EXPR && integer_onep (TREE_OPERAND (op1, 1)))
-        {
-          a = TREE_OPERAND (op1, 0);
-          b = TREE_OPERAND (mbz, 0);
-        }
+	{
+	  a = TREE_OPERAND (op1, 0);
+	  b = TREE_OPERAND (mbz, 0);
+	}
       else
 	return false;
     }
@@ -5599,15 +5602,15 @@ may_eliminate_iv (struct ivopts_data *data,
     {
       /* See cand_value_at.  */
       if (stmt_after_increment (loop, cand, use->stmt))
-        {
-          if (!tree_int_cst_lt (desc->niter, period))
-            return false;
-        }
+	{
+	  if (!tree_int_cst_lt (desc->niter, period))
+	    return false;
+	}
       else
-        {
-          if (tree_int_cst_lt (period, desc->niter))
-            return false;
-        }
+	{
+	  if (tree_int_cst_lt (period, desc->niter))
+	    return false;
+	}
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
@@ -5619,22 +5622,23 @@ may_eliminate_iv (struct ivopts_data *data,
 
       max_niter = desc->max;
       if (stmt_after_increment (loop, cand, use->stmt))
-        max_niter += 1;
+	max_niter += 1;
       period_value = wi::to_widest (period);
       if (wi::gtu_p (max_niter, period_value))
-        {
-          /* See if we can take advantage of inferred loop bound information.  */
-          if (data->loop_single_exit_p)
-            {
-              if (!max_loop_iterations (loop, &max_niter))
-                return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (wi::gtu_p (max_niter, period_value))
-                return false;
-            }
-          else
-            return false;
-        }
+	{
+	  /* See if we can take advantage of inferred loop bound
+	     information.  */
+	  if (data->loop_single_exit_p)
+	    {
+	      if (!max_loop_iterations (loop, &max_niter))
+		return false;
+	      /* The loop bound is already adjusted by adding 1.  */
+	      if (wi::gtu_p (max_niter, period_value))
+		return false;
+	    }
+	  else
+	    return false;
+	}
     }
 
   cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
@@ -5690,7 +5694,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
   comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
+  iv_inv_expr_ent *elim_inv_expr = NULL, *express_inv_expr = NULL, *inv_expr;
   tree *control_var, *bound_cst;
   enum tree_code comp = ERROR_MARK;
   struct iv_use *use = group->vuses[0];
@@ -5710,10 +5714,10 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 	 that both 'base' and 'n' will be live during the loop.	 More likely,
 	 'base + n' will be loop invariant, resulting in only one live value
 	 during the loop.  So in that case we clear depends_on_elim and set
-        elim_inv_expr_id instead.  */
+	elim_inv_expr_id instead.  */
       if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
 	{
-	  elim_inv_expr_id = get_expr_id (data, bound);
+	  elim_inv_expr = get_expr_id (data, bound);
 	  bitmap_clear (depends_on_elim);
 	}
       /* The bound is a loop invariant, so it will be only computed
@@ -5743,7 +5747,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
-                                       &express_inv_expr_id);
+				       &express_inv_expr);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
@@ -5761,7 +5765,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       cost = elim_cost;
       depends_on = depends_on_elim;
       depends_on_elim = NULL;
-      inv_expr_id = elim_inv_expr_id;
+      inv_expr = elim_inv_expr;
     }
   else
     {
@@ -5770,11 +5774,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       depends_on_express = NULL;
       bound = NULL_TREE;
       comp = ERROR_MARK;
-      inv_expr_id = express_inv_expr_id;
+      inv_expr = express_inv_expr;
     }
 
   set_group_iv_cost (data, group, cand, cost,
-		     depends_on, bound, comp, inv_expr_id);
+		     depends_on, bound, comp, inv_expr);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -5988,9 +5992,9 @@ determine_group_iv_costs (struct ivopts_data *data)
 	      if (group->cost_map[j].depends_on)
 		bitmap_print (dump_file,
 			      group->cost_map[j].depends_on, "","");
-	      if (group->cost_map[j].inv_expr_id != -1)
+		if (group->cost_map[j].inv_expr != NULL)
 		fprintf (dump_file, " inv_expr:%d",
-			 group->cost_map[j].inv_expr_id);
+			 group->cost_map[j].inv_expr->id);
 	      fprintf (dump_file, "\n");
 	    }
 
@@ -6201,7 +6205,8 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
   cost += ivs->cand_cost;
 
   cost += ivopts_global_cost_for_size (data,
-				       ivs->n_regs + ivs->num_used_inv_expr);
+				       ivs->n_regs
+				       + ivs->used_inv_exprs->elements ());
 
   ivs->cost = cost;
 }
@@ -6221,7 +6226,7 @@ iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-        ivs->n_regs--;
+	ivs->n_regs--;
     }
 }
 
@@ -6259,11 +6264,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
-  if (cp->inv_expr_id != -1)
+  if (cp->inv_expr != NULL)
     {
-      ivs->used_inv_expr[cp->inv_expr_id]--;
-      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
-        ivs->num_used_inv_expr--;
+      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
+      --(*slot);
+      if (*slot == 0)
+	ivs->used_inv_exprs->remove (cp->inv_expr);
     }
   iv_ca_recount_cost (data, ivs);
 }
@@ -6283,7 +6289,7 @@ iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-        ivs->n_regs++;
+	ivs->n_regs++;
     }
 }
 
@@ -6323,12 +6329,11 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
       ivs->cand_use_cost += cp->cost;
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
-      if (cp->inv_expr_id != -1)
-        {
-          ivs->used_inv_expr[cp->inv_expr_id]++;
-          if (ivs->used_inv_expr[cp->inv_expr_id] == 1)
-            ivs->num_used_inv_expr++;
-        }
+      if (cp->inv_expr != NULL)
+	{
+	  unsigned *slot = &ivs->used_inv_exprs->get_or_insert (cp->inv_expr);
+	  ++(*slot);
+	}
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -6537,9 +6542,8 @@ iv_ca_new (struct ivopts_data *data)
   nw->cand_use_cost = comp_cost::get_no_cost ();
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->used_inv_exprs = new hash_map <iv_inv_expr_ent *, unsigned> (13);
   nw->cost = comp_cost::get_no_cost ();
-  nw->used_inv_expr = XCNEWVEC (unsigned, data->max_inv_expr_id + 1);
-  nw->num_used_inv_expr = 0;
 
   return nw;
 }
@@ -6553,7 +6557,7 @@ iv_ca_free (struct iv_ca **ivs)
   free ((*ivs)->n_cand_uses);
   BITMAP_FREE ((*ivs)->cands);
   free ((*ivs)->n_invariant_uses);
-  free ((*ivs)->used_inv_expr);
+  delete ((*ivs)->used_inv_exprs);
   free (*ivs);
   *ivs = NULL;
 }
@@ -6579,7 +6583,7 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
       struct iv_group *group = data->vgroups[i];
       struct cost_pair *cp = iv_ca_cand_for_group (ivs, group);
       if (cp)
-        fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
+	fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
 		 group->id, cp->cand->id, cp->cost.get_cost (),
 		 cp->cost.get_complexity ());
       else
@@ -6592,6 +6596,20 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
+
+  if (ivs->used_inv_exprs->elements () > 0)
+    {
+      fprintf (dump_file, "\n  used invariant expressions:\n");
+      for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
+	   = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end ();
+	   ++it)
+	{
+	  fprintf (dump_file, "   inv_expr:%d: \t", (*it).first->id);
+	  print_generic_expr (dump_file, (*it).first->expr, TDF_SLIM);
+	  fprintf (dump_file, "\n");
+	}
+    }
+
   fprintf (file, "\n\n");
 }
 
@@ -6628,7 +6646,7 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
-        continue;
+	continue;
 
       *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
     }
@@ -6932,7 +6950,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (iv_ca_cand_used_p (ivs, cand))
-        continue;
+	continue;
 
       cp = get_group_iv_cost (data, group, cand);
       if (!cp)
@@ -6940,7 +6958,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 
       iv_ca_set_cp (data, ivs, group, cp);
       act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
-                               true);
+			       true);
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
@@ -7253,12 +7271,16 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
       if (data->loop_loc != UNKNOWN_LOCATION)
 	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 		 LOCATION_LINE (data->loop_loc));
+      fprintf (dump_file, ", %lu avg niters",
+	       avg_loop_niter (data->current_loop));
+      fprintf (dump_file, ", %lu expressions",
+	       set->used_inv_exprs->elements ());
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
-        {
-          cand = data->vcands[i];
-          dump_cand (dump_file, cand);
-        }
+	{
+	  cand = data->vcands[i];
+	  dump_cand (dump_file, cand);
+	}
       fprintf (dump_file, "\n");
     }
 }
@@ -7513,10 +7535,10 @@ rewrite_use_compare (struct ivopts_data *data,
       gimple_seq stmts;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-        {
-          fprintf (dump_file, "Replacing exit test: ");
-          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
-        }
+	{
+	  fprintf (dump_file, "Replacing exit test: ");
+	  print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+	}
       compare = cp->comp;
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-12 12:14         ` Martin Liška
@ 2016-05-12 13:51           ` Bin.Cheng
  2016-05-12 16:42             ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Bin.Cheng @ 2016-05-12 13:51 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>> record reference number in iv_ca.  This if-statement on dump_file can
>> be saved.  Also I think it simplifies current code a bit.  For now,
>> there are id <-> struct maps for different structures in IVOPT which
>> make it not straightforward.
>
> Hi.
>
> I'm sending second version of the patch. I tried to follow your advices, but
> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
> decided to replace used_inv_expr with a hash_map that contains used inv_exps
> and where value of the map is # of usages.
>
> Further questions:
> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
> Group 0:
>   cand  cost    scaled  freq    compl.  depends on
>   5     2       2.00    1.000
>   6     4       4.00    1.001    inv_expr:0
>   7     4       4.00    1.001    inv_expr:1
>   8     4       4.00    1.001    inv_expr:2
>
> That can be replaced with print_generic_expr, but I think using ids makes the dump
> output more clear.
I am okay with keeping id.  Could you please dump all inv_exprs in a
single section like
<Invariant Exprs>:
inv_expr 0: print_generic_expr
inv_expr 1: ...

Then only dump the id afterwards?

>
> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
> to fix all 8 spaces issues. Hope it's fine.
>
> I'm going to test the patch.
> Thoughts?

Some comments on the patch embedded.

>
> +/* Forward declaration.  */
Not necessary.
> +struct iv_inv_expr_ent;
> +

>
>  /* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
>
> -static int
> +static iv_inv_expr_ent *
>  get_expr_id (struct ivopts_data *data, tree expr)
We are not returning id any more, maybe rename to record_inv_expr or else.

>  {
>    struct iv_inv_expr_ent ent;
> @@ -4806,13 +4809,13 @@ get_expr_id (struct ivopts_data *data, tree expr)
>    ent.hash = iterative_hash_expr (expr, 0);
>    slot = data->inv_expr_tab->find_slot (&ent, INSERT);
>    if (*slot)
> -    return (*slot)->id;
> +    return *slot;
>
>    *slot = XNEW (struct iv_inv_expr_ent);
>    (*slot)->expr = expr;
>    (*slot)->hash = ent.hash;
>    (*slot)->id = data->max_inv_expr_id++;
> -  return (*slot)->id;
> +  return *slot;
This could be changed to
  if (!*slot)
    {
      //new and insert
    }
  return *slot;
>  }
>
>  /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
> @@ -4820,10 +4823,10 @@ get_expr_id (struct ivopts_data *data, tree expr)
>     ADDRESS_P is a flag indicating if the expression is for address
>     computation.  */
>
> -static int
> +static iv_inv_expr_ent *
>  get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
> -                            tree cbase, HOST_WIDE_INT ratio,
> -                            bool address_p)
> +                tree cbase, HOST_WIDE_INT ratio,
> +                bool address_p)
Rename function name here too.
>  {

> @@ -5988,9 +5992,9 @@ determine_group_iv_costs (struct ivopts_data *data)
>            if (group->cost_map[j].depends_on)
>          bitmap_print (dump_file,
>                    group->cost_map[j].depends_on, "","");
> -          if (group->cost_map[j].inv_expr_id != -1)
> +        if (group->cost_map[j].inv_expr != NULL)
>          fprintf (dump_file, " inv_expr:%d",
> -             group->cost_map[j].inv_expr_id);
> +             group->cost_map[j].inv_expr->id);
Dump inv_expr in another column thus it won't appear under depends_on
in dump.  Also make it preceding depends_on which is a bitmap.

While we are on this one before the other two, could you please make
this independent so it can be committed after rework?

Thanks,
bin

>
> Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-12 13:51           ` Bin.Cheng
@ 2016-05-12 16:42             ` Martin Liška
  2016-05-13  9:43               ` Bin.Cheng
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-12 16:42 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 5822 bytes --]

On 05/12/2016 03:51 PM, Bin.Cheng wrote:
> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>> record reference number in iv_ca.  This if-statement on dump_file can
>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>> there are id <-> struct maps for different structures in IVOPT which
>>> make it not straightforward.
>>
>> Hi.
>>
>> I'm sending second version of the patch. I tried to follow your advices, but
>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>> and where value of the map is # of usages.
>>
>> Further questions:
>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>> Group 0:
>>   cand  cost    scaled  freq    compl.  depends on
>>   5     2       2.00    1.000
>>   6     4       4.00    1.001    inv_expr:0
>>   7     4       4.00    1.001    inv_expr:1
>>   8     4       4.00    1.001    inv_expr:2
>>
>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>> output more clear.
> I am okay with keeping id.  Could you please dump all inv_exprs in a
> single section like
> <Invariant Exprs>:
> inv_expr 0: print_generic_expr
> inv_expr 1: ...
> 
> Then only dump the id afterwards?
> 

Sure, it would be definitely better:

The new dump format looks:

<Invariant Expressions>:
inv_expr 0: 	sudoku_351(D) + (sizetype) S.833_774 * 4
inv_expr 1: 	sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
inv_expr 2: 	sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
inv_expr 3: 	sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
inv_expr 4: 	&A.832 + (sizetype) _377 * 4
inv_expr 5: 	&A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
inv_expr 6: 	&A.832 + ((sizetype) _377 + 8) * 4
inv_expr 7: 	&A.832 + ((sizetype) _377 + 9) * 4

<Group-candidate Costs>:
Group 0:
  cand	cost	scaled	freq	compl.	depends on

...

Improved to:
  cost: 27 (complexity 2)
  cand_cost: 11
  cand_group_cost: 10 (complexity 2)
  candidates: 3, 5
   group:0 --> iv_cand:5, cost=(2,0)
   group:1 --> iv_cand:5, cost=(4,1)
   group:2 --> iv_cand:5, cost=(4,1)
   group:3 --> iv_cand:3, cost=(0,0)
   group:4 --> iv_cand:3, cost=(0,0)
  invariants 1, 6
  invariant expressions 6, 3

The only question here is that as used_inv_exprs are stored in a hash_map,
order of dumped invariants would not be stable. Is it problem?

>>
>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>> to fix all 8 spaces issues. Hope it's fine.
>>
>> I'm going to test the patch.
>> Thoughts?
> 
> Some comments on the patch embedded.
> 
>>
>> +/* Forward declaration.  */
> Not necessary.
>> +struct iv_inv_expr_ent;
>> +

I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.

> 
>>
>>  /* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
>>
>> -static int
>> +static iv_inv_expr_ent *
>>  get_expr_id (struct ivopts_data *data, tree expr)
> We are not returning id any more, maybe rename to record_inv_expr or else.

Done.

> 
>>  {
>>    struct iv_inv_expr_ent ent;
>> @@ -4806,13 +4809,13 @@ get_expr_id (struct ivopts_data *data, tree expr)
>>    ent.hash = iterative_hash_expr (expr, 0);
>>    slot = data->inv_expr_tab->find_slot (&ent, INSERT);
>>    if (*slot)
>> -    return (*slot)->id;
>> +    return *slot;
>>
>>    *slot = XNEW (struct iv_inv_expr_ent);
>>    (*slot)->expr = expr;
>>    (*slot)->hash = ent.hash;
>>    (*slot)->id = data->max_inv_expr_id++;
>> -  return (*slot)->id;
>> +  return *slot;
> This could be changed to
>   if (!*slot)
>     {
>       //new and insert
>     }
>   return *slot;

Also done.

>>  }
>>
>>  /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
>> @@ -4820,10 +4823,10 @@ get_expr_id (struct ivopts_data *data, tree expr)
>>     ADDRESS_P is a flag indicating if the expression is for address
>>     computation.  */
>>
>> -static int
>> +static iv_inv_expr_ent *
>>  get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
>> -                            tree cbase, HOST_WIDE_INT ratio,
>> -                            bool address_p)
>> +                tree cbase, HOST_WIDE_INT ratio,
>> +                bool address_p)
> Rename function name here too.
>>  {
> 

Likewise.

>> @@ -5988,9 +5992,9 @@ determine_group_iv_costs (struct ivopts_data *data)
>>            if (group->cost_map[j].depends_on)
>>          bitmap_print (dump_file,
>>                    group->cost_map[j].depends_on, "","");
>> -          if (group->cost_map[j].inv_expr_id != -1)
>> +        if (group->cost_map[j].inv_expr != NULL)
>>          fprintf (dump_file, " inv_expr:%d",
>> -             group->cost_map[j].inv_expr_id);
>> +             group->cost_map[j].inv_expr->id);
> Dump inv_expr in another column thus it won't appear under depends_on
> in dump.  Also make it preceding depends_on which is a bitmap.

Sure, now the output looks:


<Group-candidate Costs>:
Group 0:
  cand	cost	compl.	inv.ex.	depends on
  0	9	0	0	2
  1	32	0		2
  2	42	0		2
  3	9	1	1	2
  4	42	0		2
  5	9	1	2	2
  6	9	0	0	2
  7	9	1	2	2
  8	9	1	3	2
  9	0	0		2

> 
> While we are on this one before the other two, could you please make
> this independent so it can be committed after rework?

That would be better to start with that. I'm going to trigger regression tests and we
can install the patch.

Martin

> 
> Thanks,
> bin
> 
>>
>> Martin


[-- Attachment #2: 0001-Enhance-dumps-of-IVOPTS.patch --]
[-- Type: text/x-patch, Size: 37967 bytes --]

From a55fc8b79f325a75769fdb04ae185c5ac9d10476 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 12 May 2016 18:30:31 +0200
Subject: [PATCH] Enhance dumps of IVOPTS

gcc/ChangeLog:

2016-05-12  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (avg_loop_niter): Fix coding style.
	(struct cost_pair): Change inv_expr_id (int) to inv_expr
	(iv_inv_expr_ent *).
	(struct iv_inv_expr_ent): Comment struct fields.
	(sort_iv_inv_expr_ent): New function.
	(struct ivopts_data): Rename inv_expr_id to max_inv_expr_id.
	(struct iv_ca): Replace used_inv_expr and num_used_inv_expr with
	a hash_map between iv_inv_expr_ent and number of usages.
	(niter_for_exit): Fix coding style.
	(tree_ssa_iv_optimize_init): Use renamed variable.
	(determine_base_object): Fix coding style.
	(alloc_iv): Likewise.
	(find_interesting_uses_outside): Likewise.
	(add_candidate_1): Likewise.
	(add_standard_iv_candidates): Likewise.
	(set_group_iv_cost): Replace inv_expr_id with inv_expr.
	(prepare_decl_rtl): Fix coding style.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(compare_aff_trees): Likewise.
	(get_expr_id): Restructure the function.
	(get_loop_invariant_expr_id): Renamed to
	get_loop_invariant_expr.
	(get_computation_cost_at): Replace usage of inv_expr_id with
	inv_expr.
	(get_computation_cost): Likewise.
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(iv_period): Fix coding style.
	(iv_elimination_compare_lt): Likewise.
	(may_eliminate_iv): Likewise.
	(determine_group_iv_cost_cond):  Replace usage of inv_expr_id with
	inv_expr.
	(determine_group_iv_costs): Dump invariant expressions.
	(iv_ca_recount_cost): Use the newly added hash_map.
	(iv_ca_set_remove_invariants): Fix coding style.
	(iv_ca_set_add_invariants): Fix coding style.
	(iv_ca_set_no_cp): Utilize the newly added hash_map for used
	invariants.
	(iv_ca_set_cp): Likewise.
	(iv_ca_new): Initialize the newly added hash_map and remove
	initialization of fields.
	(iv_ca_free): Delete the hash_map.
	(iv_ca_dump): Dump invariant expressions.
	(iv_ca_extend): Fix coding style.
	(try_add_cand_for): Likewise.
	(create_new_ivs): Dump information about # of avg iterations and
	# of used invariant expressions.
	(rewrite_use_compare): Fix coding style.
	(free_loop_data): Set default value for max_inv_expr_id.

gcc/testsuite/ChangeLog:

2016-05-12  Martin Liska  <mliska@suse.cz>

	* g++.dg/tree-ssa/ivopts-3.C: Change test-case to follow
	the new format of dump output.
---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c               | 458 ++++++++++++++++++-------------
 2 files changed, 261 insertions(+), 199 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 6194e9d..eb72581 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b24cac4..2fe074b 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -129,7 +129,7 @@ avg_loop_niter (struct loop *loop)
     {
       niter = max_stmt_executions_int (loop);
       if (niter == -1 || niter > AVG_LOOP_NITER (loop))
-        return AVG_LOOP_NITER (loop);
+	return AVG_LOOP_NITER (loop);
     }
 
   return niter;
@@ -184,6 +184,9 @@ struct comp_cost
 static const comp_cost no_cost = {0, 0, 0};
 static const comp_cost infinite_cost = {INFTY, INFTY, INFTY};
 
+/* Forward declaration.  */
+struct iv_inv_expr_ent;
+
 /* The candidate - cost pair.  */
 struct cost_pair
 {
@@ -195,7 +198,7 @@ struct cost_pair
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
   enum tree_code comp;	/* For iv elimination, the comparison.  */
-  int inv_expr_id;      /* Loop invariant expression id.  */
+  iv_inv_expr_ent *inv_expr; /* Loop invariant expression.  */
 };
 
 /* Use.  */
@@ -307,13 +310,36 @@ iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
 }
 
 /* Loop invariant expression hashtable entry.  */
+
 struct iv_inv_expr_ent
 {
+  /* Tree expression of the entry.  */
   tree expr;
+  /* Unique indentifier.  */
   int id;
+  /* Hash value.  */
   hashval_t hash;
 };
 
+/* Sort iv_inv_expr_ent pair A and B by id field.  */
+
+static int
+sort_iv_inv_expr_ent (const void *a, const void *b)
+{
+  const iv_inv_expr_ent * const *e1 = (const iv_inv_expr_ent * const *) (a);
+  const iv_inv_expr_ent * const *e2 = (const iv_inv_expr_ent * const *) (b);
+
+  unsigned id1 = (*e1)->id;
+  unsigned id2 = (*e2)->id;
+
+  if (id1 < id2)
+    return -1;
+  else if (id1 > id2)
+    return 1;
+  else
+    return 0;
+}
+
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -363,7 +389,7 @@ struct ivopts_data
   hash_table<iv_inv_expr_hasher> *inv_expr_tab;
 
   /* Loop invariant expression id.  */
-  int inv_expr_id;
+  int max_inv_expr_id;
 
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
@@ -443,12 +469,8 @@ struct iv_ca
   /* Number of times each invariant is used.  */
   unsigned *n_invariant_uses;
 
-  /* The array holding the number of uses of each loop
-     invariant expressions created by ivopt.  */
-  unsigned *used_inv_expr;
-
-  /* The number of created loop invariants.  */
-  unsigned num_used_inv_expr;
+  /* Hash set with used invariant expression.  */
+  hash_map <iv_inv_expr_ent *, unsigned> *used_inv_exprs;
 
   /* Total cost of the assignment.  */
   comp_cost cost;
@@ -840,8 +862,8 @@ niter_for_exit (struct ivopts_data *data, edge exit)
   if (!slot)
     {
       /* Try to determine number of iterations.  We cannot safely work with ssa
-         names that appear in phi nodes on abnormal edges, so that we do not
-         create overlapping life ranges for them (PR 27283).  */
+	 names that appear in phi nodes on abnormal edges, so that we do not
+	 create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (!number_of_iterations_exit (data->current_loop,
 				      exit, desc, true)
@@ -888,7 +910,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->vgroups.create (20);
   data->vcands.create (20);
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
   data->name_expansion_cache = NULL;
   data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
   data->iv_common_cands.create (20);
@@ -930,7 +952,7 @@ determine_base_object (tree expr)
 	return determine_base_object (TREE_OPERAND (base, 0));
 
       return fold_convert (ptr_type_node,
-		           build_fold_addr_expr (base));
+			   build_fold_addr_expr (base));
 
     case POINTER_PLUS_EXPR:
       return determine_base_object (TREE_OPERAND (expr, 0));
@@ -989,7 +1011,7 @@ alloc_iv (struct ivopts_data *data, tree base, tree step,
      By doing this:
        1) More accurate cost can be computed for address expressions;
        2) Duplicate candidates won't be created for bases in different
-          forms, like &a[0] and &a.  */
+	  forms, like &a[0] and &a.  */
   STRIP_NOPS (expr);
   if ((TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
       || contain_complex_addr_expr (expr))
@@ -2265,7 +2287,7 @@ find_interesting_uses_outside (struct ivopts_data *data, edge exit)
       phi = psi.phi ();
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (!virtual_operand_p (def))
-        find_interesting_uses_op (data, def);
+	find_interesting_uses_op (data, def);
     }
 }
 
@@ -2785,8 +2807,8 @@ add_candidate_1 (struct ivopts_data *data,
 
       if (operand_equal_p (base, cand->iv->base, 0)
 	  && operand_equal_p (step, cand->iv->step, 0)
-          && (TYPE_PRECISION (TREE_TYPE (base))
-              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
+	  && (TYPE_PRECISION (TREE_TYPE (base))
+	      == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -2936,14 +2958,14 @@ add_standard_iv_candidates (struct ivopts_data *data)
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_integer_type_node) > TYPE_PRECISION (integer_type_node)
+	(long_integer_type_node) > TYPE_PRECISION (integer_type_node)
       && TYPE_PRECISION (long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_integer_type_node, 0),
 		   build_int_cst (long_integer_type_node, 1), true, NULL);
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
+	(long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
       && TYPE_PRECISION (long_long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_long_integer_type_node, 0),
 		   build_int_cst (long_long_integer_type_node, 1), true, NULL);
@@ -3329,7 +3351,7 @@ static void
 set_group_iv_cost (struct ivopts_data *data,
 		   struct iv_group *group, struct iv_cand *cand,
 		   comp_cost cost, bitmap depends_on, tree value,
-		   enum tree_code comp, int inv_expr_id)
+		   enum tree_code comp, iv_inv_expr_ent *inv_expr)
 {
   unsigned i, s;
 
@@ -3346,7 +3368,7 @@ set_group_iv_cost (struct ivopts_data *data,
       group->cost_map[cand->id].depends_on = depends_on;
       group->cost_map[cand->id].value = value;
       group->cost_map[cand->id].comp = comp;
-      group->cost_map[cand->id].inv_expr_id = inv_expr_id;
+      group->cost_map[cand->id].inv_expr = inv_expr;
       return;
     }
 
@@ -3367,7 +3389,7 @@ found:
   group->cost_map[i].depends_on = depends_on;
   group->cost_map[i].value = value;
   group->cost_map[i].comp = comp;
-  group->cost_map[i].inv_expr_id = inv_expr_id;
+  group->cost_map[i].inv_expr = inv_expr;
 }
 
 /* Gets cost of (GROUP, CAND) pair.  */
@@ -3454,7 +3476,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data)
 	continue;
       obj = *expr_p;
       if (DECL_P (obj) && HAS_RTL_P (obj) && !DECL_RTL_SET_P (obj))
-        x = produce_memory_decl_rtl (obj, regno);
+	x = produce_memory_decl_rtl (obj, regno);
       break;
 
     case SSA_NAME:
@@ -3908,7 +3930,7 @@ get_address_cost (bool symbol_present, bool var_present,
 	    }
 	}
       if (i == -1)
-        off = 0;
+	off = 0;
       data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -4040,9 +4062,9 @@ get_address_cost (bool symbol_present, bool var_present,
 	 However, the symbol will have to be loaded in any case before the
 	 loop (and quite likely we have it in register already), so it does not
 	 make much sense to penalize them too heavily.  So make some final
-         tweaks for the SYMBOL_PRESENT modes:
+	 tweaks for the SYMBOL_PRESENT modes:
 
-         If VAR_PRESENT is false, and the mode obtained by changing symbol to
+	 If VAR_PRESENT is false, and the mode obtained by changing symbol to
 	 var is cheaper, use this mode with small penalty.
 	 If VAR_PRESENT is true, try whether the mode with
 	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
@@ -4159,7 +4181,7 @@ get_address_cost (bool symbol_present, bool var_present,
 
 static bool
 get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
-                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+		   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
 {
   comp_cost res;
   tree op1 = TREE_OPERAND (expr, 1);
@@ -4181,10 +4203,10 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
   /* If the target has a cheap shift-and-add or shift-and-sub instruction,
      use that in preference to a shift insn followed by an add insn.  */
   sa_cost = (TREE_CODE (expr) != MINUS_EXPR
-             ? shiftadd_cost (speed, mode, m)
-             : (mult_in_op1
-                ? shiftsub1_cost (speed, mode, m)
-                : shiftsub0_cost (speed, mode, m)));
+	     ? shiftadd_cost (speed, mode, m)
+	     : (mult_in_op1
+		? shiftsub1_cost (speed, mode, m)
+		: shiftsub0_cost (speed, mode, m)));
 
   res = new_cost (MIN (as_cost, sa_cost), 0);
   res = add_costs (res, mult_in_op1 ? cost0 : cost1);
@@ -4316,20 +4338,20 @@ force_expr_to_var_cost (tree expr, bool speed)
     case NEGATE_EXPR:
       cost = new_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
-        {
-          tree mult = NULL_TREE;
-          comp_cost sa_cost;
-          if (TREE_CODE (op1) == MULT_EXPR)
-            mult = op1;
-          else if (TREE_CODE (op0) == MULT_EXPR)
-            mult = op0;
-
-          if (mult != NULL_TREE
-              && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
-              && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
-                                    speed, &sa_cost))
-            return sa_cost;
-        }
+	{
+	  tree mult = NULL_TREE;
+	  comp_cost sa_cost;
+	  if (TREE_CODE (op1) == MULT_EXPR)
+	    mult = op1;
+	  else if (TREE_CODE (op0) == MULT_EXPR)
+	    mult = op0;
+
+	  if (mult != NULL_TREE
+	      && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
+	      && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
+				    speed, &sa_cost))
+	    return sa_cost;
+	}
       break;
 
     CASE_CONVERT:
@@ -4543,18 +4565,18 @@ compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
   for (i = 0; i < aff1->n; i++)
     {
       if (aff1->elts[i].coef != aff2->elts[i].coef)
-        return false;
+	return false;
 
       if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
-        return false;
+	return false;
     }
   return true;
 }
 
-/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
+/* Stores EXPR in DATA->inv_expr_tab, return pointer to iv_inv_expr_ent.  */
 
-static int
-get_expr_id (struct ivopts_data *data, tree expr)
+static iv_inv_expr_ent *
+record_inv_expr (struct ivopts_data *data, tree expr)
 {
   struct iv_inv_expr_ent ent;
   struct iv_inv_expr_ent **slot;
@@ -4562,25 +4584,27 @@ get_expr_id (struct ivopts_data *data, tree expr)
   ent.expr = expr;
   ent.hash = iterative_hash_expr (expr, 0);
   slot = data->inv_expr_tab->find_slot (&ent, INSERT);
-  if (*slot)
-    return (*slot)->id;
 
-  *slot = XNEW (struct iv_inv_expr_ent);
-  (*slot)->expr = expr;
-  (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
-  return (*slot)->id;
+  if (!*slot)
+    {
+      *slot = XNEW (struct iv_inv_expr_ent);
+      (*slot)->expr = expr;
+      (*slot)->hash = ent.hash;
+      (*slot)->id = data->max_inv_expr_id++;
+    }
+
+  return *slot;
 }
 
-/* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
+/* Returns the invariant expression if expression UBASE - RATIO * CBASE
    requires a new compiler generated temporary.  Returns -1 otherwise.
    ADDRESS_P is a flag indicating if the expression is for address
    computation.  */
 
-static int
-get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
-                            tree cbase, HOST_WIDE_INT ratio,
-                            bool address_p)
+static iv_inv_expr_ent *
+get_loop_invariant_expr (struct ivopts_data *data, tree ubase,
+			 tree cbase, HOST_WIDE_INT ratio,
+			 bool address_p)
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
@@ -4592,7 +4616,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
 
   if ((TREE_CODE (ubase) == INTEGER_CST)
       && (TREE_CODE (cbase) == INTEGER_CST))
-    return -1;
+    return NULL;
 
   /* Strips the constant part. */
   if (TREE_CODE (ubase) == PLUS_EXPR
@@ -4600,7 +4624,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (ubase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (ubase, 1)) == INTEGER_CST)
-        ubase = TREE_OPERAND (ubase, 0);
+	ubase = TREE_OPERAND (ubase, 0);
     }
 
   /* Strips the constant part. */
@@ -4609,60 +4633,60 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (cbase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (cbase, 1)) == INTEGER_CST)
-        cbase = TREE_OPERAND (cbase, 0);
+	cbase = TREE_OPERAND (cbase, 0);
     }
 
   if (address_p)
     {
       if (((TREE_CODE (ubase) == SSA_NAME)
-           || (TREE_CODE (ubase) == ADDR_EXPR
-               && is_gimple_min_invariant (ubase)))
-          && (TREE_CODE (cbase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (ubase) == ADDR_EXPR
+	       && is_gimple_min_invariant (ubase)))
+	  && (TREE_CODE (cbase) == INTEGER_CST))
+	return NULL;
 
       if (((TREE_CODE (cbase) == SSA_NAME)
-           || (TREE_CODE (cbase) == ADDR_EXPR
-               && is_gimple_min_invariant (cbase)))
-          && (TREE_CODE (ubase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (cbase) == ADDR_EXPR
+	       && is_gimple_min_invariant (cbase)))
+	  && (TREE_CODE (ubase) == INTEGER_CST))
+	return NULL;
     }
 
   if (ratio == 1)
     {
       if (operand_equal_p (ubase, cbase, 0))
-        return -1;
+	return NULL;
 
       if (TREE_CODE (ubase) == ADDR_EXPR
-          && TREE_CODE (cbase) == ADDR_EXPR)
-        {
-          tree usym, csym;
-
-          usym = TREE_OPERAND (ubase, 0);
-          csym = TREE_OPERAND (cbase, 0);
-          if (TREE_CODE (usym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (usym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                usym = TREE_OPERAND (usym, 0);
-            }
-          if (TREE_CODE (csym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (csym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                csym = TREE_OPERAND (csym, 0);
-            }
-          if (operand_equal_p (usym, csym, 0))
-            return -1;
-        }
+	  && TREE_CODE (cbase) == ADDR_EXPR)
+	{
+	  tree usym, csym;
+
+	  usym = TREE_OPERAND (ubase, 0);
+	  csym = TREE_OPERAND (cbase, 0);
+	  if (TREE_CODE (usym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (usym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		usym = TREE_OPERAND (usym, 0);
+	    }
+	  if (TREE_CODE (csym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (csym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		csym = TREE_OPERAND (csym, 0);
+	    }
+	  if (operand_equal_p (usym, csym, 0))
+	    return NULL;
+	}
       /* Now do more complex comparison  */
       tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
       tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
       if (compare_aff_trees (&ubase_aff, &cbase_aff))
-        return -1;
+	return NULL;
     }
 
   tree_to_aff_combination (ub, TREE_TYPE (ub), &ubase_aff);
@@ -4671,7 +4695,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
   aff_combination_scale (&cbase_aff, -1 * ratio);
   aff_combination_add (&ubase_aff, &cbase_aff);
   expr = aff_combination_to_tree (&ubase_aff);
-  return get_expr_id (data, expr);
+  return record_inv_expr (data, expr);
 }
 
 
@@ -4689,7 +4713,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			 struct iv_use *use, struct iv_cand *cand,
 			 bool address_p, bitmap *depends_on, gimple *at,
 			 bool *can_autoinc,
-                         int *inv_expr_id)
+			 iv_inv_expr_ent **inv_expr)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase, cstep;
@@ -4790,17 +4814,17 @@ get_computation_cost_at (struct ivopts_data *data,
 
       /* Check to see if any adjustment is needed.  */
       if (cstepi == 0 && stmt_is_after_inc)
-        {
-          aff_tree real_cbase_aff;
-          aff_tree cstep_aff;
+	{
+	  aff_tree real_cbase_aff;
+	  aff_tree cstep_aff;
 
-          tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
-                                   &real_cbase_aff);
-          tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
+	  tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
+				   &real_cbase_aff);
+	  tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
 
-          aff_combination_add (&real_cbase_aff, &cstep_aff);
-          real_cbase = aff_combination_to_tree (&real_cbase_aff);
-        }
+	  aff_combination_add (&real_cbase_aff, &cstep_aff);
+	  real_cbase = aff_combination_to_tree (&real_cbase_aff);
+	}
 
       cost = difference_cost (data,
 			      ubase, real_cbase,
@@ -4846,13 +4870,13 @@ get_computation_cost_at (struct ivopts_data *data,
   /* Record setup cost in scrach field.  */
   cost.scratch = cost.cost;
 
-  if (inv_expr_id && depends_on && *depends_on)
+  if (inv_expr && depends_on && *depends_on)
     {
-      *inv_expr_id =
-          get_loop_invariant_expr_id (data, ubase, cbase, ratio, address_p);
+      *inv_expr = get_loop_invariant_expr (data, ubase, cbase, ratio,
+					   address_p);
       /* Clear depends on.  */
-      if (*inv_expr_id != -1)
-        bitmap_clear (*depends_on);
+      if (inv_expr != NULL)
+	bitmap_clear (*depends_on);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -4929,11 +4953,11 @@ static comp_cost
 get_computation_cost (struct ivopts_data *data,
 		      struct iv_use *use, struct iv_cand *cand,
 		      bool address_p, bitmap *depends_on,
-                      bool *can_autoinc, int *inv_expr_id)
+		      bool *can_autoinc, iv_inv_expr_ent **inv_expr)
 {
   return get_computation_cost_at (data,
 				  use, cand, address_p, depends_on, use->stmt,
-				  can_autoinc, inv_expr_id);
+				  can_autoinc, inv_expr);
 }
 
 /* Determines cost of computing the use in GROUP with CAND in a generic
@@ -4944,7 +4968,7 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
 				 struct iv_group *group, struct iv_cand *cand)
 {
   comp_cost cost;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   bitmap depends_on = NULL;
   struct iv_use *use = group->vuses[0];
 
@@ -4956,10 +4980,10 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
     cost = no_cost;
   else
     cost = get_computation_cost (data, use, cand, false,
-				 &depends_on, NULL, &inv_expr_id);
+				 &depends_on, NULL, &inv_expr);
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
   return !infinite_cost_p (cost);
 }
 
@@ -4972,12 +4996,12 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   unsigned i;
   bitmap depends_on;
   bool can_autoinc, first = true;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
   comp_cost sum_cost = no_cost, cost;
 
   cost = get_computation_cost (data, use, cand, true,
-			       &depends_on, &can_autoinc, &inv_expr_id);
+			       &depends_on, &can_autoinc, &inv_expr);
 
   sum_cost = cost;
   if (!infinite_cost_p (sum_cost) && cand->ainc_use == use)
@@ -5025,7 +5049,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
       sum_cost = add_costs (sum_cost, cost);
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
 
   return !infinite_cost_p (sum_cost);
 }
@@ -5081,8 +5105,8 @@ iv_period (struct iv *iv)
   pow2div = num_ending_zeros (step);
 
   period = build_low_bits_mask (type,
-                                (TYPE_PRECISION (type)
-                                 - tree_to_uhwi (pow2div)));
+				(TYPE_PRECISION (type)
+				 - tree_to_uhwi (pow2div)));
 
   return period;
 }
@@ -5215,7 +5239,7 @@ difference_cannot_overflow_p (struct ivopts_data *data, tree base, tree offset)
 
 static bool
 iv_elimination_compare_lt (struct ivopts_data *data,
-                           struct iv_cand *cand, enum tree_code *comp_p,
+			   struct iv_cand *cand, enum tree_code *comp_p,
 			   struct tree_niter_desc *niter)
 {
   tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset;
@@ -5264,10 +5288,10 @@ iv_elimination_compare_lt (struct ivopts_data *data,
 
       /* Handle b < a + 1.  */
       if (TREE_CODE (op1) == PLUS_EXPR && integer_onep (TREE_OPERAND (op1, 1)))
-        {
-          a = TREE_OPERAND (op1, 0);
-          b = TREE_OPERAND (mbz, 0);
-        }
+	{
+	  a = TREE_OPERAND (op1, 0);
+	  b = TREE_OPERAND (mbz, 0);
+	}
       else
 	return false;
     }
@@ -5353,15 +5377,15 @@ may_eliminate_iv (struct ivopts_data *data,
     {
       /* See cand_value_at.  */
       if (stmt_after_increment (loop, cand, use->stmt))
-        {
-          if (!tree_int_cst_lt (desc->niter, period))
-            return false;
-        }
+	{
+	  if (!tree_int_cst_lt (desc->niter, period))
+	    return false;
+	}
       else
-        {
-          if (tree_int_cst_lt (period, desc->niter))
-            return false;
-        }
+	{
+	  if (tree_int_cst_lt (period, desc->niter))
+	    return false;
+	}
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
@@ -5373,22 +5397,23 @@ may_eliminate_iv (struct ivopts_data *data,
 
       max_niter = desc->max;
       if (stmt_after_increment (loop, cand, use->stmt))
-        max_niter += 1;
+	max_niter += 1;
       period_value = wi::to_widest (period);
       if (wi::gtu_p (max_niter, period_value))
-        {
-          /* See if we can take advantage of inferred loop bound information.  */
-          if (data->loop_single_exit_p)
-            {
-              if (!max_loop_iterations (loop, &max_niter))
-                return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (wi::gtu_p (max_niter, period_value))
-                return false;
-            }
-          else
-            return false;
-        }
+	{
+	  /* See if we can take advantage of inferred loop bound
+	     information.  */
+	  if (data->loop_single_exit_p)
+	    {
+	      if (!max_loop_iterations (loop, &max_niter))
+		return false;
+	      /* The loop bound is already adjusted by adding 1.  */
+	      if (wi::gtu_p (max_niter, period_value))
+		return false;
+	    }
+	  else
+	    return false;
+	}
     }
 
   cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
@@ -5444,7 +5469,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
   comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
+  iv_inv_expr_ent *elim_inv_expr = NULL, *express_inv_expr = NULL, *inv_expr;
   tree *control_var, *bound_cst;
   enum tree_code comp = ERROR_MARK;
   struct iv_use *use = group->vuses[0];
@@ -5456,18 +5481,18 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       if (elim_cost.cost == 0)
-        elim_cost.cost = parm_decl_cost (data, bound);
+	elim_cost.cost = parm_decl_cost (data, bound);
       else if (TREE_CODE (bound) == INTEGER_CST)
-        elim_cost.cost = 0;
+	elim_cost.cost = 0;
       /* If we replace a loop condition 'i < n' with 'p < base + n',
 	 depends_on_elim will have 'base' and 'n' set, which implies
 	 that both 'base' and 'n' will be live during the loop.	 More likely,
 	 'base + n' will be loop invariant, resulting in only one live value
 	 during the loop.  So in that case we clear depends_on_elim and set
-        elim_inv_expr_id instead.  */
+	elim_inv_expr_id instead.  */
       if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
 	{
-	  elim_inv_expr_id = get_expr_id (data, bound);
+	  elim_inv_expr = record_inv_expr (data, bound);
 	  bitmap_clear (depends_on_elim);
 	}
       /* The bound is a loop invariant, so it will be only computed
@@ -5497,7 +5522,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
-                                       &express_inv_expr_id);
+				       &express_inv_expr);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
@@ -5515,7 +5540,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       cost = elim_cost;
       depends_on = depends_on_elim;
       depends_on_elim = NULL;
-      inv_expr_id = elim_inv_expr_id;
+      inv_expr = elim_inv_expr;
     }
   else
     {
@@ -5524,11 +5549,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       depends_on_express = NULL;
       bound = NULL_TREE;
       comp = ERROR_MARK;
-      inv_expr_id = express_inv_expr_id;
+      inv_expr = express_inv_expr;
     }
 
   set_group_iv_cost (data, group, cand, cost,
-		     depends_on, bound, comp, inv_expr_id);
+		     depends_on, bound, comp, inv_expr);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -5718,14 +5743,31 @@ determine_group_iv_costs (struct ivopts_data *data)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "<Group-candidate Costs>:\n");
+      fprintf (dump_file, "\n<Invariant Expressions>:\n");
+      auto_vec <iv_inv_expr_ent *> list (data->inv_expr_tab->elements ());
+
+      for (hash_table<iv_inv_expr_hasher>::iterator it
+	   = data->inv_expr_tab->begin (); it != data->inv_expr_tab->end ();
+	   ++it)
+	list.safe_push (*it);
+
+      list.qsort (sort_iv_inv_expr_ent);
+
+      for (i = 0; i < list.length (); ++i)
+	{
+	  fprintf (dump_file, "inv_expr %d: \t", i);
+	  print_generic_expr (dump_file, list[i]->expr, TDF_SLIM);
+	  fprintf (dump_file, "\n");
+	}
+
+      fprintf (dump_file, "\n<Group-candidate Costs>:\n");
 
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  group = data->vgroups[i];
 
 	  fprintf (dump_file, "Group %d:\n", i);
-	  fprintf (dump_file, "  cand\tcost\tcompl.\tdepends on\n");
+	  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
@@ -5736,12 +5778,14 @@ determine_group_iv_costs (struct ivopts_data *data)
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.cost,
 		       group->cost_map[j].cost.complexity);
+	      if (group->cost_map[j].inv_expr != NULL)
+		fprintf (dump_file, "%d\t",
+			 group->cost_map[j].inv_expr->id);
+	      else
+		fprintf (dump_file, "\t");
 	      if (group->cost_map[j].depends_on)
 		bitmap_print (dump_file,
 			      group->cost_map[j].depends_on, "","");
-	      if (group->cost_map[j].inv_expr_id != -1)
-		fprintf (dump_file, " inv_expr:%d",
-			 group->cost_map[j].inv_expr_id);
 	      fprintf (dump_file, "\n");
 	    }
 
@@ -5942,7 +5986,8 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
   cost.cost += ivs->cand_cost;
 
   cost.cost += ivopts_global_cost_for_size (data,
-                                            ivs->n_regs + ivs->num_used_inv_expr);
+					    ivs->n_regs
+					    + ivs->used_inv_exprs->elements ());
 
   ivs->cost = cost;
 }
@@ -5962,7 +6007,7 @@ iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-        ivs->n_regs--;
+	ivs->n_regs--;
     }
 }
 
@@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
-  if (cp->inv_expr_id != -1)
+  if (cp->inv_expr != NULL)
     {
-      ivs->used_inv_expr[cp->inv_expr_id]--;
-      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
-        ivs->num_used_inv_expr--;
+      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
+      --(*slot);
+      if (*slot == 0)
+	ivs->used_inv_exprs->remove (cp->inv_expr);
     }
   iv_ca_recount_cost (data, ivs);
 }
@@ -6024,7 +6070,7 @@ iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-        ivs->n_regs++;
+	ivs->n_regs++;
     }
 }
 
@@ -6064,12 +6110,11 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
       ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
-      if (cp->inv_expr_id != -1)
-        {
-          ivs->used_inv_expr[cp->inv_expr_id]++;
-          if (ivs->used_inv_expr[cp->inv_expr_id] == 1)
-            ivs->num_used_inv_expr++;
-        }
+      if (cp->inv_expr != NULL)
+	{
+	  unsigned *slot = &ivs->used_inv_exprs->get_or_insert (cp->inv_expr);
+	  ++(*slot);
+	}
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -6278,9 +6323,8 @@ iv_ca_new (struct ivopts_data *data)
   nw->cand_use_cost = no_cost;
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->used_inv_exprs = new hash_map <iv_inv_expr_ent *, unsigned> (13);
   nw->cost = no_cost;
-  nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
-  nw->num_used_inv_expr = 0;
 
   return nw;
 }
@@ -6294,7 +6338,7 @@ iv_ca_free (struct iv_ca **ivs)
   free ((*ivs)->n_cand_uses);
   BITMAP_FREE ((*ivs)->cands);
   free ((*ivs)->n_invariant_uses);
-  free ((*ivs)->used_inv_expr);
+  delete ((*ivs)->used_inv_exprs);
   free (*ivs);
   *ivs = NULL;
 }
@@ -6304,13 +6348,13 @@ iv_ca_free (struct iv_ca **ivs)
 static void
 iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 {
-  const char *pref = "  invariants ";
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
   fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
   fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
-           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+	   ivs->cand_cost, ivs->cand_use_cost.cost,
+	   ivs->cand_use_cost.complexity);
   bitmap_print (file, ivs->cands, "  candidates: ","\n");
 
   for (i = 0; i < ivs->upto; i++)
@@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
 
+  bool any_invariant = false;
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
       {
+	const char *pref = any_invariant ? ", " : "  invariants ";
+	any_invariant = true;
 	fprintf (file, "%s%d", pref, i);
-	pref = ", ";
       }
+
+  if (any_invariant)
+    fprintf (file, "\n");
+
+  const char *pref = "  invariant expressions ";
+  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
+       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
+    {
+	fprintf (file, "%s%d", pref, (*it).first->id);
+	pref = ", ";
+    }
+
   fprintf (file, "\n\n");
 }
 
@@ -6366,7 +6424,7 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
-        continue;
+	continue;
 
       *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
     }
@@ -6670,7 +6728,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (iv_ca_cand_used_p (ivs, cand))
-        continue;
+	continue;
 
       cp = get_group_iv_cost (data, group, cand);
       if (!cp)
@@ -6678,7 +6736,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 
       iv_ca_set_cp (data, ivs, group, cp);
       act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
-                               true);
+			       true);
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
@@ -6991,12 +7049,16 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
       if (data->loop_loc != UNKNOWN_LOCATION)
 	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 		 LOCATION_LINE (data->loop_loc));
+      fprintf (dump_file, ", %lu avg niters",
+	       avg_loop_niter (data->current_loop));
+      fprintf (dump_file, ", %lu expressions",
+	       set->used_inv_exprs->elements ());
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
-        {
-          cand = data->vcands[i];
-          dump_cand (dump_file, cand);
-        }
+	{
+	  cand = data->vcands[i];
+	  dump_cand (dump_file, cand);
+	}
       fprintf (dump_file, "\n");
     }
 }
@@ -7251,10 +7313,10 @@ rewrite_use_compare (struct ivopts_data *data,
       gimple_seq stmts;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-        {
-          fprintf (dump_file, "Replacing exit test: ");
-          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
-        }
+	{
+	  fprintf (dump_file, "Replacing exit test: ");
+	  print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+	}
       compare = cp->comp;
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
@@ -7542,7 +7604,7 @@ free_loop_data (struct ivopts_data *data)
   decl_rtl_to_reset.truncate (0);
 
   data->inv_expr_tab->empty ();
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
 
   data->iv_common_cand_tab->empty ();
   data->iv_common_cands.truncate (0);
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-12 16:42             ` Martin Liška
@ 2016-05-13  9:43               ` Bin.Cheng
  2016-05-13 10:44                 ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Bin.Cheng @ 2016-05-13  9:43 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, GCC Patches, Jan Hubicka

On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>> record reference number in iv_ca.  This if-statement on dump_file can
>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>> there are id <-> struct maps for different structures in IVOPT which
>>>> make it not straightforward.
>>>
>>> Hi.
>>>
>>> I'm sending second version of the patch. I tried to follow your advices, but
>>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>>> and where value of the map is # of usages.
>>>
>>> Further questions:
>>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>>> Group 0:
>>>   cand  cost    scaled  freq    compl.  depends on
>>>   5     2       2.00    1.000
>>>   6     4       4.00    1.001    inv_expr:0
>>>   7     4       4.00    1.001    inv_expr:1
>>>   8     4       4.00    1.001    inv_expr:2
>>>
>>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>>> output more clear.
>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>> single section like
>> <Invariant Exprs>:
>> inv_expr 0: print_generic_expr
>> inv_expr 1: ...
>>
>> Then only dump the id afterwards?
>>
>
> Sure, it would be definitely better:
>
> The new dump format looks:
>
> <Invariant Expressions>:
> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
> inv_expr 4:     &A.832 + (sizetype) _377 * 4
> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    scaled  freq    compl.  depends on
>
> ...
>
> Improved to:
>   cost: 27 (complexity 2)
>   cand_cost: 11
>   cand_group_cost: 10 (complexity 2)
>   candidates: 3, 5
>    group:0 --> iv_cand:5, cost=(2,0)
>    group:1 --> iv_cand:5, cost=(4,1)
>    group:2 --> iv_cand:5, cost=(4,1)
>    group:3 --> iv_cand:3, cost=(0,0)
>    group:4 --> iv_cand:3, cost=(0,0)
>   invariants 1, 6
>   invariant expressions 6, 3
>
> The only question here is that as used_inv_exprs are stored in a hash_map,
> order of dumped invariants would not be stable. Is it problem?
It is okay.

Only nitpicking on this version.

>
>>>
>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>>> to fix all 8 spaces issues. Hope it's fine.
>>>
>>> I'm going to test the patch.
>>> Thoughts?
>>
>> Some comments on the patch embedded.
>>
>>>
>>> +/* Forward declaration.  */
>> Not necessary.
>>> +struct iv_inv_expr_ent;
>>> +
>
> I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.
I mean the comment, clearly the declaration is self-documented.

> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
>
>    iv_ca_set_remove_invariants (ivs, cp->depends_on);
>
> -  if (cp->inv_expr_id != -1)
> +  if (cp->inv_expr != NULL)
>      {
> -      ivs->used_inv_expr[cp->inv_expr_id]--;
> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
> -        ivs->num_used_inv_expr--;
> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
> +      --(*slot);
> +      if (*slot == 0)
> +    ivs->used_inv_exprs->remove (cp->inv_expr);
I suppose insertion/removal of hash_map are not expensive?  Because
the algorithm causes a lot of these operations.

> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
>      fprintf (file, "   group:%d --> ??\n", group->id);
>      }
>
> +  bool any_invariant = false;
>    for (i = 1; i <= data->max_inv_id; i++)
>      if (ivs->n_invariant_uses[i])
>        {
> +    const char *pref = any_invariant ? ", " : "  invariants ";
> +    any_invariant = true;
>      fprintf (file, "%s%d", pref, i);
> -    pref = ", ";
>        }
> +
> +  if (any_invariant)
> +    fprintf (file, "\n");
> +
To make dump easier to read, we can simply dump invariant
variables/expressions unconditionally.  Also keep invariant variables
and expressions in the same form.
   const char *pref = "";
   //...
   fprintf (file, "  invariant variables: "
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
       {
     fprintf (file, "%s%d", pref, i);
    pref = ", ";
       }
   fprintf (file, "\n");

> +  const char *pref = "  invariant expressions ";
> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
> +       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
> +    {
> +    fprintf (file, "%s%d", pref, (*it).first->id);
> +    pref = ", ";
> +    }
> +
>    fprintf (file, "\n\n");
>  }
>

Okay with the dump change,  you may need to update Changelog entry too.

Thanks,
bin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13  9:43               ` Bin.Cheng
@ 2016-05-13 10:44                 ` Martin Liška
  2016-05-13 12:12                   ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-13 10:44 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 6619 bytes --]

On 05/13/2016 11:43 AM, Bin.Cheng wrote:
> On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>>> record reference number in iv_ca.  This if-statement on dump_file can
>>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>>> there are id <-> struct maps for different structures in IVOPT which
>>>>> make it not straightforward.
>>>>
>>>> Hi.
>>>>
>>>> I'm sending second version of the patch. I tried to follow your advices, but
>>>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>>>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>>>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>>>> and where value of the map is # of usages.
>>>>
>>>> Further questions:
>>>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>>>> Group 0:
>>>>   cand  cost    scaled  freq    compl.  depends on
>>>>   5     2       2.00    1.000
>>>>   6     4       4.00    1.001    inv_expr:0
>>>>   7     4       4.00    1.001    inv_expr:1
>>>>   8     4       4.00    1.001    inv_expr:2
>>>>
>>>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>>>> output more clear.
>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>> single section like
>>> <Invariant Exprs>:
>>> inv_expr 0: print_generic_expr
>>> inv_expr 1: ...
>>>
>>> Then only dump the id afterwards?
>>>
>>
>> Sure, it would be definitely better:
>>
>> The new dump format looks:
>>
>> <Invariant Expressions>:
>> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
>> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
>> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>> inv_expr 4:     &A.832 + (sizetype) _377 * 4
>> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
>> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
>> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>>
>> <Group-candidate Costs>:
>> Group 0:
>>   cand  cost    scaled  freq    compl.  depends on
>>
>> ...
>>
>> Improved to:
>>   cost: 27 (complexity 2)
>>   cand_cost: 11
>>   cand_group_cost: 10 (complexity 2)
>>   candidates: 3, 5
>>    group:0 --> iv_cand:5, cost=(2,0)
>>    group:1 --> iv_cand:5, cost=(4,1)
>>    group:2 --> iv_cand:5, cost=(4,1)
>>    group:3 --> iv_cand:3, cost=(0,0)
>>    group:4 --> iv_cand:3, cost=(0,0)
>>   invariants 1, 6
>>   invariant expressions 6, 3
>>
>> The only question here is that as used_inv_exprs are stored in a hash_map,
>> order of dumped invariants would not be stable. Is it problem?
> It is okay.
> 
> Only nitpicking on this version.
> 
>>
>>>>
>>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>>>> to fix all 8 spaces issues. Hope it's fine.
>>>>
>>>> I'm going to test the patch.
>>>> Thoughts?
>>>
>>> Some comments on the patch embedded.
>>>
>>>>
>>>> +/* Forward declaration.  */
>>> Not necessary.
>>>> +struct iv_inv_expr_ent;
>>>> +
>>
>> I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.
> I mean the comment, clearly the declaration is self-documented.

Hi.

Yeah, removed.

> 
>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
>>
>>    iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>
>> -  if (cp->inv_expr_id != -1)
>> +  if (cp->inv_expr != NULL)
>>      {
>> -      ivs->used_inv_expr[cp->inv_expr_id]--;
>> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>> -        ivs->num_used_inv_expr--;
>> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>> +      --(*slot);
>> +      if (*slot == 0)
>> +    ivs->used_inv_exprs->remove (cp->inv_expr);
> I suppose insertion/removal of hash_map are not expensive?  Because
> the algorithm causes a lot of these operations.

I think it should be ~ a constant operation.

> 
>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
>>      fprintf (file, "   group:%d --> ??\n", group->id);
>>      }
>>
>> +  bool any_invariant = false;
>>    for (i = 1; i <= data->max_inv_id; i++)
>>      if (ivs->n_invariant_uses[i])
>>        {
>> +    const char *pref = any_invariant ? ", " : "  invariants ";
>> +    any_invariant = true;
>>      fprintf (file, "%s%d", pref, i);
>> -    pref = ", ";
>>        }
>> +
>> +  if (any_invariant)
>> +    fprintf (file, "\n");
>> +
> To make dump easier to read, we can simply dump invariant
> variables/expressions unconditionally.  Also keep invariant variables
> and expressions in the same form.

Sure, that's a good idea!

Sample output:


Initial set of candidates:
  cost: 17 (complexity 0)
  cand_cost: 11
  cand_group_cost: 2 (complexity 0)
  candidates: 1, 5
   group:0 --> iv_cand:5, cost=(2,0)
   group:1 --> iv_cand:1, cost=(0,0)
  invariant variables: 1, 4
  invariant expressions: 

Initial set of candidates:
  cost: 42 (complexity 2)
  cand_cost: 15
  cand_group_cost: 12 (complexity 2)
  candidates: 4, 15, 16
   group:0 --> iv_cand:16, cost=(0,0)
   group:1 --> iv_cand:15, cost=(-1,0)
   group:2 --> iv_cand:4, cost=(0,0)
   group:3 --> iv_cand:15, cost=(9,1)
   group:4 --> iv_cand:15, cost=(4,1)
  invariant variables: 
  invariant expressions: 

>    const char *pref = "";
>    //...
>    fprintf (file, "  invariant variables: "
>    for (i = 1; i <= data->max_inv_id; i++)
>      if (ivs->n_invariant_uses[i])
>        {
>      fprintf (file, "%s%d", pref, i);
>     pref = ", ";
>        }
>    fprintf (file, "\n");
> 
>> +  const char *pref = "  invariant expressions ";
>> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
>> +       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
>> +    {
>> +    fprintf (file, "%s%d", pref, (*it).first->id);
>> +    pref = ", ";
>> +    }
>> +
>>    fprintf (file, "\n\n");
>>  }
>>
> 
> Okay with the dump change,  you may need to update Changelog entry too.

There's no fundamental change, thus not changing the ChangeLog entry.

Thanks for the review, installed as r236200.

Martin

> 
> Thanks,
> bin
> 


[-- Attachment #2: 0001-Enhance-dumps-of-IVOPTS-v4.patch --]
[-- Type: text/x-patch, Size: 37857 bytes --]

From b9cd4e2645ec0a73e4f42d10e67650c462d47b07 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Thu, 12 May 2016 18:30:31 +0200
Subject: [PATCH] Enhance dumps of IVOPTS

gcc/ChangeLog:

2016-05-12  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (avg_loop_niter): Fix coding style.
	(struct cost_pair): Change inv_expr_id (int) to inv_expr
	(iv_inv_expr_ent *).
	(struct iv_inv_expr_ent): Comment struct fields.
	(sort_iv_inv_expr_ent): New function.
	(struct ivopts_data): Rename inv_expr_id to max_inv_expr_id.
	(struct iv_ca): Replace used_inv_expr and num_used_inv_expr with
	a hash_map between iv_inv_expr_ent and number of usages.
	(niter_for_exit): Fix coding style.
	(tree_ssa_iv_optimize_init): Use renamed variable.
	(determine_base_object): Fix coding style.
	(alloc_iv): Likewise.
	(find_interesting_uses_outside): Likewise.
	(add_candidate_1): Likewise.
	(add_standard_iv_candidates): Likewise.
	(set_group_iv_cost): Replace inv_expr_id with inv_expr.
	(prepare_decl_rtl): Fix coding style.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(compare_aff_trees): Likewise.
	(get_expr_id): Restructure the function.
	(get_loop_invariant_expr_id): Renamed to
	get_loop_invariant_expr.
	(get_computation_cost_at): Replace usage of inv_expr_id with
	inv_expr.
	(get_computation_cost): Likewise.
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(iv_period): Fix coding style.
	(iv_elimination_compare_lt): Likewise.
	(may_eliminate_iv): Likewise.
	(determine_group_iv_cost_cond):  Replace usage of inv_expr_id with
	inv_expr.
	(determine_group_iv_costs): Dump invariant expressions.
	(iv_ca_recount_cost): Use the newly added hash_map.
	(iv_ca_set_remove_invariants): Fix coding style.
	(iv_ca_set_add_invariants): Fix coding style.
	(iv_ca_set_no_cp): Utilize the newly added hash_map for used
	invariants.
	(iv_ca_set_cp): Likewise.
	(iv_ca_new): Initialize the newly added hash_map and remove
	initialization of fields.
	(iv_ca_free): Delete the hash_map.
	(iv_ca_dump): Dump invariant expressions.
	(iv_ca_extend): Fix coding style.
	(try_add_cand_for): Likewise.
	(create_new_ivs): Dump information about # of avg iterations and
	# of used invariant expressions.
	(rewrite_use_compare): Fix coding style.
	(free_loop_data): Set default value for max_inv_expr_id.

gcc/testsuite/ChangeLog:

2016-05-12  Martin Liska  <mliska@suse.cz>

	* g++.dg/tree-ssa/ivopts-3.C: Change test-case to follow
	the new format of dump output.
---
 gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C |   2 +-
 gcc/tree-ssa-loop-ivopts.c               | 453 +++++++++++++++++--------------
 2 files changed, 257 insertions(+), 198 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 6194e9d..eb72581 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -72,4 +72,4 @@ int main ( int , char** ) {
 
 // Verify that on x86_64 and i?86 we use a single IV for the innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ \]*:64, 3 avg niters, 1 expressions, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b24cac4..62b8835 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -129,7 +129,7 @@ avg_loop_niter (struct loop *loop)
     {
       niter = max_stmt_executions_int (loop);
       if (niter == -1 || niter > AVG_LOOP_NITER (loop))
-        return AVG_LOOP_NITER (loop);
+	return AVG_LOOP_NITER (loop);
     }
 
   return niter;
@@ -184,6 +184,8 @@ struct comp_cost
 static const comp_cost no_cost = {0, 0, 0};
 static const comp_cost infinite_cost = {INFTY, INFTY, INFTY};
 
+struct iv_inv_expr_ent;
+
 /* The candidate - cost pair.  */
 struct cost_pair
 {
@@ -195,7 +197,7 @@ struct cost_pair
 			   the final value of the iv.  For iv elimination,
 			   the new bound to compare with.  */
   enum tree_code comp;	/* For iv elimination, the comparison.  */
-  int inv_expr_id;      /* Loop invariant expression id.  */
+  iv_inv_expr_ent *inv_expr; /* Loop invariant expression.  */
 };
 
 /* Use.  */
@@ -307,13 +309,36 @@ iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
 }
 
 /* Loop invariant expression hashtable entry.  */
+
 struct iv_inv_expr_ent
 {
+  /* Tree expression of the entry.  */
   tree expr;
+  /* Unique indentifier.  */
   int id;
+  /* Hash value.  */
   hashval_t hash;
 };
 
+/* Sort iv_inv_expr_ent pair A and B by id field.  */
+
+static int
+sort_iv_inv_expr_ent (const void *a, const void *b)
+{
+  const iv_inv_expr_ent * const *e1 = (const iv_inv_expr_ent * const *) (a);
+  const iv_inv_expr_ent * const *e2 = (const iv_inv_expr_ent * const *) (b);
+
+  unsigned id1 = (*e1)->id;
+  unsigned id2 = (*e2)->id;
+
+  if (id1 < id2)
+    return -1;
+  else if (id1 > id2)
+    return 1;
+  else
+    return 0;
+}
+
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -363,7 +388,7 @@ struct ivopts_data
   hash_table<iv_inv_expr_hasher> *inv_expr_tab;
 
   /* Loop invariant expression id.  */
-  int inv_expr_id;
+  int max_inv_expr_id;
 
   /* The bitmap of indices in version_info whose value was changed.  */
   bitmap relevant;
@@ -443,12 +468,8 @@ struct iv_ca
   /* Number of times each invariant is used.  */
   unsigned *n_invariant_uses;
 
-  /* The array holding the number of uses of each loop
-     invariant expressions created by ivopt.  */
-  unsigned *used_inv_expr;
-
-  /* The number of created loop invariants.  */
-  unsigned num_used_inv_expr;
+  /* Hash set with used invariant expression.  */
+  hash_map <iv_inv_expr_ent *, unsigned> *used_inv_exprs;
 
   /* Total cost of the assignment.  */
   comp_cost cost;
@@ -840,8 +861,8 @@ niter_for_exit (struct ivopts_data *data, edge exit)
   if (!slot)
     {
       /* Try to determine number of iterations.  We cannot safely work with ssa
-         names that appear in phi nodes on abnormal edges, so that we do not
-         create overlapping life ranges for them (PR 27283).  */
+	 names that appear in phi nodes on abnormal edges, so that we do not
+	 create overlapping life ranges for them (PR 27283).  */
       desc = XNEW (struct tree_niter_desc);
       if (!number_of_iterations_exit (data->current_loop,
 				      exit, desc, true)
@@ -888,7 +909,7 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->vgroups.create (20);
   data->vcands.create (20);
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
   data->name_expansion_cache = NULL;
   data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
   data->iv_common_cands.create (20);
@@ -930,7 +951,7 @@ determine_base_object (tree expr)
 	return determine_base_object (TREE_OPERAND (base, 0));
 
       return fold_convert (ptr_type_node,
-		           build_fold_addr_expr (base));
+			   build_fold_addr_expr (base));
 
     case POINTER_PLUS_EXPR:
       return determine_base_object (TREE_OPERAND (expr, 0));
@@ -989,7 +1010,7 @@ alloc_iv (struct ivopts_data *data, tree base, tree step,
      By doing this:
        1) More accurate cost can be computed for address expressions;
        2) Duplicate candidates won't be created for bases in different
-          forms, like &a[0] and &a.  */
+	  forms, like &a[0] and &a.  */
   STRIP_NOPS (expr);
   if ((TREE_CODE (expr) == ADDR_EXPR && !DECL_P (TREE_OPERAND (expr, 0)))
       || contain_complex_addr_expr (expr))
@@ -2265,7 +2286,7 @@ find_interesting_uses_outside (struct ivopts_data *data, edge exit)
       phi = psi.phi ();
       def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
       if (!virtual_operand_p (def))
-        find_interesting_uses_op (data, def);
+	find_interesting_uses_op (data, def);
     }
 }
 
@@ -2785,8 +2806,8 @@ add_candidate_1 (struct ivopts_data *data,
 
       if (operand_equal_p (base, cand->iv->base, 0)
 	  && operand_equal_p (step, cand->iv->step, 0)
-          && (TYPE_PRECISION (TREE_TYPE (base))
-              == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
+	  && (TYPE_PRECISION (TREE_TYPE (base))
+	      == TYPE_PRECISION (TREE_TYPE (cand->iv->base))))
 	break;
     }
 
@@ -2936,14 +2957,14 @@ add_standard_iv_candidates (struct ivopts_data *data)
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_integer_type_node) > TYPE_PRECISION (integer_type_node)
+	(long_integer_type_node) > TYPE_PRECISION (integer_type_node)
       && TYPE_PRECISION (long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_integer_type_node, 0),
 		   build_int_cst (long_integer_type_node, 1), true, NULL);
 
   /* The same for a double-integer type if it is still fast enough.  */
   if (TYPE_PRECISION
-        (long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
+	(long_long_integer_type_node) > TYPE_PRECISION (long_integer_type_node)
       && TYPE_PRECISION (long_long_integer_type_node) <= BITS_PER_WORD)
     add_candidate (data, build_int_cst (long_long_integer_type_node, 0),
 		   build_int_cst (long_long_integer_type_node, 1), true, NULL);
@@ -3329,7 +3350,7 @@ static void
 set_group_iv_cost (struct ivopts_data *data,
 		   struct iv_group *group, struct iv_cand *cand,
 		   comp_cost cost, bitmap depends_on, tree value,
-		   enum tree_code comp, int inv_expr_id)
+		   enum tree_code comp, iv_inv_expr_ent *inv_expr)
 {
   unsigned i, s;
 
@@ -3346,7 +3367,7 @@ set_group_iv_cost (struct ivopts_data *data,
       group->cost_map[cand->id].depends_on = depends_on;
       group->cost_map[cand->id].value = value;
       group->cost_map[cand->id].comp = comp;
-      group->cost_map[cand->id].inv_expr_id = inv_expr_id;
+      group->cost_map[cand->id].inv_expr = inv_expr;
       return;
     }
 
@@ -3367,7 +3388,7 @@ found:
   group->cost_map[i].depends_on = depends_on;
   group->cost_map[i].value = value;
   group->cost_map[i].comp = comp;
-  group->cost_map[i].inv_expr_id = inv_expr_id;
+  group->cost_map[i].inv_expr = inv_expr;
 }
 
 /* Gets cost of (GROUP, CAND) pair.  */
@@ -3454,7 +3475,7 @@ prepare_decl_rtl (tree *expr_p, int *ws, void *data)
 	continue;
       obj = *expr_p;
       if (DECL_P (obj) && HAS_RTL_P (obj) && !DECL_RTL_SET_P (obj))
-        x = produce_memory_decl_rtl (obj, regno);
+	x = produce_memory_decl_rtl (obj, regno);
       break;
 
     case SSA_NAME:
@@ -3908,7 +3929,7 @@ get_address_cost (bool symbol_present, bool var_present,
 	    }
 	}
       if (i == -1)
-        off = 0;
+	off = 0;
       data->max_offset = off;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -4040,9 +4061,9 @@ get_address_cost (bool symbol_present, bool var_present,
 	 However, the symbol will have to be loaded in any case before the
 	 loop (and quite likely we have it in register already), so it does not
 	 make much sense to penalize them too heavily.  So make some final
-         tweaks for the SYMBOL_PRESENT modes:
+	 tweaks for the SYMBOL_PRESENT modes:
 
-         If VAR_PRESENT is false, and the mode obtained by changing symbol to
+	 If VAR_PRESENT is false, and the mode obtained by changing symbol to
 	 var is cheaper, use this mode with small penalty.
 	 If VAR_PRESENT is true, try whether the mode with
 	 SYMBOL_PRESENT = false is cheaper even with cost of addition, and
@@ -4159,7 +4180,7 @@ get_address_cost (bool symbol_present, bool var_present,
 
 static bool
 get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
-                   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+		   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
 {
   comp_cost res;
   tree op1 = TREE_OPERAND (expr, 1);
@@ -4181,10 +4202,10 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
   /* If the target has a cheap shift-and-add or shift-and-sub instruction,
      use that in preference to a shift insn followed by an add insn.  */
   sa_cost = (TREE_CODE (expr) != MINUS_EXPR
-             ? shiftadd_cost (speed, mode, m)
-             : (mult_in_op1
-                ? shiftsub1_cost (speed, mode, m)
-                : shiftsub0_cost (speed, mode, m)));
+	     ? shiftadd_cost (speed, mode, m)
+	     : (mult_in_op1
+		? shiftsub1_cost (speed, mode, m)
+		: shiftsub0_cost (speed, mode, m)));
 
   res = new_cost (MIN (as_cost, sa_cost), 0);
   res = add_costs (res, mult_in_op1 ? cost0 : cost1);
@@ -4316,20 +4337,20 @@ force_expr_to_var_cost (tree expr, bool speed)
     case NEGATE_EXPR:
       cost = new_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
-        {
-          tree mult = NULL_TREE;
-          comp_cost sa_cost;
-          if (TREE_CODE (op1) == MULT_EXPR)
-            mult = op1;
-          else if (TREE_CODE (op0) == MULT_EXPR)
-            mult = op0;
-
-          if (mult != NULL_TREE
-              && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
-              && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
-                                    speed, &sa_cost))
-            return sa_cost;
-        }
+	{
+	  tree mult = NULL_TREE;
+	  comp_cost sa_cost;
+	  if (TREE_CODE (op1) == MULT_EXPR)
+	    mult = op1;
+	  else if (TREE_CODE (op0) == MULT_EXPR)
+	    mult = op0;
+
+	  if (mult != NULL_TREE
+	      && cst_and_fits_in_hwi (TREE_OPERAND (mult, 1))
+	      && get_shiftadd_cost (expr, mode, cost0, cost1, mult,
+				    speed, &sa_cost))
+	    return sa_cost;
+	}
       break;
 
     CASE_CONVERT:
@@ -4543,18 +4564,18 @@ compare_aff_trees (aff_tree *aff1, aff_tree *aff2)
   for (i = 0; i < aff1->n; i++)
     {
       if (aff1->elts[i].coef != aff2->elts[i].coef)
-        return false;
+	return false;
 
       if (!operand_equal_p (aff1->elts[i].val, aff2->elts[i].val, 0))
-        return false;
+	return false;
     }
   return true;
 }
 
-/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
+/* Stores EXPR in DATA->inv_expr_tab, return pointer to iv_inv_expr_ent.  */
 
-static int
-get_expr_id (struct ivopts_data *data, tree expr)
+static iv_inv_expr_ent *
+record_inv_expr (struct ivopts_data *data, tree expr)
 {
   struct iv_inv_expr_ent ent;
   struct iv_inv_expr_ent **slot;
@@ -4562,25 +4583,27 @@ get_expr_id (struct ivopts_data *data, tree expr)
   ent.expr = expr;
   ent.hash = iterative_hash_expr (expr, 0);
   slot = data->inv_expr_tab->find_slot (&ent, INSERT);
-  if (*slot)
-    return (*slot)->id;
 
-  *slot = XNEW (struct iv_inv_expr_ent);
-  (*slot)->expr = expr;
-  (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
-  return (*slot)->id;
+  if (!*slot)
+    {
+      *slot = XNEW (struct iv_inv_expr_ent);
+      (*slot)->expr = expr;
+      (*slot)->hash = ent.hash;
+      (*slot)->id = data->max_inv_expr_id++;
+    }
+
+  return *slot;
 }
 
-/* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
+/* Returns the invariant expression if expression UBASE - RATIO * CBASE
    requires a new compiler generated temporary.  Returns -1 otherwise.
    ADDRESS_P is a flag indicating if the expression is for address
    computation.  */
 
-static int
-get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
-                            tree cbase, HOST_WIDE_INT ratio,
-                            bool address_p)
+static iv_inv_expr_ent *
+get_loop_invariant_expr (struct ivopts_data *data, tree ubase,
+			 tree cbase, HOST_WIDE_INT ratio,
+			 bool address_p)
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
@@ -4592,7 +4615,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
 
   if ((TREE_CODE (ubase) == INTEGER_CST)
       && (TREE_CODE (cbase) == INTEGER_CST))
-    return -1;
+    return NULL;
 
   /* Strips the constant part. */
   if (TREE_CODE (ubase) == PLUS_EXPR
@@ -4600,7 +4623,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (ubase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (ubase, 1)) == INTEGER_CST)
-        ubase = TREE_OPERAND (ubase, 0);
+	ubase = TREE_OPERAND (ubase, 0);
     }
 
   /* Strips the constant part. */
@@ -4609,60 +4632,60 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
       || TREE_CODE (cbase) == POINTER_PLUS_EXPR)
     {
       if (TREE_CODE (TREE_OPERAND (cbase, 1)) == INTEGER_CST)
-        cbase = TREE_OPERAND (cbase, 0);
+	cbase = TREE_OPERAND (cbase, 0);
     }
 
   if (address_p)
     {
       if (((TREE_CODE (ubase) == SSA_NAME)
-           || (TREE_CODE (ubase) == ADDR_EXPR
-               && is_gimple_min_invariant (ubase)))
-          && (TREE_CODE (cbase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (ubase) == ADDR_EXPR
+	       && is_gimple_min_invariant (ubase)))
+	  && (TREE_CODE (cbase) == INTEGER_CST))
+	return NULL;
 
       if (((TREE_CODE (cbase) == SSA_NAME)
-           || (TREE_CODE (cbase) == ADDR_EXPR
-               && is_gimple_min_invariant (cbase)))
-          && (TREE_CODE (ubase) == INTEGER_CST))
-        return -1;
+	   || (TREE_CODE (cbase) == ADDR_EXPR
+	       && is_gimple_min_invariant (cbase)))
+	  && (TREE_CODE (ubase) == INTEGER_CST))
+	return NULL;
     }
 
   if (ratio == 1)
     {
       if (operand_equal_p (ubase, cbase, 0))
-        return -1;
+	return NULL;
 
       if (TREE_CODE (ubase) == ADDR_EXPR
-          && TREE_CODE (cbase) == ADDR_EXPR)
-        {
-          tree usym, csym;
-
-          usym = TREE_OPERAND (ubase, 0);
-          csym = TREE_OPERAND (cbase, 0);
-          if (TREE_CODE (usym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (usym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                usym = TREE_OPERAND (usym, 0);
-            }
-          if (TREE_CODE (csym) == ARRAY_REF)
-            {
-              tree ind = TREE_OPERAND (csym, 1);
-              if (TREE_CODE (ind) == INTEGER_CST
-                  && tree_fits_shwi_p (ind)
-                  && tree_to_shwi (ind) == 0)
-                csym = TREE_OPERAND (csym, 0);
-            }
-          if (operand_equal_p (usym, csym, 0))
-            return -1;
-        }
+	  && TREE_CODE (cbase) == ADDR_EXPR)
+	{
+	  tree usym, csym;
+
+	  usym = TREE_OPERAND (ubase, 0);
+	  csym = TREE_OPERAND (cbase, 0);
+	  if (TREE_CODE (usym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (usym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		usym = TREE_OPERAND (usym, 0);
+	    }
+	  if (TREE_CODE (csym) == ARRAY_REF)
+	    {
+	      tree ind = TREE_OPERAND (csym, 1);
+	      if (TREE_CODE (ind) == INTEGER_CST
+		  && tree_fits_shwi_p (ind)
+		  && tree_to_shwi (ind) == 0)
+		csym = TREE_OPERAND (csym, 0);
+	    }
+	  if (operand_equal_p (usym, csym, 0))
+	    return NULL;
+	}
       /* Now do more complex comparison  */
       tree_to_aff_combination (ubase, TREE_TYPE (ubase), &ubase_aff);
       tree_to_aff_combination (cbase, TREE_TYPE (cbase), &cbase_aff);
       if (compare_aff_trees (&ubase_aff, &cbase_aff))
-        return -1;
+	return NULL;
     }
 
   tree_to_aff_combination (ub, TREE_TYPE (ub), &ubase_aff);
@@ -4671,7 +4694,7 @@ get_loop_invariant_expr_id (struct ivopts_data *data, tree ubase,
   aff_combination_scale (&cbase_aff, -1 * ratio);
   aff_combination_add (&ubase_aff, &cbase_aff);
   expr = aff_combination_to_tree (&ubase_aff);
-  return get_expr_id (data, expr);
+  return record_inv_expr (data, expr);
 }
 
 
@@ -4689,7 +4712,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			 struct iv_use *use, struct iv_cand *cand,
 			 bool address_p, bitmap *depends_on, gimple *at,
 			 bool *can_autoinc,
-                         int *inv_expr_id)
+			 iv_inv_expr_ent **inv_expr)
 {
   tree ubase = use->iv->base, ustep = use->iv->step;
   tree cbase, cstep;
@@ -4790,17 +4813,17 @@ get_computation_cost_at (struct ivopts_data *data,
 
       /* Check to see if any adjustment is needed.  */
       if (cstepi == 0 && stmt_is_after_inc)
-        {
-          aff_tree real_cbase_aff;
-          aff_tree cstep_aff;
+	{
+	  aff_tree real_cbase_aff;
+	  aff_tree cstep_aff;
 
-          tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
-                                   &real_cbase_aff);
-          tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
+	  tree_to_aff_combination (cbase, TREE_TYPE (real_cbase),
+				   &real_cbase_aff);
+	  tree_to_aff_combination (cstep, TREE_TYPE (cstep), &cstep_aff);
 
-          aff_combination_add (&real_cbase_aff, &cstep_aff);
-          real_cbase = aff_combination_to_tree (&real_cbase_aff);
-        }
+	  aff_combination_add (&real_cbase_aff, &cstep_aff);
+	  real_cbase = aff_combination_to_tree (&real_cbase_aff);
+	}
 
       cost = difference_cost (data,
 			      ubase, real_cbase,
@@ -4846,13 +4869,13 @@ get_computation_cost_at (struct ivopts_data *data,
   /* Record setup cost in scrach field.  */
   cost.scratch = cost.cost;
 
-  if (inv_expr_id && depends_on && *depends_on)
+  if (inv_expr && depends_on && *depends_on)
     {
-      *inv_expr_id =
-          get_loop_invariant_expr_id (data, ubase, cbase, ratio, address_p);
+      *inv_expr = get_loop_invariant_expr (data, ubase, cbase, ratio,
+					   address_p);
       /* Clear depends on.  */
-      if (*inv_expr_id != -1)
-        bitmap_clear (*depends_on);
+      if (inv_expr != NULL)
+	bitmap_clear (*depends_on);
     }
 
   /* If we are after the increment, the value of the candidate is higher by
@@ -4929,11 +4952,11 @@ static comp_cost
 get_computation_cost (struct ivopts_data *data,
 		      struct iv_use *use, struct iv_cand *cand,
 		      bool address_p, bitmap *depends_on,
-                      bool *can_autoinc, int *inv_expr_id)
+		      bool *can_autoinc, iv_inv_expr_ent **inv_expr)
 {
   return get_computation_cost_at (data,
 				  use, cand, address_p, depends_on, use->stmt,
-				  can_autoinc, inv_expr_id);
+				  can_autoinc, inv_expr);
 }
 
 /* Determines cost of computing the use in GROUP with CAND in a generic
@@ -4944,7 +4967,7 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
 				 struct iv_group *group, struct iv_cand *cand)
 {
   comp_cost cost;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   bitmap depends_on = NULL;
   struct iv_use *use = group->vuses[0];
 
@@ -4956,10 +4979,10 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
     cost = no_cost;
   else
     cost = get_computation_cost (data, use, cand, false,
-				 &depends_on, NULL, &inv_expr_id);
+				 &depends_on, NULL, &inv_expr);
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
   return !infinite_cost_p (cost);
 }
 
@@ -4972,12 +4995,12 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   unsigned i;
   bitmap depends_on;
   bool can_autoinc, first = true;
-  int inv_expr_id = -1;
+  iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
   comp_cost sum_cost = no_cost, cost;
 
   cost = get_computation_cost (data, use, cand, true,
-			       &depends_on, &can_autoinc, &inv_expr_id);
+			       &depends_on, &can_autoinc, &inv_expr);
 
   sum_cost = cost;
   if (!infinite_cost_p (sum_cost) && cand->ainc_use == use)
@@ -5025,7 +5048,7 @@ determine_group_iv_cost_address (struct ivopts_data *data,
       sum_cost = add_costs (sum_cost, cost);
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
-		     NULL_TREE, ERROR_MARK, inv_expr_id);
+		     NULL_TREE, ERROR_MARK, inv_expr);
 
   return !infinite_cost_p (sum_cost);
 }
@@ -5081,8 +5104,8 @@ iv_period (struct iv *iv)
   pow2div = num_ending_zeros (step);
 
   period = build_low_bits_mask (type,
-                                (TYPE_PRECISION (type)
-                                 - tree_to_uhwi (pow2div)));
+				(TYPE_PRECISION (type)
+				 - tree_to_uhwi (pow2div)));
 
   return period;
 }
@@ -5215,7 +5238,7 @@ difference_cannot_overflow_p (struct ivopts_data *data, tree base, tree offset)
 
 static bool
 iv_elimination_compare_lt (struct ivopts_data *data,
-                           struct iv_cand *cand, enum tree_code *comp_p,
+			   struct iv_cand *cand, enum tree_code *comp_p,
 			   struct tree_niter_desc *niter)
 {
   tree cand_type, a, b, mbz, nit_type = TREE_TYPE (niter->niter), offset;
@@ -5264,10 +5287,10 @@ iv_elimination_compare_lt (struct ivopts_data *data,
 
       /* Handle b < a + 1.  */
       if (TREE_CODE (op1) == PLUS_EXPR && integer_onep (TREE_OPERAND (op1, 1)))
-        {
-          a = TREE_OPERAND (op1, 0);
-          b = TREE_OPERAND (mbz, 0);
-        }
+	{
+	  a = TREE_OPERAND (op1, 0);
+	  b = TREE_OPERAND (mbz, 0);
+	}
       else
 	return false;
     }
@@ -5353,15 +5376,15 @@ may_eliminate_iv (struct ivopts_data *data,
     {
       /* See cand_value_at.  */
       if (stmt_after_increment (loop, cand, use->stmt))
-        {
-          if (!tree_int_cst_lt (desc->niter, period))
-            return false;
-        }
+	{
+	  if (!tree_int_cst_lt (desc->niter, period))
+	    return false;
+	}
       else
-        {
-          if (tree_int_cst_lt (period, desc->niter))
-            return false;
-        }
+	{
+	  if (tree_int_cst_lt (period, desc->niter))
+	    return false;
+	}
     }
 
   /* If not, and if this is the only possible exit of the loop, see whether
@@ -5373,22 +5396,23 @@ may_eliminate_iv (struct ivopts_data *data,
 
       max_niter = desc->max;
       if (stmt_after_increment (loop, cand, use->stmt))
-        max_niter += 1;
+	max_niter += 1;
       period_value = wi::to_widest (period);
       if (wi::gtu_p (max_niter, period_value))
-        {
-          /* See if we can take advantage of inferred loop bound information.  */
-          if (data->loop_single_exit_p)
-            {
-              if (!max_loop_iterations (loop, &max_niter))
-                return false;
-              /* The loop bound is already adjusted by adding 1.  */
-              if (wi::gtu_p (max_niter, period_value))
-                return false;
-            }
-          else
-            return false;
-        }
+	{
+	  /* See if we can take advantage of inferred loop bound
+	     information.  */
+	  if (data->loop_single_exit_p)
+	    {
+	      if (!max_loop_iterations (loop, &max_niter))
+		return false;
+	      /* The loop bound is already adjusted by adding 1.  */
+	      if (wi::gtu_p (max_niter, period_value))
+		return false;
+	    }
+	  else
+	    return false;
+	}
     }
 
   cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);
@@ -5444,7 +5468,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
   comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
+  iv_inv_expr_ent *elim_inv_expr = NULL, *express_inv_expr = NULL, *inv_expr;
   tree *control_var, *bound_cst;
   enum tree_code comp = ERROR_MARK;
   struct iv_use *use = group->vuses[0];
@@ -5456,18 +5480,18 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
     {
       elim_cost = force_var_cost (data, bound, &depends_on_elim);
       if (elim_cost.cost == 0)
-        elim_cost.cost = parm_decl_cost (data, bound);
+	elim_cost.cost = parm_decl_cost (data, bound);
       else if (TREE_CODE (bound) == INTEGER_CST)
-        elim_cost.cost = 0;
+	elim_cost.cost = 0;
       /* If we replace a loop condition 'i < n' with 'p < base + n',
 	 depends_on_elim will have 'base' and 'n' set, which implies
 	 that both 'base' and 'n' will be live during the loop.	 More likely,
 	 'base + n' will be loop invariant, resulting in only one live value
 	 during the loop.  So in that case we clear depends_on_elim and set
-        elim_inv_expr_id instead.  */
+	elim_inv_expr_id instead.  */
       if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
 	{
-	  elim_inv_expr_id = get_expr_id (data, bound);
+	  elim_inv_expr = record_inv_expr (data, bound);
 	  bitmap_clear (depends_on_elim);
 	}
       /* The bound is a loop invariant, so it will be only computed
@@ -5497,7 +5521,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
-                                       &express_inv_expr_id);
+				       &express_inv_expr);
   fd_ivopts_data = data;
   walk_tree (&cmp_iv->base, find_depends, &depends_on_express, NULL);
 
@@ -5515,7 +5539,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       cost = elim_cost;
       depends_on = depends_on_elim;
       depends_on_elim = NULL;
-      inv_expr_id = elim_inv_expr_id;
+      inv_expr = elim_inv_expr;
     }
   else
     {
@@ -5524,11 +5548,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       depends_on_express = NULL;
       bound = NULL_TREE;
       comp = ERROR_MARK;
-      inv_expr_id = express_inv_expr_id;
+      inv_expr = express_inv_expr;
     }
 
   set_group_iv_cost (data, group, cand, cost,
-		     depends_on, bound, comp, inv_expr_id);
+		     depends_on, bound, comp, inv_expr);
 
   if (depends_on_elim)
     BITMAP_FREE (depends_on_elim);
@@ -5718,14 +5742,31 @@ determine_group_iv_costs (struct ivopts_data *data)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "<Group-candidate Costs>:\n");
+      fprintf (dump_file, "\n<Invariant Expressions>:\n");
+      auto_vec <iv_inv_expr_ent *> list (data->inv_expr_tab->elements ());
+
+      for (hash_table<iv_inv_expr_hasher>::iterator it
+	   = data->inv_expr_tab->begin (); it != data->inv_expr_tab->end ();
+	   ++it)
+	list.safe_push (*it);
+
+      list.qsort (sort_iv_inv_expr_ent);
+
+      for (i = 0; i < list.length (); ++i)
+	{
+	  fprintf (dump_file, "inv_expr %d: \t", i);
+	  print_generic_expr (dump_file, list[i]->expr, TDF_SLIM);
+	  fprintf (dump_file, "\n");
+	}
+
+      fprintf (dump_file, "\n<Group-candidate Costs>:\n");
 
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  group = data->vgroups[i];
 
 	  fprintf (dump_file, "Group %d:\n", i);
-	  fprintf (dump_file, "  cand\tcost\tcompl.\tdepends on\n");
+	  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
@@ -5736,12 +5777,14 @@ determine_group_iv_costs (struct ivopts_data *data)
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.cost,
 		       group->cost_map[j].cost.complexity);
+	      if (group->cost_map[j].inv_expr != NULL)
+		fprintf (dump_file, "%d\t",
+			 group->cost_map[j].inv_expr->id);
+	      else
+		fprintf (dump_file, "\t");
 	      if (group->cost_map[j].depends_on)
 		bitmap_print (dump_file,
 			      group->cost_map[j].depends_on, "","");
-	      if (group->cost_map[j].inv_expr_id != -1)
-		fprintf (dump_file, " inv_expr:%d",
-			 group->cost_map[j].inv_expr_id);
 	      fprintf (dump_file, "\n");
 	    }
 
@@ -5942,7 +5985,8 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
   cost.cost += ivs->cand_cost;
 
   cost.cost += ivopts_global_cost_for_size (data,
-                                            ivs->n_regs + ivs->num_used_inv_expr);
+					    ivs->n_regs
+					    + ivs->used_inv_exprs->elements ());
 
   ivs->cost = cost;
 }
@@ -5962,7 +6006,7 @@ iv_ca_set_remove_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]--;
       if (ivs->n_invariant_uses[iid] == 0)
-        ivs->n_regs--;
+	ivs->n_regs--;
     }
 }
 
@@ -6000,11 +6044,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
-  if (cp->inv_expr_id != -1)
+  if (cp->inv_expr != NULL)
     {
-      ivs->used_inv_expr[cp->inv_expr_id]--;
-      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
-        ivs->num_used_inv_expr--;
+      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
+      --(*slot);
+      if (*slot == 0)
+	ivs->used_inv_exprs->remove (cp->inv_expr);
     }
   iv_ca_recount_cost (data, ivs);
 }
@@ -6024,7 +6069,7 @@ iv_ca_set_add_invariants (struct iv_ca *ivs, bitmap invs)
     {
       ivs->n_invariant_uses[iid]++;
       if (ivs->n_invariant_uses[iid] == 1)
-        ivs->n_regs++;
+	ivs->n_regs++;
     }
 }
 
@@ -6064,12 +6109,11 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
       ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
-      if (cp->inv_expr_id != -1)
-        {
-          ivs->used_inv_expr[cp->inv_expr_id]++;
-          if (ivs->used_inv_expr[cp->inv_expr_id] == 1)
-            ivs->num_used_inv_expr++;
-        }
+      if (cp->inv_expr != NULL)
+	{
+	  unsigned *slot = &ivs->used_inv_exprs->get_or_insert (cp->inv_expr);
+	  ++(*slot);
+	}
       iv_ca_recount_cost (data, ivs);
     }
 }
@@ -6278,9 +6322,8 @@ iv_ca_new (struct ivopts_data *data)
   nw->cand_use_cost = no_cost;
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
+  nw->used_inv_exprs = new hash_map <iv_inv_expr_ent *, unsigned> (13);
   nw->cost = no_cost;
-  nw->used_inv_expr = XCNEWVEC (unsigned, data->inv_expr_id + 1);
-  nw->num_used_inv_expr = 0;
 
   return nw;
 }
@@ -6294,7 +6337,7 @@ iv_ca_free (struct iv_ca **ivs)
   free ((*ivs)->n_cand_uses);
   BITMAP_FREE ((*ivs)->cands);
   free ((*ivs)->n_invariant_uses);
-  free ((*ivs)->used_inv_expr);
+  delete ((*ivs)->used_inv_exprs);
   free (*ivs);
   *ivs = NULL;
 }
@@ -6304,13 +6347,13 @@ iv_ca_free (struct iv_ca **ivs)
 static void
 iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 {
-  const char *pref = "  invariants ";
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
   fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
   fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
-           ivs->cand_cost, ivs->cand_use_cost.cost, ivs->cand_use_cost.complexity);
+	   ivs->cand_cost, ivs->cand_use_cost.cost,
+	   ivs->cand_use_cost.complexity);
   bitmap_print (file, ivs->cands, "  candidates: ","\n");
 
   for (i = 0; i < ivs->upto; i++)
@@ -6324,12 +6367,24 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
 
+  const char *pref = "";
+  fprintf (file, "  invariant variables: ");
   for (i = 1; i <= data->max_inv_id; i++)
     if (ivs->n_invariant_uses[i])
       {
 	fprintf (file, "%s%d", pref, i);
 	pref = ", ";
       }
+
+  pref = "";
+  fprintf (file, "\n  invariant expressions: ");
+  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
+       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
+    {
+	fprintf (file, "%s%d", pref, (*it).first->id);
+	pref = ", ";
+    }
+
   fprintf (file, "\n\n");
 }
 
@@ -6366,7 +6421,7 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (!min_ncand && !cheaper_cost_pair (new_cp, old_cp))
-        continue;
+	continue;
 
       *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
     }
@@ -6670,7 +6725,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	continue;
 
       if (iv_ca_cand_used_p (ivs, cand))
-        continue;
+	continue;
 
       cp = get_group_iv_cost (data, group, cand);
       if (!cp)
@@ -6678,7 +6733,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 
       iv_ca_set_cp (data, ivs, group, cp);
       act_cost = iv_ca_extend (data, ivs, cand, &act_delta, NULL,
-                               true);
+			       true);
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
@@ -6991,12 +7046,16 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
       if (data->loop_loc != UNKNOWN_LOCATION)
 	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 		 LOCATION_LINE (data->loop_loc));
+      fprintf (dump_file, ", %lu avg niters",
+	       avg_loop_niter (data->current_loop));
+      fprintf (dump_file, ", %lu expressions",
+	       set->used_inv_exprs->elements ());
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
-        {
-          cand = data->vcands[i];
-          dump_cand (dump_file, cand);
-        }
+	{
+	  cand = data->vcands[i];
+	  dump_cand (dump_file, cand);
+	}
       fprintf (dump_file, "\n");
     }
 }
@@ -7251,10 +7310,10 @@ rewrite_use_compare (struct ivopts_data *data,
       gimple_seq stmts;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-        {
-          fprintf (dump_file, "Replacing exit test: ");
-          print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
-        }
+	{
+	  fprintf (dump_file, "Replacing exit test: ");
+	  print_gimple_stmt (dump_file, use->stmt, 0, TDF_SLIM);
+	}
       compare = cp->comp;
       bound = unshare_expr (fold_convert (var_type, bound));
       op = force_gimple_operand (bound, &stmts, true, NULL_TREE);
@@ -7542,7 +7601,7 @@ free_loop_data (struct ivopts_data *data)
   decl_rtl_to_reset.truncate (0);
 
   data->inv_expr_tab->empty ();
-  data->inv_expr_id = 0;
+  data->max_inv_expr_id = 0;
 
   data->iv_common_cand_tab->empty ();
   data->iv_common_cands.truncate (0);
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 10:44                 ` Martin Liška
@ 2016-05-13 12:12                   ` H.J. Lu
  2016-05-13 12:39                     ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2016-05-13 12:12 UTC (permalink / raw)
  To: Martin Liška; +Cc: Bin.Cheng, Richard Biener, GCC Patches, Jan Hubicka

On Fri, May 13, 2016 at 3:44 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
>>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>>>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>>>> record reference number in iv_ca.  This if-statement on dump_file can
>>>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>>>> there are id <-> struct maps for different structures in IVOPT which
>>>>>> make it not straightforward.
>>>>>
>>>>> Hi.
>>>>>
>>>>> I'm sending second version of the patch. I tried to follow your advices, but
>>>>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>>>>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>>>>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>>>>> and where value of the map is # of usages.
>>>>>
>>>>> Further questions:
>>>>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>>>>> Group 0:
>>>>>   cand  cost    scaled  freq    compl.  depends on
>>>>>   5     2       2.00    1.000
>>>>>   6     4       4.00    1.001    inv_expr:0
>>>>>   7     4       4.00    1.001    inv_expr:1
>>>>>   8     4       4.00    1.001    inv_expr:2
>>>>>
>>>>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>>>>> output more clear.
>>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>>> single section like
>>>> <Invariant Exprs>:
>>>> inv_expr 0: print_generic_expr
>>>> inv_expr 1: ...
>>>>
>>>> Then only dump the id afterwards?
>>>>
>>>
>>> Sure, it would be definitely better:
>>>
>>> The new dump format looks:
>>>
>>> <Invariant Expressions>:
>>> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
>>> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
>>> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>>> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>>> inv_expr 4:     &A.832 + (sizetype) _377 * 4
>>> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
>>> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
>>> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>>>
>>> <Group-candidate Costs>:
>>> Group 0:
>>>   cand  cost    scaled  freq    compl.  depends on
>>>
>>> ...
>>>
>>> Improved to:
>>>   cost: 27 (complexity 2)
>>>   cand_cost: 11
>>>   cand_group_cost: 10 (complexity 2)
>>>   candidates: 3, 5
>>>    group:0 --> iv_cand:5, cost=(2,0)
>>>    group:1 --> iv_cand:5, cost=(4,1)
>>>    group:2 --> iv_cand:5, cost=(4,1)
>>>    group:3 --> iv_cand:3, cost=(0,0)
>>>    group:4 --> iv_cand:3, cost=(0,0)
>>>   invariants 1, 6
>>>   invariant expressions 6, 3
>>>
>>> The only question here is that as used_inv_exprs are stored in a hash_map,
>>> order of dumped invariants would not be stable. Is it problem?
>> It is okay.
>>
>> Only nitpicking on this version.
>>
>>>
>>>>>
>>>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>>>>> to fix all 8 spaces issues. Hope it's fine.
>>>>>
>>>>> I'm going to test the patch.
>>>>> Thoughts?
>>>>
>>>> Some comments on the patch embedded.
>>>>
>>>>>
>>>>> +/* Forward declaration.  */
>>>> Not necessary.
>>>>> +struct iv_inv_expr_ent;
>>>>> +
>>>
>>> I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.
>> I mean the comment, clearly the declaration is self-documented.
>
> Hi.
>
> Yeah, removed.
>
>>
>>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
>>>
>>>    iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>>
>>> -  if (cp->inv_expr_id != -1)
>>> +  if (cp->inv_expr != NULL)
>>>      {
>>> -      ivs->used_inv_expr[cp->inv_expr_id]--;
>>> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>>> -        ivs->num_used_inv_expr--;
>>> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>>> +      --(*slot);
>>> +      if (*slot == 0)
>>> +    ivs->used_inv_exprs->remove (cp->inv_expr);
>> I suppose insertion/removal of hash_map are not expensive?  Because
>> the algorithm causes a lot of these operations.
>
> I think it should be ~ a constant operation.
>
>>
>>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
>>>      fprintf (file, "   group:%d --> ??\n", group->id);
>>>      }
>>>
>>> +  bool any_invariant = false;
>>>    for (i = 1; i <= data->max_inv_id; i++)
>>>      if (ivs->n_invariant_uses[i])
>>>        {
>>> +    const char *pref = any_invariant ? ", " : "  invariants ";
>>> +    any_invariant = true;
>>>      fprintf (file, "%s%d", pref, i);
>>> -    pref = ", ";
>>>        }
>>> +
>>> +  if (any_invariant)
>>> +    fprintf (file, "\n");
>>> +
>> To make dump easier to read, we can simply dump invariant
>> variables/expressions unconditionally.  Also keep invariant variables
>> and expressions in the same form.
>
> Sure, that's a good idea!
>
> Sample output:
>
>
> Initial set of candidates:
>   cost: 17 (complexity 0)
>   cand_cost: 11
>   cand_group_cost: 2 (complexity 0)
>   candidates: 1, 5
>    group:0 --> iv_cand:5, cost=(2,0)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 1, 4
>   invariant expressions:
>
> Initial set of candidates:
>   cost: 42 (complexity 2)
>   cand_cost: 15
>   cand_group_cost: 12 (complexity 2)
>   candidates: 4, 15, 16
>    group:0 --> iv_cand:16, cost=(0,0)
>    group:1 --> iv_cand:15, cost=(-1,0)
>    group:2 --> iv_cand:4, cost=(0,0)
>    group:3 --> iv_cand:15, cost=(9,1)
>    group:4 --> iv_cand:15, cost=(4,1)
>   invariant variables:
>   invariant expressions:
>
>>    const char *pref = "";
>>    //...
>>    fprintf (file, "  invariant variables: "
>>    for (i = 1; i <= data->max_inv_id; i++)
>>      if (ivs->n_invariant_uses[i])
>>        {
>>      fprintf (file, "%s%d", pref, i);
>>     pref = ", ";
>>        }
>>    fprintf (file, "\n");
>>
>>> +  const char *pref = "  invariant expressions ";
>>> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
>>> +       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
>>> +    {
>>> +    fprintf (file, "%s%d", pref, (*it).first->id);
>>> +    pref = ", ";
>>> +    }
>>> +
>>>    fprintf (file, "\n\n");
>>>  }
>>>
>>
>> Okay with the dump change,  you may need to update Changelog entry too.
>
> There's no fundamental change, thus not changing the ChangeLog entry.
>
> Thanks for the review, installed as r236200.
>

It failed to build on 32-bit hosts:

../../src-trunk/gcc/tree-ssa-loop-ivopts.c: In function \u2018void
create_new_ivs(ivopts_data*, iv_ca*)\u2019:
../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7050:44: error: format
\u2018%lu\u2019 expects argument of type \u2018long unsigned
int\u2019, but argument 3 has type \u2018long long int\u2019
[-Werror=format=]
         avg_loop_niter (data->current_loop));
                                            ^
../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7052:41: error: format
\u2018%lu\u2019 expects argument of type \u2018long unsigned
int\u2019, but argument 3 has type \u2018size_t {aka unsigned
int}\u2019 [-Werror=format=]
         set->used_inv_exprs->elements ());
                                         ^



-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 12:12                   ` H.J. Lu
@ 2016-05-13 12:39                     ` Martin Liška
  2016-05-13 12:44                       ` Kyrill Tkachov
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-13 12:39 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Bin.Cheng, Richard Biener, GCC Patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 8006 bytes --]

On 05/13/2016 02:11 PM, H.J. Lu wrote:
> On Fri, May 13, 2016 at 3:44 AM, Martin Liška <mliska@suse.cz> wrote:
>> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
>>>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>>>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>>>>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>>>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>>>>> record reference number in iv_ca.  This if-statement on dump_file can
>>>>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>>>>> there are id <-> struct maps for different structures in IVOPT which
>>>>>>> make it not straightforward.
>>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> I'm sending second version of the patch. I tried to follow your advices, but
>>>>>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>>>>>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>>>>>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>>>>>> and where value of the map is # of usages.
>>>>>>
>>>>>> Further questions:
>>>>>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>>>>>> Group 0:
>>>>>>   cand  cost    scaled  freq    compl.  depends on
>>>>>>   5     2       2.00    1.000
>>>>>>   6     4       4.00    1.001    inv_expr:0
>>>>>>   7     4       4.00    1.001    inv_expr:1
>>>>>>   8     4       4.00    1.001    inv_expr:2
>>>>>>
>>>>>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>>>>>> output more clear.
>>>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>>>> single section like
>>>>> <Invariant Exprs>:
>>>>> inv_expr 0: print_generic_expr
>>>>> inv_expr 1: ...
>>>>>
>>>>> Then only dump the id afterwards?
>>>>>
>>>>
>>>> Sure, it would be definitely better:
>>>>
>>>> The new dump format looks:
>>>>
>>>> <Invariant Expressions>:
>>>> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
>>>> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
>>>> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>>>> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>>>> inv_expr 4:     &A.832 + (sizetype) _377 * 4
>>>> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
>>>> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
>>>> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>>>>
>>>> <Group-candidate Costs>:
>>>> Group 0:
>>>>   cand  cost    scaled  freq    compl.  depends on
>>>>
>>>> ...
>>>>
>>>> Improved to:
>>>>   cost: 27 (complexity 2)
>>>>   cand_cost: 11
>>>>   cand_group_cost: 10 (complexity 2)
>>>>   candidates: 3, 5
>>>>    group:0 --> iv_cand:5, cost=(2,0)
>>>>    group:1 --> iv_cand:5, cost=(4,1)
>>>>    group:2 --> iv_cand:5, cost=(4,1)
>>>>    group:3 --> iv_cand:3, cost=(0,0)
>>>>    group:4 --> iv_cand:3, cost=(0,0)
>>>>   invariants 1, 6
>>>>   invariant expressions 6, 3
>>>>
>>>> The only question here is that as used_inv_exprs are stored in a hash_map,
>>>> order of dumped invariants would not be stable. Is it problem?
>>> It is okay.
>>>
>>> Only nitpicking on this version.
>>>
>>>>
>>>>>>
>>>>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>>>>>> to fix all 8 spaces issues. Hope it's fine.
>>>>>>
>>>>>> I'm going to test the patch.
>>>>>> Thoughts?
>>>>>
>>>>> Some comments on the patch embedded.
>>>>>
>>>>>>
>>>>>> +/* Forward declaration.  */
>>>>> Not necessary.
>>>>>> +struct iv_inv_expr_ent;
>>>>>> +
>>>>
>>>> I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.
>>> I mean the comment, clearly the declaration is self-documented.
>>
>> Hi.
>>
>> Yeah, removed.
>>
>>>
>>>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
>>>>
>>>>    iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>>>
>>>> -  if (cp->inv_expr_id != -1)
>>>> +  if (cp->inv_expr != NULL)
>>>>      {
>>>> -      ivs->used_inv_expr[cp->inv_expr_id]--;
>>>> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>>>> -        ivs->num_used_inv_expr--;
>>>> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>>>> +      --(*slot);
>>>> +      if (*slot == 0)
>>>> +    ivs->used_inv_exprs->remove (cp->inv_expr);
>>> I suppose insertion/removal of hash_map are not expensive?  Because
>>> the algorithm causes a lot of these operations.
>>
>> I think it should be ~ a constant operation.
>>
>>>
>>>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
>>>>      fprintf (file, "   group:%d --> ??\n", group->id);
>>>>      }
>>>>
>>>> +  bool any_invariant = false;
>>>>    for (i = 1; i <= data->max_inv_id; i++)
>>>>      if (ivs->n_invariant_uses[i])
>>>>        {
>>>> +    const char *pref = any_invariant ? ", " : "  invariants ";
>>>> +    any_invariant = true;
>>>>      fprintf (file, "%s%d", pref, i);
>>>> -    pref = ", ";
>>>>        }
>>>> +
>>>> +  if (any_invariant)
>>>> +    fprintf (file, "\n");
>>>> +
>>> To make dump easier to read, we can simply dump invariant
>>> variables/expressions unconditionally.  Also keep invariant variables
>>> and expressions in the same form.
>>
>> Sure, that's a good idea!
>>
>> Sample output:
>>
>>
>> Initial set of candidates:
>>   cost: 17 (complexity 0)
>>   cand_cost: 11
>>   cand_group_cost: 2 (complexity 0)
>>   candidates: 1, 5
>>    group:0 --> iv_cand:5, cost=(2,0)
>>    group:1 --> iv_cand:1, cost=(0,0)
>>   invariant variables: 1, 4
>>   invariant expressions:
>>
>> Initial set of candidates:
>>   cost: 42 (complexity 2)
>>   cand_cost: 15
>>   cand_group_cost: 12 (complexity 2)
>>   candidates: 4, 15, 16
>>    group:0 --> iv_cand:16, cost=(0,0)
>>    group:1 --> iv_cand:15, cost=(-1,0)
>>    group:2 --> iv_cand:4, cost=(0,0)
>>    group:3 --> iv_cand:15, cost=(9,1)
>>    group:4 --> iv_cand:15, cost=(4,1)
>>   invariant variables:
>>   invariant expressions:
>>
>>>    const char *pref = "";
>>>    //...
>>>    fprintf (file, "  invariant variables: "
>>>    for (i = 1; i <= data->max_inv_id; i++)
>>>      if (ivs->n_invariant_uses[i])
>>>        {
>>>      fprintf (file, "%s%d", pref, i);
>>>     pref = ", ";
>>>        }
>>>    fprintf (file, "\n");
>>>
>>>> +  const char *pref = "  invariant expressions ";
>>>> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
>>>> +       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
>>>> +    {
>>>> +    fprintf (file, "%s%d", pref, (*it).first->id);
>>>> +    pref = ", ";
>>>> +    }
>>>> +
>>>>    fprintf (file, "\n\n");
>>>>  }
>>>>
>>>
>>> Okay with the dump change,  you may need to update Changelog entry too.
>>
>> There's no fundamental change, thus not changing the ChangeLog entry.
>>
>> Thanks for the review, installed as r236200.
>>
> 
> It failed to build on 32-bit hosts:
> 
> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c: In function \u2018void
> create_new_ivs(ivopts_data*, iv_ca*)\u2019:
> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7050:44: error: format
> \u2018%lu\u2019 expects argument of type \u2018long unsigned
> int\u2019, but argument 3 has type \u2018long long int\u2019
> [-Werror=format=]
>          avg_loop_niter (data->current_loop));
>                                             ^
> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7052:41: error: format
> \u2018%lu\u2019 expects argument of type \u2018long unsigned
> int\u2019, but argument 3 has type \u2018size_t {aka unsigned
> int}\u2019 [-Werror=format=]
>          set->used_inv_exprs->elements ());
>                                          ^
> 
> 
> 

Hi.

Thanks for heads up, can you please test the following patch?

Thanks,
Martin

[-- Attachment #2: 0001-IVOPTS-dump-fall-out.patch --]
[-- Type: text/x-patch, Size: 1252 bytes --]

From 32a2da9a7f327f60401cec5c7ab2fa5ba633561f Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Fri, 13 May 2016 14:37:45 +0200
Subject: [PATCH] IVOPTS dump fall-out

gcc/ChangeLog:

2016-05-13  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (create_new_ivs): Use PRIu64 and PRId64
	in printf format.
---
 gcc/tree-ssa-loop-ivopts.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 62b8835..abfe73d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7046,9 +7046,9 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
       if (data->loop_loc != UNKNOWN_LOCATION)
 	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
 		 LOCATION_LINE (data->loop_loc));
-      fprintf (dump_file, ", %lu avg niters",
+      fprintf (dump_file, ", %" PRId64 " avg niters",
 	       avg_loop_niter (data->current_loop));
-      fprintf (dump_file, ", %lu expressions",
+      fprintf (dump_file, ", %" PRIu64 " expressions",
 	       set->used_inv_exprs->elements ());
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 12:39                     ` Martin Liška
@ 2016-05-13 12:44                       ` Kyrill Tkachov
  2016-05-13 12:47                         ` Richard Biener
  0 siblings, 1 reply; 34+ messages in thread
From: Kyrill Tkachov @ 2016-05-13 12:44 UTC (permalink / raw)
  To: Martin Liška, H.J. Lu
  Cc: Bin.Cheng, Richard Biener, GCC Patches, Jan Hubicka

Hi Martin,

On 13/05/16 13:39, Martin Liška wrote:
> On 05/13/2016 02:11 PM, H.J. Lu wrote:
>> On Fri, May 13, 2016 at 3:44 AM, Martin Liška <mliska@suse.cz> wrote:
>>> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>>>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>>>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>>>>>> Another way is to remove the use of id for struct iv_inv_expr_ent once
>>>>>>>> for all.  We can change iv_ca.used_inv_expr and cost_pair.inv_expr_id
>>>>>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>>>>>> record reference number in iv_ca.  This if-statement on dump_file can
>>>>>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>>>>>> there are id <-> struct maps for different structures in IVOPT which
>>>>>>>> make it not straightforward.
>>>>>>> Hi.
>>>>>>>
>>>>>>> I'm sending second version of the patch. I tried to follow your advices, but
>>>>>>> because of a iv_inv_expr_ent can simultaneously belong to multiply iv_cas,
>>>>>>> putting counter to iv_inv_expr_ent does not works. Instead of that, I've
>>>>>>> decided to replace used_inv_expr with a hash_map that contains used inv_exps
>>>>>>> and where value of the map is # of usages.
>>>>>>>
>>>>>>> Further questions:
>>>>>>> + iv_inv_expr_ent::id can be now removed as it's used just for purpose of dumps
>>>>>>> Group 0:
>>>>>>>    cand  cost    scaled  freq    compl.  depends on
>>>>>>>    5     2       2.00    1.000
>>>>>>>    6     4       4.00    1.001    inv_expr:0
>>>>>>>    7     4       4.00    1.001    inv_expr:1
>>>>>>>    8     4       4.00    1.001    inv_expr:2
>>>>>>>
>>>>>>> That can be replaced with print_generic_expr, but I think using ids makes the dump
>>>>>>> output more clear.
>>>>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>>>>> single section like
>>>>>> <Invariant Exprs>:
>>>>>> inv_expr 0: print_generic_expr
>>>>>> inv_expr 1: ...
>>>>>>
>>>>>> Then only dump the id afterwards?
>>>>>>
>>>>> Sure, it would be definitely better:
>>>>>
>>>>> The new dump format looks:
>>>>>
>>>>> <Invariant Expressions>:
>>>>> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
>>>>> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 + 18446744073709551580)
>>>>> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>>>>> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>>>>> inv_expr 4:     &A.832 + (sizetype) _377 * 4
>>>>> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
>>>>> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
>>>>> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>>>>>
>>>>> <Group-candidate Costs>:
>>>>> Group 0:
>>>>>    cand  cost    scaled  freq    compl.  depends on
>>>>>
>>>>> ...
>>>>>
>>>>> Improved to:
>>>>>    cost: 27 (complexity 2)
>>>>>    cand_cost: 11
>>>>>    cand_group_cost: 10 (complexity 2)
>>>>>    candidates: 3, 5
>>>>>     group:0 --> iv_cand:5, cost=(2,0)
>>>>>     group:1 --> iv_cand:5, cost=(4,1)
>>>>>     group:2 --> iv_cand:5, cost=(4,1)
>>>>>     group:3 --> iv_cand:3, cost=(0,0)
>>>>>     group:4 --> iv_cand:3, cost=(0,0)
>>>>>    invariants 1, 6
>>>>>    invariant expressions 6, 3
>>>>>
>>>>> The only question here is that as used_inv_exprs are stored in a hash_map,
>>>>> order of dumped invariants would not be stable. Is it problem?
>>>> It is okay.
>>>>
>>>> Only nitpicking on this version.
>>>>
>>>>>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks I've touched, I decided
>>>>>>> to fix all 8 spaces issues. Hope it's fine.
>>>>>>>
>>>>>>> I'm going to test the patch.
>>>>>>> Thoughts?
>>>>>> Some comments on the patch embedded.
>>>>>>
>>>>>>> +/* Forward declaration.  */
>>>>>> Not necessary.
>>>>>>> +struct iv_inv_expr_ent;
>>>>>>> +
>>>>> I think it's needed because struct cost_pair uses a pointer to iv_inv_expr_ent.
>>>> I mean the comment, clearly the declaration is self-documented.
>>> Hi.
>>>
>>> Yeah, removed.
>>>
>>>>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
>>>>>
>>>>>     iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>>>>
>>>>> -  if (cp->inv_expr_id != -1)
>>>>> +  if (cp->inv_expr != NULL)
>>>>>       {
>>>>> -      ivs->used_inv_expr[cp->inv_expr_id]--;
>>>>> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>>>>> -        ivs->num_used_inv_expr--;
>>>>> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>>>>> +      --(*slot);
>>>>> +      if (*slot == 0)
>>>>> +    ivs->used_inv_exprs->remove (cp->inv_expr);
>>>> I suppose insertion/removal of hash_map are not expensive?  Because
>>>> the algorithm causes a lot of these operations.
>>> I think it should be ~ a constant operation.
>>>
>>>>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
>>>>>       fprintf (file, "   group:%d --> ??\n", group->id);
>>>>>       }
>>>>>
>>>>> +  bool any_invariant = false;
>>>>>     for (i = 1; i <= data->max_inv_id; i++)
>>>>>       if (ivs->n_invariant_uses[i])
>>>>>         {
>>>>> +    const char *pref = any_invariant ? ", " : "  invariants ";
>>>>> +    any_invariant = true;
>>>>>       fprintf (file, "%s%d", pref, i);
>>>>> -    pref = ", ";
>>>>>         }
>>>>> +
>>>>> +  if (any_invariant)
>>>>> +    fprintf (file, "\n");
>>>>> +
>>>> To make dump easier to read, we can simply dump invariant
>>>> variables/expressions unconditionally.  Also keep invariant variables
>>>> and expressions in the same form.
>>> Sure, that's a good idea!
>>>
>>> Sample output:
>>>
>>>
>>> Initial set of candidates:
>>>    cost: 17 (complexity 0)
>>>    cand_cost: 11
>>>    cand_group_cost: 2 (complexity 0)
>>>    candidates: 1, 5
>>>     group:0 --> iv_cand:5, cost=(2,0)
>>>     group:1 --> iv_cand:1, cost=(0,0)
>>>    invariant variables: 1, 4
>>>    invariant expressions:
>>>
>>> Initial set of candidates:
>>>    cost: 42 (complexity 2)
>>>    cand_cost: 15
>>>    cand_group_cost: 12 (complexity 2)
>>>    candidates: 4, 15, 16
>>>     group:0 --> iv_cand:16, cost=(0,0)
>>>     group:1 --> iv_cand:15, cost=(-1,0)
>>>     group:2 --> iv_cand:4, cost=(0,0)
>>>     group:3 --> iv_cand:15, cost=(9,1)
>>>     group:4 --> iv_cand:15, cost=(4,1)
>>>    invariant variables:
>>>    invariant expressions:
>>>
>>>>     const char *pref = "";
>>>>     //...
>>>>     fprintf (file, "  invariant variables: "
>>>>     for (i = 1; i <= data->max_inv_id; i++)
>>>>       if (ivs->n_invariant_uses[i])
>>>>         {
>>>>       fprintf (file, "%s%d", pref, i);
>>>>      pref = ", ";
>>>>         }
>>>>     fprintf (file, "\n");
>>>>
>>>>> +  const char *pref = "  invariant expressions ";
>>>>> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
>>>>> +       = ivs->used_inv_exprs->begin (); it != ivs->used_inv_exprs->end (); ++it)
>>>>> +    {
>>>>> +    fprintf (file, "%s%d", pref, (*it).first->id);
>>>>> +    pref = ", ";
>>>>> +    }
>>>>> +
>>>>>     fprintf (file, "\n\n");
>>>>>   }
>>>>>
>>>> Okay with the dump change,  you may need to update Changelog entry too.
>>> There's no fundamental change, thus not changing the ChangeLog entry.
>>>
>>> Thanks for the review, installed as r236200.
>>>
>> It failed to build on 32-bit hosts:
>>
>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c: In function \u2018void
>> create_new_ivs(ivopts_data*, iv_ca*)\u2019:
>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7050:44: error: format
>> \u2018%lu\u2019 expects argument of type \u2018long unsigned
>> int\u2019, but argument 3 has type \u2018long long int\u2019
>> [-Werror=format=]
>>           avg_loop_niter (data->current_loop));
>>                                              ^
>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7052:41: error: format
>> \u2018%lu\u2019 expects argument of type \u2018long unsigned
>> int\u2019, but argument 3 has type \u2018size_t {aka unsigned
>> int}\u2019 [-Werror=format=]
>>           set->used_inv_exprs->elements ());
>>                                           ^
>>
>>
>>
> Hi.
> Thanks for heads up, can you please test the following patch?
>
> Thanks,
> Martin

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 62b8835..abfe73d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7046,9 +7046,9 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca *set)
        if (data->loop_loc != UNKNOWN_LOCATION)
  	fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
  		 LOCATION_LINE (data->loop_loc));
-      fprintf (dump_file, ", %lu avg niters",
+      fprintf (dump_file, ", %" PRId64 " avg niters",
  	       avg_loop_niter (data->current_loop));
-      fprintf (dump_file, ", %lu expressions",
+      fprintf (dump_file, ", %" PRIu64 " expressions",


I believe hwint.h defines HOST_WIDE_INT_PRINT_DEC and HOST_WIDE_INT_PRINT_UNSIGNED
for the HOST_WIDE_INT print formats, though I don't know how strictly their use
is enforced in the codebase.

Kyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 12:44                       ` Kyrill Tkachov
@ 2016-05-13 12:47                         ` Richard Biener
  2016-05-13 12:51                           ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Biener @ 2016-05-13 12:47 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Martin Liška, H.J. Lu, Bin.Cheng, GCC Patches, Jan Hubicka

On Fri, May 13, 2016 at 2:43 PM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Martin,
>
>
> On 13/05/16 13:39, Martin Liška wrote:
>>
>> On 05/13/2016 02:11 PM, H.J. Lu wrote:
>>>
>>> On Fri, May 13, 2016 at 3:44 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>
>>>> On 05/13/2016 11:43 AM, Bin.Cheng wrote:
>>>>>
>>>>> On Thu, May 12, 2016 at 5:41 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>>
>>>>>> On 05/12/2016 03:51 PM, Bin.Cheng wrote:
>>>>>>>
>>>>>>> On Thu, May 12, 2016 at 1:13 PM, Martin Liška <mliska@suse.cz> wrote:
>>>>>>>>
>>>>>>>> On 05/10/2016 03:16 PM, Bin.Cheng wrote:
>>>>>>>>>
>>>>>>>>> Another way is to remove the use of id for struct iv_inv_expr_ent
>>>>>>>>> once
>>>>>>>>> for all.  We can change iv_ca.used_inv_expr and
>>>>>>>>> cost_pair.inv_expr_id
>>>>>>>>> to pointers, and rename iv_inv_expr_ent.id to count and use this to
>>>>>>>>> record reference number in iv_ca.  This if-statement on dump_file
>>>>>>>>> can
>>>>>>>>> be saved.  Also I think it simplifies current code a bit.  For now,
>>>>>>>>> there are id <-> struct maps for different structures in IVOPT
>>>>>>>>> which
>>>>>>>>> make it not straightforward.
>>>>>>>>
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> I'm sending second version of the patch. I tried to follow your
>>>>>>>> advices, but
>>>>>>>> because of a iv_inv_expr_ent can simultaneously belong to multiply
>>>>>>>> iv_cas,
>>>>>>>> putting counter to iv_inv_expr_ent does not works. Instead of that,
>>>>>>>> I've
>>>>>>>> decided to replace used_inv_expr with a hash_map that contains used
>>>>>>>> inv_exps
>>>>>>>> and where value of the map is # of usages.
>>>>>>>>
>>>>>>>> Further questions:
>>>>>>>> + iv_inv_expr_ent::id can be now removed as it's used just for
>>>>>>>> purpose of dumps
>>>>>>>> Group 0:
>>>>>>>>    cand  cost    scaled  freq    compl.  depends on
>>>>>>>>    5     2       2.00    1.000
>>>>>>>>    6     4       4.00    1.001    inv_expr:0
>>>>>>>>    7     4       4.00    1.001    inv_expr:1
>>>>>>>>    8     4       4.00    1.001    inv_expr:2
>>>>>>>>
>>>>>>>> That can be replaced with print_generic_expr, but I think using ids
>>>>>>>> makes the dump
>>>>>>>> output more clear.
>>>>>>>
>>>>>>> I am okay with keeping id.  Could you please dump all inv_exprs in a
>>>>>>> single section like
>>>>>>> <Invariant Exprs>:
>>>>>>> inv_expr 0: print_generic_expr
>>>>>>> inv_expr 1: ...
>>>>>>>
>>>>>>> Then only dump the id afterwards?
>>>>>>>
>>>>>> Sure, it would be definitely better:
>>>>>>
>>>>>> The new dump format looks:
>>>>>>
>>>>>> <Invariant Expressions>:
>>>>>> inv_expr 0:     sudoku_351(D) + (sizetype) S.833_774 * 4
>>>>>> inv_expr 1:     sudoku_351(D) + ((sizetype) S.833_774 * 4 +
>>>>>> 18446744073709551580)
>>>>>> inv_expr 2:     sudoku_351(D) + ((sizetype) S.833_774 + 72) * 4
>>>>>> inv_expr 3:     sudoku_351(D) + ((sizetype) S.833_774 + 81) * 4
>>>>>> inv_expr 4:     &A.832 + (sizetype) _377 * 4
>>>>>> inv_expr 5:     &A.832 + ((sizetype) _377 * 4 + 18446744073709551612)
>>>>>> inv_expr 6:     &A.832 + ((sizetype) _377 + 8) * 4
>>>>>> inv_expr 7:     &A.832 + ((sizetype) _377 + 9) * 4
>>>>>>
>>>>>> <Group-candidate Costs>:
>>>>>> Group 0:
>>>>>>    cand  cost    scaled  freq    compl.  depends on
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Improved to:
>>>>>>    cost: 27 (complexity 2)
>>>>>>    cand_cost: 11
>>>>>>    cand_group_cost: 10 (complexity 2)
>>>>>>    candidates: 3, 5
>>>>>>     group:0 --> iv_cand:5, cost=(2,0)
>>>>>>     group:1 --> iv_cand:5, cost=(4,1)
>>>>>>     group:2 --> iv_cand:5, cost=(4,1)
>>>>>>     group:3 --> iv_cand:3, cost=(0,0)
>>>>>>     group:4 --> iv_cand:3, cost=(0,0)
>>>>>>    invariants 1, 6
>>>>>>    invariant expressions 6, 3
>>>>>>
>>>>>> The only question here is that as used_inv_exprs are stored in a
>>>>>> hash_map,
>>>>>> order of dumped invariants would not be stable. Is it problem?
>>>>>
>>>>> It is okay.
>>>>>
>>>>> Only nitpicking on this version.
>>>>>
>>>>>>>> + As check_GNU_style.sh reported multiple 8 spaces issues in hunks
>>>>>>>> I've touched, I decided
>>>>>>>> to fix all 8 spaces issues. Hope it's fine.
>>>>>>>>
>>>>>>>> I'm going to test the patch.
>>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Some comments on the patch embedded.
>>>>>>>
>>>>>>>> +/* Forward declaration.  */
>>>>>>>
>>>>>>> Not necessary.
>>>>>>>>
>>>>>>>> +struct iv_inv_expr_ent;
>>>>>>>> +
>>>>>>
>>>>>> I think it's needed because struct cost_pair uses a pointer to
>>>>>> iv_inv_expr_ent.
>>>>>
>>>>> I mean the comment, clearly the declaration is self-documented.
>>>>
>>>> Hi.
>>>>
>>>> Yeah, removed.
>>>>
>>>>>> @@ -6000,11 +6045,12 @@ iv_ca_set_no_cp (struct ivopts_data *data,
>>>>>> struct iv_ca *ivs,
>>>>>>
>>>>>>     iv_ca_set_remove_invariants (ivs, cp->depends_on);
>>>>>>
>>>>>> -  if (cp->inv_expr_id != -1)
>>>>>> +  if (cp->inv_expr != NULL)
>>>>>>       {
>>>>>> -      ivs->used_inv_expr[cp->inv_expr_id]--;
>>>>>> -      if (ivs->used_inv_expr[cp->inv_expr_id] == 0)
>>>>>> -        ivs->num_used_inv_expr--;
>>>>>> +      unsigned *slot = ivs->used_inv_exprs->get (cp->inv_expr);
>>>>>> +      --(*slot);
>>>>>> +      if (*slot == 0)
>>>>>> +    ivs->used_inv_exprs->remove (cp->inv_expr);
>>>>>
>>>>> I suppose insertion/removal of hash_map are not expensive?  Because
>>>>> the algorithm causes a lot of these operations.
>>>>
>>>> I think it should be ~ a constant operation.
>>>>
>>>>>> @@ -6324,12 +6368,26 @@ iv_ca_dump (struct ivopts_data *data, FILE
>>>>>> *file, struct iv_ca *ivs)
>>>>>>       fprintf (file, "   group:%d --> ??\n", group->id);
>>>>>>       }
>>>>>>
>>>>>> +  bool any_invariant = false;
>>>>>>     for (i = 1; i <= data->max_inv_id; i++)
>>>>>>       if (ivs->n_invariant_uses[i])
>>>>>>         {
>>>>>> +    const char *pref = any_invariant ? ", " : "  invariants ";
>>>>>> +    any_invariant = true;
>>>>>>       fprintf (file, "%s%d", pref, i);
>>>>>> -    pref = ", ";
>>>>>>         }
>>>>>> +
>>>>>> +  if (any_invariant)
>>>>>> +    fprintf (file, "\n");
>>>>>> +
>>>>>
>>>>> To make dump easier to read, we can simply dump invariant
>>>>> variables/expressions unconditionally.  Also keep invariant variables
>>>>> and expressions in the same form.
>>>>
>>>> Sure, that's a good idea!
>>>>
>>>> Sample output:
>>>>
>>>>
>>>> Initial set of candidates:
>>>>    cost: 17 (complexity 0)
>>>>    cand_cost: 11
>>>>    cand_group_cost: 2 (complexity 0)
>>>>    candidates: 1, 5
>>>>     group:0 --> iv_cand:5, cost=(2,0)
>>>>     group:1 --> iv_cand:1, cost=(0,0)
>>>>    invariant variables: 1, 4
>>>>    invariant expressions:
>>>>
>>>> Initial set of candidates:
>>>>    cost: 42 (complexity 2)
>>>>    cand_cost: 15
>>>>    cand_group_cost: 12 (complexity 2)
>>>>    candidates: 4, 15, 16
>>>>     group:0 --> iv_cand:16, cost=(0,0)
>>>>     group:1 --> iv_cand:15, cost=(-1,0)
>>>>     group:2 --> iv_cand:4, cost=(0,0)
>>>>     group:3 --> iv_cand:15, cost=(9,1)
>>>>     group:4 --> iv_cand:15, cost=(4,1)
>>>>    invariant variables:
>>>>    invariant expressions:
>>>>
>>>>>     const char *pref = "";
>>>>>     //...
>>>>>     fprintf (file, "  invariant variables: "
>>>>>     for (i = 1; i <= data->max_inv_id; i++)
>>>>>       if (ivs->n_invariant_uses[i])
>>>>>         {
>>>>>       fprintf (file, "%s%d", pref, i);
>>>>>      pref = ", ";
>>>>>         }
>>>>>     fprintf (file, "\n");
>>>>>
>>>>>> +  const char *pref = "  invariant expressions ";
>>>>>> +  for (hash_map<iv_inv_expr_ent *, unsigned>::iterator it
>>>>>> +       = ivs->used_inv_exprs->begin (); it !=
>>>>>> ivs->used_inv_exprs->end (); ++it)
>>>>>> +    {
>>>>>> +    fprintf (file, "%s%d", pref, (*it).first->id);
>>>>>> +    pref = ", ";
>>>>>> +    }
>>>>>> +
>>>>>>     fprintf (file, "\n\n");
>>>>>>   }
>>>>>>
>>>>> Okay with the dump change,  you may need to update Changelog entry too.
>>>>
>>>> There's no fundamental change, thus not changing the ChangeLog entry.
>>>>
>>>> Thanks for the review, installed as r236200.
>>>>
>>> It failed to build on 32-bit hosts:
>>>
>>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c: In function \u2018void
>>> create_new_ivs(ivopts_data*, iv_ca*)\u2019:
>>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7050:44: error: format
>>> \u2018%lu\u2019 expects argument of type \u2018long unsigned
>>> int\u2019, but argument 3 has type \u2018long long int\u2019
>>> [-Werror=format=]
>>>           avg_loop_niter (data->current_loop));
>>>                                              ^
>>> ../../src-trunk/gcc/tree-ssa-loop-ivopts.c:7052:41: error: format
>>> \u2018%lu\u2019 expects argument of type \u2018long unsigned
>>> int\u2019, but argument 3 has type \u2018size_t {aka unsigned
>>> int}\u2019 [-Werror=format=]
>>>           set->used_inv_exprs->elements ());
>>>                                           ^
>>>
>>>
>>>
>> Hi.
>> Thanks for heads up, can you please test the following patch?
>>
>> Thanks,
>> Martin
>
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index 62b8835..abfe73d 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -7046,9 +7046,9 @@ create_new_ivs (struct ivopts_data *data, struct iv_ca
> *set)
>        if (data->loop_loc != UNKNOWN_LOCATION)
>         fprintf (dump_file, " at %s:%d", LOCATION_FILE (data->loop_loc),
>                  LOCATION_LINE (data->loop_loc));
> -      fprintf (dump_file, ", %lu avg niters",
> +      fprintf (dump_file, ", %" PRId64 " avg niters",
>                avg_loop_niter (data->current_loop));
> -      fprintf (dump_file, ", %lu expressions",
> +      fprintf (dump_file, ", %" PRIu64 " expressions",
>
>
> I believe hwint.h defines HOST_WIDE_INT_PRINT_DEC and
> HOST_WIDE_INT_PRINT_UNSIGNED
> for the HOST_WIDE_INT print formats, though I don't know how strictly their
> use
> is enforced in the codebase.

Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.

Richard.

> Kyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 12:47                         ` Richard Biener
@ 2016-05-13 12:51                           ` Martin Liška
  2016-05-13 14:17                             ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-13 12:51 UTC (permalink / raw)
  To: Richard Biener, Kyrill Tkachov
  Cc: H.J. Lu, Bin.Cheng, GCC Patches, Jan Hubicka

On 05/13/2016 02:46 PM, Richard Biener wrote:
> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
> 
> Richard.

Thanks you both, installed as r236208.

Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 12:51                           ` Martin Liška
@ 2016-05-13 14:17                             ` H.J. Lu
  2016-05-13 14:46                               ` H.J. Lu
  0 siblings, 1 reply; 34+ messages in thread
From: H.J. Lu @ 2016-05-13 14:17 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Kyrill Tkachov, Bin.Cheng, GCC Patches, Jan Hubicka

On Fri, May 13, 2016 at 5:51 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/13/2016 02:46 PM, Richard Biener wrote:
>> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
>>
>> Richard.
>
> Thanks you both, installed as r236208.
>

It isn't fixed:

/export/gnu/import/git/sources/gcc/gcc/tree-ssa-loop-ivopts.c:7052:41:
error: format ‘%llu’ expects argument of type ‘long long unsigned
int’, but argument 3 has type ‘size_t {aka unsigned int}’
[-Werror=format=]
         set->used_inv_exprs->elements ());


-- 
H.J.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/3] Enhance dumps of IVOPTS
  2016-05-13 14:17                             ` H.J. Lu
@ 2016-05-13 14:46                               ` H.J. Lu
  0 siblings, 0 replies; 34+ messages in thread
From: H.J. Lu @ 2016-05-13 14:46 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Kyrill Tkachov, Bin.Cheng, GCC Patches, Jan Hubicka

On Fri, May 13, 2016 at 7:17 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, May 13, 2016 at 5:51 AM, Martin Liška <mliska@suse.cz> wrote:
>> On 05/13/2016 02:46 PM, Richard Biener wrote:
>>> Use them for HOST_WIDE_INT printing, for [u]int64_t use the PRI stuff.
>>>
>>> Richard.
>>
>> Thanks you both, installed as r236208.
>>
>
> It isn't fixed:
>
> /export/gnu/import/git/sources/gcc/gcc/tree-ssa-loop-ivopts.c:7052:41:
> error: format ‘%llu’ expects argument of type ‘long long unsigned
> int’, but argument 3 has type ‘size_t {aka unsigned int}’
> [-Werror=format=]
>          set->used_inv_exprs->elements ());
>

I am going to check in this as an obvious fix.


-- 
H.J.
---
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 2b2115f..e8953a0 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -7048,8 +7048,8 @@ create_new_ivs (struct ivopts_data *data, struct
iv_ca *set)
  LOCATION_LINE (data->loop_loc));
       fprintf (dump_file, ", " HOST_WIDE_INT_PRINT_DEC " avg niters",
        avg_loop_niter (data->current_loop));
-      fprintf (dump_file, ", %" PRIu64 " expressions",
-       set->used_inv_exprs->elements ());
+      fprintf (dump_file, ", " HOST_WIDE_INT_PRINT_UNSIGNED " expressions",
+       (unsigned HOST_WIDE_INT) set->used_inv_exprs->elements ());
       fprintf (dump_file, ", %lu IVs:\n", bitmap_count_bits (set->cands));
       EXECUTE_IF_SET_IN_BITMAP (set->cands, 0, i, bi)
  {

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-04-29 11:58 ` [PATCH 1/3] Encapsulate comp_cost within a class with methods marxin
@ 2016-05-16 10:14   ` Bin.Cheng
  2016-05-16 13:55     ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Bin.Cheng @ 2016-05-16 10:14 UTC (permalink / raw)
  To: marxin; +Cc: gcc-patches List

On Mon, Apr 25, 2016 at 10:42 AM, marxin <mliska@suse.cz> wrote:
> gcc/ChangeLog:
>
> 2016-04-25  Martin Liska  <mliska@suse.cz>
>
>         * tree-ssa-loop-ivopts.c(comp_cost::operator=): New function.
>         (comp_cost::infinite_cost_p): Likewise.
>         (operator+): Likewise.
>         (comp_cost::operator+=): Likewise.
>         (comp_cost::operator-=): Likewise.
>         (comp_cost::operator/=): Likewise.
>         (comp_cost::operator*=): Likewise.
>         (operator-): Likewise.
>         (operator<): Likewise.
>         (operator==): Likewise.
>         (operator<=): Likewise.
>         (comp_cost::get_cost): Likewise.
>         (comp_cost::set_cost): Likewise.
>         (comp_cost::get_complexity): Likewise.
>         (comp_cost::set_complexity): Likewise.
>         (comp_cost::get_scratch): Likewise.
>         (comp_cost::set_scratch): Likewise.
>         (comp_cost::get_infinite): Likewise.
>         (comp_cost::get_no_cost): Likewise.
>         (struct ivopts_data): Rename inv_expr_id to max_inv_expr_id;
>         (tree_ssa_iv_optimize_init): Use the renamed property.
>         (new_cost): Remove.
>         (infinite_cost_p): Likewise.
>         (add_costs): Likewise.
>         (sub_costs): Likewise.
>         (compare_costs): Likewise.
>         (set_group_iv_cost): Use comp_cost::infinite_cost_p.
>         (get_address_cost): Use new comp_cost::comp_cost.
>         (get_shiftadd_cost): Likewise.
>         (force_expr_to_var_cost): Use new comp_cost::get_no_cost.
>         (split_address_cost): Likewise.
>         (ptr_difference_cost): Likewise.
>         (difference_cost): Likewise.
>         (get_expr_id): Use max_inv_expr_id.
>         (get_computation_cost_at): Use comp_cost::get_infinite.
>         (determine_group_iv_cost_generic): Use comp_cost::get_no_cost.
>         (determine_group_iv_cost_address): Likewise.
>         (determine_group_iv_cost_cond): Use comp_const::infinite_cost_p.
>         (autoinc_possible_for_pair): Likewise.
>         (determine_group_iv_costs): Use new methods of comp_cost.
>         (determine_iv_cost): Likewise.
>         (cheaper_cost_pair): Use comp_cost operators.
>         (iv_ca_recount_cost): Likewise.
>         (iv_ca_set_no_cp): Likewise.
>         (iv_ca_set_cp): Likewise.
>         (iv_ca_cost): Use comp_cost::get_infinite.
>         (iv_ca_new): Use comp_cost::get_no_cost.
>         (iv_ca_dump): Use new methods of comp_cost.
>         (iv_ca_narrow): Use operators of comp_cost.
>         (iv_ca_prune): Likewi.se
>         (iv_ca_replace): Likewise.
>         (try_add_cand_for): Likewise.
>         (try_improve_iv_set): Likewise.
>         (find_optimal_iv_set): Use new methods of comp_cost.
>         (free_loop_data): Use renamed max_inv_expr_id.
> ---
Hi Martin,
Could you please rebase this patch and the profiling one against
latest trunk?  The third patch was applied before these two now.

Thanks,
bin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-05-16 10:14   ` Bin.Cheng
@ 2016-05-16 13:55     ` Martin Liška
  2016-05-19 10:23       ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-16 13:55 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 266 bytes --]

On 05/16/2016 12:13 PM, Bin.Cheng wrote:
> Hi Martin,
> Could you please rebase this patch and the profiling one against
> latest trunk?  The third patch was applied before these two now.
> 
> Thanks,
> bin

Hello.

Sending the rebased version of the patch.

Martin

[-- Attachment #2: 0002-Add-profiling-support-for-IVOPTS-v2.patch --]
[-- Type: text/x-patch, Size: 8832 bytes --]

From a91b1578f3907e05543b2acea0081b6e4744ade9 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 16 May 2016 15:52:56 +0200
Subject: [PATCH 2/2] Add profiling support for IVOPTS

gcc/ChangeLog:

2016-04-25  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (struct comp_cost): Introduce
	m_cost_scaled and m_frequency fields.
	(comp_cost::operator=): Assign to m_cost_scaled.
	(operator+): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator-): Likewise.
	(comp_cost::set_cost): Likewise.
	(comp_cost::get_cost_scaled): New function.
	(comp_cost::calculate_scaled_cost): Likewise.
	(comp_cost::propagate_scaled_cost): Likewise.
	(comp_cost::get_frequency): Likewise.
	(comp_cost::scale_cost): Likewise.
	(comp_cost::has_frequency): Likewise.
	(get_computation_cost_at): Propagate ratio of frequencies
	of loop header and another basic block.
	(determine_group_iv_costs): Dump new fields.
---
 gcc/tree-ssa-loop-ivopts.c | 130 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 118 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 876e6ed..3a80a23 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -107,6 +107,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "builtins.h"
 #include "tree-vectorizer.h"
+#include "sreal.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
@@ -173,11 +174,13 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0),
+    m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost (int cost, unsigned complexity)
-    : m_cost (cost), m_complexity (complexity), m_scratch (0)
+    : m_cost (cost), m_complexity (complexity), m_scratch (0),
+      m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost& operator= (const comp_cost& other);
@@ -236,6 +239,26 @@ struct comp_cost
   /* Set the scratch to S.  */
   void set_scratch (unsigned s);
 
+  /* Return scaled cost.  */
+  double get_cost_scaled ();
+
+  /* Calculate scaled cost based on frequency of a basic block with
+     frequency equal to NOMINATOR / DENOMINATOR.  */
+  void calculate_scaled_cost (int nominator, int denominator);
+
+  /* Propagate scaled cost which is based on frequency of basic block
+     the cost belongs to.  */
+  void propagate_scaled_cost ();
+
+  /* Return frequency of the cost.  */
+  double get_frequency ();
+
+  /* Scale COST by frequency of the cost.  */
+  const sreal scale_cost (int cost);
+
+  /* Return true if the frequency has a valid value.  */
+  bool has_frequency ();
+
   /* Return infinite comp_cost.  */
   static comp_cost get_infinite ();
 
@@ -249,6 +272,9 @@ private:
 			     complexity field should be larger for more
 			     complex expressions and addressing modes).  */
   int m_scratch;	  /* Scratch used during cost computation.  */
+  sreal m_frequency;	  /* Frequency of the basic block this comp_cost
+			     belongs to.  */
+  sreal m_cost_scaled;	  /* Scalled runtime cost.  */
 };
 
 comp_cost&
@@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
   m_cost = other.m_cost;
   m_complexity = other.m_complexity;
   m_scratch = other.m_scratch;
+  m_frequency = other.m_frequency;
+  m_cost_scaled = other.m_cost_scaled;
 
   return *this;
 }
@@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
 
   cost1.m_cost += cost2.m_cost;
   cost1.m_complexity += cost2.m_complexity;
+  cost1.m_cost_scaled += cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -290,6 +319,8 @@ comp_cost
 comp_cost::operator+= (HOST_WIDE_INT c)
 {
   this->m_cost += c;
+  if (has_frequency ())
+    this->m_cost_scaled += scale_cost (c);
 
   return *this;
 }
@@ -298,6 +329,8 @@ comp_cost
 comp_cost::operator-= (HOST_WIDE_INT c)
 {
   this->m_cost -= c;
+  if (has_frequency ())
+    this->m_cost_scaled -= scale_cost (c);
 
   return *this;
 }
@@ -306,6 +339,8 @@ comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
   this->m_cost /= c;
+  if (has_frequency ())
+    this->m_cost_scaled /= scale_cost (c);
 
   return *this;
 }
@@ -314,6 +349,8 @@ comp_cost
 comp_cost::operator*= (HOST_WIDE_INT c)
 {
   this->m_cost *= c;
+  if (has_frequency ())
+    this->m_cost_scaled *= scale_cost (c);
 
   return *this;
 }
@@ -323,6 +360,7 @@ operator- (comp_cost cost1, comp_cost cost2)
 {
   cost1.m_cost -= cost2.m_cost;
   cost1.m_complexity -= cost2.m_complexity;
+  cost1.m_cost_scaled -= cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -366,6 +404,7 @@ void
 comp_cost::set_cost (int c)
 {
   m_cost = c;
+  m_cost_scaled = scale_cost (c);
 }
 
 unsigned
@@ -392,6 +431,48 @@ comp_cost::set_scratch (unsigned s)
   m_scratch = s;
 }
 
+double
+comp_cost::get_cost_scaled ()
+{
+  return m_cost_scaled.to_double ();
+}
+
+void
+comp_cost::calculate_scaled_cost (int nominator, int denominator)
+{
+  m_frequency = denominator == 0
+    ? sreal (1) : sreal (nominator) / sreal (denominator);
+
+  m_cost_scaled = scale_cost (m_cost);
+}
+
+void
+comp_cost::propagate_scaled_cost ()
+{
+  if (m_cost < 0)
+    return;
+
+  m_cost = m_cost_scaled.to_int ();
+}
+
+double
+comp_cost::get_frequency ()
+{
+  return m_frequency.to_double ();
+}
+
+const sreal
+comp_cost::scale_cost (int cost)
+{
+  return m_frequency * cost;
+}
+
+bool
+comp_cost::has_frequency ()
+{
+  return m_frequency != sreal (0);
+}
+
 comp_cost
 comp_cost::get_infinite ()
 {
@@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return cost + get_address_cost (symbol_present, var_present,
-				    offset, ratio, cstepi,
-				    mem_mode,
-				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-				    speed, stmt_is_after_inc, can_autoinc);
+    {
+      cost += get_address_cost (symbol_present, var_present,
+				offset, ratio, cstepi,
+				mem_mode,
+				TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				speed, stmt_is_after_inc, can_autoinc);
+      goto ret;
+    }
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-      return cost;
+      goto ret;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
     cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+
+  goto ret;
 
 fallback:
   if (can_autoinc)
@@ -5093,8 +5178,13 @@ fallback:
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    return comp_cost (computation_cost (comp, speed), 0);
+    cost = comp_cost (computation_cost (comp, speed), 0);
   }
+
+ret:
+  cost.calculate_scaled_cost (at->bb->frequency,
+			      data->current_loop->header->frequency);
+  return cost;
 }
 
 /* Determines the cost of the computation by that USE is expressed
@@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  group = data->vgroups[i];
 
 	  fprintf (dump_file, "Group %d:\n", i);
-	  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
+	  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
+		   "\tdepends on\n");
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
 		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
-	      fprintf (dump_file, "  %d\t%d\t%d\t",
+	      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.get_cost (),
+		       group->cost_map[j].cost.get_cost_scaled (),
+		       group->cost_map[j].cost.get_frequency (),
 		       group->cost_map[j].cost.get_complexity ());
 	      if (group->cost_map[j].inv_expr != NULL)
 		fprintf (dump_file, "%d\t",
@@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	}
       fprintf (dump_file, "\n");
     }
+
+  for (i = 0; i < data->vgroups.length (); i++)
+    {
+      group = data->vgroups[i];
+      for (j = 0; j < group->n_map_members; j++)
+	{
+	  if (!group->cost_map[j].cand
+	      || group->cost_map[j].cost.infinite_cost_p ())
+	    continue;
+
+	  group->cost_map[j].cost.propagate_scaled_cost ();
+	}
+    }
 }
 
 /* Determines cost of the candidate CAND.  */
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-04-29 11:58 ` [PATCH 2/3] Add profiling support for IVOPTS marxin
@ 2016-05-16 13:56   ` Martin Liška
  2016-05-16 22:27     ` Bin.Cheng
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-16 13:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Bin.Cheng

[-- Attachment #1: Type: text/plain, Size: 58 bytes --]

Hello.

Sending the rebased version of the patch.

Martin

[-- Attachment #2: 0002-Add-profiling-support-for-IVOPTS-v2.patch --]
[-- Type: text/x-patch, Size: 8832 bytes --]

From a91b1578f3907e05543b2acea0081b6e4744ade9 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Mon, 16 May 2016 15:52:56 +0200
Subject: [PATCH 2/2] Add profiling support for IVOPTS

gcc/ChangeLog:

2016-04-25  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (struct comp_cost): Introduce
	m_cost_scaled and m_frequency fields.
	(comp_cost::operator=): Assign to m_cost_scaled.
	(operator+): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator-): Likewise.
	(comp_cost::set_cost): Likewise.
	(comp_cost::get_cost_scaled): New function.
	(comp_cost::calculate_scaled_cost): Likewise.
	(comp_cost::propagate_scaled_cost): Likewise.
	(comp_cost::get_frequency): Likewise.
	(comp_cost::scale_cost): Likewise.
	(comp_cost::has_frequency): Likewise.
	(get_computation_cost_at): Propagate ratio of frequencies
	of loop header and another basic block.
	(determine_group_iv_costs): Dump new fields.
---
 gcc/tree-ssa-loop-ivopts.c | 130 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 118 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 876e6ed..3a80a23 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -107,6 +107,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-address.h"
 #include "builtins.h"
 #include "tree-vectorizer.h"
+#include "sreal.h"
 
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
    cost of different addressing modes.  This should be moved to a TBD
@@ -173,11 +174,13 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
-  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0)
+  comp_cost (): m_cost (0), m_complexity (0), m_scratch (0),
+    m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost (int cost, unsigned complexity)
-    : m_cost (cost), m_complexity (complexity), m_scratch (0)
+    : m_cost (cost), m_complexity (complexity), m_scratch (0),
+      m_frequency (sreal (0)), m_cost_scaled (sreal (0))
   {}
 
   comp_cost& operator= (const comp_cost& other);
@@ -236,6 +239,26 @@ struct comp_cost
   /* Set the scratch to S.  */
   void set_scratch (unsigned s);
 
+  /* Return scaled cost.  */
+  double get_cost_scaled ();
+
+  /* Calculate scaled cost based on frequency of a basic block with
+     frequency equal to NOMINATOR / DENOMINATOR.  */
+  void calculate_scaled_cost (int nominator, int denominator);
+
+  /* Propagate scaled cost which is based on frequency of basic block
+     the cost belongs to.  */
+  void propagate_scaled_cost ();
+
+  /* Return frequency of the cost.  */
+  double get_frequency ();
+
+  /* Scale COST by frequency of the cost.  */
+  const sreal scale_cost (int cost);
+
+  /* Return true if the frequency has a valid value.  */
+  bool has_frequency ();
+
   /* Return infinite comp_cost.  */
   static comp_cost get_infinite ();
 
@@ -249,6 +272,9 @@ private:
 			     complexity field should be larger for more
 			     complex expressions and addressing modes).  */
   int m_scratch;	  /* Scratch used during cost computation.  */
+  sreal m_frequency;	  /* Frequency of the basic block this comp_cost
+			     belongs to.  */
+  sreal m_cost_scaled;	  /* Scalled runtime cost.  */
 };
 
 comp_cost&
@@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
   m_cost = other.m_cost;
   m_complexity = other.m_complexity;
   m_scratch = other.m_scratch;
+  m_frequency = other.m_frequency;
+  m_cost_scaled = other.m_cost_scaled;
 
   return *this;
 }
@@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
 
   cost1.m_cost += cost2.m_cost;
   cost1.m_complexity += cost2.m_complexity;
+  cost1.m_cost_scaled += cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -290,6 +319,8 @@ comp_cost
 comp_cost::operator+= (HOST_WIDE_INT c)
 {
   this->m_cost += c;
+  if (has_frequency ())
+    this->m_cost_scaled += scale_cost (c);
 
   return *this;
 }
@@ -298,6 +329,8 @@ comp_cost
 comp_cost::operator-= (HOST_WIDE_INT c)
 {
   this->m_cost -= c;
+  if (has_frequency ())
+    this->m_cost_scaled -= scale_cost (c);
 
   return *this;
 }
@@ -306,6 +339,8 @@ comp_cost
 comp_cost::operator/= (HOST_WIDE_INT c)
 {
   this->m_cost /= c;
+  if (has_frequency ())
+    this->m_cost_scaled /= scale_cost (c);
 
   return *this;
 }
@@ -314,6 +349,8 @@ comp_cost
 comp_cost::operator*= (HOST_WIDE_INT c)
 {
   this->m_cost *= c;
+  if (has_frequency ())
+    this->m_cost_scaled *= scale_cost (c);
 
   return *this;
 }
@@ -323,6 +360,7 @@ operator- (comp_cost cost1, comp_cost cost2)
 {
   cost1.m_cost -= cost2.m_cost;
   cost1.m_complexity -= cost2.m_complexity;
+  cost1.m_cost_scaled -= cost2.m_cost_scaled;
 
   return cost1;
 }
@@ -366,6 +404,7 @@ void
 comp_cost::set_cost (int c)
 {
   m_cost = c;
+  m_cost_scaled = scale_cost (c);
 }
 
 unsigned
@@ -392,6 +431,48 @@ comp_cost::set_scratch (unsigned s)
   m_scratch = s;
 }
 
+double
+comp_cost::get_cost_scaled ()
+{
+  return m_cost_scaled.to_double ();
+}
+
+void
+comp_cost::calculate_scaled_cost (int nominator, int denominator)
+{
+  m_frequency = denominator == 0
+    ? sreal (1) : sreal (nominator) / sreal (denominator);
+
+  m_cost_scaled = scale_cost (m_cost);
+}
+
+void
+comp_cost::propagate_scaled_cost ()
+{
+  if (m_cost < 0)
+    return;
+
+  m_cost = m_cost_scaled.to_int ();
+}
+
+double
+comp_cost::get_frequency ()
+{
+  return m_frequency.to_double ();
+}
+
+const sreal
+comp_cost::scale_cost (int cost)
+{
+  return m_frequency * cost;
+}
+
+bool
+comp_cost::has_frequency ()
+{
+  return m_frequency != sreal (0);
+}
+
 comp_cost
 comp_cost::get_infinite ()
 {
@@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return cost + get_address_cost (symbol_present, var_present,
-				    offset, ratio, cstepi,
-				    mem_mode,
-				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-				    speed, stmt_is_after_inc, can_autoinc);
+    {
+      cost += get_address_cost (symbol_present, var_present,
+				offset, ratio, cstepi,
+				mem_mode,
+				TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				speed, stmt_is_after_inc, can_autoinc);
+      goto ret;
+    }
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-      return cost;
+      goto ret;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
     cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+
+  goto ret;
 
 fallback:
   if (can_autoinc)
@@ -5093,8 +5178,13 @@ fallback:
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    return comp_cost (computation_cost (comp, speed), 0);
+    cost = comp_cost (computation_cost (comp, speed), 0);
   }
+
+ret:
+  cost.calculate_scaled_cost (at->bb->frequency,
+			      data->current_loop->header->frequency);
+  return cost;
 }
 
 /* Determines the cost of the computation by that USE is expressed
@@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  group = data->vgroups[i];
 
 	  fprintf (dump_file, "Group %d:\n", i);
-	  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
+	  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
+		   "\tdepends on\n");
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
 		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
-	      fprintf (dump_file, "  %d\t%d\t%d\t",
+	      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
 		       group->cost_map[j].cand->id,
 		       group->cost_map[j].cost.get_cost (),
+		       group->cost_map[j].cost.get_cost_scaled (),
+		       group->cost_map[j].cost.get_frequency (),
 		       group->cost_map[j].cost.get_complexity ());
 	      if (group->cost_map[j].inv_expr != NULL)
 		fprintf (dump_file, "%d\t",
@@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
 	}
       fprintf (dump_file, "\n");
     }
+
+  for (i = 0; i < data->vgroups.length (); i++)
+    {
+      group = data->vgroups[i];
+      for (j = 0; j < group->n_map_members; j++)
+	{
+	  if (!group->cost_map[j].cand
+	      || group->cost_map[j].cost.infinite_cost_p ())
+	    continue;
+
+	  group->cost_map[j].cost.propagate_scaled_cost ();
+	}
+    }
 }
 
 /* Determines cost of the candidate CAND.  */
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-16 13:56   ` Martin Liška
@ 2016-05-16 22:27     ` Bin.Cheng
  2016-05-19 10:28       ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Bin.Cheng @ 2016-05-16 22:27 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

> As profile-guided optimization can provide very useful information
> about basic block frequencies within a loop, following patch set leverages
> that information. It speeds up a single benchmark from upcoming SPECv6
> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
> also improve others (currently measuring numbers for PGO).
Hi,
Is this 20% improvement from this patch, or does it include the
existing PGO's improvement?

For the patch:
> +
> +  /* Return true if the frequency has a valid value.  */
> +  bool has_frequency ();
> +
>    /* Return infinite comp_cost.  */
>    static comp_cost get_infinite ();
>
> @@ -249,6 +272,9 @@ private:
>       complexity field should be larger for more
>       complex expressions and addressing modes).  */
>    int m_scratch;  /* Scratch used during cost computation.  */
> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
> +     belongs to.  */
> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
IMHO we shouldn't embed frequency in comp_cost, neither record scaled
cost in it.  I would suggest we compute cost and amortize the cost
over frequency in get_computation_cost_at before storing it into
comp_cost.  That is, once cost is computed/stored in comp_cost, it is
already scaled with frequency.  One argument is frequency info is only
valid for use's statement/basic_block, it really doesn't have clear
meaning in comp_cost structure.  Outside of function
get_computation_cost_at, I found it's hard to understand/remember
what's the meaning of comp_cost.m_frequency and where it came from.
There are other reasons embedded in below comments.
>
>
>  comp_cost&
> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>    m_cost = other.m_cost;
>    m_complexity = other.m_complexity;
>    m_scratch = other.m_scratch;
> +  m_frequency = other.m_frequency;
> +  m_cost_scaled = other.m_cost_scaled;
>
>    return *this;
>  }
> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>
>    cost1.m_cost += cost2.m_cost;
>    cost1.m_complexity += cost2.m_complexity;
> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>
>    return cost1;
>  }
> @@ -290,6 +319,8 @@ comp_cost
>  comp_cost::operator+= (HOST_WIDE_INT c)
This and below operators need check for infinite cost first and return
immediately.
>  {
>    this->m_cost += c;
> +  if (has_frequency ())
> +    this->m_cost_scaled += scale_cost (c);
>
>    return *this;
>  }
> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>       (symbol/var1/const parts may be omitted).  If we are looking for an
>       address, find the cost of addressing this.  */
>    if (address_p)
> -    return cost + get_address_cost (symbol_present, var_present,
> -    offset, ratio, cstepi,
> -    mem_mode,
> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
> -    speed, stmt_is_after_inc, can_autoinc);
> +    {
> +      cost += get_address_cost (symbol_present, var_present,
> + offset, ratio, cstepi,
> + mem_mode,
> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
> + speed, stmt_is_after_inc, can_autoinc);
> +      goto ret;
> +    }
>
>    /* Otherwise estimate the costs for computing the expression.  */
>    if (!symbol_present && !var_present && !offset)
>      {
>        if (ratio != 1)
>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
> -      return cost;
> +      goto ret;
>      }
>
>    /* Symbol + offset should be compile-time computable so consider that they
> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>    aratio = ratio > 0 ? ratio : -ratio;
>    if (aratio != 1)
>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
> -  return cost;
> +
> +  goto ret;
>
>  fallback:
>    if (can_autoinc)
> @@ -5093,8 +5178,13 @@ fallback:
>      if (address_p)
>        comp = build_simple_mem_ref (comp);
>
> -    return comp_cost (computation_cost (comp, speed), 0);
> +    cost = comp_cost (computation_cost (comp, speed), 0);
>    }
> +
> +ret:
> +  cost.calculate_scaled_cost (at->bb->frequency,
> +      data->current_loop->header->frequency);
Here cost consists of two parts.  One is for loop invariant
computation, we amortize is against avg_loop_niter and record register
pressure (either via invriant variables or invariant expressions) for
it;  the other is loop variant part.  For the first part, we should
not scaled it using frequency, since we have already assumed it would
be hoisted out of loop.  No matter where the use is, hoisted loop
invariant has the same frequency as loop header.  This is the second
reason I want to factor frequency out of comp_cost.  It's easier to
scale with frequency only it's necessary.

> +  return cost;
>  }
>
>  /* Determines the cost of the computation by that USE is expressed
> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>    group = data->vgroups[i];
>
>    fprintf (dump_file, "Group %d:\n", i);
> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
> +   "\tdepends on\n");
>    for (j = 0; j < group->n_map_members; j++)
>      {
>        if (!group->cost_map[j].cand
>    || group->cost_map[j].cost.infinite_cost_p ())
>   continue;
>
> -      fprintf (dump_file, "  %d\t%d\t%d\t",
> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>         group->cost_map[j].cand->id,
>         group->cost_map[j].cost.get_cost (),
> +       group->cost_map[j].cost.get_cost_scaled (),
> +       group->cost_map[j].cost.get_frequency (),
>         group->cost_map[j].cost.get_complexity ());
>        if (group->cost_map[j].inv_expr != NULL)
>   fprintf (dump_file, "%d\t",
> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>   }
>        fprintf (dump_file, "\n");
>      }
> +
> +  for (i = 0; i < data->vgroups.length (); i++)
> +    {
> +      group = data->vgroups[i];
> +      for (j = 0; j < group->n_map_members; j++)
> + {
> +  if (!group->cost_map[j].cand
> +      || group->cost_map[j].cost.infinite_cost_p ())
> +    continue;
> +
> +  group->cost_map[j].cost.propagate_scaled_cost ();
> + }
> +    }
This is wrong.  m_frequency and m_cost_scaled are initialized to
sreal(0) by default, and are never changed later for conditional
iv_use.  As a matter of factor, costs computed for all conditional
iv_uses are wrong (value is 0).  This makes the observed improvement
not that promising.  Considering code generation is very sensitive to
cost computation, it maybe just hit some special cases.  Eitherway we
need more work/investigation on the impact of this patch.

Again, I would suggest we factor out frequency out of comp_cost and
only scale the cost in place when we compute cost for each use.  Then
we can measure what's the impact on code generation.

Thanks,
bin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-05-16 13:55     ` Martin Liška
@ 2016-05-19 10:23       ` Martin Liška
  2016-05-19 11:24         ` Bin.Cheng
  0 siblings, 1 reply; 34+ messages in thread
From: Martin Liška @ 2016-05-19 10:23 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

On 05/16/2016 03:55 PM, Martin Liška wrote:
> On 05/16/2016 12:13 PM, Bin.Cheng wrote:
>> Hi Martin,
>> Could you please rebase this patch and the profiling one against
>> latest trunk?  The third patch was applied before these two now.
>>
>> Thanks,
>> bin
> 
> Hello.
> 
> Sending the rebased version of the patch.
> 
> Martin
> 

Hello.

As I've dramatically changed the 2/3 PATCH, a class encapsulation is not needed any longer.
Thus, I've reduced this patch just to usage of member function/operators that are useful
in my eyes. It's up the Bin whether to merge the patch?

Martin

[-- Attachment #2: 0001-IVOPTS-make-comp_cost-in-a-more-c-fashion-v3.patch --]
[-- Type: text/x-patch, Size: 31152 bytes --]

From 2f759a3cbd5bcf2bfd0717a7f910efe19581636e Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 17 May 2016 13:52:11 +0200
Subject: [PATCH 1/4] IVOPTS: make comp_cost in a more c++ fashion.

gcc/ChangeLog:

2016-05-17  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (comp_cost::infinite_cost_p): New
	function.
	(operator+): Likewise.
	(operator-): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator<): Likewise.
	(operator==): Likewise.
	(operator<=): Likewise.
	(comp_cost::get_infinite): Likewise.
	(comp_cost::get_no_cost): Likewise.
	(new_cost): Remove.
	(infinite_cost_p): Likewise.
	(add_costs): Likewise.
	(sub_costs): Likewise.
	(compare_costs): Likewise.
	(set_group_iv_cost): Use the newly introduced functions.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(split_address_cost): Likewise.
	(ptr_difference_cost): Likewise.
	(difference_cost): Likewise.
	(get_computation_cost_at): Likewise.
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(determine_group_iv_cost_cond): Likewise.
	(autoinc_possible_for_pair): Likewise.
	(determine_group_iv_costs): Likewise.
	(cheaper_cost_pair): Likewise.
	(iv_ca_recount_cost): Likewise.
	(iv_ca_set_no_cp): Likewise.
	(iv_ca_set_cp): Likewise.
	(iv_ca_cost): Likewise.
	(iv_ca_new): Likewise.
	(iv_ca_dump): Likewise.
	(iv_ca_narrow): Likewise.
	(iv_ca_prune): Likewise.
	(iv_ca_replace): Likewise.
	(try_add_cand_for): Likewise.
	(try_improve_iv_set): Likewise.
	(find_optimal_iv_set): Likewise.
---
 gcc/tree-ssa-loop-ivopts.c | 439 ++++++++++++++++++++++++++++-----------------
 1 file changed, 271 insertions(+), 168 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9ce6b64..f48b2f6 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -173,16 +173,184 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
+  comp_cost (): cost (0), complexity (0), scratch (0)
+  {}
+
+  comp_cost (int cost, unsigned complexity)
+    : cost (cost), complexity (complexity), scratch (0)
+  {}
+
+  /* Returns true if COST is infinite.  */
+  bool infinite_cost_p ();
+
+  /* Adds costs COST1 and COST2.  */
+  friend comp_cost operator+ (comp_cost cost1, comp_cost cost2);
+
+  /* Adds COST to the comp_cost.  */
+  comp_cost operator+= (comp_cost cost);
+
+  /* Adds constant C to this comp_cost.  */
+  comp_cost operator+= (HOST_WIDE_INT c);
+
+  /* Subtracts constant C to this comp_cost.  */
+  comp_cost operator-= (HOST_WIDE_INT c);
+
+  /* Divide the comp_cost by constant C.  */
+  comp_cost operator/= (HOST_WIDE_INT c);
+
+  /* Multiply the comp_cost by constant C.  */
+  comp_cost operator*= (HOST_WIDE_INT c);
+
+  /* Subtracts costs COST1 and COST2.  */
+  friend comp_cost operator- (comp_cost cost1, comp_cost cost2);
+
+  /* Subtracts COST from this comp_cost.  */
+  comp_cost operator-= (comp_cost cost);
+
+  /* Returns true if COST1 is smaller than COST2.  */
+  friend bool operator< (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 and COST2 are equal.  */
+  friend bool operator== (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 is smaller or equal than COST2.  */
+  friend bool operator<= (comp_cost cost1, comp_cost cost2);
+
+  /* Return infinite comp_cost.  */
+  static comp_cost get_infinite ();
+
+  /* Return empty comp_cost.  */
+  static comp_cost get_no_cost ();
+
   int cost;		/* The runtime cost.  */
-  unsigned complexity;	/* The estimate of the complexity of the code for
+  unsigned complexity;  /* The estimate of the complexity of the code for
 			   the computation (in no concrete units --
 			   complexity field should be larger for more
 			   complex expressions and addressing modes).  */
   int scratch;		/* Scratch used during cost computation.  */
 };
 
-static const comp_cost no_cost = {0, 0, 0};
-static const comp_cost infinite_cost = {INFTY, INFTY, INFTY};
+bool
+comp_cost::infinite_cost_p ()
+{
+  return cost == INFTY;
+}
+
+comp_cost
+operator+ (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
+    return comp_cost::get_infinite ();
+
+  cost1.cost += cost2.cost;
+  cost1.complexity += cost2.complexity;
+
+  return cost1;
+}
+
+comp_cost
+operator- (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
+    return comp_cost::get_infinite ();
+
+  cost1.cost -= cost2.cost;
+  cost1.complexity -= cost2.complexity;
+
+  return cost1;
+}
+
+comp_cost
+comp_cost::operator+= (comp_cost cost)
+{
+  *this = *this + cost;
+  return *this;
+}
+
+comp_cost
+comp_cost::operator+= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost += c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator-= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost -= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator/= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost /= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator*= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost *= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator-= (comp_cost cost)
+{
+  *this = *this - cost;
+  return *this;
+}
+
+bool
+operator< (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.cost == cost2.cost)
+    return cost1.complexity < cost2.complexity;
+
+  return cost1.cost < cost2.cost;
+}
+
+bool
+operator== (comp_cost cost1, comp_cost cost2)
+{
+  return cost1.cost == cost2.cost
+    && cost1.complexity == cost2.complexity;
+}
+
+bool
+operator<= (comp_cost cost1, comp_cost cost2)
+{
+  return cost1 < cost2 || cost1 == cost2;
+}
+
+comp_cost
+comp_cost::get_infinite ()
+{
+  return comp_cost (INFTY, INFTY);
+}
+
+comp_cost
+comp_cost::get_no_cost ()
+{
+  return comp_cost ();
+}
 
 struct iv_inv_expr_ent;
 
@@ -3284,64 +3452,6 @@ alloc_use_cost_map (struct ivopts_data *data)
     }
 }
 
-/* Returns description of computation cost of expression whose runtime
-   cost is RUNTIME and complexity corresponds to COMPLEXITY.  */
-
-static comp_cost
-new_cost (unsigned runtime, unsigned complexity)
-{
-  comp_cost cost;
-
-  cost.cost = runtime;
-  cost.complexity = complexity;
-
-  return cost;
-}
-
-/* Returns true if COST is infinite.  */
-
-static bool
-infinite_cost_p (comp_cost cost)
-{
-  return cost.cost == INFTY;
-}
-
-/* Adds costs COST1 and COST2.  */
-
-static comp_cost
-add_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (infinite_cost_p (cost1) || infinite_cost_p (cost2))
-    return infinite_cost;
-
-  cost1.cost += cost2.cost;
-  cost1.complexity += cost2.complexity;
-
-  return cost1;
-}
-/* Subtracts costs COST1 and COST2.  */
-
-static comp_cost
-sub_costs (comp_cost cost1, comp_cost cost2)
-{
-  cost1.cost -= cost2.cost;
-  cost1.complexity -= cost2.complexity;
-
-  return cost1;
-}
-
-/* Returns a negative number if COST1 < COST2, a positive number if
-   COST1 > COST2, and 0 if COST1 = COST2.  */
-
-static int
-compare_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (cost1.cost == cost2.cost)
-    return cost1.complexity - cost2.complexity;
-
-  return cost1.cost - cost2.cost;
-}
-
 /* Sets cost of (GROUP, CAND) pair to COST and record that it depends
    on invariants DEPENDS_ON and that the value used in expressing it
    is VALUE, and in case of iv elimination the comparison operator is COMP.  */
@@ -3354,7 +3464,7 @@ set_group_iv_cost (struct ivopts_data *data,
 {
   unsigned i, s;
 
-  if (infinite_cost_p (cost))
+  if (cost.infinite_cost_p ())
     {
       BITMAP_FREE (depends_on);
       return;
@@ -4170,7 +4280,7 @@ get_address_cost (bool symbol_present, bool var_present,
   else
     acost = data->costs[symbol_present][var_present][offset_p][ratio_p];
   complexity = (symbol_present != 0) + (var_present != 0) + offset_p + ratio_p;
-  return new_cost (cost + acost, complexity);
+  return comp_cost (cost + acost, complexity);
 }
 
  /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
@@ -4207,12 +4317,12 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
 		? shiftsub1_cost (speed, mode, m)
 		: shiftsub0_cost (speed, mode, m)));
 
-  res = new_cost (MIN (as_cost, sa_cost), 0);
-  res = add_costs (res, mult_in_op1 ? cost0 : cost1);
+  res = comp_cost (MIN (as_cost, sa_cost), 0);
+  res += (mult_in_op1 ? cost0 : cost1);
 
   STRIP_NOPS (multop);
   if (!is_gimple_val (multop))
-    res = add_costs (res, force_expr_to_var_cost (multop, speed));
+    res += force_expr_to_var_cost (multop, speed);
 
   *cost = res;
   return true;
@@ -4272,12 +4382,12 @@ force_expr_to_var_cost (tree expr, bool speed)
   STRIP_NOPS (expr);
 
   if (SSA_VAR_P (expr))
-    return no_cost;
+    return comp_cost::get_no_cost ();
 
   if (is_gimple_min_invariant (expr))
     {
       if (TREE_CODE (expr) == INTEGER_CST)
-	return new_cost (integer_cost [speed], 0);
+	return comp_cost (integer_cost [speed], 0);
 
       if (TREE_CODE (expr) == ADDR_EXPR)
 	{
@@ -4286,10 +4396,10 @@ force_expr_to_var_cost (tree expr, bool speed)
 	  if (TREE_CODE (obj) == VAR_DECL
 	      || TREE_CODE (obj) == PARM_DECL
 	      || TREE_CODE (obj) == RESULT_DECL)
-	    return new_cost (symbol_cost [speed], 0);
+	    return comp_cost (symbol_cost [speed], 0);
 	}
 
-      return new_cost (address_cost [speed], 0);
+      return comp_cost (address_cost [speed], 0);
     }
 
   switch (TREE_CODE (expr))
@@ -4313,18 +4423,18 @@ force_expr_to_var_cost (tree expr, bool speed)
 
     default:
       /* Just an arbitrary value, FIXME.  */
-      return new_cost (target_spill_cost[speed], 0);
+      return comp_cost (target_spill_cost[speed], 0);
     }
 
   if (op0 == NULL_TREE
       || TREE_CODE (op0) == SSA_NAME || CONSTANT_CLASS_P (op0))
-    cost0 = no_cost;
+    cost0 = comp_cost::get_no_cost ();
   else
     cost0 = force_expr_to_var_cost (op0, speed);
 
   if (op1 == NULL_TREE
       || TREE_CODE (op1) == SSA_NAME || CONSTANT_CLASS_P (op1))
-    cost1 = no_cost;
+    cost1 = comp_cost::get_no_cost ();
   else
     cost1 = force_expr_to_var_cost (op1, speed);
 
@@ -4335,7 +4445,7 @@ force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
-      cost = new_cost (add_cost (speed, mode), 0);
+      cost = comp_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
 	{
 	  tree mult = NULL_TREE;
@@ -4358,28 +4468,28 @@ force_expr_to_var_cost (tree expr, bool speed)
 	tree inner_mode, outer_mode;
 	outer_mode = TREE_TYPE (expr);
 	inner_mode = TREE_TYPE (op0);
-	cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+	cost = comp_cost (convert_cost (TYPE_MODE (outer_mode),
 				       TYPE_MODE (inner_mode), speed), 0);
       }
       break;
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op0),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op0),
 					     mode, speed), 0);
       else if (cst_and_fits_in_hwi (op1))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op1),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op1),
 					     mode, speed), 0);
       else
-	return new_cost (target_spill_cost [speed], 0);
+	return comp_cost (target_spill_cost [speed], 0);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  cost = add_costs (cost, cost0);
-  cost = add_costs (cost, cost1);
+  cost += cost0;
+  cost += cost1;
 
   /* Bound the cost by target_spill_cost.  The parts of complicated
      computations often are either loop invariant or at least can
@@ -4438,7 +4548,7 @@ split_address_cost (struct ivopts_data *data,
       if (depends_on)
 	walk_tree (&addr, find_depends, depends_on, NULL);
 
-      return new_cost (target_spill_cost[data->speed], 0);
+      return comp_cost (target_spill_cost[data->speed], 0);
     }
 
   *offset += bitpos / BITS_PER_UNIT;
@@ -4447,12 +4557,12 @@ split_address_cost (struct ivopts_data *data,
     {
       *symbol_present = true;
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   *symbol_present = false;
   *var_present = true;
-  return no_cost;
+  return comp_cost::get_no_cost ();
 }
 
 /* Estimates cost of expressing difference of addresses E1 - E2 as
@@ -4477,7 +4587,7 @@ ptr_difference_cost (struct ivopts_data *data,
       *offset += diff;
       *symbol_present = false;
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   if (integer_zerop (e2))
@@ -4527,7 +4637,7 @@ difference_cost (struct ivopts_data *data,
   if (operand_equal_p (e1, e2, 0))
     {
       *var_present = false;
-      return no_cost;
+      return comp_cost::get_no_cost ();
     }
 
   *var_present = true;
@@ -4538,7 +4648,7 @@ difference_cost (struct ivopts_data *data,
   if (integer_zerop (e1))
     {
       comp_cost cost = force_var_cost (data, e2, depends_on);
-      cost.cost += mult_by_coeff_cost (-1, mode, data->speed);
+      cost += mult_by_coeff_cost (-1, mode, data->speed);
       return cost;
     }
 
@@ -4732,7 +4842,7 @@ get_computation_cost_at (struct ivopts_data *data,
 
   /* Only consider real candidates.  */
   if (!cand->iv)
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   cbase = cand->iv->base;
   cstep = cand->iv->step;
@@ -4741,7 +4851,7 @@ get_computation_cost_at (struct ivopts_data *data,
   if (TYPE_PRECISION (utype) > TYPE_PRECISION (ctype))
     {
       /* We do not have a precision to express the values of use.  */
-      return infinite_cost;
+      return comp_cost::get_infinite ();
     }
 
   if (address_p
@@ -4758,7 +4868,7 @@ get_computation_cost_at (struct ivopts_data *data,
       if (use->iv->base_object
 	  && cand->iv->base_object
 	  && !operand_equal_p (use->iv->base_object, cand->iv->base_object, 0))
-	return infinite_cost;
+	return comp_cost::get_infinite ();
     }
 
   if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
@@ -4779,12 +4889,12 @@ get_computation_cost_at (struct ivopts_data *data,
     cstepi = 0;
 
   if (!constant_multiple_of (ustep, cstep, &rat))
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   if (wi::fits_shwi_p (rat))
     ratio = rat.to_shwi ();
   else
-    return infinite_cost;
+    return comp_cost::get_infinite ();
 
   STRIP_NOPS (cbase);
   ctype = TREE_TYPE (cbase);
@@ -4805,7 +4915,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -4829,7 +4939,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, real_cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -4852,21 +4962,19 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, real_cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost = add_costs (cost,
-			difference_cost (data,
-					 ubase, build_int_cst (utype, 0),
-					 &symbol_present, &var_present,
-					 &offset, depends_on));
-      cost.cost /= avg_loop_niter (data->current_loop);
-      cost.cost += add_cost (data->speed, TYPE_MODE (ctype));
+      cost += difference_cost (data, ubase, build_int_cst (utype, 0),
+			       &symbol_present, &var_present, &offset,
+			       depends_on);
+      cost /= avg_loop_niter (data->current_loop);
+      cost += add_cost (data->speed, TYPE_MODE (ctype));
     }
 
-  /* Record setup cost in scrach field.  */
+  /* Record setup cost in scratch field.  */
   cost.scratch = cost.cost;
 
   if (inv_expr && depends_on && *depends_on)
@@ -4887,26 +4995,24 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return add_costs (cost,
-		      get_address_cost (symbol_present, var_present,
-					offset, ratio, cstepi,
-					mem_mode,
-					TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-					speed, stmt_is_after_inc,
-					can_autoinc));
+    return cost + get_address_cost (symbol_present, var_present,
+				    offset, ratio, cstepi,
+				    mem_mode,
+				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				    speed, stmt_is_after_inc, can_autoinc);
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
-	cost.cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
+	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
       return cost;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
-    cost.cost += adjust_setup_cost (data,
+    cost += adjust_setup_cost (data,
 				    add_cost (speed, TYPE_MODE (ctype)));
 
   /* Having offset does not affect runtime cost in case it is added to
@@ -4914,11 +5020,11 @@ get_computation_cost_at (struct ivopts_data *data,
   if (offset)
     cost.complexity++;
 
-  cost.cost += add_cost (speed, TYPE_MODE (ctype));
+  cost += add_cost (speed, TYPE_MODE (ctype));
 
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
-    cost.cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
+    cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
   return cost;
 
 fallback:
@@ -4930,14 +5036,12 @@ fallback:
     tree comp = get_computation_at (data->current_loop, use, cand, at);
 
     if (!comp)
-      return infinite_cost;
+      return comp_cost::get_infinite ();
 
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    cost = new_cost (computation_cost (comp, speed), 0);
-    cost.scratch = 0;
-    return cost;
+    return comp_cost (computation_cost (comp, speed), 0);
   }
 }
 
@@ -4976,14 +5080,14 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
      cost of increment twice -- once at this use and once in the cost of
      the candidate.  */
   if (cand->pos == IP_ORIGINAL && cand->incremented_at == use->stmt)
-    cost = no_cost;
+    cost = comp_cost::get_no_cost ();
   else
     cost = get_computation_cost (data, use, cand, false,
 				 &depends_on, NULL, &inv_expr);
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr);
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND in addresses.  */
@@ -4997,27 +5101,27 @@ determine_group_iv_cost_address (struct ivopts_data *data,
   bool can_autoinc, first = true;
   iv_inv_expr_ent *inv_expr = NULL;
   struct iv_use *use = group->vuses[0];
-  comp_cost sum_cost = no_cost, cost;
+  comp_cost sum_cost = comp_cost::get_no_cost (), cost;
 
   cost = get_computation_cost (data, use, cand, true,
 			       &depends_on, &can_autoinc, &inv_expr);
 
   sum_cost = cost;
-  if (!infinite_cost_p (sum_cost) && cand->ainc_use == use)
+  if (!sum_cost.infinite_cost_p () && cand->ainc_use == use)
     {
       if (can_autoinc)
-	sum_cost.cost -= cand->cost_step;
+	sum_cost -= cand->cost_step;
       /* If we generated the candidate solely for exploiting autoincrement
 	 opportunities, and it turns out it can't be used, set the cost to
 	 infinity to make sure we ignore it.  */
       else if (cand->pos == IP_AFTER_USE || cand->pos == IP_BEFORE_USE)
-	sum_cost = infinite_cost;
+	sum_cost = comp_cost::get_infinite ();
     }
 
   /* Uses in a group can share setup code, so only add setup cost once.  */
-  cost.cost -= cost.scratch;
+  cost -= cost.scratch;
   /* Compute and add costs for rest uses of this group.  */
-  for (i = 1; i < group->vuses.length () && !infinite_cost_p (sum_cost); i++)
+  for (i = 1; i < group->vuses.length () && !sum_cost.infinite_cost_p (); i++)
     {
       struct iv_use *next = group->vuses[i];
 
@@ -5042,15 +5146,15 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 	  cost = get_computation_cost (data, next, cand, true,
 				       NULL, &can_autoinc, NULL);
 	  /* Remove setup cost.  */
-	  if (!infinite_cost_p (cost))
-	    cost.cost -= cost.scratch;
+	  if (!cost.infinite_cost_p ())
+	    cost -= cost.scratch;
 	}
-      sum_cost = add_costs (sum_cost, cost);
+      sum_cost += cost;
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr);
 
-  return !infinite_cost_p (sum_cost);
+  return !sum_cost.infinite_cost_p ();
 }
 
 /* Computes value of candidate CAND at position AT in iteration NITER, and
@@ -5499,7 +5603,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
       elim_cost.cost = adjust_setup_cost (data, elim_cost.cost);
     }
   else
-    elim_cost = infinite_cost;
+    elim_cost = comp_cost::get_infinite ();
 
   /* Try expressing the original giv.  If it is compared with an invariant,
      note that we cannot get rid of it.  */
@@ -5513,11 +5617,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
      TODO: The constant that we're subtracting from the cost should
      be target-dependent.  This information should be added to the
      target costs for each backend.  */
-  if (!infinite_cost_p (elim_cost) /* Do not try to decrease infinite! */
+  if (!elim_cost.infinite_cost_p () /* Do not try to decrease infinite! */
       && integer_zerop (*bound_cst)
       && (operand_equal_p (*control_var, cand->var_after, 0)
 	  || operand_equal_p (*control_var, cand->var_before, 0)))
-    elim_cost.cost -= 1;
+    elim_cost -= 1;
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
@@ -5531,10 +5635,10 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
     bound_cost.cost = parm_decl_cost (data, *bound_cst);
   else if (TREE_CODE (*bound_cst) == INTEGER_CST)
     bound_cost.cost = 0;
-  express_cost.cost += bound_cost.cost;
+  express_cost += bound_cost;
 
   /* Choose the better approach, preferring the eliminated IV. */
-  if (compare_costs (elim_cost, express_cost) <= 0)
+  if (elim_cost <= express_cost)
     {
       cost = elim_cost;
       depends_on = depends_on_elim;
@@ -5559,7 +5663,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   if (depends_on_express)
     BITMAP_FREE (depends_on_express);
 
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND.  Returns false
@@ -5604,7 +5708,7 @@ autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use,
 
   BITMAP_FREE (depends_on);
 
-  return !infinite_cost_p (cost) && can_autoinc;
+  return !cost.infinite_cost_p () && can_autoinc;
 }
 
 /* Examine IP_ORIGINAL candidates to see if they are incremented next to a
@@ -5770,7 +5874,7 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
-		  || infinite_cost_p (group->cost_map[j].cost))
+		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
 	      fprintf (dump_file, "  %d\t%d\t%d\t",
@@ -5944,19 +6048,16 @@ determine_set_costs (struct ivopts_data *data)
 static bool
 cheaper_cost_pair (struct cost_pair *a, struct cost_pair *b)
 {
-  int cmp;
-
   if (!a)
     return false;
 
   if (!b)
     return true;
 
-  cmp = compare_costs (a->cost, b->cost);
-  if (cmp < 0)
+  if (a->cost < b->cost)
     return true;
 
-  if (cmp > 0)
+  if (b->cost < a->cost)
     return false;
 
   /* In case the costs are the same, prefer the cheaper candidate.  */
@@ -5982,11 +6083,11 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
 {
   comp_cost cost = ivs->cand_use_cost;
 
-  cost.cost += ivs->cand_cost;
+  cost+= ivs->cand_cost;
 
-  cost.cost += ivopts_global_cost_for_size (data,
-					    ivs->n_regs
-					    + ivs->used_inv_exprs->elements ());
+  cost += ivopts_global_cost_for_size (data,
+				       ivs->n_regs
+				       + ivs->used_inv_exprs->elements ());
 
   ivs->cost = cost;
 }
@@ -6040,7 +6141,7 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
     }
 
-  ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
+  ivs->cand_use_cost -= cp->cost;
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
@@ -6106,7 +6207,7 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
 	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
 	}
 
-      ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
+      ivs->cand_use_cost += cp->cost;
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
       if (cp->inv_expr != NULL)
@@ -6165,7 +6266,7 @@ iv_ca_cost (struct iv_ca *ivs)
   /* This was a conditional expression but it triggered a bug in
      Sun C 5.5.  */
   if (ivs->bad_groups)
-    return infinite_cost;
+    return comp_cost::get_infinite ();
   else
     return ivs->cost;
 }
@@ -6319,11 +6420,11 @@ iv_ca_new (struct ivopts_data *data)
   nw->cands = BITMAP_ALLOC (NULL);
   nw->n_cands = 0;
   nw->n_regs = 0;
-  nw->cand_use_cost = no_cost;
+  nw->cand_use_cost = comp_cost::get_no_cost ();
   nw->cand_cost = 0;
   nw->n_invariant_uses = XCNEWVEC (unsigned, data->max_inv_id + 1);
   nw->used_inv_exprs = new hash_map <iv_inv_expr_ent *, unsigned> (13);
-  nw->cost = no_cost;
+  nw->cost = comp_cost::get_no_cost ();
 
   return nw;
 }
@@ -6350,7 +6451,8 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost,
+	   cost.complexity);
   fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
 	   ivs->cand_cost, ivs->cand_use_cost.cost,
 	   ivs->cand_use_cost.complexity);
@@ -6361,8 +6463,9 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
       struct iv_group *group = data->vgroups[i];
       struct cost_pair *cp = iv_ca_cand_for_group (ivs, group);
       if (cp)
-	fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
-		 group->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+        fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
+		 group->id, cp->cand->id, cp->cost.cost,
+		 cp->cost.complexity);
       else
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
@@ -6480,7 +6583,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6503,7 +6606,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6516,7 +6619,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
       if (!new_cp)
 	{
 	  iv_ca_delta_free (delta);
-	  return infinite_cost;
+	  return comp_cost::get_infinite ();
 	}
 
       *delta = iv_ca_delta_add (group, old_cp, new_cp, *delta);
@@ -6555,7 +6658,7 @@ iv_ca_prune (struct ivopts_data *data, struct iv_ca *ivs,
 
       acost = iv_ca_narrow (data, ivs, cand, except_cand, &act_delta);
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6668,7 +6771,7 @@ iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_delta_commit (data, ivs, act_delta, false);
       act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 
-      if (compare_costs (acost, orig_cost) < 0)
+      if (acost < orig_cost)
 	{
 	  *delta = act_delta;
 	  return acost;
@@ -6737,7 +6840,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
-      if (compare_costs (act_cost, best_cost) < 0)
+      if (act_cost < best_cost)
 	{
 	  best_cost = act_cost;
 
@@ -6748,7 +6851,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	iv_ca_delta_free (&act_delta);
     }
 
-  if (infinite_cost_p (best_cost))
+  if (best_cost.infinite_cost_p ())
     {
       for (i = 0; i < group->n_map_members; i++)
 	{
@@ -6777,7 +6880,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 				       iv_ca_cand_for_group (ivs, group),
 				       cp, act_delta);
 
-	  if (compare_costs (act_cost, best_cost) < 0)
+	  if (act_cost < best_cost)
 	    {
 	      best_cost = act_cost;
 
@@ -6793,7 +6896,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
   iv_ca_delta_commit (data, ivs, best_delta, true);
   iv_ca_delta_free (&best_delta);
 
-  return !infinite_cost_p (best_cost);
+  return !best_cost.infinite_cost_p ();
 }
 
 /* Finds an initial assignment of candidates to uses.  */
@@ -6849,7 +6952,7 @@ try_improve_iv_set (struct ivopts_data *data,
 	  act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 	}
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6883,7 +6986,7 @@ try_improve_iv_set (struct ivopts_data *data,
     }
 
   iv_ca_delta_commit (data, ivs, best_delta, true);
-  gcc_assert (compare_costs (best_cost, iv_ca_cost (ivs)) == 0);
+  gcc_assert (best_cost == iv_ca_cost (ivs));
   iv_ca_delta_free (&best_delta);
   return true;
 }
@@ -6941,8 +7044,8 @@ find_optimal_iv_set (struct ivopts_data *data)
   if (!origset && !set)
     return NULL;
 
-  origcost = origset ? iv_ca_cost (origset) : infinite_cost;
-  cost = set ? iv_ca_cost (set) : infinite_cost;
+  origcost = origset ? iv_ca_cost (origset) : comp_cost::get_infinite ();
+  cost = set ? iv_ca_cost (set) : comp_cost::get_infinite ();
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -6953,7 +7056,7 @@ find_optimal_iv_set (struct ivopts_data *data)
     }
 
   /* Choose the one with the best cost.  */
-  if (compare_costs (origcost, cost) <= 0)
+  if (origcost <= cost)
     {
       if (set)
 	iv_ca_free (&set);
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-16 22:27     ` Bin.Cheng
@ 2016-05-19 10:28       ` Martin Liška
  2016-05-20 10:04         ` Bin.Cheng
  2016-05-24 10:19         ` Bin.Cheng
  0 siblings, 2 replies; 34+ messages in thread
From: Martin Liška @ 2016-05-19 10:28 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 7556 bytes --]

On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>> As profile-guided optimization can provide very useful information
>> about basic block frequencies within a loop, following patch set leverages
>> that information. It speeds up a single benchmark from upcoming SPECv6
>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>> also improve others (currently measuring numbers for PGO).
> Hi,
> Is this 20% improvement from this patch, or does it include the
> existing PGO's improvement?

Hello.

It shows that current trunk (compared to GCC 6 branch)
has significantly improved the benchmark with PGO.
Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
improve static profile that would utilize the patch.

> 
> For the patch:
>> +
>> +  /* Return true if the frequency has a valid value.  */
>> +  bool has_frequency ();
>> +
>>    /* Return infinite comp_cost.  */
>>    static comp_cost get_infinite ();
>>
>> @@ -249,6 +272,9 @@ private:
>>       complexity field should be larger for more
>>       complex expressions and addressing modes).  */
>>    int m_scratch;  /* Scratch used during cost computation.  */
>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>> +     belongs to.  */
>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
> cost in it.  I would suggest we compute cost and amortize the cost
> over frequency in get_computation_cost_at before storing it into
> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
> already scaled with frequency.  One argument is frequency info is only
> valid for use's statement/basic_block, it really doesn't have clear
> meaning in comp_cost structure.  Outside of function
> get_computation_cost_at, I found it's hard to understand/remember
> what's the meaning of comp_cost.m_frequency and where it came from.
> There are other reasons embedded in below comments.
>>
>>
>>  comp_cost&
>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>    m_cost = other.m_cost;
>>    m_complexity = other.m_complexity;
>>    m_scratch = other.m_scratch;
>> +  m_frequency = other.m_frequency;
>> +  m_cost_scaled = other.m_cost_scaled;
>>
>>    return *this;
>>  }
>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>
>>    cost1.m_cost += cost2.m_cost;
>>    cost1.m_complexity += cost2.m_complexity;
>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>
>>    return cost1;
>>  }
>> @@ -290,6 +319,8 @@ comp_cost
>>  comp_cost::operator+= (HOST_WIDE_INT c)
> This and below operators need check for infinite cost first and return
> immediately.
>>  {
>>    this->m_cost += c;
>> +  if (has_frequency ())
>> +    this->m_cost_scaled += scale_cost (c);
>>
>>    return *this;
>>  }
>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>       (symbol/var1/const parts may be omitted).  If we are looking for an
>>       address, find the cost of addressing this.  */
>>    if (address_p)
>> -    return cost + get_address_cost (symbol_present, var_present,
>> -    offset, ratio, cstepi,
>> -    mem_mode,
>> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>> -    speed, stmt_is_after_inc, can_autoinc);
>> +    {
>> +      cost += get_address_cost (symbol_present, var_present,
>> + offset, ratio, cstepi,
>> + mem_mode,
>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>> + speed, stmt_is_after_inc, can_autoinc);
>> +      goto ret;
>> +    }
>>
>>    /* Otherwise estimate the costs for computing the expression.  */
>>    if (!symbol_present && !var_present && !offset)
>>      {
>>        if (ratio != 1)
>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>> -      return cost;
>> +      goto ret;
>>      }
>>
>>    /* Symbol + offset should be compile-time computable so consider that they
>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>    aratio = ratio > 0 ? ratio : -ratio;
>>    if (aratio != 1)
>>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>> -  return cost;
>> +
>> +  goto ret;
>>
>>  fallback:
>>    if (can_autoinc)
>> @@ -5093,8 +5178,13 @@ fallback:
>>      if (address_p)
>>        comp = build_simple_mem_ref (comp);
>>
>> -    return comp_cost (computation_cost (comp, speed), 0);
>> +    cost = comp_cost (computation_cost (comp, speed), 0);
>>    }
>> +
>> +ret:
>> +  cost.calculate_scaled_cost (at->bb->frequency,
>> +      data->current_loop->header->frequency);
> Here cost consists of two parts.  One is for loop invariant
> computation, we amortize is against avg_loop_niter and record register
> pressure (either via invriant variables or invariant expressions) for
> it;  the other is loop variant part.  For the first part, we should
> not scaled it using frequency, since we have already assumed it would
> be hoisted out of loop.  No matter where the use is, hoisted loop
> invariant has the same frequency as loop header.  This is the second
> reason I want to factor frequency out of comp_cost.  It's easier to
> scale with frequency only it's necessary.
> 
>> +  return cost;
>>  }
>>
>>  /* Determines the cost of the computation by that USE is expressed
>> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>    group = data->vgroups[i];
>>
>>    fprintf (dump_file, "Group %d:\n", i);
>> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
>> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
>> +   "\tdepends on\n");
>>    for (j = 0; j < group->n_map_members; j++)
>>      {
>>        if (!group->cost_map[j].cand
>>    || group->cost_map[j].cost.infinite_cost_p ())
>>   continue;
>>
>> -      fprintf (dump_file, "  %d\t%d\t%d\t",
>> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>>         group->cost_map[j].cand->id,
>>         group->cost_map[j].cost.get_cost (),
>> +       group->cost_map[j].cost.get_cost_scaled (),
>> +       group->cost_map[j].cost.get_frequency (),
>>         group->cost_map[j].cost.get_complexity ());
>>        if (group->cost_map[j].inv_expr != NULL)
>>   fprintf (dump_file, "%d\t",
>> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>   }
>>        fprintf (dump_file, "\n");
>>      }
>> +
>> +  for (i = 0; i < data->vgroups.length (); i++)
>> +    {
>> +      group = data->vgroups[i];
>> +      for (j = 0; j < group->n_map_members; j++)
>> + {
>> +  if (!group->cost_map[j].cand
>> +      || group->cost_map[j].cost.infinite_cost_p ())
>> +    continue;
>> +
>> +  group->cost_map[j].cost.propagate_scaled_cost ();
>> + }
>> +    }
> This is wrong.  m_frequency and m_cost_scaled are initialized to
> sreal(0) by default, and are never changed later for conditional
> iv_use.  As a matter of factor, costs computed for all conditional
> iv_uses are wrong (value is 0).  This makes the observed improvement
> not that promising.  Considering code generation is very sensitive to
> cost computation, it maybe just hit some special cases.  Eitherway we
> need more work/investigation on the impact of this patch.
> 
> Again, I would suggest we factor out frequency out of comp_cost and
> only scale the cost in place when we compute cost for each use.  Then
> we can measure what's the impact on code generation.
> 
> Thanks,
> bin
> 

All remarks were applied in third version of the patch. Together with the previous
patch, it survives bootstrap and regression tests on x86_64-linux-gnu.
I'm going to re-test the patch on SPEC benchmarks.

Martin


[-- Attachment #2: 0002-Add-profiling-support-for-IVOPTS-v3.patch --]
[-- Type: text/x-patch, Size: 3026 bytes --]

From 24e5d3f6747c77d1437feab11ff1e3888779b4d4 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 17 May 2016 15:22:43 +0200
Subject: [PATCH 2/4] Add profiling support for IVOPTS

gcc/ChangeLog:

2016-05-17  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (get_computation_cost_at): Scale
	computed costs by frequency of BB they belong to.
---
 gcc/tree-ssa-loop-ivopts.c | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index f48b2f6..8a82831 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4995,18 +4995,21 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return cost + get_address_cost (symbol_present, var_present,
-				    offset, ratio, cstepi,
-				    mem_mode,
-				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-				    speed, stmt_is_after_inc, can_autoinc);
+    {
+      cost += get_address_cost (symbol_present, var_present,
+				offset, ratio, cstepi,
+				mem_mode,
+				TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				speed, stmt_is_after_inc, can_autoinc);
+      goto ret;
+    }
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-      return cost;
+      goto ret;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5025,7 +5028,7 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
     cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+  goto ret;
 
 fallback:
   if (can_autoinc)
@@ -5041,8 +5044,31 @@ fallback:
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    return comp_cost (computation_cost (comp, speed), 0);
+    cost = comp_cost (computation_cost (comp, speed), 0);
   }
+
+ret:
+  /* Scale (multiply) the computed cost (except scratch part that should be
+     hoisted out a loop) by header->frequency / at->frequency,
+     which makes expected cost more accurate.  */
+   int loop_freq = data->current_loop->header->frequency;
+   int bb_freq = at->bb->frequency;
+   if (loop_freq != 0)
+     {
+       gcc_assert (cost.scratch <= cost.cost);
+       int scaled_cost
+	 = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
+
+       if (dump_file && (dump_flags & TDF_DETAILS))
+	 fprintf (dump_file, "Scaling iv_use based on cand %d "
+		  "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
+		  cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
+		  cost.scratch, scaled_cost, bb_freq, loop_freq);
+
+       cost.cost = scaled_cost;
+     }
+
+   return cost;
 }
 
 /* Determines the cost of the computation by that USE is expressed
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-05-19 10:23       ` Martin Liška
@ 2016-05-19 11:24         ` Bin.Cheng
  2016-05-26 21:02           ` Martin Liška
  0 siblings, 1 reply; 34+ messages in thread
From: Bin.Cheng @ 2016-05-19 11:24 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

On Thu, May 19, 2016 at 11:23 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/16/2016 03:55 PM, Martin Liška wrote:
>> On 05/16/2016 12:13 PM, Bin.Cheng wrote:
>>> Hi Martin,
>>> Could you please rebase this patch and the profiling one against
>>> latest trunk?  The third patch was applied before these two now.
>>>
>>> Thanks,
>>> bin
>>
>> Hello.
>>
>> Sending the rebased version of the patch.
>>
>> Martin
>>
>
> Hello.
>
> As I've dramatically changed the 2/3 PATCH, a class encapsulation is not needed any longer.
> Thus, I've reduced this patch just to usage of member function/operators that are useful
> in my eyes. It's up the Bin whether to merge the patch?
Yes, I think we want c++-ify such structures.

> +comp_cost
> +operator- (comp_cost cost1, comp_cost cost2)
> +{
> +  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
> +    return comp_cost::get_infinite ();
> +
> +  cost1.cost -= cost2.cost;
> +  cost1.complexity -= cost2.complexity;
> +
> +  return cost1;
> +}
For subtraction, should we expect the second operand as infinite?
Maybe add an assertion for it in case anything goes wrong here.

> +comp_cost
> +comp_cost::get_infinite ()
> +{
> +  return comp_cost (INFTY, INFTY);
> +}
> +
> +comp_cost
> +comp_cost::get_no_cost ()
> +{
> +  return comp_cost ();
> +}
I think we may keep the original global variables for
no_cost&infinite_cost, and save these two methods.
>
> @@ -5982,11 +6083,11 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
>  {
>    comp_cost cost = ivs->cand_use_cost;
>
> -  cost.cost += ivs->cand_cost;
> +  cost+= ivs->cand_cost;
Space.

This is pure refactoring, could you please make sure there is no falls
out by simply comparing SPEC code generation/disassembly?  I am asking
since cost computation is sensitive, last time we didn't catch a "*"
character typo in dump info improvement patch.

Okay with above changes, unless somebody else has comment on the C++
part (which I know very little about).

Thanks,
bin
>
> Martin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-19 10:28       ` Martin Liška
@ 2016-05-20 10:04         ` Bin.Cheng
  2016-05-24 10:19         ` Bin.Cheng
  1 sibling, 0 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-20 10:04 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

On Thu, May 19, 2016 at 11:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>    /* Return infinite comp_cost.  */
>>>    static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>       complexity field should be larger for more
>>>       complex expressions and addressing modes).  */
>>>    int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> +     belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>    m_cost = other.m_cost;
>>>    m_complexity = other.m_complexity;
>>>    m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>    return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>    cost1.m_cost += cost2.m_cost;
>>>    cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>    return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>    this->m_cost += c;
>>> +  if (has_frequency ())
>>> +    this->m_cost_scaled += scale_cost (c);
>>>
>>>    return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>       (symbol/var1/const parts may be omitted).  If we are looking for an
>>>       address, find the cost of addressing this.  */
>>>    if (address_p)
>>> -    return cost + get_address_cost (symbol_present, var_present,
>>> -    offset, ratio, cstepi,
>>> -    mem_mode,
>>> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -    speed, stmt_is_after_inc, can_autoinc);
>>> +    {
>>> +      cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +      goto ret;
>>> +    }
>>>
>>>    /* Otherwise estimate the costs for computing the expression.  */
>>>    if (!symbol_present && !var_present && !offset)
>>>      {
>>>        if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -      return cost;
>>> +      goto ret;
>>>      }
>>>
>>>    /* Symbol + offset should be compile-time computable so consider that they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>    aratio = ratio > 0 ? ratio : -ratio;
>>>    if (aratio != 1)
>>>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>    if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>      if (address_p)
>>>        comp = build_simple_mem_ref (comp);
>>>
>>> -    return comp_cost (computation_cost (comp, speed), 0);
>>> +    cost = comp_cost (computation_cost (comp, speed), 0);
>>>    }
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +      data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> not scaled it using frequency, since we have already assumed it would
>> be hoisted out of loop.  No matter where the use is, hoisted loop
>> invariant has the same frequency as loop header.  This is the second
>> reason I want to factor frequency out of comp_cost.  It's easier to
>> scale with frequency only it's necessary.
>>
>>> +  return cost;
>>>  }
>>>
>>>  /* Determines the cost of the computation by that USE is expressed
>>> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>    group = data->vgroups[i];
>>>
>>>    fprintf (dump_file, "Group %d:\n", i);
>>> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
>>> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
>>> +   "\tdepends on\n");
>>>    for (j = 0; j < group->n_map_members; j++)
>>>      {
>>>        if (!group->cost_map[j].cand
>>>    || group->cost_map[j].cost.infinite_cost_p ())
>>>   continue;
>>>
>>> -      fprintf (dump_file, "  %d\t%d\t%d\t",
>>> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>>>         group->cost_map[j].cand->id,
>>>         group->cost_map[j].cost.get_cost (),
>>> +       group->cost_map[j].cost.get_cost_scaled (),
>>> +       group->cost_map[j].cost.get_frequency (),
>>>         group->cost_map[j].cost.get_complexity ());
>>>        if (group->cost_map[j].inv_expr != NULL)
>>>   fprintf (dump_file, "%d\t",
>>> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>   }
>>>        fprintf (dump_file, "\n");
>>>      }
>>> +
>>> +  for (i = 0; i < data->vgroups.length (); i++)
>>> +    {
>>> +      group = data->vgroups[i];
>>> +      for (j = 0; j < group->n_map_members; j++)
>>> + {
>>> +  if (!group->cost_map[j].cand
>>> +      || group->cost_map[j].cost.infinite_cost_p ())
>>> +    continue;
>>> +
>>> +  group->cost_map[j].cost.propagate_scaled_cost ();
>>> + }
>>> +    }
>> This is wrong.  m_frequency and m_cost_scaled are initialized to
>> sreal(0) by default, and are never changed later for conditional
>> iv_use.  As a matter of factor, costs computed for all conditional
>> iv_uses are wrong (value is 0).  This makes the observed improvement
>> not that promising.  Considering code generation is very sensitive to
>> cost computation, it maybe just hit some special cases.  Eitherway we
>> need more work/investigation on the impact of this patch.
>>
>> Again, I would suggest we factor out frequency out of comp_cost and
>> only scale the cost in place when we compute cost for each use.  Then
>> we can measure what's the impact on code generation.
>>
>> Thanks,
>> bin
>>
>
> All remarks were applied in third version of the patch. Together with the previous
> patch, it survives bootstrap and regression tests on x86_64-linux-gnu.
> I'm going to re-test the patch on SPEC benchmarks.
Thanks for working on this.  I will run programs to see how it affects
code generation.

Thanks,
bin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-19 10:28       ` Martin Liška
  2016-05-20 10:04         ` Bin.Cheng
@ 2016-05-24 10:19         ` Bin.Cheng
  2016-05-24 10:33           ` Bin.Cheng
                             ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-24 10:19 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

On Thu, May 19, 2016 at 11:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>    /* Return infinite comp_cost.  */
>>>    static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>       complexity field should be larger for more
>>>       complex expressions and addressing modes).  */
>>>    int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> +     belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>    m_cost = other.m_cost;
>>>    m_complexity = other.m_complexity;
>>>    m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>    return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>    cost1.m_cost += cost2.m_cost;
>>>    cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>    return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>    this->m_cost += c;
>>> +  if (has_frequency ())
>>> +    this->m_cost_scaled += scale_cost (c);
>>>
>>>    return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>       (symbol/var1/const parts may be omitted).  If we are looking for an
>>>       address, find the cost of addressing this.  */
>>>    if (address_p)
>>> -    return cost + get_address_cost (symbol_present, var_present,
>>> -    offset, ratio, cstepi,
>>> -    mem_mode,
>>> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -    speed, stmt_is_after_inc, can_autoinc);
>>> +    {
>>> +      cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +      goto ret;
>>> +    }
>>>
>>>    /* Otherwise estimate the costs for computing the expression.  */
>>>    if (!symbol_present && !var_present && !offset)
>>>      {
>>>        if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -      return cost;
>>> +      goto ret;
>>>      }
>>>
>>>    /* Symbol + offset should be compile-time computable so consider that they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>    aratio = ratio > 0 ? ratio : -ratio;
>>>    if (aratio != 1)
>>>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>    if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>      if (address_p)
>>>        comp = build_simple_mem_ref (comp);
>>>
>>> -    return comp_cost (computation_cost (comp, speed), 0);
>>> +    cost = comp_cost (computation_cost (comp, speed), 0);
>>>    }
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +      data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> not scaled it using frequency, since we have already assumed it would
>> be hoisted out of loop.  No matter where the use is, hoisted loop
>> invariant has the same frequency as loop header.  This is the second
>> reason I want to factor frequency out of comp_cost.  It's easier to
>> scale with frequency only it's necessary.
>>
>>> +  return cost;
>>>  }
>>>
>>>  /* Determines the cost of the computation by that USE is expressed
>>> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>    group = data->vgroups[i];
>>>
>>>    fprintf (dump_file, "Group %d:\n", i);
>>> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
>>> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
>>> +   "\tdepends on\n");
>>>    for (j = 0; j < group->n_map_members; j++)
>>>      {
>>>        if (!group->cost_map[j].cand
>>>    || group->cost_map[j].cost.infinite_cost_p ())
>>>   continue;
>>>
>>> -      fprintf (dump_file, "  %d\t%d\t%d\t",
>>> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>>>         group->cost_map[j].cand->id,
>>>         group->cost_map[j].cost.get_cost (),
>>> +       group->cost_map[j].cost.get_cost_scaled (),
>>> +       group->cost_map[j].cost.get_frequency (),
>>>         group->cost_map[j].cost.get_complexity ());
>>>        if (group->cost_map[j].inv_expr != NULL)
>>>   fprintf (dump_file, "%d\t",
>>> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>   }
>>>        fprintf (dump_file, "\n");
>>>      }
>>> +
>>> +  for (i = 0; i < data->vgroups.length (); i++)
>>> +    {
>>> +      group = data->vgroups[i];
>>> +      for (j = 0; j < group->n_map_members; j++)
>>> + {
>>> +  if (!group->cost_map[j].cand
>>> +      || group->cost_map[j].cost.infinite_cost_p ())
>>> +    continue;
>>> +
>>> +  group->cost_map[j].cost.propagate_scaled_cost ();
>>> + }
>>> +    }
>> This is wrong.  m_frequency and m_cost_scaled are initialized to
>> sreal(0) by default, and are never changed later for conditional
>> iv_use.  As a matter of factor, costs computed for all conditional
>> iv_uses are wrong (value is 0).  This makes the observed improvement
>> not that promising.  Considering code generation is very sensitive to
>> cost computation, it maybe just hit some special cases.  Eitherway we
>> need more work/investigation on the impact of this patch.
>>
>> Again, I would suggest we factor out frequency out of comp_cost and
>> only scale the cost in place when we compute cost for each use.  Then
>> we can measure what's the impact on code generation.
>>
>> Thanks,
>> bin
>>
>
> All remarks were applied in third version of the patch. Together with the previous
> patch, it survives bootstrap and regression tests on x86_64-linux-gnu.
> I'm going to re-test the patch on SPEC benchmarks.
> +
> +ret:
> +  /* Scale (multiply) the computed cost (except scratch part that should be
> +     hoisted out a loop) by header->frequency / at->frequency,
> +     which makes expected cost more accurate.  */
> +   int loop_freq = data->current_loop->header->frequency;
> +   int bb_freq = at->bb->frequency;
> +   if (loop_freq != 0)
> +     {
> +       gcc_assert (cost.scratch <= cost.cost);
> +       int scaled_cost
> +     = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
> +
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "Scaling iv_use based on cand %d "
> +          "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
> +          cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
> +          cost.scratch, scaled_cost, bb_freq, loop_freq);
> +
> +       cost.cost = scaled_cost;
> +     }
> +
> +   return cost;
Hi,
Could you please factor out this as a function and remove the goto
statements?  Okay with this change if no fallout in benchmarks you
run.

Thanks,
bin
>
> Martin
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-24 10:19         ` Bin.Cheng
@ 2016-05-24 10:33           ` Bin.Cheng
  2016-05-24 11:01           ` Bin.Cheng
  2016-05-30 19:51           ` Martin Liška
  2 siblings, 0 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-24 10:33 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

On Thu, May 19, 2016 at 11:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>    /* Return infinite comp_cost.  */
>>>    static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>       complexity field should be larger for more
>>>       complex expressions and addressing modes).  */
>>>    int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> +     belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>    m_cost = other.m_cost;
>>>    m_complexity = other.m_complexity;
>>>    m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>    return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>    cost1.m_cost += cost2.m_cost;
>>>    cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>    return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>    this->m_cost += c;
>>> +  if (has_frequency ())
>>> +    this->m_cost_scaled += scale_cost (c);
>>>
>>>    return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>       (symbol/var1/const parts may be omitted).  If we are looking for an
>>>       address, find the cost of addressing this.  */
>>>    if (address_p)
>>> -    return cost + get_address_cost (symbol_present, var_present,
>>> -    offset, ratio, cstepi,
>>> -    mem_mode,
>>> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -    speed, stmt_is_after_inc, can_autoinc);
>>> +    {
>>> +      cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +      goto ret;
>>> +    }
>>>
>>>    /* Otherwise estimate the costs for computing the expression.  */
>>>    if (!symbol_present && !var_present && !offset)
>>>      {
>>>        if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -      return cost;
>>> +      goto ret;
>>>      }
>>>
>>>    /* Symbol + offset should be compile-time computable so consider that they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>    aratio = ratio > 0 ? ratio : -ratio;
>>>    if (aratio != 1)
>>>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>    if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>      if (address_p)
>>>        comp = build_simple_mem_ref (comp);
>>>
>>> -    return comp_cost (computation_cost (comp, speed), 0);
>>> +    cost = comp_cost (computation_cost (comp, speed), 0);
>>>    }
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +      data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> not scaled it using frequency, since we have already assumed it would
>> be hoisted out of loop.  No matter where the use is, hoisted loop
>> invariant has the same frequency as loop header.  This is the second
>> reason I want to factor frequency out of comp_cost.  It's easier to
>> scale with frequency only it's necessary.
>>
>>> +  return cost;
>>>  }
>>>
>>>  /* Determines the cost of the computation by that USE is expressed
>>> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>    group = data->vgroups[i];
>>>
>>>    fprintf (dump_file, "Group %d:\n", i);
>>> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
>>> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
>>> +   "\tdepends on\n");
>>>    for (j = 0; j < group->n_map_members; j++)
>>>      {
>>>        if (!group->cost_map[j].cand
>>>    || group->cost_map[j].cost.infinite_cost_p ())
>>>   continue;
>>>
>>> -      fprintf (dump_file, "  %d\t%d\t%d\t",
>>> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>>>         group->cost_map[j].cand->id,
>>>         group->cost_map[j].cost.get_cost (),
>>> +       group->cost_map[j].cost.get_cost_scaled (),
>>> +       group->cost_map[j].cost.get_frequency (),
>>>         group->cost_map[j].cost.get_complexity ());
>>>        if (group->cost_map[j].inv_expr != NULL)
>>>   fprintf (dump_file, "%d\t",
>>> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>   }
>>>        fprintf (dump_file, "\n");
>>>      }
>>> +
>>> +  for (i = 0; i < data->vgroups.length (); i++)
>>> +    {
>>> +      group = data->vgroups[i];
>>> +      for (j = 0; j < group->n_map_members; j++)
>>> + {
>>> +  if (!group->cost_map[j].cand
>>> +      || group->cost_map[j].cost.infinite_cost_p ())
>>> +    continue;
>>> +
>>> +  group->cost_map[j].cost.propagate_scaled_cost ();
>>> + }
>>> +    }
>> This is wrong.  m_frequency and m_cost_scaled are initialized to
>> sreal(0) by default, and are never changed later for conditional
>> iv_use.  As a matter of factor, costs computed for all conditional
>> iv_uses are wrong (value is 0).  This makes the observed improvement
>> not that promising.  Considering code generation is very sensitive to
>> cost computation, it maybe just hit some special cases.  Eitherway we
>> need more work/investigation on the impact of this patch.
>>
>> Again, I would suggest we factor out frequency out of comp_cost and
>> only scale the cost in place when we compute cost for each use.  Then
>> we can measure what's the impact on code generation.
>>
>> Thanks,
>> bin
>>
>
> All remarks were applied in third version of the patch. Together with the previous
> patch, it survives bootstrap and regression tests on x86_64-linux-gnu.
> I'm going to re-test the patch on SPEC benchmarks.
> +
> +ret:
> +  /* Scale (multiply) the computed cost (except scratch part that should be
> +     hoisted out a loop) by header->frequency / at->frequency,
> +     which makes expected cost more accurate.  */
> +   int loop_freq = data->current_loop->header->frequency;
> +   int bb_freq = at->bb->frequency;
> +   if (loop_freq != 0)
> +     {
> +       gcc_assert (cost.scratch <= cost.cost);
> +       int scaled_cost
> +     = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
> +
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "Scaling iv_use based on cand %d "
> +          "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
> +          cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
> +          cost.scratch, scaled_cost, bb_freq, loop_freq);
> +
> +       cost.cost = scaled_cost;
> +     }
> +
> +   return cost;
Hi,
Could you please factor out this as a function and remove the goto
statements?  Okay with this change if no fallout in benchmarks you
run.

Thanks,
bin
>
> Martin
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-24 10:19         ` Bin.Cheng
  2016-05-24 10:33           ` Bin.Cheng
@ 2016-05-24 11:01           ` Bin.Cheng
  2016-05-30 19:51           ` Martin Liška
  2 siblings, 0 replies; 34+ messages in thread
From: Bin.Cheng @ 2016-05-24 11:01 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches List

On Thu, May 19, 2016 at 11:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>    /* Return infinite comp_cost.  */
>>>    static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>       complexity field should be larger for more
>>>       complex expressions and addressing modes).  */
>>>    int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> +     belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>    m_cost = other.m_cost;
>>>    m_complexity = other.m_complexity;
>>>    m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>    return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>    cost1.m_cost += cost2.m_cost;
>>>    cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>    return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>    this->m_cost += c;
>>> +  if (has_frequency ())
>>> +    this->m_cost_scaled += scale_cost (c);
>>>
>>>    return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>       (symbol/var1/const parts may be omitted).  If we are looking for an
>>>       address, find the cost of addressing this.  */
>>>    if (address_p)
>>> -    return cost + get_address_cost (symbol_present, var_present,
>>> -    offset, ratio, cstepi,
>>> -    mem_mode,
>>> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -    speed, stmt_is_after_inc, can_autoinc);
>>> +    {
>>> +      cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +      goto ret;
>>> +    }
>>>
>>>    /* Otherwise estimate the costs for computing the expression.  */
>>>    if (!symbol_present && !var_present && !offset)
>>>      {
>>>        if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -      return cost;
>>> +      goto ret;
>>>      }
>>>
>>>    /* Symbol + offset should be compile-time computable so consider that they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>    aratio = ratio > 0 ? ratio : -ratio;
>>>    if (aratio != 1)
>>>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>    if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>      if (address_p)
>>>        comp = build_simple_mem_ref (comp);
>>>
>>> -    return comp_cost (computation_cost (comp, speed), 0);
>>> +    cost = comp_cost (computation_cost (comp, speed), 0);
>>>    }
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +      data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> not scaled it using frequency, since we have already assumed it would
>> be hoisted out of loop.  No matter where the use is, hoisted loop
>> invariant has the same frequency as loop header.  This is the second
>> reason I want to factor frequency out of comp_cost.  It's easier to
>> scale with frequency only it's necessary.
>>
>>> +  return cost;
>>>  }
>>>
>>>  /* Determines the cost of the computation by that USE is expressed
>>> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>    group = data->vgroups[i];
>>>
>>>    fprintf (dump_file, "Group %d:\n", i);
>>> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
>>> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
>>> +   "\tdepends on\n");
>>>    for (j = 0; j < group->n_map_members; j++)
>>>      {
>>>        if (!group->cost_map[j].cand
>>>    || group->cost_map[j].cost.infinite_cost_p ())
>>>   continue;
>>>
>>> -      fprintf (dump_file, "  %d\t%d\t%d\t",
>>> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>>>         group->cost_map[j].cand->id,
>>>         group->cost_map[j].cost.get_cost (),
>>> +       group->cost_map[j].cost.get_cost_scaled (),
>>> +       group->cost_map[j].cost.get_frequency (),
>>>         group->cost_map[j].cost.get_complexity ());
>>>        if (group->cost_map[j].inv_expr != NULL)
>>>   fprintf (dump_file, "%d\t",
>>> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>>>   }
>>>        fprintf (dump_file, "\n");
>>>      }
>>> +
>>> +  for (i = 0; i < data->vgroups.length (); i++)
>>> +    {
>>> +      group = data->vgroups[i];
>>> +      for (j = 0; j < group->n_map_members; j++)
>>> + {
>>> +  if (!group->cost_map[j].cand
>>> +      || group->cost_map[j].cost.infinite_cost_p ())
>>> +    continue;
>>> +
>>> +  group->cost_map[j].cost.propagate_scaled_cost ();
>>> + }
>>> +    }
>> This is wrong.  m_frequency and m_cost_scaled are initialized to
>> sreal(0) by default, and are never changed later for conditional
>> iv_use.  As a matter of factor, costs computed for all conditional
>> iv_uses are wrong (value is 0).  This makes the observed improvement
>> not that promising.  Considering code generation is very sensitive to
>> cost computation, it maybe just hit some special cases.  Eitherway we
>> need more work/investigation on the impact of this patch.
>>
>> Again, I would suggest we factor out frequency out of comp_cost and
>> only scale the cost in place when we compute cost for each use.  Then
>> we can measure what's the impact on code generation.
>>
>> Thanks,
>> bin
>>
>
> All remarks were applied in third version of the patch. Together with the previous
> patch, it survives bootstrap and regression tests on x86_64-linux-gnu.
> I'm going to re-test the patch on SPEC benchmarks.
> +
> +ret:
> +  /* Scale (multiply) the computed cost (except scratch part that should be
> +     hoisted out a loop) by header->frequency / at->frequency,
> +     which makes expected cost more accurate.  */
> +   int loop_freq = data->current_loop->header->frequency;
> +   int bb_freq = at->bb->frequency;
> +   if (loop_freq != 0)
> +     {
> +       gcc_assert (cost.scratch <= cost.cost);
> +       int scaled_cost
> +     = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
> +
> +       if (dump_file && (dump_flags & TDF_DETAILS))
> +     fprintf (dump_file, "Scaling iv_use based on cand %d "
> +          "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
> +          cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
> +          cost.scratch, scaled_cost, bb_freq, loop_freq);
> +
> +       cost.cost = scaled_cost;
> +     }
> +
> +   return cost;
Hi,
Could you please factor out this as a function and remove the goto
statements?  Okay with this change if no fallout in benchmarks you
run.

Thanks,
bin
>
> Martin
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/3] Encapsulate comp_cost within a class with methods.
  2016-05-19 11:24         ` Bin.Cheng
@ 2016-05-26 21:02           ` Martin Liška
  0 siblings, 0 replies; 34+ messages in thread
From: Martin Liška @ 2016-05-26 21:02 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

On 05/19/2016 01:24 PM, Bin.Cheng wrote:
> On Thu, May 19, 2016 at 11:23 AM, Martin Liška <mliska@suse.cz> wrote:
>> On 05/16/2016 03:55 PM, Martin Liška wrote:
>>> On 05/16/2016 12:13 PM, Bin.Cheng wrote:
>>>> Hi Martin,
>>>> Could you please rebase this patch and the profiling one against
>>>> latest trunk?  The third patch was applied before these two now.
>>>>
>>>> Thanks,
>>>> bin
>>>
>>> Hello.
>>>
>>> Sending the rebased version of the patch.
>>>
>>> Martin
>>>
>>
>> Hello.
>>
>> As I've dramatically changed the 2/3 PATCH, a class encapsulation is not needed any longer.
>> Thus, I've reduced this patch just to usage of member function/operators that are useful
>> in my eyes. It's up the Bin whether to merge the patch?
> Yes, I think we want c++-ify such structures.
> 
>> +comp_cost
>> +operator- (comp_cost cost1, comp_cost cost2)
>> +{
>> +  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
>> +    return comp_cost::get_infinite ();
>> +
>> +  cost1.cost -= cost2.cost;
>> +  cost1.complexity -= cost2.complexity;
>> +
>> +  return cost1;
>> +}
> For subtraction, should we expect the second operand as infinite?
> Maybe add an assertion for it in case anything goes wrong here.

Hi.

Done.

> 
>> +comp_cost
>> +comp_cost::get_infinite ()
>> +{
>> +  return comp_cost (INFTY, INFTY);
>> +}
>> +
>> +comp_cost
>> +comp_cost::get_no_cost ()
>> +{
>> +  return comp_cost ();
>> +}
> I think we may keep the original global variables for
> no_cost&infinite_cost, and save these two methods.

Likewise.

>>
>> @@ -5982,11 +6083,11 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
>>  {
>>    comp_cost cost = ivs->cand_use_cost;
>>
>> -  cost.cost += ivs->cand_cost;
>> +  cost+= ivs->cand_cost;
> Space.

Likewise.

> 
> This is pure refactoring, could you please make sure there is no falls
> out by simply comparing SPEC code generation/disassembly?  I am asking
> since cost computation is sensitive, last time we didn't catch a "*"
> character typo in dump info improvement patch.

I've just verified that code generation for SPECv6 is unchanged and I'm going
to install the patch.

Thanks,
Martin


> 
> Okay with above changes, unless somebody else has comment on the C++
> part (which I know very little about).
> 
> Thanks,
> bin
>>
>> Martin


[-- Attachment #2: 0001-IVOPTS-make-comp_cost-in-a-more-c-fashion.patch --]
[-- Type: text/x-patch, Size: 25584 bytes --]

From 6379f77c195ed128c4886c07747bf9b8b678c75c Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 17 May 2016 13:52:11 +0200
Subject: [PATCH] IVOPTS: make comp_cost in a more c++ fashion.

gcc/ChangeLog:

2016-05-17  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (comp_cost::infinite_cost_p): New
	function.
	(operator+): Likewise.
	(operator-): Likewise.
	(comp_cost::operator+=): Likewise.
	(comp_cost::operator-=): Likewise.
	(comp_cost::operator/=): Likewise.
	(comp_cost::operator*=): Likewise.
	(operator<): Likewise.
	(operator==): Likewise.
	(operator<=): Likewise.
	(new_cost): Remove.
	(infinite_cost_p): Likewise.
	(add_costs): Likewise.
	(sub_costs): Likewise.
	(compare_costs): Likewise.
	(set_group_iv_cost): Use the newly introduced functions.
	(get_address_cost): Likewise.
	(get_shiftadd_cost): Likewise.
	(force_expr_to_var_cost): Likewise.
	(split_address_cost): Likewise.
	(ptr_difference_cost): Likewise.
	(difference_cost): Likewise.
	(get_computation_cost_at): Likewise.
	(determine_group_iv_cost_generic): Likewise.
	(determine_group_iv_cost_address): Likewise.
	(determine_group_iv_cost_cond): Likewise.
	(autoinc_possible_for_pair): Likewise.
	(determine_group_iv_costs): Likewise.
	(cheaper_cost_pair): Likewise.
	(iv_ca_recount_cost): Likewise.
	(iv_ca_set_no_cp): Likewise.
	(iv_ca_set_cp): Likewise.
	(iv_ca_cost): Likewise.
	(iv_ca_new): Likewise.
	(iv_ca_dump): Likewise.
	(iv_ca_narrow): Likewise.
	(iv_ca_prune): Likewise.
	(iv_ca_replace): Likewise.
	(try_add_cand_for): Likewise.
	(try_improve_iv_set): Likewise.
	(find_optimal_iv_set): Likewise.
---
 gcc/tree-ssa-loop-ivopts.c | 380 ++++++++++++++++++++++++++++-----------------
 1 file changed, 235 insertions(+), 145 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 9ce6b64..83b9aaf 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -173,16 +173,171 @@ enum use_type
 /* Cost of a computation.  */
 struct comp_cost
 {
+  comp_cost (): cost (0), complexity (0), scratch (0)
+  {}
+
+  comp_cost (int cost, unsigned complexity, int scratch = 0)
+    : cost (cost), complexity (complexity), scratch (scratch)
+  {}
+
+  /* Returns true if COST is infinite.  */
+  bool infinite_cost_p ();
+
+  /* Adds costs COST1 and COST2.  */
+  friend comp_cost operator+ (comp_cost cost1, comp_cost cost2);
+
+  /* Adds COST to the comp_cost.  */
+  comp_cost operator+= (comp_cost cost);
+
+  /* Adds constant C to this comp_cost.  */
+  comp_cost operator+= (HOST_WIDE_INT c);
+
+  /* Subtracts constant C to this comp_cost.  */
+  comp_cost operator-= (HOST_WIDE_INT c);
+
+  /* Divide the comp_cost by constant C.  */
+  comp_cost operator/= (HOST_WIDE_INT c);
+
+  /* Multiply the comp_cost by constant C.  */
+  comp_cost operator*= (HOST_WIDE_INT c);
+
+  /* Subtracts costs COST1 and COST2.  */
+  friend comp_cost operator- (comp_cost cost1, comp_cost cost2);
+
+  /* Subtracts COST from this comp_cost.  */
+  comp_cost operator-= (comp_cost cost);
+
+  /* Returns true if COST1 is smaller than COST2.  */
+  friend bool operator< (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 and COST2 are equal.  */
+  friend bool operator== (comp_cost cost1, comp_cost cost2);
+
+  /* Returns true if COST1 is smaller or equal than COST2.  */
+  friend bool operator<= (comp_cost cost1, comp_cost cost2);
+
   int cost;		/* The runtime cost.  */
-  unsigned complexity;	/* The estimate of the complexity of the code for
+  unsigned complexity;  /* The estimate of the complexity of the code for
 			   the computation (in no concrete units --
 			   complexity field should be larger for more
 			   complex expressions and addressing modes).  */
   int scratch;		/* Scratch used during cost computation.  */
 };
 
-static const comp_cost no_cost = {0, 0, 0};
-static const comp_cost infinite_cost = {INFTY, INFTY, INFTY};
+static const comp_cost no_cost;
+static const comp_cost infinite_cost (INFTY, INFTY, INFTY);
+
+bool
+comp_cost::infinite_cost_p ()
+{
+  return cost == INFTY;
+}
+
+comp_cost
+operator+ (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.infinite_cost_p () || cost2.infinite_cost_p ())
+    return infinite_cost;
+
+  cost1.cost += cost2.cost;
+  cost1.complexity += cost2.complexity;
+
+  return cost1;
+}
+
+comp_cost
+operator- (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.infinite_cost_p ())
+    return infinite_cost;
+
+  gcc_assert (!cost2.infinite_cost_p ());
+
+  cost1.cost -= cost2.cost;
+  cost1.complexity -= cost2.complexity;
+
+  return cost1;
+}
+
+comp_cost
+comp_cost::operator+= (comp_cost cost)
+{
+  *this = *this + cost;
+  return *this;
+}
+
+comp_cost
+comp_cost::operator+= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost += c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator-= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost -= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator/= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost /= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator*= (HOST_WIDE_INT c)
+{
+  if (infinite_cost_p ())
+    return *this;
+
+  this->cost *= c;
+
+  return *this;
+}
+
+comp_cost
+comp_cost::operator-= (comp_cost cost)
+{
+  *this = *this - cost;
+  return *this;
+}
+
+bool
+operator< (comp_cost cost1, comp_cost cost2)
+{
+  if (cost1.cost == cost2.cost)
+    return cost1.complexity < cost2.complexity;
+
+  return cost1.cost < cost2.cost;
+}
+
+bool
+operator== (comp_cost cost1, comp_cost cost2)
+{
+  return cost1.cost == cost2.cost
+    && cost1.complexity == cost2.complexity;
+}
+
+bool
+operator<= (comp_cost cost1, comp_cost cost2)
+{
+  return cost1 < cost2 || cost1 == cost2;
+}
 
 struct iv_inv_expr_ent;
 
@@ -3284,64 +3439,6 @@ alloc_use_cost_map (struct ivopts_data *data)
     }
 }
 
-/* Returns description of computation cost of expression whose runtime
-   cost is RUNTIME and complexity corresponds to COMPLEXITY.  */
-
-static comp_cost
-new_cost (unsigned runtime, unsigned complexity)
-{
-  comp_cost cost;
-
-  cost.cost = runtime;
-  cost.complexity = complexity;
-
-  return cost;
-}
-
-/* Returns true if COST is infinite.  */
-
-static bool
-infinite_cost_p (comp_cost cost)
-{
-  return cost.cost == INFTY;
-}
-
-/* Adds costs COST1 and COST2.  */
-
-static comp_cost
-add_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (infinite_cost_p (cost1) || infinite_cost_p (cost2))
-    return infinite_cost;
-
-  cost1.cost += cost2.cost;
-  cost1.complexity += cost2.complexity;
-
-  return cost1;
-}
-/* Subtracts costs COST1 and COST2.  */
-
-static comp_cost
-sub_costs (comp_cost cost1, comp_cost cost2)
-{
-  cost1.cost -= cost2.cost;
-  cost1.complexity -= cost2.complexity;
-
-  return cost1;
-}
-
-/* Returns a negative number if COST1 < COST2, a positive number if
-   COST1 > COST2, and 0 if COST1 = COST2.  */
-
-static int
-compare_costs (comp_cost cost1, comp_cost cost2)
-{
-  if (cost1.cost == cost2.cost)
-    return cost1.complexity - cost2.complexity;
-
-  return cost1.cost - cost2.cost;
-}
-
 /* Sets cost of (GROUP, CAND) pair to COST and record that it depends
    on invariants DEPENDS_ON and that the value used in expressing it
    is VALUE, and in case of iv elimination the comparison operator is COMP.  */
@@ -3354,7 +3451,7 @@ set_group_iv_cost (struct ivopts_data *data,
 {
   unsigned i, s;
 
-  if (infinite_cost_p (cost))
+  if (cost.infinite_cost_p ())
     {
       BITMAP_FREE (depends_on);
       return;
@@ -4170,7 +4267,7 @@ get_address_cost (bool symbol_present, bool var_present,
   else
     acost = data->costs[symbol_present][var_present][offset_p][ratio_p];
   complexity = (symbol_present != 0) + (var_present != 0) + offset_p + ratio_p;
-  return new_cost (cost + acost, complexity);
+  return comp_cost (cost + acost, complexity);
 }
 
  /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
@@ -4207,12 +4304,12 @@ get_shiftadd_cost (tree expr, machine_mode mode, comp_cost cost0,
 		? shiftsub1_cost (speed, mode, m)
 		: shiftsub0_cost (speed, mode, m)));
 
-  res = new_cost (MIN (as_cost, sa_cost), 0);
-  res = add_costs (res, mult_in_op1 ? cost0 : cost1);
+  res = comp_cost (MIN (as_cost, sa_cost), 0);
+  res += (mult_in_op1 ? cost0 : cost1);
 
   STRIP_NOPS (multop);
   if (!is_gimple_val (multop))
-    res = add_costs (res, force_expr_to_var_cost (multop, speed));
+    res += force_expr_to_var_cost (multop, speed);
 
   *cost = res;
   return true;
@@ -4277,7 +4374,7 @@ force_expr_to_var_cost (tree expr, bool speed)
   if (is_gimple_min_invariant (expr))
     {
       if (TREE_CODE (expr) == INTEGER_CST)
-	return new_cost (integer_cost [speed], 0);
+	return comp_cost (integer_cost [speed], 0);
 
       if (TREE_CODE (expr) == ADDR_EXPR)
 	{
@@ -4286,10 +4383,10 @@ force_expr_to_var_cost (tree expr, bool speed)
 	  if (TREE_CODE (obj) == VAR_DECL
 	      || TREE_CODE (obj) == PARM_DECL
 	      || TREE_CODE (obj) == RESULT_DECL)
-	    return new_cost (symbol_cost [speed], 0);
+	    return comp_cost (symbol_cost [speed], 0);
 	}
 
-      return new_cost (address_cost [speed], 0);
+      return comp_cost (address_cost [speed], 0);
     }
 
   switch (TREE_CODE (expr))
@@ -4313,7 +4410,7 @@ force_expr_to_var_cost (tree expr, bool speed)
 
     default:
       /* Just an arbitrary value, FIXME.  */
-      return new_cost (target_spill_cost[speed], 0);
+      return comp_cost (target_spill_cost[speed], 0);
     }
 
   if (op0 == NULL_TREE
@@ -4335,7 +4432,7 @@ force_expr_to_var_cost (tree expr, bool speed)
     case PLUS_EXPR:
     case MINUS_EXPR:
     case NEGATE_EXPR:
-      cost = new_cost (add_cost (speed, mode), 0);
+      cost = comp_cost (add_cost (speed, mode), 0);
       if (TREE_CODE (expr) != NEGATE_EXPR)
 	{
 	  tree mult = NULL_TREE;
@@ -4358,28 +4455,28 @@ force_expr_to_var_cost (tree expr, bool speed)
 	tree inner_mode, outer_mode;
 	outer_mode = TREE_TYPE (expr);
 	inner_mode = TREE_TYPE (op0);
-	cost = new_cost (convert_cost (TYPE_MODE (outer_mode),
+	cost = comp_cost (convert_cost (TYPE_MODE (outer_mode),
 				       TYPE_MODE (inner_mode), speed), 0);
       }
       break;
 
     case MULT_EXPR:
       if (cst_and_fits_in_hwi (op0))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op0),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op0),
 					     mode, speed), 0);
       else if (cst_and_fits_in_hwi (op1))
-	cost = new_cost (mult_by_coeff_cost (int_cst_value (op1),
+	cost = comp_cost (mult_by_coeff_cost (int_cst_value (op1),
 					     mode, speed), 0);
       else
-	return new_cost (target_spill_cost [speed], 0);
+	return comp_cost (target_spill_cost [speed], 0);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  cost = add_costs (cost, cost0);
-  cost = add_costs (cost, cost1);
+  cost += cost0;
+  cost += cost1;
 
   /* Bound the cost by target_spill_cost.  The parts of complicated
      computations often are either loop invariant or at least can
@@ -4438,7 +4535,7 @@ split_address_cost (struct ivopts_data *data,
       if (depends_on)
 	walk_tree (&addr, find_depends, depends_on, NULL);
 
-      return new_cost (target_spill_cost[data->speed], 0);
+      return comp_cost (target_spill_cost[data->speed], 0);
     }
 
   *offset += bitpos / BITS_PER_UNIT;
@@ -4538,7 +4635,7 @@ difference_cost (struct ivopts_data *data,
   if (integer_zerop (e1))
     {
       comp_cost cost = force_var_cost (data, e2, depends_on);
-      cost.cost += mult_by_coeff_cost (-1, mode, data->speed);
+      cost += mult_by_coeff_cost (-1, mode, data->speed);
       return cost;
     }
 
@@ -4805,7 +4902,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, build_int_cst (utype, 0),
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (ratio == 1)
     {
@@ -4829,7 +4926,7 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, real_cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else if (address_p
 	   && !POINTER_TYPE_P (ctype)
@@ -4852,21 +4949,19 @@ get_computation_cost_at (struct ivopts_data *data,
 			      ubase, real_cbase,
 			      &symbol_present, &var_present, &offset,
 			      depends_on);
-      cost.cost /= avg_loop_niter (data->current_loop);
+      cost /= avg_loop_niter (data->current_loop);
     }
   else
     {
       cost = force_var_cost (data, cbase, depends_on);
-      cost = add_costs (cost,
-			difference_cost (data,
-					 ubase, build_int_cst (utype, 0),
-					 &symbol_present, &var_present,
-					 &offset, depends_on));
-      cost.cost /= avg_loop_niter (data->current_loop);
-      cost.cost += add_cost (data->speed, TYPE_MODE (ctype));
+      cost += difference_cost (data, ubase, build_int_cst (utype, 0),
+			       &symbol_present, &var_present, &offset,
+			       depends_on);
+      cost /= avg_loop_niter (data->current_loop);
+      cost += add_cost (data->speed, TYPE_MODE (ctype));
     }
 
-  /* Record setup cost in scrach field.  */
+  /* Record setup cost in scratch field.  */
   cost.scratch = cost.cost;
 
   if (inv_expr && depends_on && *depends_on)
@@ -4887,26 +4982,24 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return add_costs (cost,
-		      get_address_cost (symbol_present, var_present,
-					offset, ratio, cstepi,
-					mem_mode,
-					TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-					speed, stmt_is_after_inc,
-					can_autoinc));
+    return cost + get_address_cost (symbol_present, var_present,
+				    offset, ratio, cstepi,
+				    mem_mode,
+				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				    speed, stmt_is_after_inc, can_autoinc);
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
-	cost.cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
+	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
       return cost;
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
       are added once to the variable, if present.  */
   if (var_present && (symbol_present || offset))
-    cost.cost += adjust_setup_cost (data,
+    cost += adjust_setup_cost (data,
 				    add_cost (speed, TYPE_MODE (ctype)));
 
   /* Having offset does not affect runtime cost in case it is added to
@@ -4914,11 +5007,11 @@ get_computation_cost_at (struct ivopts_data *data,
   if (offset)
     cost.complexity++;
 
-  cost.cost += add_cost (speed, TYPE_MODE (ctype));
+  cost += add_cost (speed, TYPE_MODE (ctype));
 
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
-    cost.cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
+    cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
   return cost;
 
 fallback:
@@ -4935,9 +5028,7 @@ fallback:
     if (address_p)
       comp = build_simple_mem_ref (comp);
 
-    cost = new_cost (computation_cost (comp, speed), 0);
-    cost.scratch = 0;
-    return cost;
+    return comp_cost (computation_cost (comp, speed), 0);
   }
 }
 
@@ -4983,7 +5074,7 @@ determine_group_iv_cost_generic (struct ivopts_data *data,
 
   set_group_iv_cost (data, group, cand, cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr);
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND in addresses.  */
@@ -5003,10 +5094,10 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 			       &depends_on, &can_autoinc, &inv_expr);
 
   sum_cost = cost;
-  if (!infinite_cost_p (sum_cost) && cand->ainc_use == use)
+  if (!sum_cost.infinite_cost_p () && cand->ainc_use == use)
     {
       if (can_autoinc)
-	sum_cost.cost -= cand->cost_step;
+	sum_cost -= cand->cost_step;
       /* If we generated the candidate solely for exploiting autoincrement
 	 opportunities, and it turns out it can't be used, set the cost to
 	 infinity to make sure we ignore it.  */
@@ -5015,9 +5106,9 @@ determine_group_iv_cost_address (struct ivopts_data *data,
     }
 
   /* Uses in a group can share setup code, so only add setup cost once.  */
-  cost.cost -= cost.scratch;
+  cost -= cost.scratch;
   /* Compute and add costs for rest uses of this group.  */
-  for (i = 1; i < group->vuses.length () && !infinite_cost_p (sum_cost); i++)
+  for (i = 1; i < group->vuses.length () && !sum_cost.infinite_cost_p (); i++)
     {
       struct iv_use *next = group->vuses[i];
 
@@ -5042,15 +5133,15 @@ determine_group_iv_cost_address (struct ivopts_data *data,
 	  cost = get_computation_cost (data, next, cand, true,
 				       NULL, &can_autoinc, NULL);
 	  /* Remove setup cost.  */
-	  if (!infinite_cost_p (cost))
-	    cost.cost -= cost.scratch;
+	  if (!cost.infinite_cost_p ())
+	    cost -= cost.scratch;
 	}
-      sum_cost = add_costs (sum_cost, cost);
+      sum_cost += cost;
     }
   set_group_iv_cost (data, group, cand, sum_cost, depends_on,
 		     NULL_TREE, ERROR_MARK, inv_expr);
 
-  return !infinite_cost_p (sum_cost);
+  return !sum_cost.infinite_cost_p ();
 }
 
 /* Computes value of candidate CAND at position AT in iteration NITER, and
@@ -5513,11 +5604,11 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
      TODO: The constant that we're subtracting from the cost should
      be target-dependent.  This information should be added to the
      target costs for each backend.  */
-  if (!infinite_cost_p (elim_cost) /* Do not try to decrease infinite! */
+  if (!elim_cost.infinite_cost_p () /* Do not try to decrease infinite! */
       && integer_zerop (*bound_cst)
       && (operand_equal_p (*control_var, cand->var_after, 0)
 	  || operand_equal_p (*control_var, cand->var_before, 0)))
-    elim_cost.cost -= 1;
+    elim_cost -= 1;
 
   express_cost = get_computation_cost (data, use, cand, false,
 				       &depends_on_express, NULL,
@@ -5531,10 +5622,10 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
     bound_cost.cost = parm_decl_cost (data, *bound_cst);
   else if (TREE_CODE (*bound_cst) == INTEGER_CST)
     bound_cost.cost = 0;
-  express_cost.cost += bound_cost.cost;
+  express_cost += bound_cost;
 
   /* Choose the better approach, preferring the eliminated IV. */
-  if (compare_costs (elim_cost, express_cost) <= 0)
+  if (elim_cost <= express_cost)
     {
       cost = elim_cost;
       depends_on = depends_on_elim;
@@ -5559,7 +5650,7 @@ determine_group_iv_cost_cond (struct ivopts_data *data,
   if (depends_on_express)
     BITMAP_FREE (depends_on_express);
 
-  return !infinite_cost_p (cost);
+  return !cost.infinite_cost_p ();
 }
 
 /* Determines cost of computing uses in GROUP with CAND.  Returns false
@@ -5604,7 +5695,7 @@ autoinc_possible_for_pair (struct ivopts_data *data, struct iv_use *use,
 
   BITMAP_FREE (depends_on);
 
-  return !infinite_cost_p (cost) && can_autoinc;
+  return !cost.infinite_cost_p () && can_autoinc;
 }
 
 /* Examine IP_ORIGINAL candidates to see if they are incremented next to a
@@ -5770,7 +5861,7 @@ determine_group_iv_costs (struct ivopts_data *data)
 	  for (j = 0; j < group->n_map_members; j++)
 	    {
 	      if (!group->cost_map[j].cand
-		  || infinite_cost_p (group->cost_map[j].cost))
+		  || group->cost_map[j].cost.infinite_cost_p ())
 		continue;
 
 	      fprintf (dump_file, "  %d\t%d\t%d\t",
@@ -5944,19 +6035,16 @@ determine_set_costs (struct ivopts_data *data)
 static bool
 cheaper_cost_pair (struct cost_pair *a, struct cost_pair *b)
 {
-  int cmp;
-
   if (!a)
     return false;
 
   if (!b)
     return true;
 
-  cmp = compare_costs (a->cost, b->cost);
-  if (cmp < 0)
+  if (a->cost < b->cost)
     return true;
 
-  if (cmp > 0)
+  if (b->cost < a->cost)
     return false;
 
   /* In case the costs are the same, prefer the cheaper candidate.  */
@@ -5982,11 +6070,11 @@ iv_ca_recount_cost (struct ivopts_data *data, struct iv_ca *ivs)
 {
   comp_cost cost = ivs->cand_use_cost;
 
-  cost.cost += ivs->cand_cost;
+  cost += ivs->cand_cost;
 
-  cost.cost += ivopts_global_cost_for_size (data,
-					    ivs->n_regs
-					    + ivs->used_inv_exprs->elements ());
+  cost += ivopts_global_cost_for_size (data,
+				       ivs->n_regs
+				       + ivs->used_inv_exprs->elements ());
 
   ivs->cost = cost;
 }
@@ -6040,7 +6128,7 @@ iv_ca_set_no_cp (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_remove_invariants (ivs, cp->cand->depends_on);
     }
 
-  ivs->cand_use_cost = sub_costs (ivs->cand_use_cost, cp->cost);
+  ivs->cand_use_cost -= cp->cost;
 
   iv_ca_set_remove_invariants (ivs, cp->depends_on);
 
@@ -6106,7 +6194,7 @@ iv_ca_set_cp (struct ivopts_data *data, struct iv_ca *ivs,
 	  iv_ca_set_add_invariants (ivs, cp->cand->depends_on);
 	}
 
-      ivs->cand_use_cost = add_costs (ivs->cand_use_cost, cp->cost);
+      ivs->cand_use_cost += cp->cost;
       iv_ca_set_add_invariants (ivs, cp->depends_on);
 
       if (cp->inv_expr != NULL)
@@ -6350,7 +6438,8 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
   unsigned i;
   comp_cost cost = iv_ca_cost (ivs);
 
-  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost, cost.complexity);
+  fprintf (file, "  cost: %d (complexity %d)\n", cost.cost,
+	   cost.complexity);
   fprintf (file, "  cand_cost: %d\n  cand_group_cost: %d (complexity %d)\n",
 	   ivs->cand_cost, ivs->cand_use_cost.cost,
 	   ivs->cand_use_cost.complexity);
@@ -6361,8 +6450,9 @@ iv_ca_dump (struct ivopts_data *data, FILE *file, struct iv_ca *ivs)
       struct iv_group *group = data->vgroups[i];
       struct cost_pair *cp = iv_ca_cand_for_group (ivs, group);
       if (cp)
-	fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
-		 group->id, cp->cand->id, cp->cost.cost, cp->cost.complexity);
+        fprintf (file, "   group:%d --> iv_cand:%d, cost=(%d,%d)\n",
+		 group->id, cp->cand->id, cp->cost.cost,
+		 cp->cost.complexity);
       else
 	fprintf (file, "   group:%d --> ??\n", group->id);
     }
@@ -6480,7 +6570,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6503,7 +6593,7 @@ iv_ca_narrow (struct ivopts_data *data, struct iv_ca *ivs,
 	      iv_ca_set_cp (data, ivs, group, cp);
 	      acost = iv_ca_cost (ivs);
 
-	      if (compare_costs (acost, best_cost) < 0)
+	      if (acost < best_cost)
 		{
 		  best_cost = acost;
 		  new_cp = cp;
@@ -6555,7 +6645,7 @@ iv_ca_prune (struct ivopts_data *data, struct iv_ca *ivs,
 
       acost = iv_ca_narrow (data, ivs, cand, except_cand, &act_delta);
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6668,7 +6758,7 @@ iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_delta_commit (data, ivs, act_delta, false);
       act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 
-      if (compare_costs (acost, orig_cost) < 0)
+      if (acost < orig_cost)
 	{
 	  *delta = act_delta;
 	  return acost;
@@ -6737,7 +6827,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
       iv_ca_set_no_cp (data, ivs, group);
       act_delta = iv_ca_delta_add (group, NULL, cp, act_delta);
 
-      if (compare_costs (act_cost, best_cost) < 0)
+      if (act_cost < best_cost)
 	{
 	  best_cost = act_cost;
 
@@ -6748,7 +6838,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 	iv_ca_delta_free (&act_delta);
     }
 
-  if (infinite_cost_p (best_cost))
+  if (best_cost.infinite_cost_p ())
     {
       for (i = 0; i < group->n_map_members; i++)
 	{
@@ -6777,7 +6867,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
 				       iv_ca_cand_for_group (ivs, group),
 				       cp, act_delta);
 
-	  if (compare_costs (act_cost, best_cost) < 0)
+	  if (act_cost < best_cost)
 	    {
 	      best_cost = act_cost;
 
@@ -6793,7 +6883,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
   iv_ca_delta_commit (data, ivs, best_delta, true);
   iv_ca_delta_free (&best_delta);
 
-  return !infinite_cost_p (best_cost);
+  return !best_cost.infinite_cost_p ();
 }
 
 /* Finds an initial assignment of candidates to uses.  */
@@ -6849,7 +6939,7 @@ try_improve_iv_set (struct ivopts_data *data,
 	  act_delta = iv_ca_delta_join (act_delta, tmp_delta);
 	}
 
-      if (compare_costs (acost, best_cost) < 0)
+      if (acost < best_cost)
 	{
 	  best_cost = acost;
 	  iv_ca_delta_free (&best_delta);
@@ -6883,7 +6973,7 @@ try_improve_iv_set (struct ivopts_data *data,
     }
 
   iv_ca_delta_commit (data, ivs, best_delta, true);
-  gcc_assert (compare_costs (best_cost, iv_ca_cost (ivs)) == 0);
+  gcc_assert (best_cost == iv_ca_cost (ivs));
   iv_ca_delta_free (&best_delta);
   return true;
 }
@@ -6953,7 +7043,7 @@ find_optimal_iv_set (struct ivopts_data *data)
     }
 
   /* Choose the one with the best cost.  */
-  if (compare_costs (origcost, cost) <= 0)
+  if (origcost <= cost)
     {
       if (set)
 	iv_ca_free (&set);
-- 
2.8.2


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/3] Add profiling support for IVOPTS
  2016-05-24 10:19         ` Bin.Cheng
  2016-05-24 10:33           ` Bin.Cheng
  2016-05-24 11:01           ` Bin.Cheng
@ 2016-05-30 19:51           ` Martin Liška
  2 siblings, 0 replies; 34+ messages in thread
From: Martin Liška @ 2016-05-30 19:51 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches List

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

On 05/24/2016 12:11 PM, Bin.Cheng wrote:
> Hi,
> Could you please factor out this as a function and remove the goto
> statements?  Okay with this change if no fallout in benchmarks you
> run.
> 
> Thanks,
> bin

Hi.

Thanks for the review, I've just verified that it does not introduce any
regression on SPECv6 and it improves couple of SPEC2006 benchmarks w/
PGO. I'm going to install the patch and make a control run of benchmarks.

Thanks
Martin

[-- Attachment #2: 0002-Add-profiling-support-for-IVOPTS-final.patch --]
[-- Type: text/x-patch, Size: 4117 bytes --]

From 2991622862dd934e464f542e9e58270bf0088544 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 17 May 2016 15:22:43 +0200
Subject: [PATCH 1/5] Add profiling support for IVOPTS

gcc/ChangeLog:

2016-05-17  Martin Liska  <mliska@suse.cz>

	* tree-ssa-loop-ivopts.c (get_computation_cost_at): Scale
	computed costs by frequency of BB they belong to.
	(get_scaled_computation_cost_at): New function.
---
 gcc/tree-ssa-loop-ivopts.c | 62 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index d770ec9..a541ef8 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -4794,7 +4794,33 @@ get_loop_invariant_expr (struct ivopts_data *data, tree ubase,
   return record_inv_expr (data, expr);
 }
 
+/* Scale (multiply) the computed COST (except scratch part that should be
+   hoisted out a loop) by header->frequency / AT->frequency,
+   which makes expected cost more accurate.  */
 
+static comp_cost
+get_scaled_computation_cost_at (ivopts_data *data, gimple *at, iv_cand *cand,
+				comp_cost cost)
+{
+   int loop_freq = data->current_loop->header->frequency;
+   int bb_freq = at->bb->frequency;
+   if (loop_freq != 0)
+     {
+       gcc_assert (cost.scratch <= cost.cost);
+       int scaled_cost
+	 = cost.scratch + (cost.cost - cost.scratch) * bb_freq / loop_freq;
+
+       if (dump_file && (dump_flags & TDF_DETAILS))
+	 fprintf (dump_file, "Scaling iv_use based on cand %d "
+		  "by %2.2f: %d (scratch: %d) -> %d (%d/%d)\n",
+		  cand->id, 1.0f * bb_freq / loop_freq, cost.cost,
+		  cost.scratch, scaled_cost, bb_freq, loop_freq);
+
+       cost.cost = scaled_cost;
+     }
+
+  return cost;
+}
 
 /* Determines the cost of the computation by that USE is expressed
    from induction variable CAND.  If ADDRESS_P is true, we just need
@@ -4982,18 +5008,21 @@ get_computation_cost_at (struct ivopts_data *data,
      (symbol/var1/const parts may be omitted).  If we are looking for an
      address, find the cost of addressing this.  */
   if (address_p)
-    return cost + get_address_cost (symbol_present, var_present,
-				    offset, ratio, cstepi,
-				    mem_mode,
-				    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
-				    speed, stmt_is_after_inc, can_autoinc);
+    {
+      cost += get_address_cost (symbol_present, var_present,
+				offset, ratio, cstepi,
+				mem_mode,
+				TYPE_ADDR_SPACE (TREE_TYPE (utype)),
+				speed, stmt_is_after_inc, can_autoinc);
+      return get_scaled_computation_cost_at (data, at, cand, cost);
+    }
 
   /* Otherwise estimate the costs for computing the expression.  */
   if (!symbol_present && !var_present && !offset)
     {
       if (ratio != 1)
 	cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
-      return cost;
+      return get_scaled_computation_cost_at (data, at, cand, cost);
     }
 
   /* Symbol + offset should be compile-time computable so consider that they
@@ -5012,24 +5041,25 @@ get_computation_cost_at (struct ivopts_data *data,
   aratio = ratio > 0 ? ratio : -ratio;
   if (aratio != 1)
     cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
-  return cost;
+
+  return get_scaled_computation_cost_at (data, at, cand, cost);
 
 fallback:
   if (can_autoinc)
     *can_autoinc = false;
 
-  {
-    /* Just get the expression, expand it and measure the cost.  */
-    tree comp = get_computation_at (data->current_loop, use, cand, at);
+  /* Just get the expression, expand it and measure the cost.  */
+  tree comp = get_computation_at (data->current_loop, use, cand, at);
 
-    if (!comp)
-      return infinite_cost;
+  if (!comp)
+    return infinite_cost;
+
+  if (address_p)
+    comp = build_simple_mem_ref (comp);
 
-    if (address_p)
-      comp = build_simple_mem_ref (comp);
+  cost = comp_cost (computation_cost (comp, speed), 0);
 
-    return comp_cost (computation_cost (comp, speed), 0);
-  }
+  return get_scaled_computation_cost_at (data, at, cand, cost);
 }
 
 /* Determines the cost of the computation by that USE is expressed
-- 
2.8.3


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-05-30 16:05 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-29 11:58 [PATCH 0/3] IVOPTS: support profiling marxin
2016-04-29 11:58 ` [PATCH 3/3] Enhance dumps of IVOPTS marxin
2016-05-06  9:19   ` Martin Liška
2016-05-09  9:47     ` Richard Biener
2016-05-10 13:16       ` Bin.Cheng
2016-05-11 14:18         ` Martin Liška
2016-05-12 12:14         ` Martin Liška
2016-05-12 13:51           ` Bin.Cheng
2016-05-12 16:42             ` Martin Liška
2016-05-13  9:43               ` Bin.Cheng
2016-05-13 10:44                 ` Martin Liška
2016-05-13 12:12                   ` H.J. Lu
2016-05-13 12:39                     ` Martin Liška
2016-05-13 12:44                       ` Kyrill Tkachov
2016-05-13 12:47                         ` Richard Biener
2016-05-13 12:51                           ` Martin Liška
2016-05-13 14:17                             ` H.J. Lu
2016-05-13 14:46                               ` H.J. Lu
2016-04-29 11:58 ` [PATCH 1/3] Encapsulate comp_cost within a class with methods marxin
2016-05-16 10:14   ` Bin.Cheng
2016-05-16 13:55     ` Martin Liška
2016-05-19 10:23       ` Martin Liška
2016-05-19 11:24         ` Bin.Cheng
2016-05-26 21:02           ` Martin Liška
2016-04-29 11:58 ` [PATCH 2/3] Add profiling support for IVOPTS marxin
2016-05-16 13:56   ` Martin Liška
2016-05-16 22:27     ` Bin.Cheng
2016-05-19 10:28       ` Martin Liška
2016-05-20 10:04         ` Bin.Cheng
2016-05-24 10:19         ` Bin.Cheng
2016-05-24 10:33           ` Bin.Cheng
2016-05-24 11:01           ` Bin.Cheng
2016-05-30 19:51           ` Martin Liška
2016-05-03  9:28 ` [PATCH 0/3] IVOPTS: support profiling Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).