public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR52272]Be smart when adding iv candidates
@ 2015-11-04 10:18 Bin Cheng
  2015-11-06 13:24 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin Cheng @ 2015-11-04 10:18 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3488 bytes --]

Hi,
PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
prevented from representing address of one memory object using address of
another memory object.  Also as I commented in that PR, we have two possible
fixes for this:
1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
using another one, while not breaking PR50955.
2) Add iv candidates with base object stripped.  In this way, we use the
common base-stripped part to represent all address expressions, in the form
of [base_1 + common], [base_2 + common], ..., [base_n + common].

In terms of code generation, method 2) is at least as good as 1), actually
better in my opinion.  The problem of 2) is we need to tell when iv
candidates should be added for the common part and when shouldn't.  This
issue can be generalized and described as: We know IVO tries to add
candidates by deriving from iv uses.  One disadvantage is that candidates
are derived from iv use independently.  It doesn't take common sub
expression among different iv uses into consideration.  As a result,
candidate for common sub expression is not added, while many useless
candidates are added.

As a matter of fact, candidate derived from iv use is useful only if it's
common enough and could be shared among different uses.  A candidate is most
likely useless if it's derived from a single use and could not be shared by
others.  This patch works in this way by firstly recording all kinds
candidates derived from iv uses, then adding candidates for common ones.

The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
they are false alarms caused by align issues. 
For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
another irrelevant issue caused by heuristic candidate selecting algorithm.
Unfortunately, I don't have fix to it currently.

This patch may add more candidates in some cases, but generally candidates
number is smaller because we don't need to add useless candidates now.
Statistic data shows there are quite fewer loops with more than 30
candidates when building spec2k6 on x86_64 using this patch.

Bootstrap and test on x86_64.  I will re-test it against latest trunk on
AArch64.  Is it OK?

Thanks,
bin

2015-11-03  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/52272
	* tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
	(struct iv_common_cand_hasher): New struct.
	(iv_common_cand_hasher::hash): New function.
	(iv_common_cand_hasher::equal): New function.
	(struct ivopts_data): New fields, iv_common_cand_tab and
	iv_common_cands.
	(tree_ssa_iv_optimize_init): Initialize above fields.
	(record_common_cand, common_cand_cmp): New functions.
	(add_iv_candidate_derived_from_uses): New function.
	(add_iv_candidate_for_use): Record iv_common_cands derived from
	iv use in hash table, instead of adding candidates directly.
	(add_iv_candidate_for_uses): Call
add_iv_candidate_derived_from_uses.
	(record_important_candidates): Add important candidates to iv uses'
	related_cands.  Always keep related_cands for future use.
	(try_add_cand_for): Use iv uses' related_cands.
	(free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
	in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

[-- Attachment #2: pr52272-20151103.txt --]
[-- Type: text/plain, Size: 10307 bytes --]

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..2417d4d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
 			   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec<iv_use *> uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash <iv_common_cand>
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+			   const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+	 && operand_equal_p (ccand1->base, ccand2->base, 0)
+	 && operand_equal_p (ccand1->step, ccand2->step, 0)
+	 && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+	      == TYPE_PRECISION (TREE_TYPE (ccand2->base));
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map<tree, name_expansion *> *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table<iv_common_cand_hasher> *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec<iv_common_cand *> iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (&data->iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
     }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+		    tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);
+      (*slot)->base = base;
+      (*slot)->step = step;
+      (*slot)->uses.create (8);
+      (*slot)->hash = ent.hash;
+      data->iv_common_cands.safe_push ((*slot));
+    }
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+    = (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+    = (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data *data)
+{
+  unsigned i, j;
+  struct iv_cand *cand_1, *cand_2;
+
+  data->iv_common_cands.qsort (common_cand_cmp);
+  for (i = 0; i < data->iv_common_cands.length (); i++)
+    {
+      struct iv_common_cand *ptr = data->iv_common_cands[i];
+
+      /* Only add IV candidate if it's derived from multiple uses.  */
+      if (ptr->uses.length () <= 1)
+	break;
+
+      cand_1 = NULL;
+      cand_2 = NULL;
+      if (ip_normal_pos (data->current_loop))
+	cand_1 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_NORMAL, NULL, NULL);
+
+      if (ip_end_pos (data->current_loop)
+	  && allow_ip_end_pos_p (data->current_loop))
+	cand_2 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_END, NULL, NULL);
+
+      /* Bind deriving uses and the new candidates.  */
+      for (j = 0; j < ptr->uses.length (); j++)
+	{
+	  struct iv_use *use = ptr->uses[j];
+	  if (cand_1)
+	    bitmap_set_bit (use->related_cands, cand_1->id);
+	  if (cand_2)
+	    bitmap_set_bit (use->related_cands, cand_2->id);
+	}
+    }
+
+  /* Release data since it is useless from this point.  */
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
+}
+
 /* Adds candidates based on the value of USE's iv.  */
 
 static void
@@ -3063,19 +3198,59 @@ add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 
   add_candidate (data, iv->base, iv->step, false, use);
 
-  /* The same, but with initial value zero.  Make such variable important,
-     since it is generic enough so that possibly many uses may be based
-     on it.  */
+  /* Record common candidate for use in case it can be shared by others.  */
+  record_common_cand (data, iv->base, iv->step, use);
+
+  /* Record common candidate with initial value zero.  */
   basetype = TREE_TYPE (iv->base);
   if (POINTER_TYPE_P (basetype))
     basetype = sizetype;
-  add_candidate (data, build_int_cst (basetype, 0), iv->step, true, use);
+  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
+
+  /* Record common candidate with constant offset stripped in base.  */
+    {
+      base = strip_offset (iv->base, &offset);
+      if (offset || base != iv->base)
+	record_common_cand (data, base, iv->step, use);
+    }
+
+  /* Record common candidate with base_object removed in base.  */
+  if (iv->base_object != NULL)
+    {
+      unsigned i;
+      aff_tree aff_base;
+      tree step, base_object = iv->base_object;
 
-  /* Third, try removing the constant offset.  Make sure to even
-     add a candidate for &a[0] vs. (T *)&a.  */
-  base = strip_offset (iv->base, &offset);
-  if (offset || base != iv->base)
-    add_candidate (data, base, iv->step, false, use);
+      base = iv->base;
+      step = iv->step;
+      STRIP_NOPS (base);
+      STRIP_NOPS (step);
+      STRIP_NOPS (base_object);
+      tree_to_aff_combination (base, TREE_TYPE (base), &aff_base);
+      for (i = 0; i < aff_base.n; i++)
+	{
+	  if (aff_base.elts[i].coef != 1)
+	    continue;
+
+	  if (operand_equal_p (aff_base.elts[i].val, base_object, 0))
+	    break;
+	}
+      if (i < aff_base.n)
+	{
+	  aff_combination_remove_elt (&aff_base, i);
+	  base = aff_combination_to_tree (&aff_base);
+	  basetype = TREE_TYPE (base);
+	  if (POINTER_TYPE_P (basetype))
+	    basetype = sizetype;
+
+	  step = fold_convert (basetype, step);
+	  record_common_cand (data, base, step, use);
+	  /* Also record common candidate with offset stripped.  */
+	  base = strip_offset (base, &offset);
+	  if (offset)
+	    record_common_cand (data, base, step, use);
+	}
+    }
 
   /* At last, add auto-incremental candidates.  Make such variables
      important since other iv uses with same base object may be based
@@ -3111,10 +3286,10 @@ add_iv_candidate_for_uses (struct ivopts_data *data)
 	  gcc_unreachable ();
 	}
     }
+  add_iv_candidate_derived_from_uses (data);
 }
 
-/* Record important candidates and add them to related_cands bitmaps
-   if needed.  */
+/* Record important candidates and add them to related_cands bitmaps.  */
 
 static void
 record_important_candidates (struct ivopts_data *data)
@@ -3133,22 +3308,11 @@ record_important_candidates (struct ivopts_data *data)
   data->consider_all_candidates = (n_iv_cands (data)
 				   <= CONSIDER_ALL_CANDIDATES_BOUND);
 
-  if (data->consider_all_candidates)
-    {
-      /* We will not need "related_cands" bitmaps in this case,
-	 so release them to decrease peak memory consumption.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	{
-	  use = iv_use (data, i);
-	  BITMAP_FREE (use->related_cands);
-	}
-    }
-  else
+  /* Add important candidates to uses' related_cands bitmaps.  */
+  for (i = 0; i < n_iv_uses (data); i++)
     {
-      /* Add important candidates to the related_cands bitmaps.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	bitmap_ior_into (iv_use (data, i)->related_cands,
-			 data->important_candidates);
+      use = iv_use (data, i);
+      bitmap_ior_into (use->related_cands, data->important_candidates);
     }
 }
 
@@ -6519,7 +6683,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
      too many ivs.  The approach from few ivs to more seems more likely to be
      successful -- starting from few ivs, replacing an expensive use by a
      specific iv should always be a win.  */
-  EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, i, bi)
     {
       cand = iv_cand (data, i);
 
@@ -7428,6 +7592,9 @@ free_loop_data (struct ivopts_data *data)
 
   data->inv_expr_tab->empty ();
   data->inv_expr_id = 0;
+
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
 }
 
 /* Finalizes data structures used by the iv optimization pass.  LOOPS is the
@@ -7447,6 +7614,9 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   delete data->inv_expr_tab;
   data->inv_expr_tab = NULL;
   free_affine_expand_cache (&data->name_expansion_cache);
+  delete data->iv_common_cand_tab;
+  data->iv_common_cand_tab = NULL;
+  data->iv_common_cands.release ();
   obstack_free (&data->iv_obstack, NULL);
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-04 10:18 [PATCH PR52272]Be smart when adding iv candidates Bin Cheng
@ 2015-11-06 13:24 ` Richard Biener
  2015-11-08  2:59   ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2015-11-06 13:24 UTC (permalink / raw)
  To: Bin Cheng; +Cc: GCC Patches

On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
> Hi,
> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
> prevented from representing address of one memory object using address of
> another memory object.  Also as I commented in that PR, we have two possible
> fixes for this:
> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
> using another one, while not breaking PR50955.
> 2) Add iv candidates with base object stripped.  In this way, we use the
> common base-stripped part to represent all address expressions, in the form
> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>
> In terms of code generation, method 2) is at least as good as 1), actually
> better in my opinion.  The problem of 2) is we need to tell when iv
> candidates should be added for the common part and when shouldn't.  This
> issue can be generalized and described as: We know IVO tries to add
> candidates by deriving from iv uses.  One disadvantage is that candidates
> are derived from iv use independently.  It doesn't take common sub
> expression among different iv uses into consideration.  As a result,
> candidate for common sub expression is not added, while many useless
> candidates are added.
>
> As a matter of fact, candidate derived from iv use is useful only if it's
> common enough and could be shared among different uses.  A candidate is most
> likely useless if it's derived from a single use and could not be shared by
> others.  This patch works in this way by firstly recording all kinds
> candidates derived from iv uses, then adding candidates for common ones.
>
> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
> they are false alarms caused by align issues.
> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
> another irrelevant issue caused by heuristic candidate selecting algorithm.
> Unfortunately, I don't have fix to it currently.
>
> This patch may add more candidates in some cases, but generally candidates
> number is smaller because we don't need to add useless candidates now.
> Statistic data shows there are quite fewer loops with more than 30
> candidates when building spec2k6 on x86_64 using this patch.
>
> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
> AArch64.  Is it OK?

+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+                          const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+        && operand_equal_p (ccand1->base, ccand2->base, 0)
+        && operand_equal_p (ccand1->step, ccand2->step, 0)
+        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+             == TYPE_PRECISION (TREE_TYPE (ccand2->base));

I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
unconditionally?

+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);

allocate from the IV obstack instead?  I see we do a lot of heap allocations
in IVOPTs, so we can improve that as followup as well.

We probably should empty the obstack after each processed loop.

Thanks,
Richard.


> Thanks,
> bin
>
> 2015-11-03  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/52272
>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
>         (struct iv_common_cand_hasher): New struct.
>         (iv_common_cand_hasher::hash): New function.
>         (iv_common_cand_hasher::equal): New function.
>         (struct ivopts_data): New fields, iv_common_cand_tab and
>         iv_common_cands.
>         (tree_ssa_iv_optimize_init): Initialize above fields.
>         (record_common_cand, common_cand_cmp): New functions.
>         (add_iv_candidate_derived_from_uses): New function.
>         (add_iv_candidate_for_use): Record iv_common_cands derived from
>         iv use in hash table, instead of adding candidates directly.
>         (add_iv_candidate_for_uses): Call
> add_iv_candidate_derived_from_uses.
>         (record_important_candidates): Add important candidates to iv uses'
>         related_cands.  Always keep related_cands for future use.
>         (try_add_cand_for): Use iv uses' related_cands.
>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
>         in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-06 13:24 ` Richard Biener
@ 2015-11-08  2:59   ` Bin.Cheng
  2015-11-08  9:11     ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2015-11-08  2:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bin Cheng, GCC Patches

On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>> Hi,
>> PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
>> prevented from representing address of one memory object using address of
>> another memory object.  Also as I commented in that PR, we have two possible
>> fixes for this:
>> 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
>> using another one, while not breaking PR50955.
>> 2) Add iv candidates with base object stripped.  In this way, we use the
>> common base-stripped part to represent all address expressions, in the form
>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>
>> In terms of code generation, method 2) is at least as good as 1), actually
>> better in my opinion.  The problem of 2) is we need to tell when iv
>> candidates should be added for the common part and when shouldn't.  This
>> issue can be generalized and described as: We know IVO tries to add
>> candidates by deriving from iv uses.  One disadvantage is that candidates
>> are derived from iv use independently.  It doesn't take common sub
>> expression among different iv uses into consideration.  As a result,
>> candidate for common sub expression is not added, while many useless
>> candidates are added.
>>
>> As a matter of fact, candidate derived from iv use is useful only if it's
>> common enough and could be shared among different uses.  A candidate is most
>> likely useless if it's derived from a single use and could not be shared by
>> others.  This patch works in this way by firstly recording all kinds
>> candidates derived from iv uses, then adding candidates for common ones.
>>
>> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
>> 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
>> they are false alarms caused by align issues.
>> For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
>> another irrelevant issue caused by heuristic candidate selecting algorithm.
>> Unfortunately, I don't have fix to it currently.
>>
>> This patch may add more candidates in some cases, but generally candidates
>> number is smaller because we don't need to add useless candidates now.
>> Statistic data shows there are quite fewer loops with more than 30
>> candidates when building spec2k6 on x86_64 using this patch.
>>
>> Bootstrap and test on x86_64.  I will re-test it against latest trunk on
>> AArch64.  Is it OK?
>
> +inline bool
> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
> +                          const iv_common_cand *ccand2)
> +{
> +  return ccand1->hash == ccand2->hash
> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>
Hi Richard,
Thanks for reviewing.

> I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
Because operand_equal_p doesn't check type precision for constant int
nodes, and IVO needs to take precision into consideration.

> and b) what kind of tree is base so that it is safe to inspect TYPE_PRECISION
> unconditionally?
Both SCEV and IVO work on expressions with type satisfying
POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
unconditionally?

>
> +  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
> +  if (*slot == NULL)
> +    {
> +      *slot = XNEW (struct iv_common_cand);
>
> allocate from the IV obstack instead?  I see we do a lot of heap allocations
> in IVOPTs, so we can improve that as followup as well.
>
Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
are better to be allocated in obstack.  Actually I have already make
that change to struct iv.  others will be followup too.

Thanks,
bin
> We probably should empty the obstack after each processed loop.
>
> Thanks,
> Richard.
>
>
>> Thanks,
>> bin
>>
>> 2015-11-03  Bin Cheng  <bin.cheng@arm.com>
>>
>>         PR tree-optimization/52272
>>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
>>         (struct iv_common_cand_hasher): New struct.
>>         (iv_common_cand_hasher::hash): New function.
>>         (iv_common_cand_hasher::equal): New function.
>>         (struct ivopts_data): New fields, iv_common_cand_tab and
>>         iv_common_cands.
>>         (tree_ssa_iv_optimize_init): Initialize above fields.
>>         (record_common_cand, common_cand_cmp): New functions.
>>         (add_iv_candidate_derived_from_uses): New function.
>>         (add_iv_candidate_for_use): Record iv_common_cands derived from
>>         iv use in hash table, instead of adding candidates directly.
>>         (add_iv_candidate_for_uses): Call
>> add_iv_candidate_derived_from_uses.
>>         (record_important_candidates): Add important candidates to iv uses'
>>         related_cands.  Always keep related_cands for future use.
>>         (try_add_cand_for): Use iv uses' related_cands.
>>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
>>         in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-08  2:59   ` Bin.Cheng
@ 2015-11-08  9:11     ` Richard Biener
  2015-11-09 15:24       ` Bernd Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2015-11-08  9:11 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Bin Cheng, GCC Patches

On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng" <amker.cheng@gmail.com> wrote:
>On Fri, Nov 6, 2015 at 9:24 PM, Richard Biener
><richard.guenther@gmail.com> wrote:
>> On Wed, Nov 4, 2015 at 11:18 AM, Bin Cheng <bin.cheng@arm.com> wrote:
>>> Hi,
>>> PR52272 reported a performance regression in spec2006/410.bwaves
>once GCC is
>>> prevented from representing address of one memory object using
>address of
>>> another memory object.  Also as I commented in that PR, we have two
>possible
>>> fixes for this:
>>> 1) Improve how TMR.base is deduced, so that we can represent addr of
>mem obj
>>> using another one, while not breaking PR50955.
>>> 2) Add iv candidates with base object stripped.  In this way, we use
>the
>>> common base-stripped part to represent all address expressions, in
>the form
>>> of [base_1 + common], [base_2 + common], ..., [base_n + common].
>>>
>>> In terms of code generation, method 2) is at least as good as 1),
>actually
>>> better in my opinion.  The problem of 2) is we need to tell when iv
>>> candidates should be added for the common part and when shouldn't. 
>This
>>> issue can be generalized and described as: We know IVO tries to add
>>> candidates by deriving from iv uses.  One disadvantage is that
>candidates
>>> are derived from iv use independently.  It doesn't take common sub
>>> expression among different iv uses into consideration.  As a result,
>>> candidate for common sub expression is not added, while many useless
>>> candidates are added.
>>>
>>> As a matter of fact, candidate derived from iv use is useful only if
>it's
>>> common enough and could be shared among different uses.  A candidate
>is most
>>> likely useless if it's derived from a single use and could not be
>shared by
>>> others.  This patch works in this way by firstly recording all kinds
>>> candidates derived from iv uses, then adding candidates for common
>ones.
>>>
>>> The patch improves 410.bwaves by 3-4% on x86_64.  I also saw
>regression for
>>> 400.perlbench and small regression for 401.bzip on x86_64, but I can
>confirm
>>> they are false alarms caused by align issues.
>>> For aarch64, fp cases are obviously improved for both spec2000 and
>spec2006.
>>> Also the patch causes 2-3% regression for 459.GemsFDTD, which I
>think is
>>> another irrelevant issue caused by heuristic candidate selecting
>algorithm.
>>> Unfortunately, I don't have fix to it currently.
>>>
>>> This patch may add more candidates in some cases, but generally
>candidates
>>> number is smaller because we don't need to add useless candidates
>now.
>>> Statistic data shows there are quite fewer loops with more than 30
>>> candidates when building spec2k6 on x86_64 using this patch.
>>>
>>> Bootstrap and test on x86_64.  I will re-test it against latest
>trunk on
>>> AArch64.  Is it OK?
>>
>> +inline bool
>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>> +                          const iv_common_cand *ccand2)
>> +{
>> +  return ccand1->hash == ccand2->hash
>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>
>Hi Richard,
>Thanks for reviewing.
>
>> I'm wondering on the TYPE_PRECISION check.  a) why is that needed?
>Because operand_equal_p doesn't check type precision for constant int
>nodes, and IVO needs to take precision into consideration.

Ok

>> and b) what kind of tree is base so that it is safe to inspect
>TYPE_PRECISION
>> unconditionally?
>Both SCEV and IVO work on expressions with type satisfying
>POINTER_TYPE_P or INTEGRAL_TYPE_P, so it's safe to access precision
>unconditionally?

Yes.  Patch is OK then.

Richard.
>
>>
>> +  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
>> +  if (*slot == NULL)
>> +    {
>> +      *slot = XNEW (struct iv_common_cand);
>>
>> allocate from the IV obstack instead?  I see we do a lot of heap
>allocations
>> in IVOPTs, so we can improve that as followup as well.
>>
>Yes, small structures in IVO like iv, iv_use, iv_cand, iv_common_cand
>are better to be allocated in obstack.  Actually I have already make
>that change to struct iv.  others will be followup too.
>
>Thanks,
>bin
>> We probably should empty the obstack after each processed loop.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks,
>>> bin
>>>
>>> 2015-11-03  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         PR tree-optimization/52272
>>>         * tree-ssa-loop-ivopts.c (struct iv_common_cand): New
>struct.
>>>         (struct iv_common_cand_hasher): New struct.
>>>         (iv_common_cand_hasher::hash): New function.
>>>         (iv_common_cand_hasher::equal): New function.
>>>         (struct ivopts_data): New fields, iv_common_cand_tab and
>>>         iv_common_cands.
>>>         (tree_ssa_iv_optimize_init): Initialize above fields.
>>>         (record_common_cand, common_cand_cmp): New functions.
>>>         (add_iv_candidate_derived_from_uses): New function.
>>>         (add_iv_candidate_for_use): Record iv_common_cands derived
>from
>>>         iv use in hash table, instead of adding candidates directly.
>>>         (add_iv_candidate_for_uses): Call
>>> add_iv_candidate_derived_from_uses.
>>>         (record_important_candidates): Add important candidates to
>iv uses'
>>>         related_cands.  Always keep related_cands for future use.
>>>         (try_add_cand_for): Use iv uses' related_cands.
>>>         (free_loop_data, tree_ssa_iv_optimize_finalize): Release new
>fields
>>>         in struct ivopts_data, iv_common_cand_tab and
>iv_common_cands.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-08  9:11     ` Richard Biener
@ 2015-11-09 15:24       ` Bernd Schmidt
  2015-11-10  1:26         ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Bernd Schmidt @ 2015-11-09 15:24 UTC (permalink / raw)
  To: Richard Biener, Bin.Cheng; +Cc: Bin Cheng, GCC Patches

On 11/08/2015 10:11 AM, Richard Biener wrote:
> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng" <amker.cheng@gmail.com> wrote:
>>> +inline bool
>>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>>> +                          const iv_common_cand *ccand2)
>>> +{
>>> +  return ccand1->hash == ccand2->hash
>>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>>
> Yes.  Patch is OK then.

Doesn't follow the formatting rules though in the quoted piece.


Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-09 15:24       ` Bernd Schmidt
@ 2015-11-10  1:26         ` Bin.Cheng
  2015-11-10  8:25           ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2015-11-10  1:26 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Biener, Bin Cheng, GCC Patches

On Mon, Nov 9, 2015 at 11:24 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 11/08/2015 10:11 AM, Richard Biener wrote:
>>
>> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"
>> <amker.cheng@gmail.com> wrote:
>>>>
>>>> +inline bool
>>>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>>>> +                          const iv_common_cand *ccand2)
>>>> +{
>>>> +  return ccand1->hash == ccand2->hash
>>>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>>>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>>>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>>>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>>>
>> Yes.  Patch is OK then.
>
>
> Doesn't follow the formatting rules though in the quoted piece.

Hi Bernd,
Thanks for reviewing.  I haven't committed it yet, could you please
point out which quoted piece is so that I can update patch?

Thanks,
bin
>
>
> Bernd
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-10  1:26         ` Bin.Cheng
@ 2015-11-10  8:25           ` Bin.Cheng
  2015-11-10 10:06             ` Bernd Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2015-11-10  8:25 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Biener, Bin Cheng, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

On Tue, Nov 10, 2015 at 9:26 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Nov 9, 2015 at 11:24 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>> On 11/08/2015 10:11 AM, Richard Biener wrote:
>>>
>>> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"
>>> <amker.cheng@gmail.com> wrote:
>>>>>
>>>>> +inline bool
>>>>> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
>>>>> +                          const iv_common_cand *ccand2)
>>>>> +{
>>>>> +  return ccand1->hash == ccand2->hash
>>>>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>>>>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>>>>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>>>>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>>>>>
>>> Yes.  Patch is OK then.
>>
>>
>> Doesn't follow the formatting rules though in the quoted piece.
>
> Hi Bernd,
> Thanks for reviewing.  I haven't committed it yet, could you please
> point out which quoted piece is so that I can update patch?
Ah, the part quoted in review message, I was stupid and tried to find
quoted part in my patch...  I can see the problem now, here is the
updated patch.

Thanks,
bin

[-- Attachment #2: pr52272-20151104.txt --]
[-- Type: text/plain, Size: 9994 bytes --]

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..aecba12 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
 			   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec<iv_use *> uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash <iv_common_cand>
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+			      const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+	 && operand_equal_p (ccand1->base, ccand2->base, 0)
+	 && operand_equal_p (ccand1->step, ccand2->step, 0)
+	 && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+	      == TYPE_PRECISION (TREE_TYPE (ccand2->base));
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map<tree, name_expansion *> *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table<iv_common_cand_hasher> *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec<iv_common_cand *> iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (&data->iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
     }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+		    tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);
+      (*slot)->base = base;
+      (*slot)->step = step;
+      (*slot)->uses.create (8);
+      (*slot)->hash = ent.hash;
+      data->iv_common_cands.safe_push ((*slot));
+    }
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+    = (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+    = (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data *data)
+{
+  unsigned i, j;
+  struct iv_cand *cand_1, *cand_2;
+
+  data->iv_common_cands.qsort (common_cand_cmp);
+  for (i = 0; i < data->iv_common_cands.length (); i++)
+    {
+      struct iv_common_cand *ptr = data->iv_common_cands[i];
+
+      /* Only add IV candidate if it's derived from multiple uses.  */
+      if (ptr->uses.length () <= 1)
+	break;
+
+      cand_1 = NULL;
+      cand_2 = NULL;
+      if (ip_normal_pos (data->current_loop))
+	cand_1 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_NORMAL, NULL, NULL);
+
+      if (ip_end_pos (data->current_loop)
+	  && allow_ip_end_pos_p (data->current_loop))
+	cand_2 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_END, NULL, NULL);
+
+      /* Bind deriving uses and the new candidates.  */
+      for (j = 0; j < ptr->uses.length (); j++)
+	{
+	  struct iv_use *use = ptr->uses[j];
+	  if (cand_1)
+	    bitmap_set_bit (use->related_cands, cand_1->id);
+	  if (cand_2)
+	    bitmap_set_bit (use->related_cands, cand_2->id);
+	}
+    }
+
+  /* Release data since it is useless from this point.  */
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
+}
+
 /* Adds candidates based on the value of USE's iv.  */
 
 static void
@@ -3063,19 +3198,59 @@ add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 
   add_candidate (data, iv->base, iv->step, false, use);
 
-  /* The same, but with initial value zero.  Make such variable important,
-     since it is generic enough so that possibly many uses may be based
-     on it.  */
+  /* Record common candidate for use in case it can be shared by others.  */
+  record_common_cand (data, iv->base, iv->step, use);
+
+  /* Record common candidate with initial value zero.  */
   basetype = TREE_TYPE (iv->base);
   if (POINTER_TYPE_P (basetype))
     basetype = sizetype;
-  add_candidate (data, build_int_cst (basetype, 0), iv->step, true, use);
+  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
+
+  /* Record common candidate with constant offset stripped in base.  */
+    {
+      base = strip_offset (iv->base, &offset);
+      if (offset || base != iv->base)
+	record_common_cand (data, base, iv->step, use);
+    }
+
+  /* Record common candidate with base_object removed in base.  */
+  if (iv->base_object != NULL)
+    {
+      unsigned i;
+      aff_tree aff_base;
+      tree step, base_object = iv->base_object;
 
-  /* Third, try removing the constant offset.  Make sure to even
-     add a candidate for &a[0] vs. (T *)&a.  */
-  base = strip_offset (iv->base, &offset);
-  if (offset || base != iv->base)
-    add_candidate (data, base, iv->step, false, use);
+      base = iv->base;
+      step = iv->step;
+      STRIP_NOPS (base);
+      STRIP_NOPS (step);
+      STRIP_NOPS (base_object);
+      tree_to_aff_combination (base, TREE_TYPE (base), &aff_base);
+      for (i = 0; i < aff_base.n; i++)
+	{
+	  if (aff_base.elts[i].coef != 1)
+	    continue;
+
+	  if (operand_equal_p (aff_base.elts[i].val, base_object, 0))
+	    break;
+	}
+      if (i < aff_base.n)
+	{
+	  aff_combination_remove_elt (&aff_base, i);
+	  base = aff_combination_to_tree (&aff_base);
+	  basetype = TREE_TYPE (base);
+	  if (POINTER_TYPE_P (basetype))
+	    basetype = sizetype;
+
+	  step = fold_convert (basetype, step);
+	  record_common_cand (data, base, step, use);
+	  /* Also record common candidate with offset stripped.  */
+	  base = strip_offset (base, &offset);
+	  if (offset)
+	    record_common_cand (data, base, step, use);
+	}
+    }
 
   /* At last, add auto-incremental candidates.  Make such variables
      important since other iv uses with same base object may be based
@@ -3111,10 +3286,10 @@ add_iv_candidate_for_uses (struct ivopts_data *data)
 	  gcc_unreachable ();
 	}
     }
+  add_iv_candidate_derived_from_uses (data);
 }
 
-/* Record important candidates and add them to related_cands bitmaps
-   if needed.  */
+/* Record important candidates and add them to related_cands bitmaps.  */
 
 static void
 record_important_candidates (struct ivopts_data *data)
@@ -3133,22 +3308,11 @@ record_important_candidates (struct ivopts_data *data)
   data->consider_all_candidates = (n_iv_cands (data)
 				   <= CONSIDER_ALL_CANDIDATES_BOUND);
 
-  if (data->consider_all_candidates)
-    {
-      /* We will not need "related_cands" bitmaps in this case,
-	 so release them to decrease peak memory consumption.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	{
-	  use = iv_use (data, i);
-	  BITMAP_FREE (use->related_cands);
-	}
-    }
-  else
+  /* Add important candidates to uses' related_cands bitmaps.  */
+  for (i = 0; i < n_iv_uses (data); i++)
     {
-      /* Add important candidates to the related_cands bitmaps.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	bitmap_ior_into (iv_use (data, i)->related_cands,
-			 data->important_candidates);
+      use = iv_use (data, i);
+      bitmap_ior_into (use->related_cands, data->important_candidates);
     }
 }
 
@@ -6519,7 +6683,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
      too many ivs.  The approach from few ivs to more seems more likely to be
      successful -- starting from few ivs, replacing an expensive use by a
      specific iv should always be a win.  */
-  EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, i, bi)
     {
       cand = iv_cand (data, i);
 
@@ -7428,6 +7592,9 @@ free_loop_data (struct ivopts_data *data)
 
   data->inv_expr_tab->empty ();
   data->inv_expr_id = 0;
+
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
 }
 
 /* Finalizes data structures used by the iv optimization pass.  LOOPS is the
@@ -7447,6 +7614,9 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   delete data->inv_expr_tab;
   data->inv_expr_tab = NULL;
   free_affine_expand_cache (&data->name_expansion_cache);
+  delete data->iv_common_cand_tab;
+  data->iv_common_cand_tab = NULL;
+  data->iv_common_cands.release ();
   obstack_free (&data->iv_obstack, NULL);
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-10  8:25           ` Bin.Cheng
@ 2015-11-10 10:06             ` Bernd Schmidt
  2015-11-10 10:19               ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Bernd Schmidt @ 2015-11-10 10:06 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, Bin Cheng, GCC Patches

On 11/10/2015 09:25 AM, Bin.Cheng wrote:
>> Thanks for reviewing.  I haven't committed it yet, could you please
>> point out which quoted piece is so that I can update patch?

Sorry, I thought it was pretty obvious...

> +{
> +  return ccand1->hash == ccand2->hash
> +	 && operand_equal_p (ccand1->base, ccand2->base, 0)
> +	 && operand_equal_p (ccand1->step, ccand2->step, 0)
> +	 && TYPE_PRECISION (TREE_TYPE (ccand1->base))
> +	      == TYPE_PRECISION (TREE_TYPE (ccand2->base));
> +}
> +

Multi-line expressions should be wrapped in parentheses so that 
emacs/indent can format them automatically. Two sets of parens are 
needed for this. Operators should then line up appropriately.


Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-10 10:06             ` Bernd Schmidt
@ 2015-11-10 10:19               ` Bin.Cheng
  2015-11-18 15:50                 ` Bernd Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2015-11-10 10:19 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Biener, Bin Cheng, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 11/10/2015 09:25 AM, Bin.Cheng wrote:
>>>
>>> Thanks for reviewing.  I haven't committed it yet, could you please
>>> point out which quoted piece is so that I can update patch?
>
>
> Sorry, I thought it was pretty obvious...
>
>> +{
>> +  return ccand1->hash == ccand2->hash
>> +        && operand_equal_p (ccand1->base, ccand2->base, 0)
>> +        && operand_equal_p (ccand1->step, ccand2->step, 0)
>> +        && TYPE_PRECISION (TREE_TYPE (ccand1->base))
>> +             == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>> +}
>> +
>
>
> Multi-line expressions should be wrapped in parentheses so that emacs/indent
> can format them automatically. Two sets of parens are needed for this.
> Operators should then line up appropriately.
Ah, thanks for teaching.  Here is the updated patch, hoping it's correct.

Thanks,
bin
>
>
> Bernd

[-- Attachment #2: pr52272-20151104.txt --]
[-- Type: text/plain, Size: 10001 bytes --]

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..a00e33c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
 			   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec<iv_use *> uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash <iv_common_cand>
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+			      const iv_common_cand *ccand2)
+{
+  return (ccand1->hash == ccand2->hash
+	  && operand_equal_p (ccand1->base, ccand2->base, 0)
+	  && operand_equal_p (ccand1->step, ccand2->step, 0)
+	  && (TYPE_PRECISION (TREE_TYPE (ccand1->base))
+	      == TYPE_PRECISION (TREE_TYPE (ccand2->base))));
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map<tree, name_expansion *> *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table<iv_common_cand_hasher> *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec<iv_common_cand *> iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (&data->iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
     }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+		    tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);
+      (*slot)->base = base;
+      (*slot)->step = step;
+      (*slot)->uses.create (8);
+      (*slot)->hash = ent.hash;
+      data->iv_common_cands.safe_push ((*slot));
+    }
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+    = (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+    = (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data *data)
+{
+  unsigned i, j;
+  struct iv_cand *cand_1, *cand_2;
+
+  data->iv_common_cands.qsort (common_cand_cmp);
+  for (i = 0; i < data->iv_common_cands.length (); i++)
+    {
+      struct iv_common_cand *ptr = data->iv_common_cands[i];
+
+      /* Only add IV candidate if it's derived from multiple uses.  */
+      if (ptr->uses.length () <= 1)
+	break;
+
+      cand_1 = NULL;
+      cand_2 = NULL;
+      if (ip_normal_pos (data->current_loop))
+	cand_1 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_NORMAL, NULL, NULL);
+
+      if (ip_end_pos (data->current_loop)
+	  && allow_ip_end_pos_p (data->current_loop))
+	cand_2 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_END, NULL, NULL);
+
+      /* Bind deriving uses and the new candidates.  */
+      for (j = 0; j < ptr->uses.length (); j++)
+	{
+	  struct iv_use *use = ptr->uses[j];
+	  if (cand_1)
+	    bitmap_set_bit (use->related_cands, cand_1->id);
+	  if (cand_2)
+	    bitmap_set_bit (use->related_cands, cand_2->id);
+	}
+    }
+
+  /* Release data since it is useless from this point.  */
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
+}
+
 /* Adds candidates based on the value of USE's iv.  */
 
 static void
@@ -3063,19 +3198,59 @@ add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 
   add_candidate (data, iv->base, iv->step, false, use);
 
-  /* The same, but with initial value zero.  Make such variable important,
-     since it is generic enough so that possibly many uses may be based
-     on it.  */
+  /* Record common candidate for use in case it can be shared by others.  */
+  record_common_cand (data, iv->base, iv->step, use);
+
+  /* Record common candidate with initial value zero.  */
   basetype = TREE_TYPE (iv->base);
   if (POINTER_TYPE_P (basetype))
     basetype = sizetype;
-  add_candidate (data, build_int_cst (basetype, 0), iv->step, true, use);
+  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
+
+  /* Record common candidate with constant offset stripped in base.  */
+    {
+      base = strip_offset (iv->base, &offset);
+      if (offset || base != iv->base)
+	record_common_cand (data, base, iv->step, use);
+    }
+
+  /* Record common candidate with base_object removed in base.  */
+  if (iv->base_object != NULL)
+    {
+      unsigned i;
+      aff_tree aff_base;
+      tree step, base_object = iv->base_object;
 
-  /* Third, try removing the constant offset.  Make sure to even
-     add a candidate for &a[0] vs. (T *)&a.  */
-  base = strip_offset (iv->base, &offset);
-  if (offset || base != iv->base)
-    add_candidate (data, base, iv->step, false, use);
+      base = iv->base;
+      step = iv->step;
+      STRIP_NOPS (base);
+      STRIP_NOPS (step);
+      STRIP_NOPS (base_object);
+      tree_to_aff_combination (base, TREE_TYPE (base), &aff_base);
+      for (i = 0; i < aff_base.n; i++)
+	{
+	  if (aff_base.elts[i].coef != 1)
+	    continue;
+
+	  if (operand_equal_p (aff_base.elts[i].val, base_object, 0))
+	    break;
+	}
+      if (i < aff_base.n)
+	{
+	  aff_combination_remove_elt (&aff_base, i);
+	  base = aff_combination_to_tree (&aff_base);
+	  basetype = TREE_TYPE (base);
+	  if (POINTER_TYPE_P (basetype))
+	    basetype = sizetype;
+
+	  step = fold_convert (basetype, step);
+	  record_common_cand (data, base, step, use);
+	  /* Also record common candidate with offset stripped.  */
+	  base = strip_offset (base, &offset);
+	  if (offset)
+	    record_common_cand (data, base, step, use);
+	}
+    }
 
   /* At last, add auto-incremental candidates.  Make such variables
      important since other iv uses with same base object may be based
@@ -3111,10 +3286,10 @@ add_iv_candidate_for_uses (struct ivopts_data *data)
 	  gcc_unreachable ();
 	}
     }
+  add_iv_candidate_derived_from_uses (data);
 }
 
-/* Record important candidates and add them to related_cands bitmaps
-   if needed.  */
+/* Record important candidates and add them to related_cands bitmaps.  */
 
 static void
 record_important_candidates (struct ivopts_data *data)
@@ -3133,22 +3308,11 @@ record_important_candidates (struct ivopts_data *data)
   data->consider_all_candidates = (n_iv_cands (data)
 				   <= CONSIDER_ALL_CANDIDATES_BOUND);
 
-  if (data->consider_all_candidates)
-    {
-      /* We will not need "related_cands" bitmaps in this case,
-	 so release them to decrease peak memory consumption.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	{
-	  use = iv_use (data, i);
-	  BITMAP_FREE (use->related_cands);
-	}
-    }
-  else
+  /* Add important candidates to uses' related_cands bitmaps.  */
+  for (i = 0; i < n_iv_uses (data); i++)
     {
-      /* Add important candidates to the related_cands bitmaps.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	bitmap_ior_into (iv_use (data, i)->related_cands,
-			 data->important_candidates);
+      use = iv_use (data, i);
+      bitmap_ior_into (use->related_cands, data->important_candidates);
     }
 }
 
@@ -6519,7 +6683,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
      too many ivs.  The approach from few ivs to more seems more likely to be
      successful -- starting from few ivs, replacing an expensive use by a
      specific iv should always be a win.  */
-  EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, i, bi)
     {
       cand = iv_cand (data, i);
 
@@ -7428,6 +7592,9 @@ free_loop_data (struct ivopts_data *data)
 
   data->inv_expr_tab->empty ();
   data->inv_expr_id = 0;
+
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
 }
 
 /* Finalizes data structures used by the iv optimization pass.  LOOPS is the
@@ -7447,6 +7614,9 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   delete data->inv_expr_tab;
   data->inv_expr_tab = NULL;
   free_affine_expand_cache (&data->name_expansion_cache);
+  delete data->iv_common_cand_tab;
+  data->iv_common_cand_tab = NULL;
+  data->iv_common_cands.release ();
   obstack_free (&data->iv_obstack, NULL);
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-10 10:19               ` Bin.Cheng
@ 2015-11-18 15:50                 ` Bernd Schmidt
  2015-11-20  9:07                   ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Bernd Schmidt @ 2015-11-18 15:50 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Richard Biener, Bin Cheng, GCC Patches

On 11/10/2015 11:19 AM, Bin.Cheng wrote:
> On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>>
>> Multi-line expressions should be wrapped in parentheses so that emacs/indent
>> can format them automatically. Two sets of parens are needed for this.
>> Operators should then line up appropriately.
> Ah, thanks for teaching.  Here is the updated patch, hoping it's correct.
It looks like you're waiting to check it in - Richard's earlier approval 
still holds.


Bernd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH PR52272]Be smart when adding iv candidates
  2015-11-18 15:50                 ` Bernd Schmidt
@ 2015-11-20  9:07                   ` Bin.Cheng
  0 siblings, 0 replies; 11+ messages in thread
From: Bin.Cheng @ 2015-11-20  9:07 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Richard Biener, Bin Cheng, GCC Patches

On Wed, Nov 18, 2015 at 11:50 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 11/10/2015 11:19 AM, Bin.Cheng wrote:
>>
>> On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt <bschmidt@redhat.com>
>> wrote:
>>>
>>>
>>> Multi-line expressions should be wrapped in parentheses so that
>>> emacs/indent
>>> can format them automatically. Two sets of parens are needed for this.
>>> Operators should then line up appropriately.
>>
>> Ah, thanks for teaching.  Here is the updated patch, hoping it's correct.
>
> It looks like you're waiting to check it in - Richard's earlier approval
> still holds.
Thanks, applied as r230647.
>
>
> Bernd
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-20  9:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 10:18 [PATCH PR52272]Be smart when adding iv candidates Bin Cheng
2015-11-06 13:24 ` Richard Biener
2015-11-08  2:59   ` Bin.Cheng
2015-11-08  9:11     ` Richard Biener
2015-11-09 15:24       ` Bernd Schmidt
2015-11-10  1:26         ` Bin.Cheng
2015-11-10  8:25           ` Bin.Cheng
2015-11-10 10:06             ` Bernd Schmidt
2015-11-10 10:19               ` Bin.Cheng
2015-11-18 15:50                 ` Bernd Schmidt
2015-11-20  9:07                   ` Bin.Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).