public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Bin Cheng" <bin.cheng@arm.com>
To: <gcc-patches@gcc.gnu.org>
Subject: [PATCH PR52272]Be smart when adding iv candidates
Date: Wed, 04 Nov 2015 10:18:00 -0000	[thread overview]
Message-ID: <000901d116ea$19388d70$4ba9a850$@arm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3488 bytes --]

Hi,
PR52272 reported a performance regression in spec2006/410.bwaves once GCC is
prevented from representing address of one memory object using address of
another memory object.  Also as I commented in that PR, we have two possible
fixes for this:
1) Improve how TMR.base is deduced, so that we can represent addr of mem obj
using another one, while not breaking PR50955.
2) Add iv candidates with base object stripped.  In this way, we use the
common base-stripped part to represent all address expressions, in the form
of [base_1 + common], [base_2 + common], ..., [base_n + common].

In terms of code generation, method 2) is at least as good as 1), actually
better in my opinion.  The problem of 2) is we need to tell when iv
candidates should be added for the common part and when shouldn't.  This
issue can be generalized and described as: We know IVO tries to add
candidates by deriving from iv uses.  One disadvantage is that candidates
are derived from iv use independently.  It doesn't take common sub
expression among different iv uses into consideration.  As a result,
candidate for common sub expression is not added, while many useless
candidates are added.

As a matter of fact, candidate derived from iv use is useful only if it's
common enough and could be shared among different uses.  A candidate is most
likely useless if it's derived from a single use and could not be shared by
others.  This patch works in this way by firstly recording all kinds
candidates derived from iv uses, then adding candidates for common ones.

The patch improves 410.bwaves by 3-4% on x86_64.  I also saw regression for
400.perlbench and small regression for 401.bzip on x86_64, but I can confirm
they are false alarms caused by align issues. 
For aarch64, fp cases are obviously improved for both spec2000 and spec2006.
Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is
another irrelevant issue caused by heuristic candidate selecting algorithm.
Unfortunately, I don't have fix to it currently.

This patch may add more candidates in some cases, but generally candidates
number is smaller because we don't need to add useless candidates now.
Statistic data shows there are quite fewer loops with more than 30
candidates when building spec2k6 on x86_64 using this patch.

Bootstrap and test on x86_64.  I will re-test it against latest trunk on
AArch64.  Is it OK?

Thanks,
bin

2015-11-03  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/52272
	* tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct.
	(struct iv_common_cand_hasher): New struct.
	(iv_common_cand_hasher::hash): New function.
	(iv_common_cand_hasher::equal): New function.
	(struct ivopts_data): New fields, iv_common_cand_tab and
	iv_common_cands.
	(tree_ssa_iv_optimize_init): Initialize above fields.
	(record_common_cand, common_cand_cmp): New functions.
	(add_iv_candidate_derived_from_uses): New function.
	(add_iv_candidate_for_use): Record iv_common_cands derived from
	iv use in hash table, instead of adding candidates directly.
	(add_iv_candidate_for_uses): Call
add_iv_candidate_derived_from_uses.
	(record_important_candidates): Add important candidates to iv uses'
	related_cands.  Always keep related_cands for future use.
	(try_add_cand_for): Use iv uses' related_cands.
	(free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields
	in struct ivopts_data, iv_common_cand_tab and iv_common_cands.

[-- Attachment #2: pr52272-20151103.txt --]
[-- Type: text/plain, Size: 10307 bytes --]

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..2417d4d 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
 			   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec<iv_use *> uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash <iv_common_cand>
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+			   const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+	 && operand_equal_p (ccand1->base, ccand2->base, 0)
+	 && operand_equal_p (ccand1->step, ccand2->step, 0)
+	 && TYPE_PRECISION (TREE_TYPE (ccand1->base))
+	      == TYPE_PRECISION (TREE_TYPE (ccand2->base));
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash <iv_inv_expr_ent>
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map<tree, name_expansion *> *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table<iv_common_cand_hasher> *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec<iv_common_cand *> iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table<iv_inv_expr_hasher> (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table<iv_common_cand_hasher> (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (&data->iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
     }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+		    tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (&ent, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = XNEW (struct iv_common_cand);
+      (*slot)->base = base;
+      (*slot)->step = step;
+      (*slot)->uses.create (8);
+      (*slot)->hash = ent.hash;
+      data->iv_common_cands.safe_push ((*slot));
+    }
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+    = (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+    = (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data *data)
+{
+  unsigned i, j;
+  struct iv_cand *cand_1, *cand_2;
+
+  data->iv_common_cands.qsort (common_cand_cmp);
+  for (i = 0; i < data->iv_common_cands.length (); i++)
+    {
+      struct iv_common_cand *ptr = data->iv_common_cands[i];
+
+      /* Only add IV candidate if it's derived from multiple uses.  */
+      if (ptr->uses.length () <= 1)
+	break;
+
+      cand_1 = NULL;
+      cand_2 = NULL;
+      if (ip_normal_pos (data->current_loop))
+	cand_1 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_NORMAL, NULL, NULL);
+
+      if (ip_end_pos (data->current_loop)
+	  && allow_ip_end_pos_p (data->current_loop))
+	cand_2 = add_candidate_1 (data, ptr->base, ptr->step,
+				  false, IP_END, NULL, NULL);
+
+      /* Bind deriving uses and the new candidates.  */
+      for (j = 0; j < ptr->uses.length (); j++)
+	{
+	  struct iv_use *use = ptr->uses[j];
+	  if (cand_1)
+	    bitmap_set_bit (use->related_cands, cand_1->id);
+	  if (cand_2)
+	    bitmap_set_bit (use->related_cands, cand_2->id);
+	}
+    }
+
+  /* Release data since it is useless from this point.  */
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
+}
+
 /* Adds candidates based on the value of USE's iv.  */
 
 static void
@@ -3063,19 +3198,59 @@ add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 
   add_candidate (data, iv->base, iv->step, false, use);
 
-  /* The same, but with initial value zero.  Make such variable important,
-     since it is generic enough so that possibly many uses may be based
-     on it.  */
+  /* Record common candidate for use in case it can be shared by others.  */
+  record_common_cand (data, iv->base, iv->step, use);
+
+  /* Record common candidate with initial value zero.  */
   basetype = TREE_TYPE (iv->base);
   if (POINTER_TYPE_P (basetype))
     basetype = sizetype;
-  add_candidate (data, build_int_cst (basetype, 0), iv->step, true, use);
+  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
+
+  /* Record common candidate with constant offset stripped in base.  */
+    {
+      base = strip_offset (iv->base, &offset);
+      if (offset || base != iv->base)
+	record_common_cand (data, base, iv->step, use);
+    }
+
+  /* Record common candidate with base_object removed in base.  */
+  if (iv->base_object != NULL)
+    {
+      unsigned i;
+      aff_tree aff_base;
+      tree step, base_object = iv->base_object;
 
-  /* Third, try removing the constant offset.  Make sure to even
-     add a candidate for &a[0] vs. (T *)&a.  */
-  base = strip_offset (iv->base, &offset);
-  if (offset || base != iv->base)
-    add_candidate (data, base, iv->step, false, use);
+      base = iv->base;
+      step = iv->step;
+      STRIP_NOPS (base);
+      STRIP_NOPS (step);
+      STRIP_NOPS (base_object);
+      tree_to_aff_combination (base, TREE_TYPE (base), &aff_base);
+      for (i = 0; i < aff_base.n; i++)
+	{
+	  if (aff_base.elts[i].coef != 1)
+	    continue;
+
+	  if (operand_equal_p (aff_base.elts[i].val, base_object, 0))
+	    break;
+	}
+      if (i < aff_base.n)
+	{
+	  aff_combination_remove_elt (&aff_base, i);
+	  base = aff_combination_to_tree (&aff_base);
+	  basetype = TREE_TYPE (base);
+	  if (POINTER_TYPE_P (basetype))
+	    basetype = sizetype;
+
+	  step = fold_convert (basetype, step);
+	  record_common_cand (data, base, step, use);
+	  /* Also record common candidate with offset stripped.  */
+	  base = strip_offset (base, &offset);
+	  if (offset)
+	    record_common_cand (data, base, step, use);
+	}
+    }
 
   /* At last, add auto-incremental candidates.  Make such variables
      important since other iv uses with same base object may be based
@@ -3111,10 +3286,10 @@ add_iv_candidate_for_uses (struct ivopts_data *data)
 	  gcc_unreachable ();
 	}
     }
+  add_iv_candidate_derived_from_uses (data);
 }
 
-/* Record important candidates and add them to related_cands bitmaps
-   if needed.  */
+/* Record important candidates and add them to related_cands bitmaps.  */
 
 static void
 record_important_candidates (struct ivopts_data *data)
@@ -3133,22 +3308,11 @@ record_important_candidates (struct ivopts_data *data)
   data->consider_all_candidates = (n_iv_cands (data)
 				   <= CONSIDER_ALL_CANDIDATES_BOUND);
 
-  if (data->consider_all_candidates)
-    {
-      /* We will not need "related_cands" bitmaps in this case,
-	 so release them to decrease peak memory consumption.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	{
-	  use = iv_use (data, i);
-	  BITMAP_FREE (use->related_cands);
-	}
-    }
-  else
+  /* Add important candidates to uses' related_cands bitmaps.  */
+  for (i = 0; i < n_iv_uses (data); i++)
     {
-      /* Add important candidates to the related_cands bitmaps.  */
-      for (i = 0; i < n_iv_uses (data); i++)
-	bitmap_ior_into (iv_use (data, i)->related_cands,
-			 data->important_candidates);
+      use = iv_use (data, i);
+      bitmap_ior_into (use->related_cands, data->important_candidates);
     }
 }
 
@@ -6519,7 +6683,7 @@ try_add_cand_for (struct ivopts_data *data, struct iv_ca *ivs,
      too many ivs.  The approach from few ivs to more seems more likely to be
      successful -- starting from few ivs, replacing an expensive use by a
      specific iv should always be a win.  */
-  EXECUTE_IF_SET_IN_BITMAP (data->important_candidates, 0, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, i, bi)
     {
       cand = iv_cand (data, i);
 
@@ -7428,6 +7592,9 @@ free_loop_data (struct ivopts_data *data)
 
   data->inv_expr_tab->empty ();
   data->inv_expr_id = 0;
+
+  data->iv_common_cand_tab->empty ();
+  data->iv_common_cands.truncate (0);
 }
 
 /* Finalizes data structures used by the iv optimization pass.  LOOPS is the
@@ -7447,6 +7614,9 @@ tree_ssa_iv_optimize_finalize (struct ivopts_data *data)
   delete data->inv_expr_tab;
   data->inv_expr_tab = NULL;
   free_affine_expand_cache (&data->name_expansion_cache);
+  delete data->iv_common_cand_tab;
+  data->iv_common_cand_tab = NULL;
+  data->iv_common_cands.release ();
   obstack_free (&data->iv_obstack, NULL);
 }
 

             reply	other threads:[~2015-11-04 10:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04 10:18 Bin Cheng [this message]
2015-11-06 13:24 ` Richard Biener
2015-11-08  2:59   ` Bin.Cheng
2015-11-08  9:11     ` Richard Biener
2015-11-09 15:24       ` Bernd Schmidt
2015-11-10  1:26         ` Bin.Cheng
2015-11-10  8:25           ` Bin.Cheng
2015-11-10 10:06             ` Bernd Schmidt
2015-11-10 10:19               ` Bin.Cheng
2015-11-18 15:50                 ` Bernd Schmidt
2015-11-20  9:07                   ` Bin.Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000901d116ea$19388d70$4ba9a850$@arm.com' \
    --to=bin.cheng@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).