public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r12-6087] Don't move cold code out of loop by checking bb count
@ 2021-12-21  3:50 Xiong Hu Luo
  0 siblings, 0 replies; only message in thread
From: Xiong Hu Luo @ 2021-12-21  3:50 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:51a24e4a984536371eaf146a922124af4a6982da

commit r12-6087-g51a24e4a984536371eaf146a922124af4a6982da
Author: Xionghu Luo <luoxhu@linux.ibm.com>
Date:   Tue Dec 7 19:24:35 2021 -0600

    Don't move cold code out of loop by checking bb count
    
    v8 changes:
    1. Use hotter_than_inner_loop instead of colder to store a hotter loop
    nearest to loop.
    2. Update the logic in fill_coldest_and_hotter_out_loop and
    get_coldest_out_loop to make common case O(1).
    3. Update function argument bb_colder_than_loop_preheader.
    4. Make cached array to vec<class *loop> for index checking.
    
    v7 changes:
    1. Refine get_coldest_out_loop to replace loop with checking
    pre-computed coldest_outermost_loop and colder_than_inner_loop.
    2. Add function fill_cold_out_loop, compute coldest_outermost_loop and
    colder_than_inner_loop recursively without loop.
    
    v6 changes:
    1. Add function fill_coldest_out_loop to pre compute the coldest
    outermost loop for each loop.
    2. Rename find_coldest_out_loop to get_coldest_out_loop.
    3. Add testcase ssa-lim-22.c to differentiate with ssa-lim-19.c.
    
    v5 changes:
    1. Refine comments for new functions.
    2. Use basic_block instead of count in bb_colder_than_loop_preheader
    to align with function name.
    3. Refine with simpler implementation for get_coldest_out_loop and
    ref_in_loop_hot_body::operator for better understanding.
    
    v4 changes:
    1. Sort out profile_count comparision to function bb_cold_than_loop_preheader.
    2. Update ref_in_loop_hot_body::operator () to find cold_loop before compare.
    3. Split RTL invariant motion part out.
    4. Remove aux changes.
    
    v3 changes:
    1. Handle max_loop in determine_max_movement instead of outermost_invariant_loop.
    2. Remove unnecessary changes.
    3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in can_sm_ref_p.
    4. "gsi_next (&bsi);" in move_computations_worker is kept since it caused
    infinite loop when implementing v1 and the iteration is missed to be
    updated actually.
    
    v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
    v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
    v3: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580211.html
    v4: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581231.html
    v5: https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581961.html
    ...
    v8: https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586209.html
    
    There was a patch trying to avoid move cold block out of loop:
    
    https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
    
    Richard suggested to "never hoist anything from a bb with lower execution
    frequency to a bb with higher one in LIM invariantness_dom_walker
    before_dom_children".
    
    In gimple LIM analysis, add get_coldest_out_loop to move invariants to
    expected target loop, if profile count of the loop bb is colder
    than target loop preheader, it won't be hoisted out of loop.
    Likely for store motion, if all locations of the REF in loop is cold,
    don't do store motion of it.
    
    SPEC2017 performance evaluation shows 1% performance improvement for
    intrate GEOMEAN and no obvious regression for others.  Especially,
    500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
    largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
    on P8LE.
    
    gcc/ChangeLog:
    
    2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
    
            * tree-ssa-loop-im.c (bb_colder_than_loop_preheader): New
            function.
            (get_coldest_out_loop): New function.
            (determine_max_movement): Use get_coldest_out_loop.
            (move_computations_worker): Adjust and fix iteration udpate.
            (class ref_in_loop_hot_body): New functor.
            (ref_in_loop_hot_body::operator): New.
            (can_sm_ref_p): Use for_all_locs_in_loop.
            (fill_coldest_and_hotter_out_loop): New.
            (tree_ssa_lim_finalize): Free coldest_outermost_loop and
            hotter_than_inner_loop.
            (loop_invariant_motion_in_fun): Call fill_coldest_and_hotter_out_loop.
    
    gcc/testsuite/ChangeLog:
    
    2021-12-21  Xionghu Luo  <luoxhu@linux.ibm.com>
    
            * gcc.dg/tree-ssa/recip-3.c: Adjust.
            * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
            * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
            * gcc.dg/tree-ssa/ssa-lim-21.c: New test.
            * gcc.dg/tree-ssa/ssa-lim-22.c: New test.
            * gcc.dg/tree-ssa/ssa-lim-23.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c    |   2 +-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c |  25 +++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c |  35 +++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c |  32 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-23.c |  21 ++++
 gcc/tree-ssa-loop-im.c                     | 152 ++++++++++++++++++++++++++++-
 7 files changed, 293 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
index 638bf38db8c..641c91e719e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
@@ -23,4 +23,4 @@ float h ()
 	F[0] += E / d;
 }
 
-/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
+/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 00000000000..51c1913d003
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int x;
+void
+bar (int, char *, char *);
+void
+foo (int *a, int n, int m, int s, int t)
+{
+  int i;
+  int j;
+  int k;
+
+  for (i = 0; i < m; i++) // Loop 1
+    {
+      if (__builtin_expect (x, 0))
+	for (j = 0; j < n; j++) // Loop 2
+	  for (k = 0; k < n; k++) // Loop 3
+	    {
+	      bar (s / 5, "one", "two");
+	      a[t] = s;
+	    }
+      a[t] = t;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "out of loop 2" 4 "lim2" } } */
+/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
new file mode 100644
index 00000000000..bc60a040a70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
@@ -0,0 +1,25 @@
+/* { dg-do compile  } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+/* Test that `count' is not hoisted out of loop when bb is cold.  */
+
+int count;
+volatile int x;
+
+struct obj {
+  int data;
+  struct obj *next;
+
+} *q;
+
+void
+func (int m)
+{
+  struct obj *p;
+  for (int i = 0; i < m; i++)
+    if (__builtin_expect (x, 0))
+      count++;
+
+}
+
+/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2"  }  } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
new file mode 100644
index 00000000000..ffe6f8f699d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
@@ -0,0 +1,35 @@
+/* { dg-do compile  } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+/* Test that `data' and 'data1' is not hoisted out of inner loop and outer loop
+   when it is in cold loop.  */
+
+int count;
+volatile int x;
+
+struct obj {
+  int data;
+  int data1;
+  struct obj *next;
+};
+
+void
+func (int m, int n, int k, struct obj *a)
+{
+  struct obj *q = a;
+  for (int j = 0; j < m; j++)
+    if (__builtin_expect (m, 0))
+      for (int i = 0; i < m; i++)
+	{
+	  if (__builtin_expect (x, 0))
+	    {
+	      count++;
+	      q->data += 3; /* Not hoisted out to inner loop. */
+	    }
+	  count += n;
+	  q->data1 += k; /* Not hoisted out to outer loop. */
+	}
+}
+
+/* { dg-final { scan-tree-dump-not "Executing store motion of" "lim2"  }  } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
new file mode 100644
index 00000000000..16ba4ceb8ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-22.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int x;
+volatile int y;
+void
+bar (int, char *, char *);
+void
+foo (int *a, int n, int m, int s, int t)
+{
+  int i;
+  int j;
+  int k;
+
+  for (i = 0; i < m; i++) // Loop 1
+    {
+      if (__builtin_expect (x, 0))
+	for (j = 0; j < n; j++) // Loop 2
+	  if (__builtin_expect (y, 0))
+	    for (k = 0; k < n; k++) // Loop 3
+	      {
+		bar (s / 5, "one", "two");
+		a[t] = s;
+	      }
+      a[t] = t;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "out of loop 3" 4 "lim2" } } */
+/* { dg-final { scan-tree-dump-times "out of loop 1" 3 "lim2" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-23.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-23.c
new file mode 100644
index 00000000000..e7880746422
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-23.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int x;
+void
+bar (int, char *, char *);
+void
+foo (int *a, int n, int k)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+    {
+      if (__builtin_expect (x, 0))
+	bar (k / 5, "one", "two");
+      a[i] = k;
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "out of loop 1" "lim2" } } */
+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 682406d26c1..b952386a9b1 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -146,6 +146,11 @@ public:
 enum dep_kind { lim_raw, sm_war, sm_waw };
 enum dep_state { dep_unknown, dep_independent, dep_dependent };
 
+/* coldest outermost loop for given loop.  */
+vec<class loop *> coldest_outermost_loop;
+/* hotter outer loop nearest to given loop.  */
+vec<class loop *> hotter_than_inner_loop;
+
 /* Populate the loop dependence cache of REF for LOOP, KIND with STATE.  */
 
 static void
@@ -417,6 +422,63 @@ movement_possibility (gimple *stmt)
   return ret;
 }
 
+/* Compare the profile count inequality of bb and loop's preheader, it is
+   three-state as stated in profile-count.h, FALSE is returned if inequality
+   cannot be decided.  */
+bool
+bb_colder_than_loop_preheader (basic_block bb, class loop *loop)
+{
+  gcc_assert (bb && loop);
+  return bb->count < loop_preheader_edge (loop)->src->count;
+}
+
+/* Check coldest loop between OUTERMOST_LOOP and LOOP by comparing profile
+   count.
+  It does three steps check:
+  1) Check whether CURR_BB is cold in it's own loop_father, if it is cold, just
+  return NULL which means it should not be moved out at all;
+  2) CURR_BB is NOT cold, check if pre-computed COLDEST_LOOP is outside of
+  OUTERMOST_LOOP, if it is inside of OUTERMOST_LOOP, return the COLDEST_LOOP;
+  3) If COLDEST_LOOP is outside of OUTERMOST_LOOP, check whether there is a
+  hotter loop between OUTERMOST_LOOP and loop in pre-computed
+  HOTTER_THAN_INNER_LOOP, return it's nested inner loop, otherwise return
+  OUTERMOST_LOOP.
+  At last, the coldest_loop is inside of OUTERMOST_LOOP, just return it as
+  the hoist target.  */
+
+static class loop *
+get_coldest_out_loop (class loop *outermost_loop, class loop *loop,
+		      basic_block curr_bb)
+{
+  gcc_assert (outermost_loop == loop
+	      || flow_loop_nested_p (outermost_loop, loop));
+
+  /* If bb_colder_than_loop_preheader returns false due to three-state
+    comparision, OUTERMOST_LOOP is returned finally to preserve the behavior.
+    Otherwise, return the coldest loop between OUTERMOST_LOOP and LOOP.  */
+  if (curr_bb && bb_colder_than_loop_preheader (curr_bb, loop))
+    return NULL;
+
+  class loop *coldest_loop = coldest_outermost_loop[loop->num];
+  if (loop_depth (coldest_loop) < loop_depth (outermost_loop))
+    {
+      class loop *hotter_loop = hotter_than_inner_loop[loop->num];
+      if (!hotter_loop
+	  || loop_depth (hotter_loop) < loop_depth (outermost_loop))
+	return outermost_loop;
+
+      /*  hotter_loop is between OUTERMOST_LOOP and LOOP like:
+	[loop tree root, ..., coldest_loop, ..., outermost_loop, ...,
+	hotter_loop, second_coldest_loop, ..., loop]
+	return second_coldest_loop to be the hoist target.  */
+      class loop *aloop;
+      for (aloop = hotter_loop->inner; aloop; aloop = aloop->next)
+	if (aloop == loop || flow_loop_nested_p (aloop, loop))
+	  return aloop;
+    }
+  return coldest_loop;
+}
+
 /* Suppose that operand DEF is used inside the LOOP.  Returns the outermost
    loop to that we could move the expression using DEF if it did not have
    other operands, i.e. the outermost loop enclosing LOOP in that the value
@@ -685,7 +747,9 @@ determine_max_movement (gimple *stmt, bool must_preserve_exec)
     level = ALWAYS_EXECUTED_IN (bb);
   else
     level = superloop_at_depth (loop, 1);
-  lim_data->max_loop = level;
+  lim_data->max_loop = get_coldest_out_loop (level, loop, bb);
+  if (!lim_data->max_loop)
+    return false;
 
   if (gphi *phi = dyn_cast <gphi *> (stmt))
     {
@@ -1217,7 +1281,10 @@ move_computations_worker (basic_block bb)
       /* We do not really want to move conditionals out of the loop; we just
 	 placed it here to force its operands to be moved if necessary.  */
       if (gimple_code (stmt) == GIMPLE_COND)
-	continue;
+	{
+	  gsi_next (&bsi);
+	  continue;
+	}
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -3023,6 +3090,26 @@ ref_indep_loop_p (class loop *loop, im_mem_ref *ref, dep_kind kind)
   return indep_p;
 }
 
+class ref_in_loop_hot_body
+{
+public:
+  ref_in_loop_hot_body (class loop *loop_) : l (loop_) {}
+  bool operator () (mem_ref_loc *loc);
+  class loop *l;
+};
+
+/* Check the coldest loop between loop L and innermost loop.  If there is one
+   cold loop between L and INNER_LOOP, store motion can be performed, otherwise
+   no cold loop means no store motion.  get_coldest_out_loop also handles cases
+   when l is inner_loop.  */
+bool
+ref_in_loop_hot_body::operator () (mem_ref_loc *loc)
+{
+  basic_block curr_bb = gimple_bb (loc->stmt);
+  class loop *inner_loop = curr_bb->loop_father;
+  return get_coldest_out_loop (l, inner_loop, curr_bb);
+}
+
 
 /* Returns true if we can perform store motion of REF from LOOP.  */
 
@@ -3077,6 +3164,12 @@ can_sm_ref_p (class loop *loop, im_mem_ref *ref)
   if (!ref_indep_loop_p (loop, ref, sm_war))
     return false;
 
+  /* Verify whether the candidate is hot for LOOP.  Only do store motion if the
+    candidate's profile count is hot.  Statement in cold BB shouldn't be moved
+    out of it's loop_father.  */
+  if (!for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body (loop)))
+    return false;
+
   return true;
 }
 
@@ -3289,6 +3382,48 @@ fill_always_executed_in (void)
     fill_always_executed_in_1 (loop, contains_call);
 }
 
+/* Find the coldest loop preheader for LOOP, also find the nearest hotter loop
+   to LOOP.  Then recursively iterate each inner loop.  */
+
+void
+fill_coldest_and_hotter_out_loop (class loop *coldest_loop,
+				  class loop *hotter_loop, class loop *loop)
+{
+  if (bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
+				     coldest_loop))
+    coldest_loop = loop;
+
+  coldest_outermost_loop[loop->num] = coldest_loop;
+
+  hotter_than_inner_loop[loop->num] = NULL;
+  class loop *outer_loop = loop_outer (loop);
+  if (hotter_loop
+      && bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
+					hotter_loop))
+    hotter_than_inner_loop[loop->num] = hotter_loop;
+
+  if (outer_loop && outer_loop != current_loops->tree_root
+      && bb_colder_than_loop_preheader (loop_preheader_edge (loop)->src,
+					outer_loop))
+    hotter_than_inner_loop[loop->num] = outer_loop;
+
+  if (dump_enabled_p ())
+    {
+      dump_printf (MSG_NOTE, "loop %d's coldest_outermost_loop is %d, ",
+		   loop->num, coldest_loop->num);
+      if (hotter_than_inner_loop[loop->num])
+	dump_printf (MSG_NOTE, "hotter_than_inner_loop is %d\n",
+		     hotter_than_inner_loop[loop->num]->num);
+      else
+	dump_printf (MSG_NOTE, "hotter_than_inner_loop is NULL\n");
+    }
+
+  class loop *inner_loop;
+  for (inner_loop = loop->inner; inner_loop; inner_loop = inner_loop->next)
+    fill_coldest_and_hotter_out_loop (coldest_loop,
+				      hotter_than_inner_loop[loop->num],
+				      inner_loop);
+}
 
 /* Compute the global information needed by the loop invariant motion pass.  */
 
@@ -3373,6 +3508,9 @@ tree_ssa_lim_finalize (void)
     free_affine_expand_cache (&memory_accesses.ttae_cache);
 
   free (bb_loop_postorder);
+
+  coldest_outermost_loop.release ();
+  hotter_than_inner_loop.release ();
 }
 
 /* Moves invariants from loops.  Only "expensive" invariants are moved out --
@@ -3392,6 +3530,16 @@ loop_invariant_motion_in_fun (function *fun, bool store_motion)
   /* Fills ALWAYS_EXECUTED_IN information for basic blocks.  */
   fill_always_executed_in ();
 
+  /* Pre-compute coldest outermost loop and nearest hotter loop of each loop.
+   */
+  class loop *loop;
+  coldest_outermost_loop.create (number_of_loops (cfun));
+  coldest_outermost_loop.safe_grow_cleared (number_of_loops (cfun));
+  hotter_than_inner_loop.create (number_of_loops (cfun));
+  hotter_than_inner_loop.safe_grow_cleared (number_of_loops (cfun));
+  for (loop = current_loops->tree_root->inner; loop != NULL; loop = loop->next)
+    fill_coldest_and_hotter_out_loop (loop, NULL, loop);
+
   int *rpo = XNEWVEC (int, last_basic_block_for_fn (fun));
   int n = pre_and_rev_post_order_compute_fn (fun, NULL, rpo, false);


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-12-21  3:50 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-21  3:50 [gcc r12-6087] Don't move cold code out of loop by checking bb count Xiong Hu Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).