public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc r14-162] Update loop estimate after header duplication
@ 2023-04-22  7:21 Jan Hubicka
  0 siblings, 0 replies; only message in thread
From: Jan Hubicka @ 2023-04-22  7:21 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:cda246f8b421ba855a9e5f9d7bfcd0fc49b7bd4b

commit r14-162-gcda246f8b421ba855a9e5f9d7bfcd0fc49b7bd4b
Author: Jan Hubicka <jh@suse.cz>
Date:   Sat Apr 22 09:20:45 2023 +0200

    Update loop estimate after header duplication
    
    Loop header copying implements partial loop peelng.  If all exits of the loop
    are peeled (which is the common case) the number of iterations decreases by 1.
    Without noting this, for loops iterating zero times, we end up re-peeling them
    later in the loop peeling pass which is wasteful.
    
    This patch commonizes the code for esitmate update and adds logic to detect
    when all (likely) exits were peeled by loop-ch.
    
    We are still wrong about update of estimate however: if the exits behave
    randomly with given probability, loop peeling does not decrease expected
    iteration counts, just decreases probability that loop will be executed.
    In this case we thus incorrectly decrease any_estimate. Doing so however
    at least help us to not peel or optimize hard the lop later.
    
    If the loop iterates precisely the estimated nuner of iterations. the estimate
    decreases, but we are wrong about decreasing the header frequncy.  We already
    have logic that tries to prove that loop exit will not be taken in peeled out
    iterations and it may make sense to special case this.
    
    I also fixed problem where we had off by one error in iteration count updating.
    It makes perfect sense to expect loop to have 0 iterations.  However if bounds
    drops to negative, we lose info about the loop behaviour (since we have no
    profile data reaching the loop body).
    
    Bootstrapped/regtested x86_64-linux, comitted.
    Honza
    
    gcc/ChangeLog:
    
    2023-04-22  Jan Hubicka  <hubicka@ucw.cz>
                Ondrej Kubanek  <kubanek0ondrej@gmail.com>
    
            * cfgloopmanip.h (adjust_loop_info_after_peeling): Declare.
            * tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix updating of
            loop profile and bounds after header duplication.
            * tree-ssa-loop-ivcanon.cc (adjust_loop_info_after_peeling):
            Break out from try_peel_loop; fix handling of 0 iterations.
            (try_peel_loop): Use adjust_loop_info_after_peeling.
    
    gcc/testsuite/ChangeLog:
    
    2023-04-22  Jan Hubicka  <hubicka@ucw.cz>
                Ondrej Kubanek  <kubanek0ondrej@gmail.com>
    
            * gcc.dg/tree-ssa/peel1.c: Decrease number of peels by 1.
            * gcc.dg/unroll-8.c: Decrease loop iteration estimate.
            * gcc.dg/tree-prof/peel-2.c: New test.

Diff:
---
 gcc/cfgloopmanip.h                      |   1 +
 gcc/testsuite/gcc.dg/tree-prof/peel-2.c |  21 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/peel1.c   |   4 +-
 gcc/testsuite/gcc.dg/unroll-8.c         |   2 +-
 gcc/tree-ssa-loop-ch.cc                 | 112 +++++++++++++++++++++++++++++---
 gcc/tree-ssa-loop-ivcanon.cc            |  76 +++++++++++++++-------
 6 files changed, 178 insertions(+), 38 deletions(-)

diff --git a/gcc/cfgloopmanip.h b/gcc/cfgloopmanip.h
index 93e417fcb7d..c40cfeae0e3 100644
--- a/gcc/cfgloopmanip.h
+++ b/gcc/cfgloopmanip.h
@@ -59,5 +59,6 @@ class loop * loop_version (class loop *, void *,
 			    basic_block *,
 			    profile_probability, profile_probability,
 			    profile_probability, profile_probability, bool);
+void adjust_loop_info_after_peeling (class loop *loop, int npeel, bool precise);
 
 #endif /* GCC_CFGLOOPMANIP_H */
diff --git a/gcc/testsuite/gcc.dg/tree-prof/peel-2.c b/gcc/testsuite/gcc.dg/tree-prof/peel-2.c
new file mode 100644
index 00000000000..ac417fb3b57
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/peel-2.c
@@ -0,0 +1,21 @@
+/* { dg-options "-O3 -fdump-tree-cunroll-details -fno-unroll-loops -fpeel-loops -fdump-tree-ch2-details-blocks" } */
+int a[100];
+int n = 1000000;
+int zeroc;
+int
+main()
+{
+	a[0]=1;
+	for (int i = 0; i < n; i++)
+	{
+	  int j;
+	  for (j = 0; a[j]; j++);
+	  zeroc+=j;
+	  asm __volatile__ ("":::"memory");
+	}
+	return 0;
+}
+/* { dg-final-use { scan-tree-dump "Peeled loop 2, 1 times" "cunroll" } } */
+/* { dg-final-use { scan-tree-dump "Peeled likely exits: likely decreased number of iterations of loop 1" "ch2" } } */
+/* { dg-final-use { scan-tree-dump "Peeled all exits: decreased number of iterations of loop 2" "ch2" } } */
+/* { dg-final-use { scan-tree-dump-not "Invalid sum" "ch2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/peel1.c b/gcc/testsuite/gcc.dg/tree-ssa/peel1.c
index 36f3ea063ea..dc5848cb5c5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/peel1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/peel1.c
@@ -7,5 +7,5 @@ void add(struct foo *a,int l)
   for (i=0;i<l;i++)
     a->a[i]++;
 }
-/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 3 times." "cunroll"} } */
-/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Loop 1 likely iterates at most 2 times." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Peeled loop 1, 3 times." "cunroll"} } */
diff --git a/gcc/testsuite/gcc.dg/unroll-8.c b/gcc/testsuite/gcc.dg/unroll-8.c
index b16df672833..dfcfe2eebf0 100644
--- a/gcc/testsuite/gcc.dg/unroll-8.c
+++ b/gcc/testsuite/gcc.dg/unroll-8.c
@@ -8,7 +8,7 @@ int t(struct a *a, int n)
     a->a[i]++;
 }
 /* { dg-final { scan-rtl-dump-not "Unrolled loop" "loop2_unroll" } } */
-/* { dg-final { scan-rtl-dump "likely upper bound: 7" "loop2_unroll" } } */
+/* { dg-final { scan-rtl-dump "likely upper bound: 6" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "realistic bound: -1" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump "Not unrolling loop, doesn't roll" "loop2_unroll" } } */
 /* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */
diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 0025bc16e43..2fad2a3d7b6 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -381,7 +381,7 @@ unsigned int
 ch_base::copy_headers (function *fun)
 {
   basic_block header;
-  edge exit, entry;
+  edge exit, nonexit, entry;
   basic_block *bbs, *copied_bbs;
   unsigned n_bbs;
   unsigned bbs_size;
@@ -453,8 +453,16 @@ ch_base::copy_headers (function *fun)
 	 the header to have just a single successor and copying up to
 	 postdominator.  */
 
-      exit = NULL;
+      nonexit = NULL;
       n_bbs = 0;
+      int nexits = 0;
+      profile_count exit_count = profile_count::zero ();
+      profile_count entry_count = profile_count::zero ();
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, loop->header->preds)
+	if (e->src != loop->latch)
+	  entry_count += e->count ();
       while (should_duplicate_loop_header_p (header, loop, &remaining_limit))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -463,15 +471,23 @@ ch_base::copy_headers (function *fun)
 	  /* Find a successor of header that is inside a loop; i.e. the new
 	     header after the condition is copied.  */
 	  if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
-	    exit = EDGE_SUCC (header, 0);
+	    {
+	      nonexit = EDGE_SUCC (header, 0);
+	      exit = EDGE_SUCC (header, 1);
+	    }
 	  else
-	    exit = EDGE_SUCC (header, 1);
+	    {
+	      nonexit = EDGE_SUCC (header, 1);
+	      exit = EDGE_SUCC (header, 0);
+	    }
+	  exit_count += exit->count ();
+	  nexits++;
 	  bbs[n_bbs++] = header;
 	  gcc_assert (bbs_size > n_bbs);
-	  header = exit->dest;
+	  header = nonexit->dest;
 	}
 
-      if (!exit)
+      if (!nonexit)
 	continue;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -483,9 +499,9 @@ ch_base::copy_headers (function *fun)
 
       /* Ensure that the header will have just the latch as a predecessor
 	 inside the loop.  */
-      if (!single_pred_p (exit->dest))
+      if (!single_pred_p (nonexit->dest))
 	{
-	  header = split_edge (exit);
+	  header = split_edge (nonexit);
 	  exit = single_pred_edge (header);
 	}
 
@@ -550,8 +566,6 @@ ch_base::copy_headers (function *fun)
       /* Find correct latch.  We only duplicate chain of conditionals so
 	 there should be precisely two edges to the new header.  One entry
 	 edge and one to latch.  */
-      edge_iterator ei;
-      edge e;
       FOR_EACH_EDGE (e, ei, loop->header->preds)
 	if (header != e->src)
 	  {
@@ -569,7 +583,85 @@ ch_base::copy_headers (function *fun)
 	  else
 	    fprintf (dump_file, "Loop %d is still not do-while loop.\n",
 		     loop->num);
+	  fprintf (dump_file, "Exit count: ");
+	  exit_count.dump (dump_file);
+	  fprintf (dump_file, "\nEntry count: ");
+	  entry_count.dump (dump_file);
+	  fprintf (dump_file, "\n");
+	}
+
+      /* We possibly decreased number of itrations by 1.  */
+      auto_vec<edge> exits = get_loop_exit_edges (loop);
+      bool precise = (nexits == (int) exits.length ());
+      if (!get_max_loop_iterations_int (loop))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Loop %d no longer loops.\n", loop->num);
+	  /* TODO: We can unloop like in tree-ssa-loop-ivcanon.  */
+	  precise = false;
+	}
+      /* Check that loop may not terminate in other way than via
+	 basic blocks we duplicated.  */
+      if (precise)
+	{
+	  basic_block *bbs = get_loop_body (loop);
+	  for (unsigned i = 0; i < loop->num_nodes && precise; ++i)
+	   {
+	     basic_block bb = bbs[i];
+	     bool found_exit = false;
+	     FOR_EACH_EDGE (e, ei, bb->succs)
+	      if (!flow_bb_inside_loop_p (loop, e->dest))
+		{
+		  found_exit = true;
+		  break;
+		}
+	     /* If BB has exit, it was duplicated.  */
+	     if (found_exit)
+	       continue;
+	     /* Give up on irreducible loops.  */
+	     if (bb->flags & BB_IRREDUCIBLE_LOOP)
+	       {
+		 precise = false;
+		 break;
+	       }
+	     /* Check that inner loops are finite.  */
+	     for (class loop *l = bb->loop_father; l != loop && precise;
+		  l = loop_outer (l))
+	       if (!l->finite_p)
+		 {
+		   precise = false;
+		   break;
+		 }
+	     /* Verify that there is no statement that may be terminate
+		execution in a way not visible to CFG.  */
+	     for (gimple_stmt_iterator bsi = gsi_start_bb (bb);
+		  !gsi_end_p (bsi); gsi_next (&bsi))
+	       if (stmt_can_terminate_bb_p (gsi_stmt (bsi)))
+		 precise = false;
+	   }
+	}
+      if (precise)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Peeled all exits:"
+		     " decreased number of iterations of loop %d by 1.\n",
+		     loop->num);
+	  adjust_loop_info_after_peeling (loop, 1, true);
 	}
+      else if (exit_count >= entry_count.apply_scale (9, 10))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file,
+		     "Peeled likely exits: likely decreased number "
+		     "of iterations of loop %d by 1.\n", loop->num);
+	  adjust_loop_info_after_peeling (loop, 1, false);
+	}
+      else if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Not decreased number"
+		 " of iterations of loop %d; likely exits remains.\n",
+		 loop->num);
 
       changed = true;
     }
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index ddc31a82fdc..9f72d534b7c 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -980,6 +980,56 @@ estimated_peeled_sequence_size (struct loop_size *size,
 			     	       - size->eliminated_by_peeling), 1);
 }
 
+/* Update loop estimates after peeling LOOP by NPEEL.
+   If PRECISE is false only likely exists were duplicated and thus
+   do not update any estimates that are supposed to be always reliable.  */
+void
+adjust_loop_info_after_peeling (class loop *loop, int npeel, bool precise)
+{
+  if (loop->any_estimate)
+    {
+      /* Since peeling is mostly about loops where first few
+	 iterations are special, it is not quite correct to
+	 assume that the remaining iterations will behave
+	 the same way.  However we do not have better info
+	 so update the esitmate, since it is likely better
+	 than keeping it as it is.
+
+	 Remove it if it looks wrong.
+
+	 TODO: We likely want to special case the situation where
+	 peeling is optimizing out exit edges and only update
+	 estimates here.  */
+      if (wi::leu_p (npeel, loop->nb_iterations_estimate))
+	loop->nb_iterations_estimate -= npeel;
+      else
+	loop->any_estimate = false;
+    }
+  if (loop->any_upper_bound && precise)
+    {
+      if (wi::leu_p (npeel, loop->nb_iterations_upper_bound))
+	loop->nb_iterations_upper_bound -= npeel;
+      else
+	{
+	  /* Peeling maximal number of iterations or more
+	     makes no sense and is a bug.
+	     We should peel completely.  */
+	  gcc_unreachable ();
+	}
+    }
+  if (loop->any_likely_upper_bound)
+    {
+      if (wi::leu_p (npeel, loop->nb_iterations_likely_upper_bound))
+	loop->nb_iterations_likely_upper_bound -= npeel;
+      else
+	{
+	  loop->any_estimate = true;
+	  loop->nb_iterations_estimate = 0;
+	  loop->nb_iterations_likely_upper_bound = 0;
+	}
+    }
+}
+
 /* If the loop is expected to iterate N times and is
    small enough, duplicate the loop body N+1 times before
    the loop itself.  This way the hot path will never
@@ -1109,31 +1159,7 @@ try_peel_loop (class loop *loop,
       fprintf (dump_file, "Peeled loop %d, %i times.\n",
 	       loop->num, (int) npeel);
     }
-  if (loop->any_estimate)
-    {
-      if (wi::ltu_p (npeel, loop->nb_iterations_estimate))
-        loop->nb_iterations_estimate -= npeel;
-      else
-	loop->nb_iterations_estimate = 0;
-    }
-  if (loop->any_upper_bound)
-    {
-      if (wi::ltu_p (npeel, loop->nb_iterations_upper_bound))
-        loop->nb_iterations_upper_bound -= npeel;
-      else
-        loop->nb_iterations_upper_bound = 0;
-    }
-  if (loop->any_likely_upper_bound)
-    {
-      if (wi::ltu_p (npeel, loop->nb_iterations_likely_upper_bound))
-	loop->nb_iterations_likely_upper_bound -= npeel;
-      else
-	{
-	  loop->any_estimate = true;
-	  loop->nb_iterations_estimate = 0;
-	  loop->nb_iterations_likely_upper_bound = 0;
-	}
-    }
+  adjust_loop_info_after_peeling (loop, npeel, true);
   profile_count entry_count = profile_count::zero ();
 
   edge e;

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-04-22  7:21 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-22  7:21 [gcc r14-162] Update loop estimate after header duplication Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).