public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
@ 2017-09-27 11:09 Richard Biener
  2017-09-28 20:13 ` Sebastian Pop
  2017-09-28 20:29 ` Sebastian Pop
  0 siblings, 2 replies; 3+ messages in thread
From: Richard Biener @ 2017-09-27 11:09 UTC (permalink / raw)
  To: gcc-patches


This removes another quadraticness from SCOP detection, gather_bbs
domwalk.  This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.

With this I'm now confident to remove the 
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006.  I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).

Statistics I have gathered a few patches before for SPEC CPU 2006:

1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets

which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).

I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.

Richard.

2017-09-27  Richard Biener  <rguenther@suse.de>

	* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
	(graphite-max-nb-scop-params): Document special value zero.
	* domwalk.h (dom_walker::STOP): New symbolical constant.
	(dom_walker::dom_walker): Add optional parameter for bb to
	RPO mapping.
	(dom_walker::~dom_walker): Declare.
	(dom_walker::before_dom_children): Document STOP return value.
	(dom_walker::m_user_bb_to_rpo): New member.
	(dom_walker::m_bb_to_rpo): Likewise.
	* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
	mapping here if not provided by the user.
	(dom_walker::~dom_walker): Free bb to RPO mapping if not
	provided by the user.
	(dom_walker::STOP): Define.
	(dom_walker::walk): Do not compute bb to RPO mapping here.
	Support STOP return value from before_dom_children to stop
	walking.
	* graphite-optimize-isl.c (optimize_isl): If the schedule
	is the same still generate code if -fgraphite-identity
	or -floop-parallelize-all are given.
	* graphite-scop-detection.c: Include cfganal.h.
	(gather_bbs::gather_bbs): Get and pass through bb to RPO
	mapping.
	(gather_bbs::before_dom_children): Return STOP for BBs
	not in the region.
	(build_scops): Compute bb to RPO mapping and pass it to
	the domwalk.  Treat --param graphite-max-nb-scop-params=0
	as not limiting the number of params.
	* graphite.c (graphite_initialize): Remove limit on the
	number of basic-blocks in a function.
	* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
	(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
	default value of 10.

Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 253224)
+++ gcc/doc/invoke.texi	(working copy)
@@ -10512,13 +10512,9 @@ sequence pairs.  This option only applie
 @item graphite-max-nb-scop-params
 To avoid exponential effects in the Graphite loop transforms, the
 number of parameters in a Static Control Part (SCoP) is bounded.  The
-default value is 10 parameters.  A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded.  The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound.  A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
 
 @item loop-block-tile-size
 Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===================================================================
--- gcc/domwalk.c	(revision 253224)
+++ gcc/domwalk.c	(working copy)
@@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in
    If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
    EDGE_EXECUTABLE on every edge in the CFG. */
 dom_walker::dom_walker (cdi_direction direction,
-			bool skip_unreachable_blocks)
+			bool skip_unreachable_blocks,
+			int *bb_index_to_rpo)
   : m_dom_direction (direction),
     m_skip_unreachable_blocks (skip_unreachable_blocks),
-    m_unreachable_dom (NULL)
+    m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+    m_unreachable_dom (NULL),
+    m_bb_to_rpo (bb_index_to_rpo)
 {
+  /* Compute the basic-block index to RPO mapping if not provided by
+     the user.  */
+  if (! m_bb_to_rpo && direction == CDI_DOMINATORS)
+    {
+      int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+      int postorder_num = pre_and_rev_post_order_compute (NULL, postorder,
+							  true);
+      m_bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+      for (int i = 0; i < postorder_num; ++i)
+	m_bb_to_rpo[postorder[i]] = i;
+      free (postorder);
+    }
+
   /* If we are not skipping unreachable blocks, then there is nothing
-     to do.  */
+     further to do.  */
   if (!m_skip_unreachable_blocks)
     return;
 
@@ -194,6 +210,14 @@ dom_walker::dom_walker (cdi_direction di
     }
 }
 
+/* Destructor.  */
+
+dom_walker::~dom_walker ()
+{
+  if (! m_user_bb_to_rpo)
+    free (m_bb_to_rpo);
+}
+
 /* Return TRUE if BB is reachable, false otherwise.  */
 
 bool
@@ -254,6 +278,8 @@ dom_walker::propagate_unreachable_to_edg
     m_unreachable_dom = bb;
 }
 
+const edge dom_walker::STOP = (edge)-1;
+
 /* Recursively walk the dominator tree.
    BB is the basic block we are currently visiting.  */
 
@@ -264,17 +290,7 @@ dom_walker::walk (basic_block bb)
   basic_block *worklist = XNEWVEC (basic_block,
 				   n_basic_blocks_for_fn (cfun) * 2);
   int sp = 0;
-  int *postorder, postorder_num;
-
-  if (m_dom_direction == CDI_DOMINATORS)
-    {
-      postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
-      postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
-      bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
-      for (int i = 0; i < postorder_num; ++i)
-	bb_postorder[postorder[i]] = i;
-      free (postorder);
-    }
+  bb_postorder = m_bb_to_rpo;
 
   while (true)
     {
@@ -283,13 +299,14 @@ dom_walker::walk (basic_block bb)
 	  || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
 	  || bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
 	{
+	  edge taken_edge = NULL;
 
 	  /* Callback for subclasses to do custom things before we have walked
 	     the dominator children, but before we walk statements.  */
 	  if (this->bb_reachable (cfun, bb))
 	    {
-	      edge taken_edge = before_dom_children (bb);
-	      if (taken_edge)
+	      taken_edge = before_dom_children (bb);
+	      if (taken_edge && taken_edge != STOP)
 		{
 		  edge_iterator ei;
 		  edge e;
@@ -306,12 +323,17 @@ dom_walker::walk (basic_block bb)
 	  worklist[sp++] = bb;
 	  worklist[sp++] = NULL;
 
-	  int saved_sp = sp;
-	  for (dest = first_dom_son (m_dom_direction, bb);
-	       dest; dest = next_dom_son (m_dom_direction, dest))
-	    worklist[sp++] = dest;
-	  if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
-	    sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+	  /* If the callback returned NONE then we are supposed to
+	     stop and not even propagate EDGE_EXECUTABLE further.  */
+	  if (taken_edge != STOP)
+	    {
+	      int saved_sp = sp;
+	      for (dest = first_dom_son (m_dom_direction, bb);
+		   dest; dest = next_dom_son (m_dom_direction, dest))
+		worklist[sp++] = dest;
+	      if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
+		sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+	    }
 	}
       /* NULL is used to mark pop operations in the recursion stack.  */
       while (sp > 0 && !worklist[sp - 1])
@@ -331,10 +353,6 @@ dom_walker::walk (basic_block bb)
       else
 	break;
     }
-  if (m_dom_direction == CDI_DOMINATORS)
-    {
-      free (bb_postorder);
-      bb_postorder = NULL;
-    }
+  bb_postorder = NULL;
   free (worklist);
 }
Index: gcc/domwalk.h
===================================================================
--- gcc/domwalk.h	(revision 253224)
+++ gcc/domwalk.h	(working copy)
@@ -30,14 +30,22 @@ along with GCC; see the file COPYING3.
 class dom_walker
 {
 public:
+  static const edge STOP;
+
   /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
      that some edges are not executable.
 
      If a client can discover that a COND, SWITCH or GOTO has a static
      target in the before_dom_children callback, the taken edge should
      be returned.  The generic walker will clear EDGE_EXECUTABLE on all
-     edges it can determine are not executable.  */
-  dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false);
+     edges it can determine are not executable.
+     
+     You can provide a mapping of basic-block index to RPO if you
+     have that readily available or you do multiple walks.  */
+  dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false,
+	      int *bb_index_to_rpo = NULL);
+
+  ~dom_walker ();
 
   /* Walk the dominator tree.  */
   void walk (basic_block);
@@ -48,7 +56,10 @@ public:
      edges, NULL otherwise.  When skipping unreachable blocks, the walker
      uses the taken edge information to clear EDGE_EXECUTABLE on the other
      edges, exposing unreachable blocks.  A NULL return value means all
-     outgoing edges should still be considered executable.  */
+     outgoing edges should still be considered executable.  A return value
+     of STOP means to stop the domwalk from processing dominated blocks from
+     here.  This can be used to process a SEME region only (note domwalk
+     will still do work linear in function size).  */
   virtual edge before_dom_children (basic_block) { return NULL; }
 
   /* Function to call after the recursive walk of the dominator children.  */
@@ -61,7 +72,9 @@ private:
      dominator tree.  */
   const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2;
   bool m_skip_unreachable_blocks;
+  bool m_user_bb_to_rpo;
   basic_block m_unreachable_dom;
+  int *m_bb_to_rpo;
 
   /* Query whether or not the given block is reachable or not.  */
   bool bb_reachable (struct function *, basic_block);
Index: gcc/graphite-optimize-isl.c
===================================================================
--- gcc/graphite-optimize-isl.c	(revision 253224)
+++ gcc/graphite-optimize-isl.c	(working copy)
@@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
 	print_schedule_ast (dump_file, scop->original_schedule, scop);
       isl_schedule_free (scop->transformed_schedule);
       scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
-      return false;
+      return flag_graphite_identity || flag_loop_parallelize_all;
     }
 
   return true;
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	(revision 253224)
+++ gcc/graphite-scop-detection.c	(working copy)
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-pretty-print.h"
+#include "cfganal.h"
 #include "graphite.h"
 
 class debug_printer
@@ -1544,7 +1545,7 @@ build_alias_set (scop_p scop)
 class gather_bbs : public dom_walker
 {
 public:
-  gather_bbs (cdi_direction, scop_p);
+  gather_bbs (cdi_direction, scop_p, int *);
 
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
@@ -1554,8 +1555,8 @@ private:
   scop_p scop;
 };
 }
-gather_bbs::gather_bbs (cdi_direction direction, scop_p scop)
-  : dom_walker (direction), scop (scop)
+gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
+  : dom_walker (direction, false, bb_to_rpo), scop (scop)
 {
 }
 
@@ -1589,7 +1590,7 @@ gather_bbs::before_dom_children (basic_b
 {
   sese_info_p region = scop->scop_info;
   if (!bb_in_sese_p (bb, region->region))
-    return NULL;
+    return dom_walker::STOP;
 
   record_loop_in_sese (bb, region);
 
@@ -1708,6 +1709,15 @@ build_scops (vec<scop_p> *scops)
 
   /* Now create scops from the lightweight SESEs.  */
   vec<sese_l> scops_l = sb.get_scops ();
+
+  /* Domwalk needs a bb to RPO mapping.  Compute it once here.  */
+  int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
+  int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+  for (int i = 0; i < postorder_num; ++i)
+    bb_to_rpo[postorder[i]] = i;
+  free (postorder);
+
   int i;
   sese_l *s;
   FOR_EACH_VEC_ELT (scops_l, i, s)
@@ -1715,7 +1725,7 @@ build_scops (vec<scop_p> *scops)
       scop_p scop = new_scop (s->entry, s->exit);
 
       /* Record all basic blocks and their conditions in REGION.  */
-      gather_bbs (CDI_DOMINATORS, scop).walk (s->entry->dest);
+      gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest);
 
       /* domwalk does not fulfil our code-generations constraints on the
          order of pbb which is to produce sth like execution order, delaying
@@ -1757,8 +1767,8 @@ build_scops (vec<scop_p> *scops)
 
       find_scop_parameters (scop);
       graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS);
-
-      if (scop_nb_params (scop) > max_dim)
+      if (max_dim > 0
+	  && scop_nb_params (scop) > max_dim)
 	{
 	  DEBUG_PRINT (dp << "[scop-detection-fail] too many parameters: "
 			  << scop_nb_params (scop)
@@ -1771,6 +1781,7 @@ build_scops (vec<scop_p> *scops)
       scops->safe_push (scop);
     }
 
+  free (bb_to_rpo);
   DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0););
 }
 
Index: gcc/graphite.c
===================================================================
--- gcc/graphite.c	(revision 253224)
+++ gcc/graphite.c	(working copy)
@@ -218,14 +218,9 @@ static bool
 graphite_initialize (void)
 {
   int min_loops = PARAM_VALUE (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION);
-  int max_bbs = PARAM_VALUE (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION);
-  int nbbs = n_basic_blocks_for_fn (cfun);
   int nloops = number_of_loops (cfun);
 
-  if (nloops <= min_loops
-      /* FIXME: This limit on the number of basic blocks of a function
-	 should be removed when the SCOP detection is faster.  */
-      || (nbbs > max_bbs))
+  if (nloops <= min_loops)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -234,10 +229,6 @@ graphite_initialize (void)
 		     "PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION = %d.\n",
 		     min_loops);
 
-	  else if (nbbs > max_bbs)
-	    fprintf (dump_file, "\nFunction has too many basic blocks: "
-		     "PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION = %d.\n", max_bbs);
-
 	  fprintf (dump_file, "\nnumber of SCoPs: 0\n");
 	  print_global_statistics (dump_file);
 	}
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 253224)
+++ gcc/params.def	(working copy)
@@ -873,14 +873,7 @@ DEFPARAM (PARAM_LOOP_BLOCK_TILE_SIZE,
 DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
 	  "graphite-max-nb-scop-params",
 	  "maximum number of parameters in a SCoP.",
-	  7, 0, 0)
-
-/* Maximal number of basic blocks in the functions analyzed by Graphite.  */
-
-DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
-	  "graphite-max-bbs-per-function",
-	  "maximum number of basic blocks per function to be analyzed by Graphite.",
-	  100, 0, 0)
+	  10, 0, 0)
 
 /* Maximal number of array references in a scop.  */
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-09-28 20:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 11:09 [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk Richard Biener
2017-09-28 20:13 ` Sebastian Pop
2017-09-28 20:29 ` Sebastian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).