From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
Date: Wed, 27 Sep 2017 11:09:00 -0000 [thread overview]
Message-ID: <alpine.LSU.2.20.1709271302380.26836@zhemvz.fhfr.qr> (raw)
This removes another quadraticness from SCOP detection, gather_bbs
domwalk. This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.
With this I'm now confident to remove the
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006. I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).
Statistics I have gathered a few patches before for SPEC CPU 2006:
1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets
which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).
I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.
Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.
Richard.
2017-09-27 Richard Biener <rguenther@suse.de>
* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
(graphite-max-nb-scop-params): Document special value zero.
* domwalk.h (dom_walker::STOP): New symbolical constant.
(dom_walker::dom_walker): Add optional parameter for bb to
RPO mapping.
(dom_walker::~dom_walker): Declare.
(dom_walker::before_dom_children): Document STOP return value.
(dom_walker::m_user_bb_to_rpo): New member.
(dom_walker::m_bb_to_rpo): Likewise.
* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
mapping here if not provided by the user.
(dom_walker::~dom_walker): Free bb to RPO mapping if not
provided by the user.
(dom_walker::STOP): Define.
(dom_walker::walk): Do not compute bb to RPO mapping here.
Support STOP return value from before_dom_children to stop
walking.
* graphite-optimize-isl.c (optimize_isl): If the schedule
is the same still generate code if -fgraphite-identity
or -floop-parallelize-all are given.
* graphite-scop-detection.c: Include cfganal.h.
(gather_bbs::gather_bbs): Get and pass through bb to RPO
mapping.
(gather_bbs::before_dom_children): Return STOP for BBs
not in the region.
(build_scops): Compute bb to RPO mapping and pass it to
the domwalk. Treat --param graphite-max-nb-scop-params=0
as not limiting the number of params.
* graphite.c (graphite_initialize): Remove limit on the
number of basic-blocks in a function.
* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
default value of 10.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi (revision 253224)
+++ gcc/doc/invoke.texi (working copy)
@@ -10512,13 +10512,9 @@ sequence pairs. This option only applie
@item graphite-max-nb-scop-params
To avoid exponential effects in the Graphite loop transforms, the
number of parameters in a Static Control Part (SCoP) is bounded. The
-default value is 10 parameters. A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded. The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound. A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
@item loop-block-tile-size
Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===================================================================
--- gcc/domwalk.c (revision 253224)
+++ gcc/domwalk.c (working copy)
@@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in
If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
EDGE_EXECUTABLE on every edge in the CFG. */
dom_walker::dom_walker (cdi_direction direction,
- bool skip_unreachable_blocks)
+ bool skip_unreachable_blocks,
+ int *bb_index_to_rpo)
: m_dom_direction (direction),
m_skip_unreachable_blocks (skip_unreachable_blocks),
- m_unreachable_dom (NULL)
+ m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+ m_unreachable_dom (NULL),
+ m_bb_to_rpo (bb_index_to_rpo)
{
+ /* Compute the basic-block index to RPO mapping if not provided by
+ the user. */
+ if (! m_bb_to_rpo && direction == CDI_DOMINATORS)
+ {
+ int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+ int postorder_num = pre_and_rev_post_order_compute (NULL, postorder,
+ true);
+ m_bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+ for (int i = 0; i < postorder_num; ++i)
+ m_bb_to_rpo[postorder[i]] = i;
+ free (postorder);
+ }
+
/* If we are not skipping unreachable blocks, then there is nothing
- to do. */
+ further to do. */
if (!m_skip_unreachable_blocks)
return;
@@ -194,6 +210,14 @@ dom_walker::dom_walker (cdi_direction di
}
}
+/* Destructor. */
+
+dom_walker::~dom_walker ()
+{
+ if (! m_user_bb_to_rpo)
+ free (m_bb_to_rpo);
+}
+
/* Return TRUE if BB is reachable, false otherwise. */
bool
@@ -254,6 +278,8 @@ dom_walker::propagate_unreachable_to_edg
m_unreachable_dom = bb;
}
+const edge dom_walker::STOP = (edge)-1;
+
/* Recursively walk the dominator tree.
BB is the basic block we are currently visiting. */
@@ -264,17 +290,7 @@ dom_walker::walk (basic_block bb)
basic_block *worklist = XNEWVEC (basic_block,
n_basic_blocks_for_fn (cfun) * 2);
int sp = 0;
- int *postorder, postorder_num;
-
- if (m_dom_direction == CDI_DOMINATORS)
- {
- postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
- postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
- bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
- for (int i = 0; i < postorder_num; ++i)
- bb_postorder[postorder[i]] = i;
- free (postorder);
- }
+ bb_postorder = m_bb_to_rpo;
while (true)
{
@@ -283,13 +299,14 @@ dom_walker::walk (basic_block bb)
|| bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
|| bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
{
+ edge taken_edge = NULL;
/* Callback for subclasses to do custom things before we have walked
the dominator children, but before we walk statements. */
if (this->bb_reachable (cfun, bb))
{
- edge taken_edge = before_dom_children (bb);
- if (taken_edge)
+ taken_edge = before_dom_children (bb);
+ if (taken_edge && taken_edge != STOP)
{
edge_iterator ei;
edge e;
@@ -306,12 +323,17 @@ dom_walker::walk (basic_block bb)
worklist[sp++] = bb;
worklist[sp++] = NULL;
- int saved_sp = sp;
- for (dest = first_dom_son (m_dom_direction, bb);
- dest; dest = next_dom_son (m_dom_direction, dest))
- worklist[sp++] = dest;
- if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
- sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+ /* If the callback returned NONE then we are supposed to
+ stop and not even propagate EDGE_EXECUTABLE further. */
+ if (taken_edge != STOP)
+ {
+ int saved_sp = sp;
+ for (dest = first_dom_son (m_dom_direction, bb);
+ dest; dest = next_dom_son (m_dom_direction, dest))
+ worklist[sp++] = dest;
+ if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
+ sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+ }
}
/* NULL is used to mark pop operations in the recursion stack. */
while (sp > 0 && !worklist[sp - 1])
@@ -331,10 +353,6 @@ dom_walker::walk (basic_block bb)
else
break;
}
- if (m_dom_direction == CDI_DOMINATORS)
- {
- free (bb_postorder);
- bb_postorder = NULL;
- }
+ bb_postorder = NULL;
free (worklist);
}
Index: gcc/domwalk.h
===================================================================
--- gcc/domwalk.h (revision 253224)
+++ gcc/domwalk.h (working copy)
@@ -30,14 +30,22 @@ along with GCC; see the file COPYING3.
class dom_walker
{
public:
+ static const edge STOP;
+
/* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
that some edges are not executable.
If a client can discover that a COND, SWITCH or GOTO has a static
target in the before_dom_children callback, the taken edge should
be returned. The generic walker will clear EDGE_EXECUTABLE on all
- edges it can determine are not executable. */
- dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false);
+ edges it can determine are not executable.
+
+ You can provide a mapping of basic-block index to RPO if you
+ have that readily available or you do multiple walks. */
+ dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false,
+ int *bb_index_to_rpo = NULL);
+
+ ~dom_walker ();
/* Walk the dominator tree. */
void walk (basic_block);
@@ -48,7 +56,10 @@ public:
edges, NULL otherwise. When skipping unreachable blocks, the walker
uses the taken edge information to clear EDGE_EXECUTABLE on the other
edges, exposing unreachable blocks. A NULL return value means all
- outgoing edges should still be considered executable. */
+ outgoing edges should still be considered executable. A return value
+ of STOP means to stop the domwalk from processing dominated blocks from
+ here. This can be used to process a SEME region only (note domwalk
+ will still do work linear in function size). */
virtual edge before_dom_children (basic_block) { return NULL; }
/* Function to call after the recursive walk of the dominator children. */
@@ -61,7 +72,9 @@ private:
dominator tree. */
const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2;
bool m_skip_unreachable_blocks;
+ bool m_user_bb_to_rpo;
basic_block m_unreachable_dom;
+ int *m_bb_to_rpo;
/* Query whether or not the given block is reachable or not. */
bool bb_reachable (struct function *, basic_block);
Index: gcc/graphite-optimize-isl.c
===================================================================
--- gcc/graphite-optimize-isl.c (revision 253224)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
print_schedule_ast (dump_file, scop->original_schedule, scop);
isl_schedule_free (scop->transformed_schedule);
scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
- return false;
+ return flag_graphite_identity || flag_loop_parallelize_all;
}
return true;
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c (revision 253224)
+++ gcc/graphite-scop-detection.c (working copy)
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.
#include "tree-pass.h"
#include "tree-ssa-propagate.h"
#include "gimple-pretty-print.h"
+#include "cfganal.h"
#include "graphite.h"
class debug_printer
@@ -1544,7 +1545,7 @@ build_alias_set (scop_p scop)
class gather_bbs : public dom_walker
{
public:
- gather_bbs (cdi_direction, scop_p);
+ gather_bbs (cdi_direction, scop_p, int *);
virtual edge before_dom_children (basic_block);
virtual void after_dom_children (basic_block);
@@ -1554,8 +1555,8 @@ private:
scop_p scop;
};
}
-gather_bbs::gather_bbs (cdi_direction direction, scop_p scop)
- : dom_walker (direction), scop (scop)
+gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
+ : dom_walker (direction, false, bb_to_rpo), scop (scop)
{
}
@@ -1589,7 +1590,7 @@ gather_bbs::before_dom_children (basic_b
{
sese_info_p region = scop->scop_info;
if (!bb_in_sese_p (bb, region->region))
- return NULL;
+ return dom_walker::STOP;
record_loop_in_sese (bb, region);
@@ -1708,6 +1709,15 @@ build_scops (vec<scop_p> *scops)
/* Now create scops from the lightweight SESEs. */
vec<sese_l> scops_l = sb.get_scops ();
+
+ /* Domwalk needs a bb to RPO mapping. Compute it once here. */
+ int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+ int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
+ int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+ for (int i = 0; i < postorder_num; ++i)
+ bb_to_rpo[postorder[i]] = i;
+ free (postorder);
+
int i;
sese_l *s;
FOR_EACH_VEC_ELT (scops_l, i, s)
@@ -1715,7 +1725,7 @@ build_scops (vec<scop_p> *scops)
scop_p scop = new_scop (s->entry, s->exit);
/* Record all basic blocks and their conditions in REGION. */
- gather_bbs (CDI_DOMINATORS, scop).walk (s->entry->dest);
+ gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest);
/* domwalk does not fulfil our code-generations constraints on the
order of pbb which is to produce sth like execution order, delaying
@@ -1757,8 +1767,8 @@ build_scops (vec<scop_p> *scops)
find_scop_parameters (scop);
graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS);
-
- if (scop_nb_params (scop) > max_dim)
+ if (max_dim > 0
+ && scop_nb_params (scop) > max_dim)
{
DEBUG_PRINT (dp << "[scop-detection-fail] too many parameters: "
<< scop_nb_params (scop)
@@ -1771,6 +1781,7 @@ build_scops (vec<scop_p> *scops)
scops->safe_push (scop);
}
+ free (bb_to_rpo);
DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0););
}
Index: gcc/graphite.c
===================================================================
--- gcc/graphite.c (revision 253224)
+++ gcc/graphite.c (working copy)
@@ -218,14 +218,9 @@ static bool
graphite_initialize (void)
{
int min_loops = PARAM_VALUE (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION);
- int max_bbs = PARAM_VALUE (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION);
- int nbbs = n_basic_blocks_for_fn (cfun);
int nloops = number_of_loops (cfun);
- if (nloops <= min_loops
- /* FIXME: This limit on the number of basic blocks of a function
- should be removed when the SCOP detection is faster. */
- || (nbbs > max_bbs))
+ if (nloops <= min_loops)
{
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -234,10 +229,6 @@ graphite_initialize (void)
"PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION = %d.\n",
min_loops);
- else if (nbbs > max_bbs)
- fprintf (dump_file, "\nFunction has too many basic blocks: "
- "PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION = %d.\n", max_bbs);
-
fprintf (dump_file, "\nnumber of SCoPs: 0\n");
print_global_statistics (dump_file);
}
Index: gcc/params.def
===================================================================
--- gcc/params.def (revision 253224)
+++ gcc/params.def (working copy)
@@ -873,14 +873,7 @@ DEFPARAM (PARAM_LOOP_BLOCK_TILE_SIZE,
DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
"graphite-max-nb-scop-params",
"maximum number of parameters in a SCoP.",
- 7, 0, 0)
-
-/* Maximal number of basic blocks in the functions analyzed by Graphite. */
-
-DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
- "graphite-max-bbs-per-function",
- "maximum number of basic blocks per function to be analyzed by Graphite.",
- 100, 0, 0)
+ 10, 0, 0)
/* Maximal number of array references in a scop. */
next reply other threads:[~2017-09-27 11:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-27 11:09 Richard Biener [this message]
2017-09-28 20:13 ` Sebastian Pop
2017-09-28 20:29 ` Sebastian Pop
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.20.1709271302380.26836@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).