* [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
@ 2017-09-27 11:09 Richard Biener
2017-09-28 20:13 ` Sebastian Pop
2017-09-28 20:29 ` Sebastian Pop
0 siblings, 2 replies; 3+ messages in thread
From: Richard Biener @ 2017-09-27 11:09 UTC (permalink / raw)
To: gcc-patches
This removes another quadraticness from SCOP detection, gather_bbs
domwalk. This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.
With this I'm now confident to remove the
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006. I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).
Statistics I have gathered a few patches before for SPEC CPU 2006:
1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets
which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).
I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.
Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.
Richard.
2017-09-27 Richard Biener <rguenther@suse.de>
* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
(graphite-max-nb-scop-params): Document special value zero.
* domwalk.h (dom_walker::STOP): New symbolical constant.
(dom_walker::dom_walker): Add optional parameter for bb to
RPO mapping.
(dom_walker::~dom_walker): Declare.
(dom_walker::before_dom_children): Document STOP return value.
(dom_walker::m_user_bb_to_rpo): New member.
(dom_walker::m_bb_to_rpo): Likewise.
* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
mapping here if not provided by the user.
(dom_walker::~dom_walker): Free bb to RPO mapping if not
provided by the user.
(dom_walker::STOP): Define.
(dom_walker::walk): Do not compute bb to RPO mapping here.
Support STOP return value from before_dom_children to stop
walking.
* graphite-optimize-isl.c (optimize_isl): If the schedule
is the same still generate code if -fgraphite-identity
or -floop-parallelize-all are given.
* graphite-scop-detection.c: Include cfganal.h.
(gather_bbs::gather_bbs): Get and pass through bb to RPO
mapping.
(gather_bbs::before_dom_children): Return STOP for BBs
not in the region.
(build_scops): Compute bb to RPO mapping and pass it to
the domwalk. Treat --param graphite-max-nb-scop-params=0
as not limiting the number of params.
* graphite.c (graphite_initialize): Remove limit on the
number of basic-blocks in a function.
* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
default value of 10.
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi (revision 253224)
+++ gcc/doc/invoke.texi (working copy)
@@ -10512,13 +10512,9 @@ sequence pairs. This option only applie
@item graphite-max-nb-scop-params
To avoid exponential effects in the Graphite loop transforms, the
number of parameters in a Static Control Part (SCoP) is bounded. The
-default value is 10 parameters. A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded. The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound. A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
@item loop-block-tile-size
Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===================================================================
--- gcc/domwalk.c (revision 253224)
+++ gcc/domwalk.c (working copy)
@@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in
If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
EDGE_EXECUTABLE on every edge in the CFG. */
dom_walker::dom_walker (cdi_direction direction,
- bool skip_unreachable_blocks)
+ bool skip_unreachable_blocks,
+ int *bb_index_to_rpo)
: m_dom_direction (direction),
m_skip_unreachable_blocks (skip_unreachable_blocks),
- m_unreachable_dom (NULL)
+ m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+ m_unreachable_dom (NULL),
+ m_bb_to_rpo (bb_index_to_rpo)
{
+ /* Compute the basic-block index to RPO mapping if not provided by
+ the user. */
+ if (! m_bb_to_rpo && direction == CDI_DOMINATORS)
+ {
+ int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+ int postorder_num = pre_and_rev_post_order_compute (NULL, postorder,
+ true);
+ m_bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+ for (int i = 0; i < postorder_num; ++i)
+ m_bb_to_rpo[postorder[i]] = i;
+ free (postorder);
+ }
+
/* If we are not skipping unreachable blocks, then there is nothing
- to do. */
+ further to do. */
if (!m_skip_unreachable_blocks)
return;
@@ -194,6 +210,14 @@ dom_walker::dom_walker (cdi_direction di
}
}
+/* Destructor. */
+
+dom_walker::~dom_walker ()
+{
+ if (! m_user_bb_to_rpo)
+ free (m_bb_to_rpo);
+}
+
/* Return TRUE if BB is reachable, false otherwise. */
bool
@@ -254,6 +278,8 @@ dom_walker::propagate_unreachable_to_edg
m_unreachable_dom = bb;
}
+const edge dom_walker::STOP = (edge)-1;
+
/* Recursively walk the dominator tree.
BB is the basic block we are currently visiting. */
@@ -264,17 +290,7 @@ dom_walker::walk (basic_block bb)
basic_block *worklist = XNEWVEC (basic_block,
n_basic_blocks_for_fn (cfun) * 2);
int sp = 0;
- int *postorder, postorder_num;
-
- if (m_dom_direction == CDI_DOMINATORS)
- {
- postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
- postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
- bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
- for (int i = 0; i < postorder_num; ++i)
- bb_postorder[postorder[i]] = i;
- free (postorder);
- }
+ bb_postorder = m_bb_to_rpo;
while (true)
{
@@ -283,13 +299,14 @@ dom_walker::walk (basic_block bb)
|| bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)
|| bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
{
+ edge taken_edge = NULL;
/* Callback for subclasses to do custom things before we have walked
the dominator children, but before we walk statements. */
if (this->bb_reachable (cfun, bb))
{
- edge taken_edge = before_dom_children (bb);
- if (taken_edge)
+ taken_edge = before_dom_children (bb);
+ if (taken_edge && taken_edge != STOP)
{
edge_iterator ei;
edge e;
@@ -306,12 +323,17 @@ dom_walker::walk (basic_block bb)
worklist[sp++] = bb;
worklist[sp++] = NULL;
- int saved_sp = sp;
- for (dest = first_dom_son (m_dom_direction, bb);
- dest; dest = next_dom_son (m_dom_direction, dest))
- worklist[sp++] = dest;
- if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
- sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+ /* If the callback returned NONE then we are supposed to
+ stop and not even propagate EDGE_EXECUTABLE further. */
+ if (taken_edge != STOP)
+ {
+ int saved_sp = sp;
+ for (dest = first_dom_son (m_dom_direction, bb);
+ dest; dest = next_dom_son (m_dom_direction, dest))
+ worklist[sp++] = dest;
+ if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS)
+ sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp);
+ }
}
/* NULL is used to mark pop operations in the recursion stack. */
while (sp > 0 && !worklist[sp - 1])
@@ -331,10 +353,6 @@ dom_walker::walk (basic_block bb)
else
break;
}
- if (m_dom_direction == CDI_DOMINATORS)
- {
- free (bb_postorder);
- bb_postorder = NULL;
- }
+ bb_postorder = NULL;
free (worklist);
}
Index: gcc/domwalk.h
===================================================================
--- gcc/domwalk.h (revision 253224)
+++ gcc/domwalk.h (working copy)
@@ -30,14 +30,22 @@ along with GCC; see the file COPYING3.
class dom_walker
{
public:
+ static const edge STOP;
+
/* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover
that some edges are not executable.
If a client can discover that a COND, SWITCH or GOTO has a static
target in the before_dom_children callback, the taken edge should
be returned. The generic walker will clear EDGE_EXECUTABLE on all
- edges it can determine are not executable. */
- dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false);
+ edges it can determine are not executable.
+
+ You can provide a mapping of basic-block index to RPO if you
+ have that readily available or you do multiple walks. */
+ dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false,
+ int *bb_index_to_rpo = NULL);
+
+ ~dom_walker ();
/* Walk the dominator tree. */
void walk (basic_block);
@@ -48,7 +56,10 @@ public:
edges, NULL otherwise. When skipping unreachable blocks, the walker
uses the taken edge information to clear EDGE_EXECUTABLE on the other
edges, exposing unreachable blocks. A NULL return value means all
- outgoing edges should still be considered executable. */
+ outgoing edges should still be considered executable. A return value
+ of STOP means to stop the domwalk from processing dominated blocks from
+ here. This can be used to process a SEME region only (note domwalk
+ will still do work linear in function size). */
virtual edge before_dom_children (basic_block) { return NULL; }
/* Function to call after the recursive walk of the dominator children. */
@@ -61,7 +72,9 @@ private:
dominator tree. */
const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2;
bool m_skip_unreachable_blocks;
+ bool m_user_bb_to_rpo;
basic_block m_unreachable_dom;
+ int *m_bb_to_rpo;
/* Query whether or not the given block is reachable or not. */
bool bb_reachable (struct function *, basic_block);
Index: gcc/graphite-optimize-isl.c
===================================================================
--- gcc/graphite-optimize-isl.c (revision 253224)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -189,7 +189,7 @@ optimize_isl (scop_p scop)
print_schedule_ast (dump_file, scop->original_schedule, scop);
isl_schedule_free (scop->transformed_schedule);
scop->transformed_schedule = isl_schedule_copy (scop->original_schedule);
- return false;
+ return flag_graphite_identity || flag_loop_parallelize_all;
}
return true;
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c (revision 253224)
+++ gcc/graphite-scop-detection.c (working copy)
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.
#include "tree-pass.h"
#include "tree-ssa-propagate.h"
#include "gimple-pretty-print.h"
+#include "cfganal.h"
#include "graphite.h"
class debug_printer
@@ -1544,7 +1545,7 @@ build_alias_set (scop_p scop)
class gather_bbs : public dom_walker
{
public:
- gather_bbs (cdi_direction, scop_p);
+ gather_bbs (cdi_direction, scop_p, int *);
virtual edge before_dom_children (basic_block);
virtual void after_dom_children (basic_block);
@@ -1554,8 +1555,8 @@ private:
scop_p scop;
};
}
-gather_bbs::gather_bbs (cdi_direction direction, scop_p scop)
- : dom_walker (direction), scop (scop)
+gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo)
+ : dom_walker (direction, false, bb_to_rpo), scop (scop)
{
}
@@ -1589,7 +1590,7 @@ gather_bbs::before_dom_children (basic_b
{
sese_info_p region = scop->scop_info;
if (!bb_in_sese_p (bb, region->region))
- return NULL;
+ return dom_walker::STOP;
record_loop_in_sese (bb, region);
@@ -1708,6 +1709,15 @@ build_scops (vec<scop_p> *scops)
/* Now create scops from the lightweight SESEs. */
vec<sese_l> scops_l = sb.get_scops ();
+
+ /* Domwalk needs a bb to RPO mapping. Compute it once here. */
+ int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+ int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true);
+ int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun));
+ for (int i = 0; i < postorder_num; ++i)
+ bb_to_rpo[postorder[i]] = i;
+ free (postorder);
+
int i;
sese_l *s;
FOR_EACH_VEC_ELT (scops_l, i, s)
@@ -1715,7 +1725,7 @@ build_scops (vec<scop_p> *scops)
scop_p scop = new_scop (s->entry, s->exit);
/* Record all basic blocks and their conditions in REGION. */
- gather_bbs (CDI_DOMINATORS, scop).walk (s->entry->dest);
+ gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest);
/* domwalk does not fulfil our code-generations constraints on the
order of pbb which is to produce sth like execution order, delaying
@@ -1757,8 +1767,8 @@ build_scops (vec<scop_p> *scops)
find_scop_parameters (scop);
graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS);
-
- if (scop_nb_params (scop) > max_dim)
+ if (max_dim > 0
+ && scop_nb_params (scop) > max_dim)
{
DEBUG_PRINT (dp << "[scop-detection-fail] too many parameters: "
<< scop_nb_params (scop)
@@ -1771,6 +1781,7 @@ build_scops (vec<scop_p> *scops)
scops->safe_push (scop);
}
+ free (bb_to_rpo);
DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0););
}
Index: gcc/graphite.c
===================================================================
--- gcc/graphite.c (revision 253224)
+++ gcc/graphite.c (working copy)
@@ -218,14 +218,9 @@ static bool
graphite_initialize (void)
{
int min_loops = PARAM_VALUE (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION);
- int max_bbs = PARAM_VALUE (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION);
- int nbbs = n_basic_blocks_for_fn (cfun);
int nloops = number_of_loops (cfun);
- if (nloops <= min_loops
- /* FIXME: This limit on the number of basic blocks of a function
- should be removed when the SCOP detection is faster. */
- || (nbbs > max_bbs))
+ if (nloops <= min_loops)
{
if (dump_file && (dump_flags & TDF_DETAILS))
{
@@ -234,10 +229,6 @@ graphite_initialize (void)
"PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION = %d.\n",
min_loops);
- else if (nbbs > max_bbs)
- fprintf (dump_file, "\nFunction has too many basic blocks: "
- "PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION = %d.\n", max_bbs);
-
fprintf (dump_file, "\nnumber of SCoPs: 0\n");
print_global_statistics (dump_file);
}
Index: gcc/params.def
===================================================================
--- gcc/params.def (revision 253224)
+++ gcc/params.def (working copy)
@@ -873,14 +873,7 @@ DEFPARAM (PARAM_LOOP_BLOCK_TILE_SIZE,
DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS,
"graphite-max-nb-scop-params",
"maximum number of parameters in a SCoP.",
- 7, 0, 0)
-
-/* Maximal number of basic blocks in the functions analyzed by Graphite. */
-
-DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
- "graphite-max-bbs-per-function",
- "maximum number of basic blocks per function to be analyzed by Graphite.",
- 100, 0, 0)
+ 10, 0, 0)
/* Maximal number of array references in a scop. */
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
2017-09-27 11:09 [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk Richard Biener
@ 2017-09-28 20:13 ` Sebastian Pop
2017-09-28 20:29 ` Sebastian Pop
1 sibling, 0 replies; 3+ messages in thread
From: Sebastian Pop @ 2017-09-28 20:13 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener <rguenther@suse.de> wrote:
>
> This removes another quadraticness from SCOP detection, gather_bbs
> domwalk. This is done by enhancing domwalk to handle SEME regions
> via a special return value from before_dom_children.
>
> With this I'm now confident to remove the
> PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
> Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
> documented default value which enables 90 more loos to be processed
> in SPEC CPU 2006. I've also made a value of zero magic in disabling
> the limit (a trick commonly used in GCC).
>
> Statistics I have gathered a few patches before for SPEC CPU 2006:
>
> 1255 multi-loop SESEs in SCOP processing
> max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
> max. drs per scop 869, 10 scops >= 100
> max. pbbs per scop 36, 12 scops >= 10
> 919 SCOPs fail in build_alias_sets
>
> which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
> is reasonable (if tuned to SPEC CPU 2006).
>
> I've also included the hunk that allows -fgraphite-identity
> to work ontop of -floop-nest-optimize and for -floop-nest-optimize
> -ftree-parallelize-all also make sure to code-gen loops that
> end up not transformed.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
> tested, applied to trunk.
>
> Richard.
>
> 2017-09-27 Richard Biener <rguenther@suse.de>
>
> * doc/invoke.texi (graphite-max-bbs-per-function): Remove.
> (graphite-max-nb-scop-params): Document special value zero.
> * domwalk.h (dom_walker::STOP): New symbolical constant.
> (dom_walker::dom_walker): Add optional parameter for bb to
> RPO mapping.
> (dom_walker::~dom_walker): Declare.
> (dom_walker::before_dom_children): Document STOP return value.
> (dom_walker::m_user_bb_to_rpo): New member.
> (dom_walker::m_bb_to_rpo): Likewise.
> * domwalk.c (dom_walker::dom_walker): Compute bb to RPO
> mapping here if not provided by the user.
> (dom_walker::~dom_walker): Free bb to RPO mapping if not
> provided by the user.
> (dom_walker::STOP): Define.
> (dom_walker::walk): Do not compute bb to RPO mapping here.
> Support STOP return value from before_dom_children to stop
> walking.
> * graphite-optimize-isl.c (optimize_isl): If the schedule
> is the same still generate code if -fgraphite-identity
> or -floop-parallelize-all are given.
> * graphite-scop-detection.c: Include cfganal.h.
> (gather_bbs::gather_bbs): Get and pass through bb to RPO
> mapping.
> (gather_bbs::before_dom_children): Return STOP for BBs
> not in the region.
> (build_scops): Compute bb to RPO mapping and pass it to
> the domwalk. Treat --param graphite-max-nb-scop-params=0
> as not limiting the number of params.
> * graphite.c (graphite_initialize): Remove limit on the
> number of basic-blocks in a function.
> * params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
> (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
> default value of 10.
The patch looks good. Thanks!
>
> Index: gcc/doc/invoke.texi
> ===================================================================
> --- gcc/doc/invoke.texi (revision 253224)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -10512,13 +10512,9 @@ sequence pairs. This option only applie
> @item graphite-max-nb-scop-params
> To avoid exponential effects in the Graphite loop transforms, the
> number of parameters in a Static Control Part (SCoP) is bounded. The
> -default value is 10 parameters.
Now that we have "compute-out" functionality in all supported
versions of isl, let's remove this parameter.
We needed this in the past when isl was not able to stop an
exponential computation, and that happened when operating
on large dimension spaces.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk
2017-09-27 11:09 [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk Richard Biener
2017-09-28 20:13 ` Sebastian Pop
@ 2017-09-28 20:29 ` Sebastian Pop
1 sibling, 0 replies; 3+ messages in thread
From: Sebastian Pop @ 2017-09-28 20:29 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener <rguenther@suse.de> wrote:
> /* Maximal number of array references in a scop. */
>
DEFPARAM (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP,
"graphite-max-arrays-per-scop",
"maximum number of arrays per scop.",
100, 0, 0)
Let's also remove this param as we now have max-isl-operations.
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-09-28 20:29 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 11:09 [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk Richard Biener
2017-09-28 20:13 ` Sebastian Pop
2017-09-28 20:29 ` Sebastian Pop
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).