From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44530 invoked by alias); 27 Sep 2017 11:09:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 41698 invoked by uid 89); 27 Sep 2017 11:08:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-16.1 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=mining, walked, 8737, walker X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 27 Sep 2017 11:08:02 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 10446AAAD for ; Wed, 27 Sep 2017 11:07:58 +0000 (UTC) Date: Wed, 27 Sep 2017 11:09:00 -0000 From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2017-09/txt/msg01809.txt.bz2 This removes another quadraticness from SCOP detection, gather_bbs domwalk. This is done by enhancing domwalk to handle SEME regions via a special return value from before_dom_children. With this I'm now confident to remove the PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit. Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its documented default value which enables 90 more loos to be processed in SPEC CPU 2006. I've also made a value of zero magic in disabling the limit (a trick commonly used in GCC). Statistics I have gathered a few patches before for SPEC CPU 2006: 1255 multi-loop SESEs in SCOP processing max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8 max. drs per scop 869, 10 scops >= 100 max. pbbs per scop 36, 12 scops >= 10 919 SCOPs fail in build_alias_sets which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP is reasonable (if tuned to SPEC CPU 2006). I've also included the hunk that allows -fgraphite-identity to work ontop of -floop-nest-optimize and for -floop-nest-optimize -ftree-parallelize-all also make sure to code-gen loops that end up not transformed. Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006 tested, applied to trunk. Richard. 2017-09-27 Richard Biener * doc/invoke.texi (graphite-max-bbs-per-function): Remove. (graphite-max-nb-scop-params): Document special value zero. * domwalk.h (dom_walker::STOP): New symbolical constant. (dom_walker::dom_walker): Add optional parameter for bb to RPO mapping. (dom_walker::~dom_walker): Declare. (dom_walker::before_dom_children): Document STOP return value. (dom_walker::m_user_bb_to_rpo): New member. (dom_walker::m_bb_to_rpo): Likewise. * domwalk.c (dom_walker::dom_walker): Compute bb to RPO mapping here if not provided by the user. (dom_walker::~dom_walker): Free bb to RPO mapping if not provided by the user. (dom_walker::STOP): Define. (dom_walker::walk): Do not compute bb to RPO mapping here. Support STOP return value from before_dom_children to stop walking. * graphite-optimize-isl.c (optimize_isl): If the schedule is the same still generate code if -fgraphite-identity or -floop-parallelize-all are given. * graphite-scop-detection.c: Include cfganal.h. (gather_bbs::gather_bbs): Get and pass through bb to RPO mapping. (gather_bbs::before_dom_children): Return STOP for BBs not in the region. (build_scops): Compute bb to RPO mapping and pass it to the domwalk. Treat --param graphite-max-nb-scop-params=0 as not limiting the number of params. * graphite.c (graphite_initialize): Remove limit on the number of basic-blocks in a function. * params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove. (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented default value of 10. Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 253224) +++ gcc/doc/invoke.texi (working copy) @@ -10512,13 +10512,9 @@ sequence pairs. This option only applie @item graphite-max-nb-scop-params To avoid exponential effects in the Graphite loop transforms, the number of parameters in a Static Control Part (SCoP) is bounded. The -default value is 10 parameters. A variable whose value is unknown at -compilation time and defined outside a SCoP is a parameter of the SCoP. - -@item graphite-max-bbs-per-function -To avoid exponential effects in the detection of SCoPs, the size of -the functions analyzed by Graphite is bounded. The default value is -100 basic blocks. +default value is 10 parameters, a value of zero can be used to lift +the bound. A variable whose value is unknown at compilation time and +defined outside a SCoP is a parameter of the SCoP. @item loop-block-tile-size Loop blocking or strip mining transforms, enabled with Index: gcc/domwalk.c =================================================================== --- gcc/domwalk.c (revision 253224) +++ gcc/domwalk.c (working copy) @@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in If SKIP_UNREACHBLE_BLOCKS is true, then we need to set EDGE_EXECUTABLE on every edge in the CFG. */ dom_walker::dom_walker (cdi_direction direction, - bool skip_unreachable_blocks) + bool skip_unreachable_blocks, + int *bb_index_to_rpo) : m_dom_direction (direction), m_skip_unreachable_blocks (skip_unreachable_blocks), - m_unreachable_dom (NULL) + m_user_bb_to_rpo (bb_index_to_rpo != NULL), + m_unreachable_dom (NULL), + m_bb_to_rpo (bb_index_to_rpo) { + /* Compute the basic-block index to RPO mapping if not provided by + the user. */ + if (! m_bb_to_rpo && direction == CDI_DOMINATORS) + { + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); + int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, + true); + m_bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun)); + for (int i = 0; i < postorder_num; ++i) + m_bb_to_rpo[postorder[i]] = i; + free (postorder); + } + /* If we are not skipping unreachable blocks, then there is nothing - to do. */ + further to do. */ if (!m_skip_unreachable_blocks) return; @@ -194,6 +210,14 @@ dom_walker::dom_walker (cdi_direction di } } +/* Destructor. */ + +dom_walker::~dom_walker () +{ + if (! m_user_bb_to_rpo) + free (m_bb_to_rpo); +} + /* Return TRUE if BB is reachable, false otherwise. */ bool @@ -254,6 +278,8 @@ dom_walker::propagate_unreachable_to_edg m_unreachable_dom = bb; } +const edge dom_walker::STOP = (edge)-1; + /* Recursively walk the dominator tree. BB is the basic block we are currently visiting. */ @@ -264,17 +290,7 @@ dom_walker::walk (basic_block bb) basic_block *worklist = XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) * 2); int sp = 0; - int *postorder, postorder_num; - - if (m_dom_direction == CDI_DOMINATORS) - { - postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); - postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true); - bb_postorder = XNEWVEC (int, last_basic_block_for_fn (cfun)); - for (int i = 0; i < postorder_num; ++i) - bb_postorder[postorder[i]] = i; - free (postorder); - } + bb_postorder = m_bb_to_rpo; while (true) { @@ -283,13 +299,14 @@ dom_walker::walk (basic_block bb) || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun) || bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) { + edge taken_edge = NULL; /* Callback for subclasses to do custom things before we have walked the dominator children, but before we walk statements. */ if (this->bb_reachable (cfun, bb)) { - edge taken_edge = before_dom_children (bb); - if (taken_edge) + taken_edge = before_dom_children (bb); + if (taken_edge && taken_edge != STOP) { edge_iterator ei; edge e; @@ -306,12 +323,17 @@ dom_walker::walk (basic_block bb) worklist[sp++] = bb; worklist[sp++] = NULL; - int saved_sp = sp; - for (dest = first_dom_son (m_dom_direction, bb); - dest; dest = next_dom_son (m_dom_direction, dest)) - worklist[sp++] = dest; - if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS) - sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp); + /* If the callback returned NONE then we are supposed to + stop and not even propagate EDGE_EXECUTABLE further. */ + if (taken_edge != STOP) + { + int saved_sp = sp; + for (dest = first_dom_son (m_dom_direction, bb); + dest; dest = next_dom_son (m_dom_direction, dest)) + worklist[sp++] = dest; + if (sp - saved_sp > 1 && m_dom_direction == CDI_DOMINATORS) + sort_bbs_postorder (&worklist[saved_sp], sp - saved_sp); + } } /* NULL is used to mark pop operations in the recursion stack. */ while (sp > 0 && !worklist[sp - 1]) @@ -331,10 +353,6 @@ dom_walker::walk (basic_block bb) else break; } - if (m_dom_direction == CDI_DOMINATORS) - { - free (bb_postorder); - bb_postorder = NULL; - } + bb_postorder = NULL; free (worklist); } Index: gcc/domwalk.h =================================================================== --- gcc/domwalk.h (revision 253224) +++ gcc/domwalk.h (working copy) @@ -30,14 +30,22 @@ along with GCC; see the file COPYING3. class dom_walker { public: + static const edge STOP; + /* Use SKIP_UNREACHBLE_BLOCKS = true when your client can discover that some edges are not executable. If a client can discover that a COND, SWITCH or GOTO has a static target in the before_dom_children callback, the taken edge should be returned. The generic walker will clear EDGE_EXECUTABLE on all - edges it can determine are not executable. */ - dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false); + edges it can determine are not executable. + + You can provide a mapping of basic-block index to RPO if you + have that readily available or you do multiple walks. */ + dom_walker (cdi_direction direction, bool skip_unreachable_blocks = false, + int *bb_index_to_rpo = NULL); + + ~dom_walker (); /* Walk the dominator tree. */ void walk (basic_block); @@ -48,7 +56,10 @@ public: edges, NULL otherwise. When skipping unreachable blocks, the walker uses the taken edge information to clear EDGE_EXECUTABLE on the other edges, exposing unreachable blocks. A NULL return value means all - outgoing edges should still be considered executable. */ + outgoing edges should still be considered executable. A return value + of STOP means to stop the domwalk from processing dominated blocks from + here. This can be used to process a SEME region only (note domwalk + will still do work linear in function size). */ virtual edge before_dom_children (basic_block) { return NULL; } /* Function to call after the recursive walk of the dominator children. */ @@ -61,7 +72,9 @@ private: dominator tree. */ const ENUM_BITFIELD (cdi_direction) m_dom_direction : 2; bool m_skip_unreachable_blocks; + bool m_user_bb_to_rpo; basic_block m_unreachable_dom; + int *m_bb_to_rpo; /* Query whether or not the given block is reachable or not. */ bool bb_reachable (struct function *, basic_block); Index: gcc/graphite-optimize-isl.c =================================================================== --- gcc/graphite-optimize-isl.c (revision 253224) +++ gcc/graphite-optimize-isl.c (working copy) @@ -189,7 +189,7 @@ optimize_isl (scop_p scop) print_schedule_ast (dump_file, scop->original_schedule, scop); isl_schedule_free (scop->transformed_schedule); scop->transformed_schedule = isl_schedule_copy (scop->original_schedule); - return false; + return flag_graphite_identity || flag_loop_parallelize_all; } return true; Index: gcc/graphite-scop-detection.c =================================================================== --- gcc/graphite-scop-detection.c (revision 253224) +++ gcc/graphite-scop-detection.c (working copy) @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. #include "tree-pass.h" #include "tree-ssa-propagate.h" #include "gimple-pretty-print.h" +#include "cfganal.h" #include "graphite.h" class debug_printer @@ -1544,7 +1545,7 @@ build_alias_set (scop_p scop) class gather_bbs : public dom_walker { public: - gather_bbs (cdi_direction, scop_p); + gather_bbs (cdi_direction, scop_p, int *); virtual edge before_dom_children (basic_block); virtual void after_dom_children (basic_block); @@ -1554,8 +1555,8 @@ private: scop_p scop; }; } -gather_bbs::gather_bbs (cdi_direction direction, scop_p scop) - : dom_walker (direction), scop (scop) +gather_bbs::gather_bbs (cdi_direction direction, scop_p scop, int *bb_to_rpo) + : dom_walker (direction, false, bb_to_rpo), scop (scop) { } @@ -1589,7 +1590,7 @@ gather_bbs::before_dom_children (basic_b { sese_info_p region = scop->scop_info; if (!bb_in_sese_p (bb, region->region)) - return NULL; + return dom_walker::STOP; record_loop_in_sese (bb, region); @@ -1708,6 +1709,15 @@ build_scops (vec *scops) /* Now create scops from the lightweight SESEs. */ vec scops_l = sb.get_scops (); + + /* Domwalk needs a bb to RPO mapping. Compute it once here. */ + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); + int postorder_num = pre_and_rev_post_order_compute (NULL, postorder, true); + int *bb_to_rpo = XNEWVEC (int, last_basic_block_for_fn (cfun)); + for (int i = 0; i < postorder_num; ++i) + bb_to_rpo[postorder[i]] = i; + free (postorder); + int i; sese_l *s; FOR_EACH_VEC_ELT (scops_l, i, s) @@ -1715,7 +1725,7 @@ build_scops (vec *scops) scop_p scop = new_scop (s->entry, s->exit); /* Record all basic blocks and their conditions in REGION. */ - gather_bbs (CDI_DOMINATORS, scop).walk (s->entry->dest); + gather_bbs (CDI_DOMINATORS, scop, bb_to_rpo).walk (s->entry->dest); /* domwalk does not fulfil our code-generations constraints on the order of pbb which is to produce sth like execution order, delaying @@ -1757,8 +1767,8 @@ build_scops (vec *scops) find_scop_parameters (scop); graphite_dim_t max_dim = PARAM_VALUE (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS); - - if (scop_nb_params (scop) > max_dim) + if (max_dim > 0 + && scop_nb_params (scop) > max_dim) { DEBUG_PRINT (dp << "[scop-detection-fail] too many parameters: " << scop_nb_params (scop) @@ -1771,6 +1781,7 @@ build_scops (vec *scops) scops->safe_push (scop); } + free (bb_to_rpo); DEBUG_PRINT (dp << "number of SCoPs: " << (scops ? scops->length () : 0);); } Index: gcc/graphite.c =================================================================== --- gcc/graphite.c (revision 253224) +++ gcc/graphite.c (working copy) @@ -218,14 +218,9 @@ static bool graphite_initialize (void) { int min_loops = PARAM_VALUE (PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION); - int max_bbs = PARAM_VALUE (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION); - int nbbs = n_basic_blocks_for_fn (cfun); int nloops = number_of_loops (cfun); - if (nloops <= min_loops - /* FIXME: This limit on the number of basic blocks of a function - should be removed when the SCOP detection is faster. */ - || (nbbs > max_bbs)) + if (nloops <= min_loops) { if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -234,10 +229,6 @@ graphite_initialize (void) "PARAM_GRAPHITE_MIN_LOOPS_PER_FUNCTION = %d.\n", min_loops); - else if (nbbs > max_bbs) - fprintf (dump_file, "\nFunction has too many basic blocks: " - "PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION = %d.\n", max_bbs); - fprintf (dump_file, "\nnumber of SCoPs: 0\n"); print_global_statistics (dump_file); } Index: gcc/params.def =================================================================== --- gcc/params.def (revision 253224) +++ gcc/params.def (working copy) @@ -873,14 +873,7 @@ DEFPARAM (PARAM_LOOP_BLOCK_TILE_SIZE, DEFPARAM (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS, "graphite-max-nb-scop-params", "maximum number of parameters in a SCoP.", - 7, 0, 0) - -/* Maximal number of basic blocks in the functions analyzed by Graphite. */ - -DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION, - "graphite-max-bbs-per-function", - "maximum number of basic blocks per function to be analyzed by Graphite.", - 100, 0, 0) + 10, 0, 0) /* Maximal number of array references in a scop. */