From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 118369 invoked by alias); 19 Aug 2015 21:46:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 118359 invoked by uid 89); 19 Aug 2015 21:46:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 19 Aug 2015 21:46:12 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 1BE948E220; Wed, 19 Aug 2015 21:46:11 +0000 (UTC) Received: from localhost.localdomain (ovpn-113-201.phx2.redhat.com [10.3.113.201]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7JLkAP4020678; Wed, 19 Aug 2015 17:46:10 -0400 Subject: Re: [Patch,tree-optimization]: Add new path Splitting pass on tree ssa representation To: Ajit Kumar Agarwal , Richard Biener References: <37378DC5BCD0EE48BA4B082E0B55DFAA41F3F56C@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F41150@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA41F42541@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295155D@XAP-PVEXMBX02.xlnx.xilinx.com> <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB@XAP-PVEXMBX02.xlnx.xilinx.com> Cc: GCC Patches , Vinod Kathail , Shail Aditya Gupta , Vidhumouli Hunsigida , Nagaraju Mekala From: Jeff Law Message-ID: <55D4F921.2020708@redhat.com> Date: Wed, 19 Aug 2015 21:53:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <37378DC5BCD0EE48BA4B082E0B55DFAA4295ADCB@XAP-PVEXMBX02.xlnx.xilinx.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg01141.txt.bz2 On 08/15/2015 11:01 AM, Ajit Kumar Agarwal wrote: > > > From cf2b64cc1d6623424d770f2a9ea257eb7e58e887 Mon Sep 17 00:00:00 2001 > From: Ajit Kumar Agarwal > Date: Sat, 15 Aug 2015 18:19:14 +0200 > Subject: [PATCH] [Patch,tree-optimization]: Add new path Splitting pass on > tree ssa representation. > > Added a new pass on path splitting on tree SSA representation. The path > splitting optimization does the CFG transformation of join block of the > if-then-else same as the loop latch node is moved and merged with the > predecessor blocks after preserving the SSA representation. > > ChangeLog: > 2015-08-15 Ajit Agarwal > > * gcc/Makefile.in: Add the build of the new file > tree-ssa-path-split.c Instead: * Makefile.in (OBJS): Add tree-ssa-path-split.o. > * gcc/opts.c (OPT_ftree_path_split) : Add an entry for > Path splitting pass with optimization flag greater and > equal to O2. * opts.c (default_options_table): Add entry for path splitting optimization at -O2 and above. > * gcc/passes.def (path_split): add new path splitting pass. Capitalize "add". > * gcc/tree-ssa-path-split.c: New. Use "New file". > * gcc/tracer.c (transform_duplicate): New. Use "New function". > * gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c: New. > * gcc/testsuite/gcc.dg/path-split-1.c: New. These belong in gcc/testsuite/ChangeLog and remove the "gcc/testsuite" prefix. > * gcc/doc/invoke.texi > (ftree-path-split): Document. > (fdump-tree-path_split): Document. Should just be two lines instead of three. And more generally, there's no need to prefix ChangeLog entries with "gcc/". Now that the ChangeLog nits are out of the way, let's get to stuff that's more interesting. > > Signed-off-by:Ajit Agarwalajitkum@xilinx.com > --- > gcc/Makefile.in | 1 + > gcc/common.opt | 4 + > gcc/doc/invoke.texi | 16 +- > gcc/opts.c | 1 + > gcc/passes.def | 1 + > gcc/testsuite/gcc.dg/path-split-1.c | 65 ++++++ > gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c | 60 +++++ > gcc/timevar.def | 1 + > gcc/tracer.c | 37 +-- > gcc/tree-pass.h | 1 + > gcc/tree-ssa-path-split.c | 330 +++++++++++++++++++++++++++ > 11 files changed, 503 insertions(+), 14 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/path-split-1.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/path-split-2.c > create mode 100644 gcc/tree-ssa-path-split.c > > diff --git a/gcc/common.opt b/gcc/common.opt > index e80eadf..1d02582 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -2378,6 +2378,10 @@ ftree-vrp > Common Report Var(flag_tree_vrp) Init(0) Optimization > Perform Value Range Propagation on trees > > +ftree-path-split > +Common Report Var(flag_tree_path_split) Init(0) Optimization > +Perform Path Splitting Maybe "Perform Path Splitting for loop backedges" or something which is a little more descriptive. The above isn't exactly right, so don't use it as-is. > @@ -9068,6 +9075,13 @@ enabled by default at @option{-O2} and higher. Null pointer check > elimination is only done if @option{-fdelete-null-pointer-checks} is > enabled. > > +@item -ftree-path-split > +@opindex ftree-path-split > +Perform Path Splitting on trees. The join blocks of IF-THEN-ELSE same > +as loop latch node is moved to its predecessor and the loop latch node > +will be forwarding block. This is enabled by default at @option{-O2} > +and higher. Needs some work. Maybe something along the lines of When two paths of execution merge immediately before a loop latch node, try to duplicate the merge node into the two paths. > diff --git a/gcc/passes.def b/gcc/passes.def > index 6b66f8f..20ddf3d 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_ccp); > /* After CCP we rewrite no longer addressed locals into SSA > form if possible. */ > + NEXT_PASS (pass_path_split); > NEXT_PASS (pass_forwprop); > NEXT_PASS (pass_sra_early); I can't recall if we've discussed the location of the pass at all. I'm not objecting to this location, but would like to hear why you chose this particular location in the optimization pipeline. > /* pass_build_ealias is a dummy pass that ensures that we > diff --git a/gcc/testsuite/gcc.dg/path-split-1.c b/gcc/testsuite/gcc.dg/path-split-1.c ISTM the two tests should be combined into a single test. I didn't see a functional difference in the test() function between those two tests. I believe you can still create/scan debugging dumps with dg-do run test. > +DEFTIMEVAR (TV_TREE_PATH_SPLIT , "tree path_split") tree path split rather than using underscores > diff --git a/gcc/tracer.c b/gcc/tracer.c > index cad7ab1..206692f 100644 > --- a/gcc/tracer.c > +++ b/gcc/tracer.c > @@ -58,11 +58,13 @@ > #include "fibonacci_heap.h" > > static int count_insns (basic_block); > -static bool ignore_bb_p (const_basic_block); > +bool ignore_bb_p (const_basic_block); > static bool better_p (const_edge, const_edge); > static edge find_best_successor (basic_block); > static edge find_best_predecessor (basic_block); > static int find_trace (basic_block, basic_block *); > +basic_block transform_duplicate(basic_block bb, > + basic_block bb2); Please create a tracer.h and put the newly exported prototypes in tracer.h. Then include tracer.h in tracer.c and tree-ssa-path-split.c. > @@ -224,6 +226,24 @@ find_trace (basic_block bb, basic_block *trace) > return i; > } > > +/* Transform the block that needs to be duplicated. */ > + > +basic_block > +transform_duplicate(basic_block bb, > + basic_block bb2) Space between the name of the function and first paren. It looks like these two lines should be joined. Single space between the type and the name of the argument. Ultimately there's not a lot of shared code between the tracer and path splitting, which is basically what I expected. Nevertheless, sharing a single implementation of those routines is probably wise. > + > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "cfghooks.h" > +#include "tree.h" > +#include "gimple.h" > +#include "rtl.h" > +#include "ssa.h" > +#include "flags.h" > +#include "alias.h" > +#include "fold-const.h" > +#include "stor-layout.h" > +#include "calls.h" > +#include "cfganal.h" > +#include "internal-fn.h" > +#include "gimple-fold.h" > +#include "tree-eh.h" > +#include "gimple-iterator.h" > +#include "gimple-walk.h" > +#include "tree-cfg.h" > +#include "tree-ssa-loop-manip.h" > +#include "tree-ssa-loop-niter.h" > +#include "tree-ssa-loop.h" > +#include "tree-into-ssa.h" > +#include "tree-ssa.h" > +#include "tree-pass.h" > +#include "tree-dump.h" > +#include "gimple-pretty-print.h" > +#include "diagnostic-core.h" > +#include "intl.h" > +#include "cfgloop.h" > +#include "tree-scalar-evolution.h" > +#include "tree-ssa-propagate.h" > +#include "tree-chrec.h" > +#include "insn-config.h" > +#include "expmed.h" > +#include "dojump.h" > +#include "explow.h" > +#include "emit-rtl.h" > +#include "varasm.h" > +#include "stmt.h" > +#include "expr.h" > +#include "insn-codes.h" > +#include "optabs.h" > +#include "fibonacci_heap.h" I'm having a hard time seeing how all these are needed. Especially the RTL related includes. These really need to be trimmed. > + > +extern basic_block transform_duplicate(basic_block bb, basic_block bb2); > +extern bool ignore_bb_p (const_basic_block bb); With the prototypes moved to tracer.h, you won't want the extern declarations in here. > + > +/* This function gets the join blocks same as the source > + node of the loop latch nodes and form the trace with > + the join block and its predecessor. */ I'm having a hard time parsing this comment. Please try to rewrite it to be clearer. I suspect this routine is more complicated than it needs to be because of how you're searching for our subgraphs. > + > +static int > +find_trace_loop_latch_same_as_join_blk (basic_block bb, > + basic_block *trace) > +{ > + vec bbs; > + basic_block bb1; > + unsigned int i; > + edge_iterator ei; > + edge e1; > + bool found = false; > + int n = 0; > + > + bbs = get_all_dominated_blocks (CDI_DOMINATORS, > + bb ); > + FOR_EACH_VEC_ELT (bbs, i, bb1) > + { > + FOR_EACH_EDGE (e1, ei, bb->succs) > + { > + if ( bb1 == e1->dest) > + { > + found = true; > + } > + } > + > + if (!found && bb1 != bb) > + { > + found = false; > + FOR_EACH_EDGE (e1, ei, bb1->succs) > + { > + if (e1->flags & (EDGE_DFS_BACK)) > + { > + trace[1] = e1->src; > + n++; > + found = true; > + } > + } It seems to me all this can be changed to look for the backedges via the loop tree. > + > + if (found && EDGE_COUNT(bb1->preds) == 2) Space between EDGE_COUNT and the open paren. You'd want to keep this test, then do some kind of dominance test like we talked about earlier in the discussion of this patch. Jeff > + { > + FOR_EACH_EDGE (e1, ei, bb1->preds) > + { > + if (single_succ_p(e1->src) && > + single_succ_edge (e1->src)->flags & EDGE_FALLTHRU) And you'd keep these tests in some form. > + > +/* This function performs the feasibility tests for path splitting > + to perform. Return false if the feasibility for path splitting > + is not done and returns true if the feasbility for path splitting > + is done. Following feasibility tests are performed. > + > + 1. Return false if the join block has rhs casting for assign > + gimple statements. > + 2. If the number of phis is greater than 1 or the phi node in > + the join block has virtual operand return false. > + 3. Return true if the number of sequential statements is > + greater than 2. > + 4. If the phi result in the phi node of the join block is not > + used inside the same join block return false. > + 7. Otherwise returns true. */ These seem totally arbitrary. What's the reason behind each of these restrictions? None should be a correctness requirement AFAICT. Others (like the number of statements) should probably be conditionalized on size vs speed optimizations. The join > + same as the source of the loop latch node is identified along > + with their predecessors. I couldn't parse this sentence. Based on the feasibility tests for > + path splitting the path splitting is performed by adding the > + join blocks into the predecessors after propagating the phi > + result with the corresponding phi arguments for the predecessors. > + The tree-cfg-cleanup will merge the blocks in the predecessors > + path and the update-ssa will update the ssa representation after > + the path splitting is performed. */ It would probably help to show a visual representation of what's happening. ie, show a little CFG before and after. > + > +static bool > +perform_path_splitting () > +{ > + bool changed = false; > + basic_block trace[2] = {NULL,NULL}; > + basic_block bb; > + auto_vec*> blocks; > + blocks.safe_grow_cleared (last_basic_block_for_fn (cfun)); > + fibonacci_heap heap (LONG_MIN); > + > + initialize_original_copy_tables(); > + calculate_dominance_info (CDI_DOMINATORS); > + > + FOR_EACH_BB_FN (bb, cfun) > + { > + if (!ignore_bb_p (bb)) > + blocks[bb->index] = heap.insert (-bb->frequency, bb); > + } > + > + while (!heap.empty()) Wouldn't it be easier to walk the loop tree? We've got a fairly specific subgraph that we're looking for. Why not walk the loop tree, using the latch as a point to search backwards for the shape of the subgraph we're interested in transforming? That would seem to be a lot less expensive than looking at every block looking for the start of the subgraphs we're interested in. That'd also seem to eliminate all the fibheap stuff. > + > +static unsigned int > +execute_path_split (void) > +{ > + bool changed; > + > + if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + 1) > + return 0; > + > + mark_dfs_back_edges (); ISTM you could check the return value from mark_dfs_back_edges and do nothing if no back edges were found. > + > +static bool > +gate_path_split(void) > +{ > + return flag_tree_path_split != 0; > +} > + > +namespace { > + > +const pass_data pass_data_path_split = > +{ > + GIMPLE_PASS, /* type */ > + "path_split", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_TREE_PATH_SPLIT, /* tv_id */ > + 0, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_update_ssa, /* todo_flags_finish */ It seems to me that you're probably missing some properties_required. You depend on a proper CFG and SSA graph, so at the least PROP_cfg and PROP_ssa. Presumably you don't set TODO_cleanup_cfg so that you can avoid the cfg cleanup if nothing gets changed. Is there some reason you don't do the same for the SSA graph update? Jeff