From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5807 invoked by alias); 23 Nov 2014 22:22:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 5797 invoked by uid 89); 23 Nov 2014 22:22:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ig0-f170.google.com Received: from mail-ig0-f170.google.com (HELO mail-ig0-f170.google.com) (209.85.213.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sun, 23 Nov 2014 22:22:12 +0000 Received: by mail-ig0-f170.google.com with SMTP id r2so4193029igi.3 for ; Sun, 23 Nov 2014 14:22:10 -0800 (PST) X-Received: by 10.50.3.8 with SMTP id 8mr8768422igy.45.1416781330784; Sun, 23 Nov 2014 14:22:10 -0800 (PST) Received: from f1.c.bardezibar.internal (81.37.148.146.bc.googleusercontent.com. [146.148.37.81]) by mx.google.com with ESMTPSA id mx10sm3567417igb.21.2014.11.23.14.22.09 for (version=SSLv3 cipher=RC4-SHA bits=128/128); Sun, 23 Nov 2014 14:22:09 -0800 (PST) Date: Mon, 24 Nov 2014 00:06:00 -0000 From: Sebastian Pop To: Jeff Law Cc: Richard Biener , James Greenhalgh , Steve Ellcey , GCC Patches Subject: Re: [Patch] Improving jump-thread pass for PR 54742 Message-ID: <20141123222207.GA24073@f1.c.bardezibar.internal> References: <20140821094109.GA20536@arm.com> <53FB73C0.7090507@redhat.com> <20140926201451.GA16704@f1.c.bardezibar.internal> <20141026213433.GA22140@f1.c.bardezibar.internal> <20141111011404.GA23088@f1.c.bardezibar.internal> <20141118221933.GA7298@f1.c.bardezibar.internal> <5470E495.7070601@redhat.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="wac7ysb48OaltWcw" Content-Disposition: inline In-Reply-To: <5470E495.7070601@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2014-11/txt/msg02955.txt.bz2 --wac7ysb48OaltWcw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2876 Jeff Law wrote: > >PS: I have run some perf analysis with the patch: > >- on a bootstrap of GCC I see 3209 FSM jump threads > >- libpng and libjpeg contain FSM jump threads, the perf increase is in the 1% > > (measured on simulators and reduced data sets) > >- coremark gets jump threaded (as expected) > >- I'm setting up the llvm test-suite and I will report perf differences > So that's *far* more jump threads than I would expect this to find > in a bootstrap of GCC -- like 3 orders of magnitude more than I'd > expect to find. The second patch attached limits the search for FSM jump threads to loops. With that patch, we are now down to 470 jump threads in an x86_64-linux bootstrap (and 424 jump threads on powerpc64-linux bootstrap.) > I haven't dug deep, but the first level analysis is not encouraging. > > Basically I used the trunk compiler with and without your patch to > build gcc-4.7.3's cc1 (4.7.3 simply because that's what I last used > this testing framework). So that gives me two cc1s that I then use > to compile a bunch of .i files under valgrind's (cachegrind) > control. > > valgrind --tool=cachegrind --cache-sim=no --branch-sim=yes ...... > > That gives me two hunks of data for each input file I test. > Specifically I get the dynamic number of instructions and the > dynamic number of branches executed. > > For jump threading those values correspond directly to the effect > we're looking for -- a reduction in dynamic conditional jumps and a > reduction in dynamic instructions executed. Typically the change in > dynamic instructions executed is 2-3X the change in dynamic > conditional jumps -- which makes sense as removing the conditional > jump usually means we remove a comparison and possibly some setup > code as well. > > Consistently with your patch those values get worse. Across the > entire set of .i files I get > > For the trunk: > > instructions:1339016494968 > branches: 243568982489 > > With your patch: > > instructions:1339739533291 > branches: 243806615986 > > > So that's 723038323 more instructions and 237633497 more branches > after installing your patch. While we're looking a just under .1% > regression in dynamic branches, that's a terrible result for this > work. One of the reasons I think we see more branches is that in sese region copying we do not use the knowledge of the value of the condition for the last branch in a jump-thread path: we rely on other propagation passes to remove the branch. The last attached patch adds: /* Remove the last branch in the jump thread path. */ remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest); Please let me know if the attached patches are producing better results on gcc. I rebased the original patch on trunk and all patches bootstrapped together on x86_64-linux and powerpc64-linux. Thanks, Sebastian --wac7ysb48OaltWcw Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-extend-jump-thread-for-finite-state-automata-PR-5474.patch" Content-length: 16089 >From 120d5490598b1a09a06c04796b4fda46be7fd7db Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Fri, 26 Sep 2014 14:54:20 -0500 Subject: [PATCH 1/4] extend jump thread for finite state automata PR 54742 Adapted from a patch from James Greenhalgh. * params.def (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): New. * doc/invoke.texi (max-fsm-thread-path-insns, max-fsm-thread-length, max-fsm-thread-paths): Documented. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c: New. * tree-cfg.c (gimple_duplicate_sese_region): Save and restore loop header and latch. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Restore the original value of cond when simplification fails. (fsm_find_thread_path): New. (fsm_find_control_statement_thread_paths): New. (fsm_thread_through_normal_block): Call find_control_statement_thread_paths. * tree-ssa-threadupdate.c (dump_jump_thread_path): Pretty print EDGE_START_FSM_THREAD. (thread_through_all_blocks): Generate code for EDGE_START_FSM_THREAD edges calling gimple_duplicate_sese_region. * tree-ssa-threadupdate.h (jump_thread_edge_type): Add EDGE_START_FSM_THREAD. --- gcc/doc/invoke.texi | 12 ++ gcc/params.def | 15 ++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c | 38 +++++ gcc/tree-cfg.c | 26 ++- gcc/tree-ssa-threadedge.c | 203 ++++++++++++++++++++++- gcc/tree-ssa-threadupdate.c | 52 +++++- gcc/tree-ssa-threadupdate.h | 1 + 7 files changed, 342 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 89edddb..074183f 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10624,6 +10624,18 @@ large and significantly increase compile time at optimization level @option{-O1} and higher. This parameter is a maximum nubmer of statements in a single generated constructor. Default value is 5000. +@item max-fsm-thread-path-insns +Maximum number of instructions to copy when duplicating blocks on a +finite state automaton jump thread path. The default is 100. + +@item max-fsm-thread-length +Maximum number of basic blocks on a finite state automaton jump thread +path. The default is 10. + +@item max-fsm-thread-paths +Maximum number of new jump thread paths to create for a finite state +automaton. The default is 50. + @end table @end table diff --git a/gcc/params.def b/gcc/params.def index 6c71326..37741d3 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -1140,6 +1140,21 @@ DEFPARAM (PARAM_CHKP_MAX_CTOR_SIZE, "Maximum number of statements to be included into a single static " "constructor generated by Pointer Bounds Checker", 5000, 100, 0) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATH_INSNS, + "max-fsm-thread-path-insns", + "Maximum number of instructions to copy when duplicating blocks on a finite state automaton jump thread path", + 100, 1, 999999) + +DEFPARAM (PARAM_MAX_FSM_THREAD_LENGTH, + "max-fsm-thread-length", + "Maximum number of basic blocks on a finite state automaton jump thread path", + 10, 1, 999999) + +DEFPARAM (PARAM_MAX_FSM_THREAD_PATHS, + "max-fsm-thread-paths", + "Maximum number of new jump thread paths to create for a finite state automaton", + 50, 1, 999999) /* Local variables: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c new file mode 100644 index 0000000..310d3db --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-6.c @@ -0,0 +1,38 @@ +int sum0, sum1, sum2, sum3; +int foo (char *s, char **ret) +{ + int state=0; + char c; + + for (; *s && state != 4; s++) + { + c = *s; + if (c == '*') + { + s++; + break; + } + switch (state) + { + case 0: + if (c == '+') + state = 1; + else if (c != '-') + sum0+=c; + break; + case 1: + if (c == '+') + state = 2; + else if (c == '-') + state = 0; + else + sum1+=c; + break; + default: + break; + } + + } + *ret = s; + return state; +} diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index e78554f..6d96c52 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -5967,10 +5967,12 @@ gimple_duplicate_sese_region (edge entry, edge exit, { unsigned i; bool free_region_copy = false, copying_header = false; + bool save_loop_details = false; struct loop *loop = entry->dest->loop_father; edge exit_copy; vec doms; edge redirected; + int memo_loop_header_no = 0, memo_loop_latch_no = 0; int total_freq = 0, entry_freq = 0; gcov_type total_count = 0, entry_count = 0; @@ -5988,9 +5990,20 @@ gimple_duplicate_sese_region (edge entry, edge exit, if (region[i]->loop_father != loop) return false; - if (region[i] != entry->dest - && region[i] == loop->header) - return false; + /* If we are copying a region that starts and ends in an arbirary place in + the loop: keep track of which block will become our loop header. */ + if (region[i] != entry->dest && region[i] == loop->header) + { + save_loop_details = true; + memo_loop_header_no = i; + } + + /* And which block will become our loop latch. */ + if (region[i] != entry->src && region[i] == loop->latch) + { + save_loop_details = true; + memo_loop_latch_no = i; + } } /* In case the function is used for loop header copying (which is the primary @@ -6073,6 +6086,13 @@ gimple_duplicate_sese_region (edge entry, edge exit, loop->latch = exit->src; } + /* Restore loop details if we were asked to save them. */ + if (save_loop_details) + { + loop->header = region[memo_loop_header_no]; + loop->latch = region[memo_loop_latch_no]; + } + /* Redirect the entry and add the phi node arguments. */ redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest)); gcc_assert (redirected != NULL); diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 8b0b7b8..3939a74 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -56,6 +56,8 @@ along with GCC; see the file COPYING3. If not see #include "params.h" #include "tree-ssa-threadedge.h" #include "builtins.h" +#include "cfg.h" +#include "cfganal.h" /* To avoid code explosion due to jump threading, we limit the number of statements we are going to copy. This variable @@ -661,6 +663,7 @@ simplify_control_stmt_condition (edge e, rather than use a relational operator. These are simpler to handle. */ if (TREE_CODE (cond) == SSA_NAME) { + tree original_lhs = cond; cached_lhs = cond; /* Get the variable's current value from the equivalence chains. @@ -689,6 +692,12 @@ simplify_control_stmt_condition (edge e, pass specific callback to try and simplify it further. */ if (cached_lhs && ! is_gimple_min_invariant (cached_lhs)) cached_lhs = (*simplify) (stmt, stmt); + + /* We couldn't find an invariant. But, callers of this + function may be able to do something useful with the + unmodified destination. */ + if (!cached_lhs) + cached_lhs = original_lhs; } else cached_lhs = NULL; @@ -948,6 +957,175 @@ thread_around_empty_blocks (edge taken_edge, return false; } +/* Return true if there is at least one path from START_BB to END_BB. + VISITED_BBS is used to make sure we don't fall into an infinite loop. */ + +static bool +fsm_find_thread_path (basic_block start_bb, basic_block end_bb, + vec *&path, + hash_set *visited_bbs, int n_insns) +{ + if (start_bb == end_bb) + { + vec_safe_push (path, start_bb); + return true; + } + + if (!visited_bbs->add (start_bb)) + { + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, start_bb->succs) + if (fsm_find_thread_path (e->dest, end_bb, path, visited_bbs, n_insns)) + { + vec_safe_push (path, start_bb); + return true; + } + } + + return false; +} + +static int max_threaded_paths; + +/* We trace the value of the variable EXPR back through any phi nodes looking + for places where it gets a constant value and save the path. Stop after + having recorded MAX_PATHS jump threading paths. */ + +static void +fsm_find_control_statement_thread_paths (tree expr, + hash_set *visited_phis, + vec *&path) +{ + tree var = SSA_NAME_VAR (expr); + gimple def_stmt = SSA_NAME_DEF_STMT (expr); + basic_block var_bb = gimple_bb (def_stmt); + + if (var == NULL || var_bb == NULL) + return; + + vec *next_path; + vec_alloc (next_path, n_basic_blocks_for_fn (cfun)); + + basic_block last_bb_in_path = path->last (); + + /* Put the path from var_bb to last_bb_in_path into next_path. */ + if (var_bb != last_bb_in_path) + { + edge e; + int e_count = 0; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, last_bb_in_path->preds) + { + hash_set *visited_bbs = new hash_set; + + if (fsm_find_thread_path (var_bb, e->src, next_path, visited_bbs, 0)) + ++e_count; + + delete visited_bbs; + + /* If there is more than one path, stop. */ + if (e_count > 1) + { + vec_free (next_path); + return; + } + } + } + + /* Visit PHI nodes once. */ + if (gimple_code (def_stmt) != GIMPLE_PHI + || visited_phis->add (def_stmt)) + { + vec_free (next_path); + return; + } + + gphi *phi = as_a (def_stmt); + + /* Append all the nodes from next_path to path. */ + vec_safe_splice (path, next_path); + gcc_assert (path->last () == var_bb); + + /* Iterate over the arguments of PHI. */ + unsigned int i; + for (i = 0; i < gimple_phi_num_args (phi); i++) + { + tree arg = gimple_phi_arg_def (phi, i); + basic_block bbi = gimple_phi_arg_edge (phi, i)->src; + + /* Skip edges pointing outside the current loop. */ + if (!arg || var_bb->loop_father != bbi->loop_father) + continue; + + /* Add BBI to the path. */ + vec_safe_push (path, bbi); + + if (TREE_CODE (arg) == INTEGER_CST) + { + int j, n = path->length (); + vec *jump_thread_path + = new vec (); + int joiners = 0; + + for (j = 0; j < n - 1; j++) + { + edge e = find_edge ((*path)[n - j - 1], + (*path)[n - j - 2]); + gcc_assert (e); + enum jump_thread_edge_type kind; + + if (j == 0) + kind = EDGE_START_FSM_THREAD; + else if (single_pred_p (e->src)) + kind = EDGE_NO_COPY_SRC_BLOCK; + else { + kind = EDGE_COPY_SRC_JOINER_BLOCK; + ++joiners; + } + + jump_thread_edge *x = new jump_thread_edge (e, kind); + jump_thread_path->safe_push (x); + } + + /* Add the edge taken when the control variable has value ARG. */ + edge taken_edge = find_taken_edge ((*path)[0], arg); + jump_thread_edge *x + = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK); + jump_thread_path->safe_push (x); + + /* A path with less than 3 nodes should not be jump-threaded. */ + if (n > 2 && n < PARAM_VALUE (PARAM_MAX_FSM_THREAD_LENGTH) + && max_threaded_paths > 0) + { + int n_insns = 0; + gimple_stmt_iterator gsi; + + for (j = 1; j < n - 1; j++) + for (gsi = gsi_start_bb ((*path)[j]); !gsi_end_p (gsi); + gsi_next (&gsi)) + ++n_insns; + + if (n_insns < PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) + { + register_jump_thread (jump_thread_path); + --max_threaded_paths; + } + } + } + else if (TREE_CODE (arg) == SSA_NAME) + fsm_find_control_statement_thread_paths (arg, visited_phis, path); + + /* Remove BBI from the path. */ + path->pop (); + } + + /* Remove all the nodes that we added from next_path. */ + vec_safe_truncate (path, (path->length () - next_path->length ())); + vec_free (next_path); +} + /* We are exiting E->src, see if E->dest ends with a conditional jump which has a known value when reached via E. @@ -1033,7 +1211,10 @@ thread_through_normal_block (edge e, cond = simplify_control_stmt_condition (e, stmt, dummy_cond, simplify, handle_dominating_asserts); - if (cond && is_gimple_min_invariant (cond)) + if (!cond) + return 0; + + if (is_gimple_min_invariant (cond)) { edge taken_edge = find_taken_edge (e->dest, cond); basic_block dest = (taken_edge ? taken_edge->dest : NULL); @@ -1079,6 +1260,26 @@ thread_through_normal_block (edge e, backedge_seen_p); return 1; } + + if (TREE_CODE (cond) != SSA_NAME + || e->dest->loop_father != e->src->loop_father) + return 0; + + /* When COND cannot be simplified, try to find paths from a control + statement back through the PHI nodes which would affect that control + statement. */ + vec *bb_path; + vec_alloc (bb_path, n_basic_blocks_for_fn (cfun)); + vec_safe_push (bb_path, e->dest); + hash_set *visited_phis = new hash_set; + + max_threaded_paths = PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATHS); + fsm_find_control_statement_thread_paths (cond, visited_phis, bb_path); + + delete visited_phis; + vec_free (bb_path); + + return -1; } return 0; } diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index ca0b8bf..a453b5e 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -167,8 +167,9 @@ dump_jump_thread_path (FILE *dump_file, vec path, bool registering) { fprintf (dump_file, - " %s jump thread: (%d, %d) incoming edge; ", + " %s%s jump thread: (%d, %d) incoming edge; ", (registering ? "Registering" : "Cancelling"), + (path[0]->type == EDGE_START_FSM_THREAD ? " FSM": ""), path[0]->e->src->index, path[0]->e->dest->index); for (unsigned int i = 1; i < path.length (); i++) @@ -2343,6 +2344,55 @@ thread_through_all_blocks (bool may_peel_loop_headers) threaded_blocks = BITMAP_ALLOC (NULL); memset (&thread_stats, 0, sizeof (thread_stats)); + for (i = 0; i < paths.length ();) + { + vec *path = paths[i]; + edge entry = (*path)[0]->e; + + if ((*path)[0]->type != EDGE_START_FSM_THREAD + /* Do not jump-thread twice from the same block. */ + || bitmap_bit_p (threaded_blocks, entry->src->index)) { + i++; + continue; + } + + unsigned len = path->length (); + edge exit = (*path)[len - 1]->e; + basic_block *region = XNEWVEC (basic_block, len - 1); + + for (unsigned int j = 0; j < len - 1; j++) + region[j] = (*path)[j]->e->dest; + + bool success = gimple_duplicate_sese_region (entry, exit, region, + len - 1, NULL, 0); + delete_jump_thread_path (path); + paths.unordered_remove (i); + + if (success) + { + /* We do not update dominance info. */ + free_dominance_info (CDI_DOMINATORS); + bitmap_set_bit (threaded_blocks, entry->src->index); + } + } + + for (i = 0; i < paths.length ();) + { + vec *path = paths[i]; + edge entry = (*path)[0]->e; + + /* Do not jump-thread twice from the same block. */ + if (bitmap_bit_p (threaded_blocks, entry->src->index)) + { + delete_jump_thread_path (path); + paths.unordered_remove (i); + } + else + i++; + } + + bitmap_clear (threaded_blocks); + mark_threaded_blocks (threaded_blocks); initialize_original_copy_tables (); diff --git a/gcc/tree-ssa-threadupdate.h b/gcc/tree-ssa-threadupdate.h index 426aca5..42c3a9e 100644 --- a/gcc/tree-ssa-threadupdate.h +++ b/gcc/tree-ssa-threadupdate.h @@ -26,6 +26,7 @@ extern bool thread_through_all_blocks (bool); enum jump_thread_edge_type { EDGE_START_JUMP_THREAD, + EDGE_START_FSM_THREAD, EDGE_COPY_SRC_BLOCK, EDGE_COPY_SRC_JOINER_BLOCK, EDGE_NO_COPY_SRC_BLOCK -- 1.9.1 --wac7ysb48OaltWcw Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0002-look-for-fsm-threads-only-in-loops.patch" Content-length: 3822 >From b3c22ccf4ba3a26ba7b2ac3760059032235f5089 Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Sun, 23 Nov 2014 01:03:40 -0600 Subject: [PATCH 2/4] look for fsm threads only in loops --- gcc/tree-ssa-threadedge.c | 84 +++++++++++++++++++++++++++-------------------- 1 file changed, 49 insertions(+), 35 deletions(-) diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 3939a74..41b6494 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -1064,36 +1064,7 @@ fsm_find_control_statement_thread_paths (tree expr, if (TREE_CODE (arg) == INTEGER_CST) { - int j, n = path->length (); - vec *jump_thread_path - = new vec (); - int joiners = 0; - - for (j = 0; j < n - 1; j++) - { - edge e = find_edge ((*path)[n - j - 1], - (*path)[n - j - 2]); - gcc_assert (e); - enum jump_thread_edge_type kind; - - if (j == 0) - kind = EDGE_START_FSM_THREAD; - else if (single_pred_p (e->src)) - kind = EDGE_NO_COPY_SRC_BLOCK; - else { - kind = EDGE_COPY_SRC_JOINER_BLOCK; - ++joiners; - } - - jump_thread_edge *x = new jump_thread_edge (e, kind); - jump_thread_path->safe_push (x); - } - - /* Add the edge taken when the control variable has value ARG. */ - edge taken_edge = find_taken_edge ((*path)[0], arg); - jump_thread_edge *x - = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK); - jump_thread_path->safe_push (x); + int n = path->length (); /* A path with less than 3 nodes should not be jump-threaded. */ if (n > 2 && n < PARAM_VALUE (PARAM_MAX_FSM_THREAD_LENGTH) @@ -1101,14 +1072,56 @@ fsm_find_control_statement_thread_paths (tree expr, { int n_insns = 0; gimple_stmt_iterator gsi; + int j; + loop_p loop = (*path)[0]->loop_father; + bool path_crosses_loops = false; for (j = 1; j < n - 1; j++) - for (gsi = gsi_start_bb ((*path)[j]); !gsi_end_p (gsi); - gsi_next (&gsi)) - ++n_insns; + { + basic_block bb = (*path)[j]; + if (bb->loop_father != loop) + { + path_crosses_loops = true; + break; + } + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + ++n_insns; + } - if (n_insns < PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) + if (!path_crosses_loops + && n_insns < PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS)) { + vec *jump_thread_path + = new vec (); + int joiners = 0; + + for (j = 0; j < n - 1; j++) + { + edge e = find_edge ((*path)[n - j - 1], + (*path)[n - j - 2]); + gcc_assert (e); + enum jump_thread_edge_type kind; + + if (j == 0) + kind = EDGE_START_FSM_THREAD; + else if (single_pred_p (e->src)) + kind = EDGE_NO_COPY_SRC_BLOCK; + else { + kind = EDGE_COPY_SRC_JOINER_BLOCK; + ++joiners; + } + + jump_thread_edge *x = new jump_thread_edge (e, kind); + jump_thread_path->safe_push (x); + } + + /* Add the edge taken when the control variable has value ARG. */ + edge taken_edge = find_taken_edge ((*path)[0], arg); + jump_thread_edge *x + = new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK); + jump_thread_path->safe_push (x); + register_jump_thread (jump_thread_path); --max_threaded_paths; } @@ -1262,7 +1275,8 @@ thread_through_normal_block (edge e, } if (TREE_CODE (cond) != SSA_NAME - || e->dest->loop_father != e->src->loop_father) + || e->dest->loop_father != e->src->loop_father + || loop_depth (e->dest->loop_father) == 0) return 0; /* When COND cannot be simplified, try to find paths from a control -- 1.9.1 --wac7ysb48OaltWcw Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0003-add-FSM-debug.patch" Content-length: 1065 >From 40531e183620fbcbfa678ddceaacbecd69a2b087 Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Sun, 23 Nov 2014 01:04:05 -0600 Subject: [PATCH 3/4] add FSM debug --- gcc/tree-ssa-threadupdate.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index a453b5e..dd2b518 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2365,15 +2365,17 @@ thread_through_all_blocks (bool may_peel_loop_headers) bool success = gimple_duplicate_sese_region (entry, exit, region, len - 1, NULL, 0); - delete_jump_thread_path (path); - paths.unordered_remove (i); - if (success) { + dump_jump_thread_path (stderr, *path, false); + /* We do not update dominance info. */ free_dominance_info (CDI_DOMINATORS); bitmap_set_bit (threaded_blocks, entry->src->index); } + + delete_jump_thread_path (path); + paths.unordered_remove (i); } for (i = 0; i < paths.length ();) -- 1.9.1 --wac7ysb48OaltWcw Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0004-make-copied-region-single-entry-and-remove-last-cond.patch" Content-length: 6779 >From b9b6155099d81b5ee6322e8bba2e3ba5d4f00b6e Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Sun, 23 Nov 2014 10:52:11 -0600 Subject: [PATCH 4/4] make copied region single entry and remove last condition stmt --- gcc/tree-cfg.c | 2 +- gcc/tree-cfg.h | 1 + gcc/tree-ssa-threadupdate.c | 151 +++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 151 insertions(+), 3 deletions(-) diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 6d96c52..ffa5162 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -2666,7 +2666,7 @@ reinstall_phi_args (edge new_edge, edge old_edge) near its "logical" location. This is of most help to humans looking at debugging dumps. */ -static basic_block +basic_block split_edge_bb_loc (edge edge_in) { basic_block dest = edge_in->dest; diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h index 626e973..51f0899 100644 --- a/gcc/tree-cfg.h +++ b/gcc/tree-cfg.h @@ -67,6 +67,7 @@ extern void verify_gimple_in_cfg (struct function *, bool); extern tree gimple_block_label (basic_block); extern void add_phi_args_after_copy_bb (basic_block); extern void add_phi_args_after_copy (basic_block *, unsigned, edge); +extern basic_block split_edge_bb_loc (edge); extern bool gimple_duplicate_sese_region (edge, edge, basic_block *, unsigned, basic_block *, bool); extern bool gimple_duplicate_sese_tail (edge, edge, basic_block *, unsigned, diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index dd2b518..3ee2117 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2318,6 +2318,153 @@ bb_ends_with_multiway_branch (basic_block bb ATTRIBUTE_UNUSED) return false; } +/* Duplicates a Single Entry Multiple Exit REGION (set of N_REGION basic + blocks). The ENTRY edge is redirected to the duplicate of the region. If + REGION is not a Single Entry region, ignore any incoming edges other than + ENTRY: this makes the copied region a Single Entry region. + + Remove the last conditional statement in the last basic block in the REGION, + and create a single fallthru edge pointing to the same destination as the + EXIT edge. + + The new basic blocks are stored to REGION_COPY in the same order as they had + in REGION, provided that REGION_COPY is not NULL. + + Returns false if it is unable to copy the region, true otherwise. */ + +static bool +duplicate_seme_region (edge entry, edge exit, + basic_block *region, unsigned n_region, + basic_block *region_copy) +{ + unsigned i; + bool free_region_copy = false, copying_header = false; + struct loop *loop = entry->dest->loop_father; + edge exit_copy; + edge redirected; + int total_freq = 0, entry_freq = 0; + gcov_type total_count = 0, entry_count = 0; + + if (!can_copy_bbs_p (region, n_region)) + return false; + + /* Some sanity checking. Note that we do not check for all possible + missuses of the functions. I.e. if you ask to copy something weird, + it will work, but the state of structures probably will not be + correct. */ + for (i = 0; i < n_region; i++) + { + /* We do not handle subloops, i.e. all the blocks must belong to the + same loop. */ + if (region[i]->loop_father != loop) + return false; + } + + initialize_original_copy_tables (); + + if (copying_header) + set_loop_copy (loop, loop_outer (loop)); + else + set_loop_copy (loop, loop); + + if (!region_copy) + { + region_copy = XNEWVEC (basic_block, n_region); + free_region_copy = true; + } + + if (entry->dest->count) + { + total_count = entry->dest->count; + entry_count = entry->count; + /* Fix up corner cases, to avoid division by zero or creation of negative + frequencies. */ + if (entry_count > total_count) + entry_count = total_count; + } + else + { + total_freq = entry->dest->frequency; + entry_freq = EDGE_FREQUENCY (entry); + /* Fix up corner cases, to avoid division by zero or creation of negative + frequencies. */ + if (total_freq == 0) + total_freq = 1; + else if (entry_freq > total_freq) + entry_freq = total_freq; + } + + copy_bbs (region, n_region, region_copy, &exit, 1, &exit_copy, loop, + split_edge_bb_loc (entry), 0); + if (total_count) + { + scale_bbs_frequencies_gcov_type (region, n_region, + total_count - entry_count, + total_count); + scale_bbs_frequencies_gcov_type (region_copy, n_region, entry_count, + total_count); + } + else + { + scale_bbs_frequencies_int (region, n_region, total_freq - entry_freq, + total_freq); + scale_bbs_frequencies_int (region_copy, n_region, entry_freq, total_freq); + } + +#ifdef ENABLE_CHECKING + /* Make sure no edge other than ENTRY is entering the copied region. */ + for (i = 0; i < n_region; i++) + { + edge e; + edge_iterator ei; + basic_block bb = region_copy[i]; + + if (single_pred_p (bb)) + continue; + + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ei_next (&ei)) + { + basic_block x = e->src; + bool found = false; + + for (unsigned j = 0; j < n_region; j++) + if (x == region_copy[j]) + { + found = true; + break; + } + + gcc_assert (found); + } + } +#endif + + /* Remove the last branch in the jump thread path. */ + remove_ctrl_stmt_and_useless_edges (region_copy[n_region - 1], exit->dest); + edge e = make_edge (region_copy[n_region - 1], exit->dest, EDGE_FALLTHRU); + + if (e) { + rescan_loop_exit (e, true, false); + e->probability = REG_BR_PROB_BASE; + e->count = region_copy[n_region - 1]->count; + //copy_phi_args (e->dest, rd->path->last ()->e, e, rd->path, idx); + } + + /* Redirect the entry and add the phi node arguments. */ + redirected = redirect_edge_and_branch (entry, get_bb_copy (entry->dest)); + gcc_assert (redirected != NULL); + flush_pending_stmts (entry); + + /* Add the other PHI node arguments. */ + add_phi_args_after_copy (region_copy, n_region, NULL); + + if (free_region_copy) + free (region_copy); + + free_original_copy_tables (); + return true; +} + /* Walk through all blocks and thread incoming edges to the appropriate outgoing edge for each edge pair recorded in THREADED_EDGES. @@ -2363,8 +2510,8 @@ thread_through_all_blocks (bool may_peel_loop_headers) for (unsigned int j = 0; j < len - 1; j++) region[j] = (*path)[j]->e->dest; - bool success = gimple_duplicate_sese_region (entry, exit, region, - len - 1, NULL, 0); + bool success = duplicate_seme_region (entry, exit, region, + len - 1, NULL); if (success) { dump_jump_thread_path (stderr, *path, false); -- 1.9.1 --wac7ysb48OaltWcw--