public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: Patch for switch elimination (PR 54742)
@ 2014-08-12 17:47 Steve Ellcey
  2014-08-12 17:56 ` Jakub Jelinek
  2014-08-12 18:31 ` Jeff Law
  0 siblings, 2 replies; 28+ messages in thread
From: Steve Ellcey @ 2014-08-12 17:47 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jeff Law, james.greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

After talking to Jeff Law at the GCC Cauldron I have updated my switch
shortcut plugin pass to try and address this optimization in the hopes of
getting it added to GCC as a static pass.  I fixed the code to build with
the various C++ changes that have been happening in GCC but the current
version I have included in this email is not working yet.  When I run it
on coremark I get errors like:

core_state.c: In function 'core_bench_state':
core_state.c:43:8: error: size of loop 16 should be 13, not 5
 ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock, 
        ^
core_state.c:43:8: error: bb 15 does not belong to loop 16
core_state.c:43:8: error: bb 113 does not belong to loop 16
core_state.c:43:8: error: bb 118 does not belong to loop 16
core_state.c:43:8: error: bb 117 does not belong to loop 16
(etc) 

Apparently there have been some changes to the loop information that
is built since GCC 4.9.  I had hoped that adding a call to fix_loop_structure
after recalculating the dominance information would fix this but it didn't.

Does anyone have any ideas on how to fix up the loop information that my
optimization has messed as it duplicates blocks?  I tried adding a call
'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure but that
did not seem to have any affect.

Steve Ellcey
sellcey@mips.com

2014-08-12  Steve Ellcey  <sellcey@mips.com>

	PR tree-opt/54742
	* Makefile.in (OBJS): Add tree-switch-shortcut.o.
	* common.opt (ftree-switch-shortcut): New.
	* opts.c (default_options_table): Add OPT_ftree_switch_shortcut.
	* params.def (PARAM_MAX_SWITCH_INSNS): New.
	(PARAM_MAX_SWITCH_PATHS): New.
	* passes.def (pass_tree_switch_shortcut): New.
	* timevar.def (TV_TREE_SWITCH_SHORTCUT): New.
	* tree-pass.h (make_pass_tree_switch_shortcut): New.
	* tree-switch-shortcut.c: New.


[-- Attachment #2: diff.out --]
[-- Type: text/x-patch, Size: 15328 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31c1f4d..94e8ec4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1411,6 +1411,7 @@ OBJS = \
 	tree-scalar-evolution.o \
 	tree-sra.o \
 	tree-switch-conversion.o \
+	tree-switch-shortcut.o \
 	tree-ssa-address.o \
 	tree-ssa-alias.o \
 	tree-ssa-ccp.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 0c4f86b..fe0664a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2249,6 +2249,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+ftree-switch-shortcut
+Common Report Var(flag_tree_switch_shortcut) Init(0) Optimization
+Convert jumps to switch statements into jumps to case statement.
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/opts.c b/gcc/opts.c
index be1867c..f1ac2e5 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -514,6 +514,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_switch_shortcut, NULL, 1 },
 
     /* -Ofast adds optimizations to -O3.  */
     { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
diff --git a/gcc/params.def b/gcc/params.def
index cad00e2..65377d3 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1058,6 +1058,20 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
 	  "strength reduction",
 	  50, 1, 999999)
 
+/* Maximum number of instructions to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_INSNS,
+	  "max-switch-insns",
+	  "Maximum number of instructions to duplicate when "
+	  "shortcutting a switch statement",
+	  100, 1, 999999)
+
+/* Maximum number of paths to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_PATHS,
+	  "max-switch-paths",
+	  "Maximum number of new paths to create when"
+	  " shortcutting a switch statement",
+	  50, 1, 999999)
+
 DEFPARAM (PARAM_ASAN_STACK,
          "asan-stack",
          "Enable asan stack protection",
diff --git a/gcc/passes.def b/gcc/passes.def
index f13df6c..8bbf2d0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -157,6 +157,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_cselim);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_tree_ifcombine);
+      NEXT_PASS (pass_tree_switch_shortcut);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_tail_recursion);
       NEXT_PASS (pass_ch);
diff --git a/gcc/timevar.def b/gcc/timevar.def
index a04d05c..d9ee915 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
+DEFTIMEVAR (TV_TREE_SWITCH_SHORTCUT  , "switch statement shortcuts")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
 DEFTIMEVAR (TV_TREE_SLP_VECTORIZATION, "tree slp vectorization")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 1477d1f..f898e27 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -575,6 +575,7 @@ extern gimple_opt_pass *make_pass_early_inline (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_inline_parameters (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_tree_switch_shortcut (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-switch-shortcut.c b/gcc/tree-switch-shortcut.c
new file mode 100644
index 0000000..8ecb74b
--- /dev/null
+++ b/gcc/tree-switch-shortcut.c
@@ -0,0 +1,373 @@
+/* Switch shortcutting optimization for GNU C
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Steve Ellcey (sellcey@imgtec.com).
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This file implements an optimization where, when a variable is set
+   to a constant value and there is a path that leads from this definition
+   to a switch statement that uses that variable as its controlling expression
+   we duplicate the blocks on this path and change the switch goto to a
+   direct goto to the label of the switch block that control would goto based
+   on the value of the variable.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "params.h"
+#include "flags.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "basic-block.h"
+#include "function.h"
+#include "hash-table.h"
+#include "tree-ssa-alias.h"
+#include "tree-cfg.h"
+#include "tree-ssa-operands.h"
+#include "tree-inline.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "tree-phinodes.h"
+#include "gimple-iterator.h"
+#include "gimple-ssa.h"
+#include "ssa-iterators.h"
+#include "tree-into-ssa.h"
+#include "cfgloop.h"
+
+/* Helper function for find_path, visited_bbs is used to make sure we don't
+   fall into an infinite loop.  */
+
+static int
+find_path_1(basic_block start_bb, basic_block end_bb, hash_set<basic_block> *visited_bbs)
+{
+  edge_iterator ei;
+  edge e;
+
+  if (start_bb == end_bb) return 1;
+
+  if (!visited_bbs->add(start_bb))
+    {
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (find_path_1 (e->dest, end_bb, visited_bbs))
+	  return 1;
+    }
+    return 0;
+}
+
+/* Return 1 if there is a path from start_bb to end_bb and 0 if there
+   is not.  There may be multiple paths from start_bb to end_bb.  */
+
+static int
+find_path(basic_block start_bb, basic_block end_bb)
+{
+  edge_iterator ei;
+  edge e;
+  hash_set<basic_block> visited_bbs;
+  int p = 0;
+
+  if (start_bb == end_bb) return 1;
+
+  if (!visited_bbs.add(start_bb))
+    {
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (find_path_1 (e->dest, end_bb, &visited_bbs))
+	  {
+	    p = 1;
+	    break;
+	  }
+    }
+  return p;
+}
+
+
+/* We save the paths we want to copy in bbs_list_array.  n_bbs_list is the
+   number of paths saved, bbs_list_array[i] is the list of basic blocks in
+   one path.  Each path starts with the block where a variable is assigned
+   a constant value (bbs_list_array[i][0]) and ends with the switch statement
+   block (bbs_list_array[i][bbs_list_size[i]-2]) followed by the block that
+   the switch statement is going to go to given the constant value of the
+   variable (bbs_list_array[i][bbs_list_size[i]-1]).  */
+
+static basic_block **bbs_list_array;
+static int *val_array;
+static int *bbs_list_size;
+static int max_path_count;
+static int max_insn_count;
+static int n_bbs_list;
+
+/* bbs_list[0] is the block with the switch statement,
+   bbs_list[n-1] is the block where the switch statement variable is assigned
+     a constant value,
+   The entries in between make a (reverse) path between the two.
+
+   We don't want to change bb_list, we want to leave that alone and
+   and copy the path to bbs_list_array so that we wind up with a list (array)
+   of paths that we want to update.  We also want to add the block that the
+   switch is going to go to on to the list so that we know which exit from
+   the switch statement is important.  */
+
+static void
+save_new_path (basic_block *bbs_list, int n, tree val)
+{
+  int i;
+  int insn_count;
+  basic_block bb;
+  edge switch_taken_edge;
+  gimple_stmt_iterator gsi;
+
+  if (n <= 1) return;
+
+  if (n_bbs_list >= max_path_count)
+    return;
+
+  /* Put the blocks in 'correct' order and add in where we want to go after
+     the switch statement, We want to leave bbs_list untouched for future
+     calls.  */
+
+  bbs_list_array[n_bbs_list] = XNEWVEC (basic_block, n+1);
+  for (i = 0; i < n; i++)
+    bbs_list_array[n_bbs_list][i] = bbs_list[n-i-1];
+
+  switch_taken_edge = find_taken_edge (bbs_list[0], val);
+  bbs_list_array[n_bbs_list][n] = switch_taken_edge->dest;
+
+  bbs_list_size[n_bbs_list] = n + 1;
+  val_array[n_bbs_list] = (int) TREE_INT_CST_LOW (val);
+
+  /* Count how many instructions are in the blocks we are going to
+     duplicate and if there are too many do not save this path
+     (return without incrementing n_bbs_list).  */
+
+  insn_count = 0;
+  for (i = 1; i < n; i++)
+    {
+      bb = bbs_list_array[n_bbs_list][i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	insn_count += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+    }
+
+  if (insn_count > max_insn_count)
+    return;
+
+  n_bbs_list = n_bbs_list + 1;
+}
+
+/* switch_stmt is a switch statement whose switch index expression
+   is the variable expr.  We trace the value of the variable back
+   through any phi nodes looking for places where it gets a constant
+   value and save the path in bbs_list.  Then we call save_new_path
+   to create a list of such paths.  */
+
+static void
+process_switch (tree expr, gimple switch_stmt,
+		hash_set<gimple> *visited_phis,
+	        basic_block *bbs_list, int n)
+{
+  gimple def_stmt;
+  tree var;
+  unsigned int i;
+  edge e;
+  edge_iterator ei;
+  basic_block bbx;
+  basic_block var_bb;
+  int e_count;
+
+  gcc_assert (gimple_code (switch_stmt) == GIMPLE_SWITCH);
+  var = SSA_NAME_VAR (expr);
+  def_stmt = SSA_NAME_DEF_STMT (expr);
+  var_bb = gimple_bb (def_stmt);
+
+  if (var == NULL || var_bb == NULL) return;
+
+  /* We have a variable definition (var) that is defined in var_bb,
+     We want to put the path from var_bb to the current bb into the
+     bbs_list.  If there is more then one path, skip this and don't
+     try to do the optimization.  */
+
+  bbx = bbs_list[n-1];
+  while (bbx != var_bb)
+   {
+     e_count = 0;
+     FOR_EACH_EDGE (e, ei, bbx->preds)
+       {
+         if (find_path (var_bb, e->src))
+	   {
+      	     bbs_list[n] = e->src;
+      	     n = n + 1;
+	     e_count = e_count + 1;
+    	   }
+       }
+     if (e_count != 1) return;
+     bbx = bbs_list[n-1];
+   }
+
+  if ((gimple_code (def_stmt) == GIMPLE_PHI)
+       && !visited_phis->add (def_stmt))
+    {
+      for (i = 0; i < gimple_phi_num_args (def_stmt); i++)
+	{
+	  tree arg = gimple_phi_arg_def (def_stmt, i);
+	  if (arg && (TREE_CODE (arg) == INTEGER_CST))
+	    {
+	      /* const char *name = IDENTIFIER_POINTER (DECL_NAME (var)); */
+	      bbs_list[n] = gimple_phi_arg_edge (def_stmt, i)->src;
+	      save_new_path(bbs_list, n + 1, arg);
+	    }
+	  else if (arg && (TREE_CODE (arg) == SSA_NAME))
+	    {
+		  bbs_list[n] = gimple_phi_arg_edge (def_stmt, i)->src;
+		  process_switch (arg, switch_stmt, visited_phis, bbs_list, n+1);
+	    }
+	}
+    }
+}
+
+/* Find paths that lead from blocks where a variable is assigned a constant
+   value to a switch statement where that variable is used as the switch
+   index.  Save the paths in bbs_list_array so that they can be processed
+   by copy_switch_paths.  */
+
+static unsigned int
+find_switch_shortcuts (function *fun)
+{
+  basic_block bb;
+  hash_set<gimple> visited_phis;
+  basic_block *bbs_list;
+  int n = 1;
+
+  bbs_list = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple stmt = last_stmt (bb);
+      if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
+	{
+	  tree op = gimple_switch_index (stmt);
+	  tree var = SSA_NAME_VAR (op);
+	  if (var)
+	    {
+	      bbs_list[0] = bb;
+	      process_switch (op, stmt, &visited_phis, bbs_list, n);
+	    }
+	}
+    }
+  XDELETEVEC (bbs_list);
+  return 0;
+}
+
+/* Call gimple_duplicate_sese_region to douplicate the blocks in bb_list.
+   We free and recalculate all ssa and dominance information afterwords
+   because the region being copied is not really SESE and so we cannot
+   trust gimple_duplicate_sese_region to correctly update the dataflow
+   information.  */
+
+static void
+duplicate_blocks (basic_block *bb_list, int bb_count)
+{
+  edge orig_edge, exit_edge;
+
+  orig_edge = find_edge (bb_list[0], bb_list[1]);
+  exit_edge = find_edge (bb_list[bb_count-2], bb_list[bb_count-1]);
+  gimple_duplicate_sese_region (orig_edge, exit_edge, &bb_list[1], bb_count-2, NULL, false);
+  free_dominance_info (CDI_DOMINATORS);
+  update_ssa (TODO_update_ssa);
+  calculate_dominance_info (CDI_DOMINATORS);
+  fix_loop_structure (NULL);
+}
+
+/* Go through the paths saved in bbs_list_array and make copies of them.  */
+
+static void
+copy_switch_paths (void)
+{
+  int i;
+
+  /* Process each path in bbs_list_size.  */
+  for (i = 0; i < n_bbs_list; i++)
+    {
+    /* For each path in bbs_list_size loop through and copy each block in
+       the path (except the first on where the constant is assigned and
+       the final one where the switch statement goes to.  */
+
+    if (!single_pred_p (bbs_list_array[i][1]))
+      duplicate_blocks (bbs_list_array[i], bbs_list_size[i]);
+    }
+}
+
+
+/* Main entry for the tree if-conversion pass.  */
+
+namespace {
+
+const pass_data pass_data_tree_switch_shortcut =
+{
+  GIMPLE_PASS, /* type */
+  "switch_shortcut", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_SWITCH_SHORTCUT, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_tree_switch_shortcut : public gimple_opt_pass
+{
+public:
+  pass_tree_switch_shortcut (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_tree_switch_shortcut, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return flag_tree_switch_shortcut;
+    }
+  virtual unsigned int execute (function *);
+
+}; // class pass_tree_switch_shortcut
+
+unsigned int
+pass_tree_switch_shortcut::execute (function *fun)
+{
+  int i;
+
+  n_bbs_list = 0;
+  max_insn_count = PARAM_VALUE (PARAM_MAX_SWITCH_INSNS);
+  max_path_count = PARAM_VALUE (PARAM_MAX_SWITCH_PATHS);
+  val_array = XNEWVEC (int, max_path_count);
+  bbs_list_size = XNEWVEC (int, max_path_count); 
+  bbs_list_array = XNEWVEC (basic_block *, max_path_count);
+  find_switch_shortcuts (fun);
+  copy_switch_paths ();
+  XDELETEVEC (val_array);
+  XDELETEVEC (bbs_list_size);
+  for (i = 0; i < n_bbs_list; i++)
+    XDELETEVEC (bbs_list_array[i]); 
+  XDELETEVEC (bbs_list_array);
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_tree_switch_shortcut (gcc::context *ctxt)
+{
+  return new pass_tree_switch_shortcut (ctxt);
+}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 17:47 RFC: Patch for switch elimination (PR 54742) Steve Ellcey
@ 2014-08-12 17:56 ` Jakub Jelinek
  2014-08-12 18:31 ` Jeff Law
  1 sibling, 0 replies; 28+ messages in thread
From: Jakub Jelinek @ 2014-08-12 17:56 UTC (permalink / raw)
  To: Steve Ellcey; +Cc: GCC Patches, Jeff Law, james.greenhalgh

On Tue, Aug 12, 2014 at 10:46:46AM -0700, Steve Ellcey wrote:
> --- /dev/null
> +++ b/gcc/tree-switch-shortcut.c
> +/* This file implements an optimization where, when a variable is set
> +   to a constant value and there is a path that leads from this definition
> +   to a switch statement that uses that variable as its controlling expression
> +   we duplicate the blocks on this path and change the switch goto to a
> +   direct goto to the label of the switch block that control would goto based
> +   on the value of the variable.  */

Would be nice to have some short C example here in the comment.

> +static int
> +find_path_1(basic_block start_bb, basic_block end_bb, hash_set<basic_block> *visited_bbs)
> +{
> +  edge_iterator ei;
> +  edge e;
> +
> +  if (start_bb == end_bb) return 1;

Sorry for pedantry, but you don't have space before ( in many places,
the find_path_1 line is too long.

> +
> +  if (!visited_bbs->add(start_bb))
> +    {
> +      FOR_EACH_EDGE (e, ei, start_bb->succs)
> +	if (find_path_1 (e->dest, end_bb, visited_bbs))
> +	  return 1;
> +    }
> +    return 0;

return 0 not below if.

> +static int
> +find_path(basic_block start_bb, basic_block end_bb)

Again missing space.

> +  if (!visited_bbs.add(start_bb))

Likewise.

> +static basic_block **bbs_list_array;
> +static int *val_array;
> +static int *bbs_list_size;
> +static int max_path_count;
> +static int max_insn_count;
> +static int n_bbs_list;

For a simple pass like this, is it really necessary to add new global
variables?

> +     FOR_EACH_EDGE (e, ei, bbx->preds)
> +       {
> +         if (find_path (var_bb, e->src))
> +	   {
> +      	     bbs_list[n] = e->src;
> +      	     n = n + 1;
> +	     e_count = e_count + 1;
> +    	   }
> +       }

Weird indentation.

> +  if ((gimple_code (def_stmt) == GIMPLE_PHI)

Extraneous parens around the equality check.

> +	  else if (arg && (TREE_CODE (arg) == SSA_NAME))

Likewise.

> +	    {
> +		  bbs_list[n] = gimple_phi_arg_edge (def_stmt, i)->src;
> +		  process_switch (arg, switch_stmt, visited_phis, bbs_list, n+1);

Weird indentation.

	Jakub

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 17:47 RFC: Patch for switch elimination (PR 54742) Steve Ellcey
  2014-08-12 17:56 ` Jakub Jelinek
@ 2014-08-12 18:31 ` Jeff Law
  2014-08-12 20:23   ` Richard Biener
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Law @ 2014-08-12 18:31 UTC (permalink / raw)
  To: Steve Ellcey, GCC Patches; +Cc: james.greenhalgh

On 08/12/14 11:46, Steve Ellcey wrote:
> After talking to Jeff Law at the GCC Cauldron I have updated my switch
> shortcut plugin pass to try and address this optimization in the hopes of
> getting it added to GCC as a static pass.  I fixed the code to build with
> the various C++ changes that have been happening in GCC but the current
> version I have included in this email is not working yet.  When I run it
> on coremark I get errors like:
>
> core_state.c: In function 'core_bench_state':
> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>   ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>          ^
> core_state.c:43:8: error: bb 15 does not belong to loop 16
> core_state.c:43:8: error: bb 113 does not belong to loop 16
> core_state.c:43:8: error: bb 118 does not belong to loop 16
> core_state.c:43:8: error: bb 117 does not belong to loop 16
> (etc)
>
> Apparently there have been some changes to the loop information that
> is built since GCC 4.9.  I had hoped that adding a call to fix_loop_structure
> after recalculating the dominance information would fix this but it didn't.
>
> Does anyone have any ideas on how to fix up the loop information that my
> optimization has messed as it duplicates blocks?  I tried adding a call
> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure but that
> did not seem to have any affect.
Try setting the header & latch fields for the loop structure to NULL, 
then call loops_set_state (LOOPS_NEED_FIXUP).

Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 18:31 ` Jeff Law
@ 2014-08-12 20:23   ` Richard Biener
  2014-08-12 20:40     ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-12 20:23 UTC (permalink / raw)
  To: Jeff Law, Steve Ellcey, GCC Patches; +Cc: james.greenhalgh

On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
>On 08/12/14 11:46, Steve Ellcey wrote:
>> After talking to Jeff Law at the GCC Cauldron I have updated my
>switch
>> shortcut plugin pass to try and address this optimization in the
>hopes of
>> getting it added to GCC as a static pass.  I fixed the code to build
>with
>> the various C++ changes that have been happening in GCC but the
>current
>> version I have included in this email is not working yet.  When I run
>it
>> on coremark I get errors like:
>>
>> core_state.c: In function 'core_bench_state':
>> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>>   ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>>          ^
>> core_state.c:43:8: error: bb 15 does not belong to loop 16
>> core_state.c:43:8: error: bb 113 does not belong to loop 16
>> core_state.c:43:8: error: bb 118 does not belong to loop 16
>> core_state.c:43:8: error: bb 117 does not belong to loop 16
>> (etc)
>>
>> Apparently there have been some changes to the loop information that
>> is built since GCC 4.9.  I had hoped that adding a call to
>fix_loop_structure
>> after recalculating the dominance information would fix this but it
>didn't.
>>
>> Does anyone have any ideas on how to fix up the loop information that
>my
>> optimization has messed as it duplicates blocks?  I tried adding a
>call
>> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure
>but that
>> did not seem to have any affect.
>Try setting the header & latch fields for the loop structure to NULL, 
>then call loops_set_state (LOOPS_NEED_FIXUP).

But that is _not_ the appropriate way of keeping loops preserved!

Richard.

>Jeff


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 20:23   ` Richard Biener
@ 2014-08-12 20:40     ` Jeff Law
  2014-08-12 22:45       ` Steve Ellcey
                         ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Jeff Law @ 2014-08-12 20:40 UTC (permalink / raw)
  To: Richard Biener, Steve Ellcey, GCC Patches; +Cc: james.greenhalgh

On 08/12/14 14:23, Richard Biener wrote:
> On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
>> On 08/12/14 11:46, Steve Ellcey wrote:
>>> After talking to Jeff Law at the GCC Cauldron I have updated my
>> switch
>>> shortcut plugin pass to try and address this optimization in the
>> hopes of
>>> getting it added to GCC as a static pass.  I fixed the code to build
>> with
>>> the various C++ changes that have been happening in GCC but the
>> current
>>> version I have included in this email is not working yet.  When I run
>> it
>>> on coremark I get errors like:
>>>
>>> core_state.c: In function 'core_bench_state':
>>> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>>>    ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>>>           ^
>>> core_state.c:43:8: error: bb 15 does not belong to loop 16
>>> core_state.c:43:8: error: bb 113 does not belong to loop 16
>>> core_state.c:43:8: error: bb 118 does not belong to loop 16
>>> core_state.c:43:8: error: bb 117 does not belong to loop 16
>>> (etc)
>>>
>>> Apparently there have been some changes to the loop information that
>>> is built since GCC 4.9.  I had hoped that adding a call to
>> fix_loop_structure
>>> after recalculating the dominance information would fix this but it
>> didn't.
>>>
>>> Does anyone have any ideas on how to fix up the loop information that
>> my
>>> optimization has messed as it duplicates blocks?  I tried adding a
>> call
>>> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure
>> but that
>>> did not seem to have any affect.
>> Try setting the header & latch fields for the loop structure to NULL,
>> then call loops_set_state (LOOPS_NEED_FIXUP).
>
> But that is _not_ the appropriate way of keeping loops preserved!
I think that's done when we've scrogged the loop beyond repair and want 
the structures rebuilt.  IIRC, that's what you recommended we do.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 20:40     ` Jeff Law
@ 2014-08-12 22:45       ` Steve Ellcey
  2014-08-13 20:55         ` Sebastian Pop
  2014-08-13  2:54       ` Bin.Cheng
  2014-08-13  9:44       ` Richard Biener
  2 siblings, 1 reply; 28+ messages in thread
From: Steve Ellcey @ 2014-08-12 22:45 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches, james.greenhalgh, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]

On Tue, 2014-08-12 at 14:40 -0600, Jeff Law wrote:
> On 08/12/14 14:23, Richard Biener wrote:
> > On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
> >> Try setting the header & latch fields for the loop structure to NULL,
> >> then call loops_set_state (LOOPS_NEED_FIXUP).
> >
> > But that is _not_ the appropriate way of keeping loops preserved!
> I think that's done when we've scrogged the loop beyond repair and want 
> the structures rebuilt.  IIRC, that's what you recommended we do.
> 
> jeff

Well, I don't know if it is the 'right' thing to do, but it does seem to
work.  Here is a new patch that seems to be working and that also
addresses the comments that Jakub Jelinek had.  I want to do more
testing before I officially submit it but I thought I would put it out
here for others to look over again while I do that.

Steve Ellcey
sellcey@mips.com

2014-08-12  Steve Ellcey  <sellcey@mips.com>

	PR tree-opt/54742
	* Makefile.in (OBJS): Add tree-switch-shortcut.o.
	* common.opt (ftree-switch-shortcut): New.
	* opts.c (default_options_table): Add OPT_ftree_switch_shortcut.
	* params.def (PARAM_MAX_SWITCH_INSNS): New.
	(PARAM_MAX_SWITCH_PATHS): New.
	* passes.def (pass_tree_switch_shortcut): New.
	* timevar.def (TV_TREE_SWITCH_SHORTCUT): New.
	* tree-pass.h (make_pass_tree_switch_shortcut): New.
	* tree-switch-shortcut.c: New.


[-- Attachment #2: diff.out --]
[-- Type: text/x-patch, Size: 17495 bytes --]

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 31c1f4d..94e8ec4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1411,6 +1411,7 @@ OBJS = \
 	tree-scalar-evolution.o \
 	tree-sra.o \
 	tree-switch-conversion.o \
+	tree-switch-shortcut.o \
 	tree-ssa-address.o \
 	tree-ssa-alias.o \
 	tree-ssa-ccp.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 0c4f86b..fe0664a 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2249,6 +2249,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+ftree-switch-shortcut
+Common Report Var(flag_tree_switch_shortcut) Init(0) Optimization
+Convert jumps to switch statements into jumps to case statement.
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/opts.c b/gcc/opts.c
index be1867c..f1ac2e5 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -514,6 +514,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC },
     { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
     { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
+    { OPT_LEVELS_3_PLUS, OPT_ftree_switch_shortcut, NULL, 1 },
 
     /* -Ofast adds optimizations to -O3.  */
     { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
diff --git a/gcc/params.def b/gcc/params.def
index cad00e2..65377d3 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1058,6 +1058,20 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
 	  "strength reduction",
 	  50, 1, 999999)
 
+/* Maximum number of instructions to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_INSNS,
+	  "max-switch-insns",
+	  "Maximum number of instructions to duplicate when "
+	  "shortcutting a switch statement",
+	  100, 1, 999999)
+
+/* Maximum number of paths to duplicate when shortcutting a switch.  */
+DEFPARAM (PARAM_MAX_SWITCH_PATHS,
+	  "max-switch-paths",
+	  "Maximum number of new paths to create when"
+	  " shortcutting a switch statement",
+	  50, 1, 999999)
+
 DEFPARAM (PARAM_ASAN_STACK,
          "asan-stack",
          "Enable asan stack protection",
diff --git a/gcc/passes.def b/gcc/passes.def
index f13df6c..8bbf2d0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -157,6 +157,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_cselim);
       NEXT_PASS (pass_copy_prop);
       NEXT_PASS (pass_tree_ifcombine);
+      NEXT_PASS (pass_tree_switch_shortcut);
       NEXT_PASS (pass_phiopt);
       NEXT_PASS (pass_tail_recursion);
       NEXT_PASS (pass_ch);
diff --git a/gcc/timevar.def b/gcc/timevar.def
index a04d05c..d9ee915 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -170,6 +170,7 @@ DEFTIMEVAR (TV_TREE_LOOP_IVCANON     , "tree canonical iv")
 DEFTIMEVAR (TV_SCEV_CONST            , "scev constant prop")
 DEFTIMEVAR (TV_TREE_LOOP_UNSWITCH    , "tree loop unswitching")
 DEFTIMEVAR (TV_COMPLETE_UNROLL       , "complete unrolling")
+DEFTIMEVAR (TV_TREE_SWITCH_SHORTCUT  , "switch statement shortcuts")
 DEFTIMEVAR (TV_TREE_PARALLELIZE_LOOPS, "tree parallelize loops")
 DEFTIMEVAR (TV_TREE_VECTORIZATION    , "tree vectorization")
 DEFTIMEVAR (TV_TREE_SLP_VECTORIZATION, "tree slp vectorization")
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 1477d1f..f898e27 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -575,6 +575,7 @@ extern gimple_opt_pass *make_pass_early_inline (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_inline_parameters (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_tree_switch_shortcut (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-switch-shortcut.c b/gcc/tree-switch-shortcut.c
new file mode 100644
index 0000000..21d48f8
--- /dev/null
+++ b/gcc/tree-switch-shortcut.c
@@ -0,0 +1,438 @@
+/* Switch shortcutting optimization for GNU C
+   Copyright (C) 2013 Free Software Foundation, Inc.
+   Contributed by Steve Ellcey (steve.ellcey@imgtec.com).
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This file implements an optimization where, when a variable is set
+   to a constant value and there is a path that leads from that definition
+   to a switch statement that uses that variable as its controlling expression
+   we duplicate the blocks on this path and change the jump to the switch
+   statement with a direct jump to the label of the switch block that control
+   would goto based on the value of the variable.  This can come up in
+   loops/switch statements that implement state machines.
+
+   Example (modified from PR 54742):
+
+   foo(char *str) {
+     int sum=0;
+     int state=0;
+     char *s=str;
+     for (; *s; s++) {
+       char c=*s;
+       <CODE BLOCK 1>
+       switch (state) {
+         case 0:
+           if (c == '+')       { state = 1; sum += 9; }
+           else if (c != '-')  { state = 2; sum += 3; }
+           break;
+         case 1:
+           if (c == '+')       { state = 2; sum += 4; }
+           else if (c == '-')  { state = 0; sum += 7; }
+           break;
+         case 2:
+           if (c == '+')       { state = 0; sum += 8; }
+           else if (c == '-')  { state = 1; sum += 2; }
+           break;
+       }
+       <CODE BLOCK 2>
+     }
+     return state;
+   }
+
+  This pass will convert the code inside 'case 0' to something like:
+
+    case 0:
+      if (c == '+')      { state = 1; sum += 9;
+                           <CODE BLOCK 2>
+                           s++; if (!s) goto loop_exit;
+                           <CODE BLOCK 1>
+                           goto case_1; }
+      else if (c != '-') { state = 2; sum += 3;
+                           <CODE BLOCK 2>
+                           s++; if (!s) goto loop_exit;
+                           <CODE BLOCK 1>
+                           goto case_2; }
+      else               { <CODE BLOCK 2>
+			   s++; if (!s) goto exit;
+                           <CODE BLOCK 1>
+                           goto case_0; }
+
+Similar transformations would apply to the other parts of the switch
+statement.  This obviously can lead to a lot of code duplication but
+it can also result in faster code since we are replacing two jumps
+(one indirect) with a single direct jump.  */
+ 
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "params.h"
+#include "flags.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "basic-block.h"
+#include "function.h"
+#include "hash-table.h"
+#include "tree-ssa-alias.h"
+#include "tree-cfg.h"
+#include "tree-ssa-operands.h"
+#include "tree-inline.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "tree-phinodes.h"
+#include "gimple-iterator.h"
+#include "gimple-ssa.h"
+#include "ssa-iterators.h"
+#include "tree-into-ssa.h"
+#include "cfgloop.h"
+
+/* Helper function for find_path, visited_bbs is used to make sure we don't
+   fall into an infinite loop.  */
+
+static int
+find_path_1 (basic_block start_bb, basic_block end_bb,
+	     hash_set<basic_block> *visited_bbs)
+{
+  edge_iterator ei;
+  edge e;
+
+  if (start_bb == end_bb) return 1;
+
+  if (!visited_bbs->add (start_bb))
+    {
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (find_path_1 (e->dest, end_bb, visited_bbs))
+	  return 1;
+    }
+  return 0;
+}
+
+/* Return 1 if there is a path from start_bb to end_bb and 0 if there
+   is not.  There may be multiple paths from start_bb to end_bb.  */
+
+static int
+find_path (basic_block start_bb, basic_block end_bb)
+{
+  edge_iterator ei;
+  edge e;
+  hash_set<basic_block> visited_bbs;
+  int p = 0;
+
+  if (start_bb == end_bb) return 1;
+
+  if (!visited_bbs.add (start_bb))
+    {
+      FOR_EACH_EDGE (e, ei, start_bb->succs)
+	if (find_path_1 (e->dest, end_bb, &visited_bbs))
+	  {
+	    p = 1;
+	    break;
+	  }
+    }
+  return p;
+}
+
+
+/* We save the paths we want to copy in bbs_list_array.  n_bbs_list is the
+   number of paths saved, bbs_list_array[i] is the list of basic blocks in
+   one path.  Each path starts with the block where a variable is assigned
+   a constant value (bbs_list_array[i][0]) and ends with the switch statement
+   block (bbs_list_array[i][bbs_list_size[i]-2]) followed by the block that
+   the switch statement is going to go to given the constant value of the
+   variable (bbs_list_array[i][bbs_list_size[i]-1]).  */
+
+struct path_info
+{
+  basic_block **bbs_list_array;
+  int *val_array;
+  int *bbs_list_size;
+  int max_path_count;
+  int max_insn_count;
+  int n_bbs_list;
+};
+
+/* bbs_list[0] is the block with the switch statement,
+   bbs_list[n-1] is the block where the switch statement variable is assigned
+     a constant value,
+   The entries in between make a (reverse) path between the two.
+
+   We don't want to change bb_list, we want to leave that alone and
+   and copy the path to bbs_list_array so that we wind up with a list (array)
+   of paths that we want to update.  We also want to add the block that the
+   switch is going to go to on to the list so that we know which exit from
+   the switch statement is important.  */
+
+static void
+save_new_path (basic_block *bbs_list, int n, tree val, path_info *pi)
+{
+  int i;
+  int insn_count;
+  basic_block bb;
+  edge switch_taken_edge;
+  gimple_stmt_iterator gsi;
+
+  if (n <= 1) return;
+
+  if (pi->n_bbs_list >= pi->max_path_count)
+    return;
+
+  /* Put the blocks in 'correct' order and add in where we want to go after
+     the switch statement, We want to leave bbs_list untouched for future
+     calls.  */
+
+  pi->bbs_list_array[pi->n_bbs_list] = XNEWVEC (basic_block, n+1);
+  for (i = 0; i < n; i++)
+    pi->bbs_list_array[pi->n_bbs_list][i] = bbs_list[n-i-1];
+
+  switch_taken_edge = find_taken_edge (bbs_list[0], val);
+  pi->bbs_list_array[pi->n_bbs_list][n] = switch_taken_edge->dest;
+
+  pi->bbs_list_size[pi->n_bbs_list] = n + 1;
+  pi->val_array[pi->n_bbs_list] = (int) TREE_INT_CST_LOW (val);
+
+  /* Count how many instructions are in the blocks we are going to
+     duplicate and if there are too many do not save this path
+     (return without incrementing n_bbs_list).  */
+
+  insn_count = 0;
+  for (i = 1; i < n; i++)
+    {
+      bb = pi->bbs_list_array[pi->n_bbs_list][i];
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	insn_count += estimate_num_insns (gsi_stmt (gsi), &eni_size_weights);
+    }
+
+  if (insn_count > pi->max_insn_count)
+    return;
+
+  pi->n_bbs_list = pi->n_bbs_list + 1;
+}
+
+/* switch_stmt is a switch statement whose switch index expression
+   is the variable expr.  We trace the value of the variable back
+   through any phi nodes looking for places where it gets a constant
+   value and save the path in bbs_list.  Then we call save_new_path
+   to create a list of such paths.  */
+
+static void
+process_switch (tree expr, gimple switch_stmt,
+		hash_set<gimple> *visited_phis,
+	        basic_block *bbs_list, int n,
+		path_info *pi)
+{
+  gimple def_stmt;
+  tree var;
+  unsigned int i;
+  edge e;
+  edge_iterator ei;
+  basic_block bbx;
+  basic_block var_bb;
+  int e_count;
+
+  gcc_assert (gimple_code (switch_stmt) == GIMPLE_SWITCH);
+  var = SSA_NAME_VAR (expr);
+  def_stmt = SSA_NAME_DEF_STMT (expr);
+  var_bb = gimple_bb (def_stmt);
+
+  if (var == NULL || var_bb == NULL) return;
+
+  /* We have a variable definition (var) that is defined in var_bb,
+     We want to put the path from var_bb to the current bb into the
+     bbs_list.  If there is more then one path, skip this and don't
+     try to do the optimization.  */
+
+  bbx = bbs_list[n-1];
+  while (bbx != var_bb)
+    {
+      e_count = 0;
+      FOR_EACH_EDGE (e, ei, bbx->preds)
+	if (find_path (var_bb, e->src))
+	  {
+	    bbs_list[n] = e->src;
+	    n = n + 1;
+	    e_count = e_count + 1;
+	  }
+      if (e_count != 1) return;
+      bbx = bbs_list[n-1];
+    }
+
+  if (gimple_code (def_stmt) == GIMPLE_PHI
+      && !visited_phis->add (def_stmt))
+    {
+      for (i = 0; i < gimple_phi_num_args (def_stmt); i++)
+	{
+	  tree arg = gimple_phi_arg_def (def_stmt, i);
+	  if (arg && TREE_CODE (arg) == INTEGER_CST)
+	    {
+	      /* const char *name = IDENTIFIER_POINTER (DECL_NAME (var)); */
+	      bbs_list[n] = gimple_phi_arg_edge (def_stmt, i)->src;
+	      save_new_path (bbs_list, n + 1, arg, pi);
+	    }
+	  else if (arg && TREE_CODE (arg) == SSA_NAME)
+	    {
+	      bbs_list[n] = gimple_phi_arg_edge (def_stmt, i)->src;
+	      process_switch (arg, switch_stmt, visited_phis, bbs_list, n+1, pi);
+	    }
+	}
+    }
+}
+
+/* Find paths that lead from blocks where a variable is assigned a constant
+   value to a switch statement where that variable is used as the switch
+   index.  Save the paths in bbs_list_array so that they can be processed
+   by copy_switch_paths.  */
+
+static unsigned int
+find_switch_shortcuts (function *fun, path_info *pi)
+{
+  basic_block bb;
+  hash_set<gimple> visited_phis;
+  basic_block *bbs_list;
+  int n = 1;
+
+  bbs_list = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple stmt = last_stmt (bb);
+      if (stmt && gimple_code (stmt) == GIMPLE_SWITCH)
+	{
+	  tree op = gimple_switch_index (stmt);
+	  tree var = SSA_NAME_VAR (op);
+	  if (var)
+	    {
+	      bbs_list[0] = bb;
+	      process_switch (op, stmt, &visited_phis, bbs_list, n, pi);
+	    }
+	}
+    }
+  XDELETEVEC (bbs_list);
+  return 0;
+}
+
+/* Call gimple_duplicate_sese_region to douplicate the blocks in bb_list.
+   We free and recalculate all ssa and dominance information afterwords
+   because the region being copied is not really SESE and so we cannot
+   trust gimple_duplicate_sese_region to correctly update the dataflow
+   information.  */
+
+static void
+duplicate_blocks (basic_block *bb_list, int bb_count)
+{
+  edge orig_edge, exit_edge;
+  loop_p loop;
+
+  orig_edge = find_edge (bb_list[0], bb_list[1]);
+  exit_edge = find_edge (bb_list[bb_count-2], bb_list[bb_count-1]);
+  gimple_duplicate_sese_region (orig_edge, exit_edge, &bb_list[1],
+				bb_count-2, NULL, false);
+  free_dominance_info (CDI_DOMINATORS);
+  update_ssa (TODO_update_ssa);
+  calculate_dominance_info (CDI_DOMINATORS);
+  FOR_EACH_LOOP (loop, 0)
+    {
+      loop->latch = NULL;
+      loop->header = NULL;
+    }
+  loops_state_set (LOOPS_NEED_FIXUP);
+}
+
+/* Go through the paths saved in bbs_list_array and make copies of them.  */
+
+static void
+copy_switch_paths (path_info *pi)
+{
+  int i;
+
+  /* Process each path in bbs_list_size.  */
+  for (i = 0; i < pi->n_bbs_list; i++)
+    {
+    /* For each path in bbs_list_size loop through and copy each block in
+       the path (except the first on where the constant is assigned and
+       the final one where the switch statement goes to.  */
+
+    if (!single_pred_p (pi->bbs_list_array[i][1]))
+      duplicate_blocks (pi->bbs_list_array[i], pi->bbs_list_size[i]);
+    }
+}
+
+
+/* Main entry for the tree if-conversion pass.  */
+
+namespace {
+
+const pass_data pass_data_tree_switch_shortcut =
+{
+  GIMPLE_PASS, /* type */
+  "switch_shortcut", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_SWITCH_SHORTCUT, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_tree_switch_shortcut : public gimple_opt_pass
+{
+public:
+  pass_tree_switch_shortcut (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_tree_switch_shortcut, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return flag_tree_switch_shortcut;
+    }
+  virtual unsigned int execute (function *);
+
+}; // class pass_tree_switch_shortcut
+
+unsigned int
+pass_tree_switch_shortcut::execute (function *fun)
+{
+  int i;
+  path_info *pi;
+
+  pi = XNEW (path_info);
+  pi->n_bbs_list = 0;
+  pi->max_insn_count = PARAM_VALUE (PARAM_MAX_SWITCH_INSNS);
+  pi->max_path_count = PARAM_VALUE (PARAM_MAX_SWITCH_PATHS);
+  pi->val_array = XNEWVEC (int, pi->max_path_count);
+  pi->bbs_list_size = XNEWVEC (int, pi->max_path_count); 
+  pi->bbs_list_array = XNEWVEC (basic_block *, pi->max_path_count);
+  find_switch_shortcuts (fun, pi);
+  copy_switch_paths (pi);
+  XDELETEVEC (pi->val_array);
+  XDELETEVEC (pi->bbs_list_size);
+  for (i = 0; i < pi->n_bbs_list; i++)
+    XDELETEVEC (pi->bbs_list_array[i]); 
+  XDELETEVEC (pi->bbs_list_array);
+  XDELETE (pi);
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_tree_switch_shortcut (gcc::context *ctxt)
+{
+  return new pass_tree_switch_shortcut (ctxt);
+}

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 20:40     ` Jeff Law
  2014-08-12 22:45       ` Steve Ellcey
@ 2014-08-13  2:54       ` Bin.Cheng
  2014-08-13  9:52         ` Richard Biener
  2014-08-13  9:44       ` Richard Biener
  2 siblings, 1 reply; 28+ messages in thread
From: Bin.Cheng @ 2014-08-13  2:54 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Steve Ellcey, GCC Patches, james.greenhalgh

On Wed, Aug 13, 2014 at 4:40 AM, Jeff Law <law@redhat.com> wrote:
> On 08/12/14 14:23, Richard Biener wrote:
>>
>> On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/12/14 11:46, Steve Ellcey wrote:
>>>>
>>>> After talking to Jeff Law at the GCC Cauldron I have updated my
>>>
>>> switch
>>>>
>>>> shortcut plugin pass to try and address this optimization in the
>>>
>>> hopes of
>>>>
>>>> getting it added to GCC as a static pass.  I fixed the code to build
>>>
>>> with
>>>>
>>>> the various C++ changes that have been happening in GCC but the
>>>
>>> current
>>>>
>>>> version I have included in this email is not working yet.  When I run
>>>
>>> it
>>>>
>>>> on coremark I get errors like:
>>>>
>>>> core_state.c: In function 'core_bench_state':
>>>> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>>>>    ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>>>>           ^
>>>> core_state.c:43:8: error: bb 15 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 113 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 118 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 117 does not belong to loop 16
>>>> (etc)
>>>>
>>>> Apparently there have been some changes to the loop information that
>>>> is built since GCC 4.9.  I had hoped that adding a call to
>>>
>>> fix_loop_structure
>>>>
>>>> after recalculating the dominance information would fix this but it
>>>
>>> didn't.
>>>>
>>>>
>>>> Does anyone have any ideas on how to fix up the loop information that
>>>
>>> my
>>>>
>>>> optimization has messed as it duplicates blocks?  I tried adding a
>>>
>>> call
>>>>
>>>> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure
>>>
>>> but that
>>>>
>>>> did not seem to have any affect.
>>>
>>> Try setting the header & latch fields for the loop structure to NULL,
>>> then call loops_set_state (LOOPS_NEED_FIXUP).
>>
>>
>> But that is _not_ the appropriate way of keeping loops preserved!
>
> I think that's done when we've scrogged the loop beyond repair and want the
> structures rebuilt.  IIRC, that's what you recommended we do.

Sorry for disturbing with this patch irrelevant question.  If I
understand correctly, we are now trying best to preserve loop
structure and LOOPS_NEED_FIXUP still works.  My question is how we
decide when we can rebuild loop structure and when we need to preserve
it?  If there is meta data stored in loop structure, does that mean it
should never be re-built?

Thanks,
bin

>
> jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 20:40     ` Jeff Law
  2014-08-12 22:45       ` Steve Ellcey
  2014-08-13  2:54       ` Bin.Cheng
@ 2014-08-13  9:44       ` Richard Biener
  2014-09-03 21:22         ` Jeff Law
  2 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-13  9:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steve Ellcey, GCC Patches, james.greenhalgh

On Tue, Aug 12, 2014 at 10:40 PM, Jeff Law <law@redhat.com> wrote:
> On 08/12/14 14:23, Richard Biener wrote:
>>
>> On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
>>>
>>> On 08/12/14 11:46, Steve Ellcey wrote:
>>>>
>>>> After talking to Jeff Law at the GCC Cauldron I have updated my
>>>
>>> switch
>>>>
>>>> shortcut plugin pass to try and address this optimization in the
>>>
>>> hopes of
>>>>
>>>> getting it added to GCC as a static pass.  I fixed the code to build
>>>
>>> with
>>>>
>>>> the various C++ changes that have been happening in GCC but the
>>>
>>> current
>>>>
>>>> version I have included in this email is not working yet.  When I run
>>>
>>> it
>>>>
>>>> on coremark I get errors like:
>>>>
>>>> core_state.c: In function 'core_bench_state':
>>>> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>>>>    ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>>>>           ^
>>>> core_state.c:43:8: error: bb 15 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 113 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 118 does not belong to loop 16
>>>> core_state.c:43:8: error: bb 117 does not belong to loop 16
>>>> (etc)
>>>>
>>>> Apparently there have been some changes to the loop information that
>>>> is built since GCC 4.9.  I had hoped that adding a call to
>>>
>>> fix_loop_structure
>>>>
>>>> after recalculating the dominance information would fix this but it
>>>
>>> didn't.
>>>>
>>>>
>>>> Does anyone have any ideas on how to fix up the loop information that
>>>
>>> my
>>>>
>>>> optimization has messed as it duplicates blocks?  I tried adding a
>>>
>>> call
>>>>
>>>> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure
>>>
>>> but that
>>>>
>>>> did not seem to have any affect.
>>>
>>> Try setting the header & latch fields for the loop structure to NULL,
>>> then call loops_set_state (LOOPS_NEED_FIXUP).
>>
>>
>> But that is _not_ the appropriate way of keeping loops preserved!
>
> I think that's done when we've scrogged the loop beyond repair and want the
> structures rebuilt.  IIRC, that's what you recommended we do.

I don't see that this pass should "scrog" a loop beyond repair.  Btw,
the "proper" way of just fixing loops up (assuming that all loop
headers are still at their appropriate place) is to _just_ do
loops_set_state (LOOPS_NEED_FIXUP).

The next pass doing a loop_optimizer_init () then will fixup loops.

So the case you mention is when you totally screw the loop by
creating new entry edges that not go to its header for example.
But still this is in theory very bad as you cause important annotations
to be lost.  If the loop is truly gone, ok, but if it just re-materializes
then you've done a bad job here.  Consider the case where a
loop becomes a loop nest - you'd want to preserve the loop header
as the header of the outer loop (which you'd have to identify its
header in some way - dominator checks to the rescue!) and let
fixup discover the new inner loop.

Yes, we may have little utility for dealing with the more complex
cases and I've been hesitant to enforce not dropping loops on the
floor an ICE (well, mainly because we can't even bootstrap with
such check ...), but in the end we should arrive there.

Richard.

> jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13  2:54       ` Bin.Cheng
@ 2014-08-13  9:52         ` Richard Biener
  2014-08-14 17:45           ` Steve Ellcey
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-13  9:52 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Jeff Law, Steve Ellcey, GCC Patches, james.greenhalgh

On Wed, Aug 13, 2014 at 4:54 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Aug 13, 2014 at 4:40 AM, Jeff Law <law@redhat.com> wrote:
>> On 08/12/14 14:23, Richard Biener wrote:
>>>
>>> On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 08/12/14 11:46, Steve Ellcey wrote:
>>>>>
>>>>> After talking to Jeff Law at the GCC Cauldron I have updated my
>>>>
>>>> switch
>>>>>
>>>>> shortcut plugin pass to try and address this optimization in the
>>>>
>>>> hopes of
>>>>>
>>>>> getting it added to GCC as a static pass.  I fixed the code to build
>>>>
>>>> with
>>>>>
>>>>> the various C++ changes that have been happening in GCC but the
>>>>
>>>> current
>>>>>
>>>>> version I have included in this email is not working yet.  When I run
>>>>
>>>> it
>>>>>
>>>>> on coremark I get errors like:
>>>>>
>>>>> core_state.c: In function 'core_bench_state':
>>>>> core_state.c:43:8: error: size of loop 16 should be 13, not 5
>>>>>    ee_u16 core_bench_state(ee_u32 blksize, ee_u8 *memblock,
>>>>>           ^
>>>>> core_state.c:43:8: error: bb 15 does not belong to loop 16
>>>>> core_state.c:43:8: error: bb 113 does not belong to loop 16
>>>>> core_state.c:43:8: error: bb 118 does not belong to loop 16
>>>>> core_state.c:43:8: error: bb 117 does not belong to loop 16
>>>>> (etc)
>>>>>
>>>>> Apparently there have been some changes to the loop information that
>>>>> is built since GCC 4.9.  I had hoped that adding a call to
>>>>
>>>> fix_loop_structure
>>>>>
>>>>> after recalculating the dominance information would fix this but it
>>>>
>>>> didn't.
>>>>>
>>>>>
>>>>> Does anyone have any ideas on how to fix up the loop information that
>>>>
>>>> my
>>>>>
>>>>> optimization has messed as it duplicates blocks?  I tried adding a
>>>>
>>>> call
>>>>>
>>>>> 'loop_optimizer_init (LOOPS_NORMAL)' before the fix_loop_structure
>>>>
>>>> but that
>>>>>
>>>>> did not seem to have any affect.
>>>>
>>>> Try setting the header & latch fields for the loop structure to NULL,
>>>> then call loops_set_state (LOOPS_NEED_FIXUP).
>>>
>>>
>>> But that is _not_ the appropriate way of keeping loops preserved!
>>
>> I think that's done when we've scrogged the loop beyond repair and want the
>> structures rebuilt.  IIRC, that's what you recommended we do.
>
> Sorry for disturbing with this patch irrelevant question.  If I
> understand correctly, we are now trying best to preserve loop
> structure and LOOPS_NEED_FIXUP still works.  My question is how we
> decide when we can rebuild loop structure and when we need to preserve
> it?  If there is meta data stored in loop structure, does that mean it
> should never be re-built?

Meta-data is associated with loops and ultimately loops are associated
with their header basic-block.  LOOPS_NEED_FIXUP tells the
machinery to re-scan the body for loops but keep the existing
header basic-block to loop structure association.  The only case it
does not do that is when you explicitely NULL a loop->header (you
signal the loop is now no longer a loop) or when the discovery process
discovers a loop which header basic-block didn't previously have a
loop associated with or if a loops recorded header basic-block is
no longer a loop header (by means of finding a latch edge going to it).

I'd like to make both discovering a new loop and removing an existing
loop (that wasn't marked for removal) an error.  All that passes need
to do is properly allocate a new loop structure for new loops headers
and mark loops that no longer are loops properly.  And of course
do it "the right way".

Note that the suspicious activity is logged in passes dump-files
(well, in the pass performing the fix_loop_structure call) with
TDF_DETAILS level as "fix_loop_structure: removing loop %d"
and "flow_loops_find: discovered new loop %d with header %d".

One issue with marking loops for removal is that we do that by
NULL-ing its header which makes spurious removal detection
impossible (as we no longer know its former header).  We should
fix that implementation detail.

Richard.

>
> Thanks,
> bin
>
>>
>> jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-12 22:45       ` Steve Ellcey
@ 2014-08-13 20:55         ` Sebastian Pop
  2014-08-13 21:06           ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Pop @ 2014-08-13 20:55 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Jeff Law, Richard Biener, GCC Patches, james.greenhalgh, Jakub Jelinek

Steve Ellcey wrote:
> +/* This file implements an optimization where, when a variable is set
> +   to a constant value and there is a path that leads from that definition
> +   to a switch statement that uses that variable as its controlling expression
> +   we duplicate the blocks on this path and change the jump to the switch
> +   statement with a direct jump to the label of the switch block that control
> +   would goto based on the value of the variable.  This can come up in
> +   loops/switch statements that implement state machines.

Can you explain why the jump-threading pass cannot do this?  Why do we need
another pass to handle a corner case that jump-thread does not catch yet?

> +
> +   Example (modified from PR 54742):
> +
> +   foo(char *str) {
> +     int sum=0;
> +     int state=0;
> +     char *s=str;
> +     for (; *s; s++) {
> +       char c=*s;
> +       <CODE BLOCK 1>

From what I understand, Jeff has fixed jump-threading to catch exactly the
pattern in this example.  What jump-threading doesn't do yet is duplicate the
path when <CODE BLOCK 1> is a condition, like in the example in comment 27:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742#c27
(see the if (c == '*') ):

int sum0, sum1, sum2, sum3;
int foo(char * s, char** ret)
{
  int state=0;
  char c;

  for (; *s && state != 4; s++)
    {
      c = *s;
      if (c == '*')
	{
	  s++;
	  break;
	}
      switch (state) {
	case 0:
	  if (c == '+') state = 1;
	  else if (c != '-') sum0+=c;
	  break;
	case 1:
	  if (c == '+') state = 2;
	  else if (c == '-') state = 0;
	  else sum1+=c;
	  break;
	case 2:
	  if (c == '+') state = 3;
	  else if (c == '-') state = 1;
	  else sum2+=c;
	  break;
	case 3:
	  if (c == '-') state = 2;
	  else if (c == 'x') state = 4;
	  break;
	default:
	  break;
      }
    }
  *ret = s;
  return state;
}


> +       switch (state) {
> +         case 0:
> +           if (c == '+')       { state = 1; sum += 9; }
> +           else if (c != '-')  { state = 2; sum += 3; }
> +           break;
> +         case 1:
> +           if (c == '+')       { state = 2; sum += 4; }
> +           else if (c == '-')  { state = 0; sum += 7; }
> +           break;
> +         case 2:
> +           if (c == '+')       { state = 0; sum += 8; }
> +           else if (c == '-')  { state = 1; sum += 2; }
> +           break;
> +       }
> +       <CODE BLOCK 2>
> +     }
> +     return state;
> +   }
> +
> +  This pass will convert the code inside 'case 0' to something like:
> +
> +    case 0:
> +      if (c == '+')      { state = 1; sum += 9;
> +                           <CODE BLOCK 2>
> +                           s++; if (!s) goto loop_exit;
> +                           <CODE BLOCK 1>
> +                           goto case_1; }
> +      else if (c != '-') { state = 2; sum += 3;
> +                           <CODE BLOCK 2>
> +                           s++; if (!s) goto loop_exit;
> +                           <CODE BLOCK 1>
> +                           goto case_2; }
> +      else               { <CODE BLOCK 2>
> +			   s++; if (!s) goto exit;
> +                           <CODE BLOCK 1>
> +                           goto case_0; }
> +
> +Similar transformations would apply to the other parts of the switch
> +statement.  This obviously can lead to a lot of code duplication but
> +it can also result in faster code since we are replacing two jumps
> +(one indirect) with a single direct jump.  */

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13 20:55         ` Sebastian Pop
@ 2014-08-13 21:06           ` Jeff Law
  2014-08-13 21:33             ` Sebastian Pop
  2014-08-14 10:32             ` Richard Biener
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff Law @ 2014-08-13 21:06 UTC (permalink / raw)
  To: Sebastian Pop, Steve Ellcey
  Cc: Richard Biener, GCC Patches, james.greenhalgh, Jakub Jelinek

On 08/13/14 14:55, Sebastian Pop wrote:
> Steve Ellcey wrote:
>> +/* This file implements an optimization where, when a variable is set
>> +   to a constant value and there is a path that leads from that definition
>> +   to a switch statement that uses that variable as its controlling expression
>> +   we duplicate the blocks on this path and change the jump to the switch
>> +   statement with a direct jump to the label of the switch block that control
>> +   would goto based on the value of the variable.  This can come up in
>> +   loops/switch statements that implement state machines.
>
> Can you explain why the jump-threading pass cannot do this?  Why do we need
> another pass to handle a corner case that jump-thread does not catch yet?
I'm pretty sure jump threading *could* handle it, but after looking at 
the full testcase when it was posted, I'm not sure it's *wise* to handle 
this in jump threading.

The core issue is we have to look even deeper into the CFG than was 
originally envisioned and do quite a bit more block duplication than was 
originally envisioned.  That's going to have a notable compile-time cost 
(and to a lesser extent issues around codesize).

It's unfortunate that the testcase when we first looked at this was 
over-simplified and not actually representative of what happens in 
Coremark.  Had I seen the Coremark testcase, I probably wouldn't have 
suggested we tackle the problem in jump threading and the extensions I 
made to jump threading would _not_ have included aggressively following 
backedges in the CFG.

You'll note in a separate thread Steve and I discussed this during 
Cauldron and it was at my recommendation Steve resurrected his proof of 
concept plugin and started beating it into shape.

jeff


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13 21:06           ` Jeff Law
@ 2014-08-13 21:33             ` Sebastian Pop
  2014-08-14 10:32             ` Richard Biener
  1 sibling, 0 replies; 28+ messages in thread
From: Sebastian Pop @ 2014-08-13 21:33 UTC (permalink / raw)
  To: Jeff Law
  Cc: Steve Ellcey, Richard Biener, GCC Patches, james.greenhalgh,
	Jakub Jelinek

Jeff Law wrote:
> I'm pretty sure jump threading *could* handle it, but after looking
> at the full testcase when it was posted, I'm not sure it's *wise* to
> handle this in jump threading.

Thanks for clearing my doubts.

Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13 21:06           ` Jeff Law
  2014-08-13 21:33             ` Sebastian Pop
@ 2014-08-14 10:32             ` Richard Biener
  2014-08-14 15:56               ` Jeff Law
  1 sibling, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-14 10:32 UTC (permalink / raw)
  To: Jeff Law
  Cc: Sebastian Pop, Steve Ellcey, GCC Patches, james.greenhalgh,
	Jakub Jelinek

On Wed, Aug 13, 2014 at 11:06 PM, Jeff Law <law@redhat.com> wrote:
> On 08/13/14 14:55, Sebastian Pop wrote:
>>
>> Steve Ellcey wrote:
>>>
>>> +/* This file implements an optimization where, when a variable is set
>>> +   to a constant value and there is a path that leads from that
>>> definition
>>> +   to a switch statement that uses that variable as its controlling
>>> expression
>>> +   we duplicate the blocks on this path and change the jump to the
>>> switch
>>> +   statement with a direct jump to the label of the switch block that
>>> control
>>> +   would goto based on the value of the variable.  This can come up in
>>> +   loops/switch statements that implement state machines.
>>
>>
>> Can you explain why the jump-threading pass cannot do this?  Why do we
>> need
>> another pass to handle a corner case that jump-thread does not catch yet?
>
> I'm pretty sure jump threading *could* handle it, but after looking at the
> full testcase when it was posted, I'm not sure it's *wise* to handle this in
> jump threading.
>
> The core issue is we have to look even deeper into the CFG than was
> originally envisioned and do quite a bit more block duplication than was
> originally envisioned.  That's going to have a notable compile-time cost
> (and to a lesser extent issues around codesize).
>
> It's unfortunate that the testcase when we first looked at this was
> over-simplified and not actually representative of what happens in Coremark.
> Had I seen the Coremark testcase, I probably wouldn't have suggested we
> tackle the problem in jump threading and the extensions I made to jump
> threading would _not_ have included aggressively following backedges in the
> CFG.
>
> You'll note in a separate thread Steve and I discussed this during Cauldron
> and it was at my recommendation Steve resurrected his proof of concept
> plugin and started beating it into shape.

But do we really want a pass just to help coremark?

Richard.

> jeff
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 10:32             ` Richard Biener
@ 2014-08-14 15:56               ` Jeff Law
  2014-08-14 16:29                 ` David Malcolm
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Law @ 2014-08-14 15:56 UTC (permalink / raw)
  To: Richard Biener
  Cc: Sebastian Pop, Steve Ellcey, GCC Patches, james.greenhalgh,
	Jakub Jelinek

On 08/14/14 04:32, Richard Biener wrote:
>> You'll note in a separate thread Steve and I discussed this during Cauldron
>> and it was at my recommendation Steve resurrected his proof of concept
>> plugin and started beating it into shape.
>
> But do we really want a pass just to help coremark?
And that's the biggest argument against Steve's work.  In theory it 
should be applicable to other FSMs, but nobody's come forth with 
additional testcases from real world applications.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 16:29                 ` David Malcolm
@ 2014-08-14 16:21                   ` Jeff Law
  2014-08-14 18:25                     ` Steve Ellcey
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Law @ 2014-08-14 16:21 UTC (permalink / raw)
  To: David Malcolm
  Cc: Richard Biener, Sebastian Pop, Steve Ellcey, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On 08/14/14 10:12, David Malcolm wrote:
> On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
>> On 08/14/14 04:32, Richard Biener wrote:
>>>> You'll note in a separate thread Steve and I discussed this during Cauldron
>>>> and it was at my recommendation Steve resurrected his proof of concept
>>>> plugin and started beating it into shape.
>>>
>>> But do we really want a pass just to help coremark?
>> And that's the biggest argument against Steve's work.  In theory it
>> should be applicable to other FSMs, but nobody's come forth with
>> additional testcases from real world applications.
>
> Maybe a regex library?  Perhaps:
> http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
The key is that at least some states tell you at compile time what state 
you'll be in during the next loop iteration.  Thus instead of coming 
around the loop, evaluating the switch condition, then doing the 
multi-way branch, we just directly jump to the case for the next iteration.

I've never looked at the PCRE code to know if it's got cases like that.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 15:56               ` Jeff Law
@ 2014-08-14 16:29                 ` David Malcolm
  2014-08-14 16:21                   ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: David Malcolm @ 2014-08-14 16:29 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Sebastian Pop, Steve Ellcey, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
> On 08/14/14 04:32, Richard Biener wrote:
> >> You'll note in a separate thread Steve and I discussed this during Cauldron
> >> and it was at my recommendation Steve resurrected his proof of concept
> >> plugin and started beating it into shape.
> >
> > But do we really want a pass just to help coremark?
> And that's the biggest argument against Steve's work.  In theory it 
> should be applicable to other FSMs, but nobody's come forth with 
> additional testcases from real world applications.

Maybe a regex library?  Perhaps:
http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?


Dave


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13  9:52         ` Richard Biener
@ 2014-08-14 17:45           ` Steve Ellcey
  0 siblings, 0 replies; 28+ messages in thread
From: Steve Ellcey @ 2014-08-14 17:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bin.Cheng, Jeff Law, GCC Patches, james.greenhalgh

On Wed, 2014-08-13 at 11:52 +0200, Richard Biener wrote:
> On Wed, Aug 13, 2014 at 4:54 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> > On Wed, Aug 13, 2014 at 4:40 AM, Jeff Law <law@redhat.com> wrote:
> >> On 08/12/14 14:23, Richard Biener wrote:
> >>> On August 12, 2014 8:31:16 PM CEST, Jeff Law <law@redhat.com> wrote:
> >>>> On 08/12/14 11:46, Steve Ellcey wrote:
> >>>>
> >>>> Try setting the header & latch fields for the loop structure to NULL,
> >>>> then call loops_set_state (LOOPS_NEED_FIXUP).
> >>>
> >>>
> >>> But that is _not_ the appropriate way of keeping loops preserved!
> >>
> >> I think that's done when we've scrogged the loop beyond repair and want the
> >> structures rebuilt.  IIRC, that's what you recommended we do.

An update on this part of the patch.  I found that just calling
'loops_set_state (LOOPS_NEED_FIXUP)' without setting the header and
latch fields to NULL first is enough to fix the problem I had earlier.
I thought I had tried that before but either I was calling something
else or I had put the call in the wrong place but it is working now.


Steve Ellcey

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 16:21                   ` Jeff Law
@ 2014-08-14 18:25                     ` Steve Ellcey
  2014-08-14 18:45                       ` Sebastian Pop
                                         ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Steve Ellcey @ 2014-08-14 18:25 UTC (permalink / raw)
  To: Jeff Law
  Cc: David Malcolm, Richard Biener, Sebastian Pop, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote:
> On 08/14/14 10:12, David Malcolm wrote:
> > On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
> >> On 08/14/14 04:32, Richard Biener wrote:
> >>>> You'll note in a separate thread Steve and I discussed this during Cauldron
> >>>> and it was at my recommendation Steve resurrected his proof of concept
> >>>> plugin and started beating it into shape.
> >>>
> >>> But do we really want a pass just to help coremark?
> >> And that's the biggest argument against Steve's work.  In theory it
> >> should be applicable to other FSMs, but nobody's come forth with
> >> additional testcases from real world applications.
> >
> > Maybe a regex library?  Perhaps:
> > http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
> The key is that at least some states tell you at compile time what state 
> you'll be in during the next loop iteration.  Thus instead of coming 
> around the loop, evaluating the switch condition, then doing the 
> multi-way branch, we just directly jump to the case for the next iteration.
> 
> I've never looked at the PCRE code to know if it's got cases like that.
> 
> jeff

I compiled PCRE but it never triggered this optimization (even if I
bumped up the parameters for instruction counts and paths).

I understand the desire not to add optimizations just for benchmarks but
we do know other compilers have added this optimization for coremark
(See
http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark.  Does 'competing against other compilers' sound better then 'optimizing for a benchmark'?

Steve Ellcey
sellcey@mips.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 18:25                     ` Steve Ellcey
@ 2014-08-14 18:45                       ` Sebastian Pop
  2014-08-15 10:07                         ` Richard Biener
  2014-08-15 10:13                       ` Richard Biener
  2014-08-15 15:58                       ` Ramana Radhakrishnan
  2 siblings, 1 reply; 28+ messages in thread
From: Sebastian Pop @ 2014-08-14 18:45 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Jeff Law, David Malcolm, Richard Biener, GCC Patches,
	james.greenhalgh, Jakub Jelinek

Steve Ellcey wrote:
> I understand the desire not to add optimizations just for benchmarks but
> we do know other compilers have added this optimization for coremark
> (See
> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance)
> and the 13 people on the CC list for this bug certainly shows interest in
> having it even if it is just for a benchmark.  Does 'competing against other
> compilers' sound better then 'optimizing for a benchmark'?

I definitely would like to see GCC trunk do this transform.  What about we
integrate the new pass, and then when jump-threading manages to catch the
coremark loop, we remove the pass?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 18:45                       ` Sebastian Pop
@ 2014-08-15 10:07                         ` Richard Biener
  2014-08-15 13:26                           ` Jeff Law
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-15 10:07 UTC (permalink / raw)
  To: Sebastian Pop
  Cc: Steve Ellcey, Jeff Law, David Malcolm, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On Thu, Aug 14, 2014 at 8:45 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Steve Ellcey wrote:
>> I understand the desire not to add optimizations just for benchmarks but
>> we do know other compilers have added this optimization for coremark
>> (See
>> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance)
>> and the 13 people on the CC list for this bug certainly shows interest in
>> having it even if it is just for a benchmark.  Does 'competing against other
>> compilers' sound better then 'optimizing for a benchmark'?
>
> I definitely would like to see GCC trunk do this transform.  What about we
> integrate the new pass, and then when jump-threading manages to catch the
> coremark loop, we remove the pass?

It never worked that way.

A new pass takes compile-time, if we disable it by default it won't help
coremark (and it will bitrot quickly).

So - please fix DOM instead.

Richard.

> Thanks,
> Sebastian

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 18:25                     ` Steve Ellcey
  2014-08-14 18:45                       ` Sebastian Pop
@ 2014-08-15 10:13                       ` Richard Biener
  2014-08-15 10:44                         ` Richard Biener
  2014-08-15 15:58                       ` Ramana Radhakrishnan
  2 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-08-15 10:13 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Jeff Law, David Malcolm, Sebastian Pop, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On Thu, Aug 14, 2014 at 8:25 PM, Steve Ellcey <sellcey@mips.com> wrote:
> On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote:
>> On 08/14/14 10:12, David Malcolm wrote:
>> > On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
>> >> On 08/14/14 04:32, Richard Biener wrote:
>> >>>> You'll note in a separate thread Steve and I discussed this during Cauldron
>> >>>> and it was at my recommendation Steve resurrected his proof of concept
>> >>>> plugin and started beating it into shape.
>> >>>
>> >>> But do we really want a pass just to help coremark?
>> >> And that's the biggest argument against Steve's work.  In theory it
>> >> should be applicable to other FSMs, but nobody's come forth with
>> >> additional testcases from real world applications.
>> >
>> > Maybe a regex library?  Perhaps:
>> > http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
>> The key is that at least some states tell you at compile time what state
>> you'll be in during the next loop iteration.  Thus instead of coming
>> around the loop, evaluating the switch condition, then doing the
>> multi-way branch, we just directly jump to the case for the next iteration.
>>
>> I've never looked at the PCRE code to know if it's got cases like that.
>>
>> jeff
>
> I compiled PCRE but it never triggered this optimization (even if I
> bumped up the parameters for instruction counts and paths).
>
> I understand the desire not to add optimizations just for benchmarks but
> we do know other compilers have added this optimization for coremark
> (See
> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark.  Does 'competing against other compilers' sound better then 'optimizing for a benchmark'?

Well - as an open-source compiler we have the luxury to not care
about "benchmark compilers" ;)  At least that's what our non-existant
sales-team told me.

There are plenty "real" interpreters around which may have states
that deterministically forward to another state.  If you optimize
those as well, fine.

Btw - the patch doesn't contain a single testcase ....

With coremark being "secret" what's the real-world testcase this
optimizes?  Note that the benchmarks used in SPEC are usually
available and taken from real-world apps.  I don't know coremark
at all, but from its name it sounds like sth like nullstone?

Richard.

> Steve Ellcey
> sellcey@mips.com
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-15 10:13                       ` Richard Biener
@ 2014-08-15 10:44                         ` Richard Biener
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Biener @ 2014-08-15 10:44 UTC (permalink / raw)
  To: Steve Ellcey
  Cc: Jeff Law, David Malcolm, Sebastian Pop, GCC Patches,
	james.greenhalgh, Jakub Jelinek

On Fri, Aug 15, 2014 at 12:13 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Aug 14, 2014 at 8:25 PM, Steve Ellcey <sellcey@mips.com> wrote:
>> On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote:
>>> On 08/14/14 10:12, David Malcolm wrote:
>>> > On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
>>> >> On 08/14/14 04:32, Richard Biener wrote:
>>> >>>> You'll note in a separate thread Steve and I discussed this during Cauldron
>>> >>>> and it was at my recommendation Steve resurrected his proof of concept
>>> >>>> plugin and started beating it into shape.
>>> >>>
>>> >>> But do we really want a pass just to help coremark?
>>> >> And that's the biggest argument against Steve's work.  In theory it
>>> >> should be applicable to other FSMs, but nobody's come forth with
>>> >> additional testcases from real world applications.
>>> >
>>> > Maybe a regex library?  Perhaps:
>>> > http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
>>> The key is that at least some states tell you at compile time what state
>>> you'll be in during the next loop iteration.  Thus instead of coming
>>> around the loop, evaluating the switch condition, then doing the
>>> multi-way branch, we just directly jump to the case for the next iteration.
>>>
>>> I've never looked at the PCRE code to know if it's got cases like that.
>>>
>>> jeff
>>
>> I compiled PCRE but it never triggered this optimization (even if I
>> bumped up the parameters for instruction counts and paths).
>>
>> I understand the desire not to add optimizations just for benchmarks but
>> we do know other compilers have added this optimization for coremark
>> (See
>> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark.  Does 'competing against other compilers' sound better then 'optimizing for a benchmark'?
>
> Well - as an open-source compiler we have the luxury to not care
> about "benchmark compilers" ;)  At least that's what our non-existant
> sales-team told me.
>
> There are plenty "real" interpreters around which may have states
> that deterministically forward to another state.  If you optimize
> those as well, fine.
>
> Btw - the patch doesn't contain a single testcase ....
>
> With coremark being "secret" what's the real-world testcase this
> optimizes?  Note that the benchmarks used in SPEC are usually
> available and taken from real-world apps.  I don't know coremark
> at all, but from its name it sounds like sth like nullstone?

Ok, seems you can download coremark so I did that.  The benchmark
doesn't resemble any reasonable state machine as the "state"
is simply compiler-visibly looping 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5...
in a real state machine the next state would come from input
(uninteresting) or be determined by a previous state (well - it is in
the artificial case, but for all states which makes the switch moot).
Thus this benchmark is so artificial that it isn't worth a special
pass.

I suppose the "expected" optimizaton is to un-obfuscate this into
the non-state-machine this really is after peeling the first
iterations to make 'seed' compile-time determinable (here I note
the function comment that says it's important that 'seed' is
_not_ compile-time known - hah, peeling is such a nice tool
to "workaround" that idea).

Stupid benchmarks.

Richard.

> Richard.
>
>> Steve Ellcey
>> sellcey@mips.com
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-15 10:07                         ` Richard Biener
@ 2014-08-15 13:26                           ` Jeff Law
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Law @ 2014-08-15 13:26 UTC (permalink / raw)
  To: Richard Biener, Sebastian Pop
  Cc: Steve Ellcey, David Malcolm, GCC Patches, james.greenhalgh,
	Jakub Jelinek

On 08/15/14 04:07, Richard Biener wrote:
> On Thu, Aug 14, 2014 at 8:45 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> Steve Ellcey wrote:
>>> I understand the desire not to add optimizations just for benchmarks but
>>> we do know other compilers have added this optimization for coremark
>>> (See
>>> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance)
>>> and the 13 people on the CC list for this bug certainly shows interest in
>>> having it even if it is just for a benchmark.  Does 'competing against other
>>> compilers' sound better then 'optimizing for a benchmark'?
>>
>> I definitely would like to see GCC trunk do this transform.  What about we
>> integrate the new pass, and then when jump-threading manages to catch the
>> coremark loop, we remove the pass?
>
> It never worked that way.
>
> A new pass takes compile-time, if we disable it by default it won't help
> coremark (and it will bitrot quickly).
>
> So - please fix DOM instead.
Steve's work is highly likely to be faster than further extending the 
threading code -- that's one of the primary reasons I suggested Steve 
resurrect his work.



Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-14 18:25                     ` Steve Ellcey
  2014-08-14 18:45                       ` Sebastian Pop
  2014-08-15 10:13                       ` Richard Biener
@ 2014-08-15 15:58                       ` Ramana Radhakrishnan
  2 siblings, 0 replies; 28+ messages in thread
From: Ramana Radhakrishnan @ 2014-08-15 15:58 UTC (permalink / raw)
  To: Steve Ellcey, Jeff Law
  Cc: David Malcolm, Richard Biener, Sebastian Pop, GCC Patches,
	James Greenhalgh, Jakub Jelinek



On 14/08/14 19:25, Steve Ellcey wrote:
> On Thu, 2014-08-14 at 10:21 -0600, Jeff Law wrote:
>> On 08/14/14 10:12, David Malcolm wrote:
>>> On Thu, 2014-08-14 at 09:56 -0600, Jeff Law wrote:
>>>> On 08/14/14 04:32, Richard Biener wrote:
>>>>>> You'll note in a separate thread Steve and I discussed this during Cauldron
>>>>>> and it was at my recommendation Steve resurrected his proof of concept
>>>>>> plugin and started beating it into shape.
>>>>>
>>>>> But do we really want a pass just to help coremark?
>>>> And that's the biggest argument against Steve's work.  In theory it
>>>> should be applicable to other FSMs, but nobody's come forth with
>>>> additional testcases from real world applications.
>>>
>>> Maybe a regex library?  Perhaps:
>>> http://vcs.pcre.org/viewvc/code/trunk/pcre_dfa_exec.c?revision=1477 ?
>> The key is that at least some states tell you at compile time what state
>> you'll be in during the next loop iteration.  Thus instead of coming
>> around the loop, evaluating the switch condition, then doing the
>> multi-way branch, we just directly jump to the case for the next iteration.
>>
>> I've never looked at the PCRE code to know if it's got cases like that.
>>
>> jeff
>
> I compiled PCRE but it never triggered this optimization (even if I
> bumped up the parameters for instruction counts and paths).
>
> I understand the desire not to add optimizations just for benchmarks but
> we do know other compilers have added this optimization for coremark
> (See
> http://community.arm.com/groups/embedded/blog/2013/02/21/coremark-and-compiler-performance) and the 13 people on the CC list for this bug certainly shows interest in having it even if it is just for a benchmark.  Does 'competing against other compilers' sound better then 'optimizing for a benchmark'?
>
> Steve Ellcey
> sellcey@mips.com
>
>

I've been told and have seen empirical evidence of this triggering in 
another well-known popular embedded benchmark suite.

Why not try this on SPEC2k(6) and see if it triggers with bumped up 
parameters to see if it applies elsewhere ?


regards
Ramana

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-08-13  9:44       ` Richard Biener
@ 2014-09-03 21:22         ` Jeff Law
  2014-09-04 12:57           ` Richard Biener
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Law @ 2014-09-03 21:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: Steve Ellcey, GCC Patches, james.greenhalgh

On 08/13/14 03:44, Richard Biener wrote:
>
> I don't see that this pass should "scrog" a loop beyond repair.  Btw,
> the "proper" way of just fixing loops up (assuming that all loop
> headers are still at their appropriate place) is to _just_ do
> loops_set_state (LOOPS_NEED_FIXUP).
This pass can quite easily create new irreducible loops and non-nested 
loops, the pass may take a previously well defined natural loop and make 
it irreducible.  When I worked on it, I didn't see any reasonable way to 
update the loop structures.

> But still this is in theory very bad as you cause important annotations
> to be lost.  If the loop is truly gone, ok, but if it just re-materializes
> then you've done a bad job here.  Consider the case where a
> loop becomes a loop nest - you'd want to preserve the loop header
> as the header of the outer loop (which you'd have to identify its
> header in some way - dominator checks to the rescue!) and let
> fixup discover the new inner loop.
While I'd love to be able to be able to update and DTRT here, I just 
couldn't see a way to do it.  And while I hate losing the loop structure 
and missed-optimizations that it may lead to later, I judged the benefit 
of removing the multi-way branch to be so beneficial that it outweighed 
the losses elsewhere.


>
> Yes, we may have little utility for dealing with the more complex
> cases and I've been hesitant to enforce not dropping loops on the
> floor an ICE (well, mainly because we can't even bootstrap with
> such check ...), but in the end we should arrive there.
It'd certainly be nice.  I really don't like the idea of dropping loops 
on the floor.  If we can get there, great, but I suspect you'll find its 
harder than expected.

jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-09-03 21:22         ` Jeff Law
@ 2014-09-04 12:57           ` Richard Biener
  2014-09-04 13:07             ` Jakub Jelinek
  0 siblings, 1 reply; 28+ messages in thread
From: Richard Biener @ 2014-09-04 12:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: Steve Ellcey, GCC Patches, james.greenhalgh

On Wed, Sep 3, 2014 at 11:22 PM, Jeff Law <law@redhat.com> wrote:
> On 08/13/14 03:44, Richard Biener wrote:
>>
>>
>> I don't see that this pass should "scrog" a loop beyond repair.  Btw,
>> the "proper" way of just fixing loops up (assuming that all loop
>> headers are still at their appropriate place) is to _just_ do
>> loops_set_state (LOOPS_NEED_FIXUP).
>
> This pass can quite easily create new irreducible loops and non-nested
> loops, the pass may take a previously well defined natural loop and make it
> irreducible.  When I worked on it, I didn't see any reasonable way to update
> the loop structures.

The auto-fixup should end up removing the loop in this case (well,
I think so, and if not I'm happy to improve it).

Note that I think we arrived at the point where the loop structure
has annotations that are required for correctness :/  (simduid
for example - if that goes away we do ...?  ICE?  generate
wrong code?  I don't know - Jakub shoud).

So there might be loops that we want to avoid applying such
disruptive transforms to.

>> But still this is in theory very bad as you cause important annotations
>> to be lost.  If the loop is truly gone, ok, but if it just re-materializes
>> then you've done a bad job here.  Consider the case where a
>> loop becomes a loop nest - you'd want to preserve the loop header
>> as the header of the outer loop (which you'd have to identify its
>> header in some way - dominator checks to the rescue!) and let
>> fixup discover the new inner loop.
>
> While I'd love to be able to be able to update and DTRT here, I just
> couldn't see a way to do it.  And while I hate losing the loop structure and
> missed-optimizations that it may lead to later, I judged the benefit of
> removing the multi-way branch to be so beneficial that it outweighed the
> losses elsewhere.
>
>
>
>>
>> Yes, we may have little utility for dealing with the more complex
>> cases and I've been hesitant to enforce not dropping loops on the
>> floor an ICE (well, mainly because we can't even bootstrap with
>> such check ...), but in the end we should arrive there.
>
> It'd certainly be nice.  I really don't like the idea of dropping loops on
> the floor.  If we can get there, great, but I suspect you'll find its harder
> than expected.

Sure.  But as said above explicitely dropping (by setting ->header
to NULL) is discouraged - it's better to get it done automatically
by only setting LOOPS_NEED_FIXUP.  In some cases that's not
possible - like when we _remove_ the basic-block that ->header
or ->latch points to we obviously can't leave them point to that
block.  So we either have to know better (or do a good guess
that we are sure doesn't point to a wrong "real" header/latch)
or punt and set to NULL.

Richard.

> jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-09-04 12:57           ` Richard Biener
@ 2014-09-04 13:07             ` Jakub Jelinek
  2014-09-04 14:05               ` Richard Biener
  0 siblings, 1 reply; 28+ messages in thread
From: Jakub Jelinek @ 2014-09-04 13:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Steve Ellcey, GCC Patches, james.greenhalgh

On Thu, Sep 04, 2014 at 02:57:47PM +0200, Richard Biener wrote:
> Note that I think we arrived at the point where the loop structure
> has annotations that are required for correctness :/  (simduid
> for example - if that goes away we do ...?  ICE?  generate
> wrong code?  I don't know - Jakub shoud).

For safelen loops it will be just (perhaps serious) missed-optimization,
for simduid I believe it shouldn't ICE either, the IFN_GOMP* builtins would
just fold as if the vectorization factor was 1 if the loop goes away.
But it will be even significantly bigger missed-optimization.

	Jakub

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: RFC: Patch for switch elimination (PR 54742)
  2014-09-04 13:07             ` Jakub Jelinek
@ 2014-09-04 14:05               ` Richard Biener
  0 siblings, 0 replies; 28+ messages in thread
From: Richard Biener @ 2014-09-04 14:05 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, Steve Ellcey, GCC Patches, james.greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Thu, Sep 4, 2014 at 3:06 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Sep 04, 2014 at 02:57:47PM +0200, Richard Biener wrote:
>> Note that I think we arrived at the point where the loop structure
>> has annotations that are required for correctness :/  (simduid
>> for example - if that goes away we do ...?  ICE?  generate
>> wrong code?  I don't know - Jakub shoud).
>
> For safelen loops it will be just (perhaps serious) missed-optimization,
> for simduid I believe it shouldn't ICE either, the IFN_GOMP* builtins would
> just fold as if the vectorization factor was 1 if the loop goes away.
> But it will be even significantly bigger missed-optimization.

The following is a patch abstracting away the ->header = NULL settings
to a mark_loop_for_removal function and adding checking code
that tries to discover suspicious cases.

I see it fire on fold-const.ii compile in VRP for example:

fold-const.ii.067t.vrp1:fix_loop_structure: re-discovered removed loop
6 with pointer-equal header 459

(only testcase I tried).  As noted in the patch the pointer-equal header
case should probably be an ICE (although it's possible we just
re-allocated that basic-block ...).

Richard.

[-- Attachment #2: loop-removal --]
[-- Type: application/octet-stream, Size: 8789 bytes --]

Index: trunk/gcc/cfghooks.c
===================================================================
*** trunk.orig/gcc/cfghooks.c	2014-06-05 16:04:44.962551413 +0200
--- trunk/gcc/cfghooks.c	2014-09-04 15:30:39.713375472 +0200
*************** delete_basic_block (basic_block bb)
*** 569,582 ****
        struct loop *loop = bb->loop_father;
  
        /* If we remove the header or the latch of a loop, mark the loop for
! 	 removal by setting its header and latch to NULL.  */
        if (loop->latch == bb
  	  || loop->header == bb)
! 	{
! 	  loop->header = NULL;
! 	  loop->latch = NULL;
! 	  loops_state_set (LOOPS_NEED_FIXUP);
! 	}
  
        remove_bb_from_loops (bb);
      }
--- 569,578 ----
        struct loop *loop = bb->loop_father;
  
        /* If we remove the header or the latch of a loop, mark the loop for
! 	 removal.  */
        if (loop->latch == bb
  	  || loop->header == bb)
! 	mark_loop_for_removal (loop);
  
        remove_bb_from_loops (bb);
      }
*************** merge_blocks (basic_block a, basic_block
*** 760,770 ****
  	  /* ... we merge two loop headers, in which case we kill
  	     the inner loop.  */
  	  if (b->loop_father->header == b)
! 	    {
! 	      b->loop_father->header = NULL;
! 	      b->loop_father->latch = NULL;
! 	      loops_state_set (LOOPS_NEED_FIXUP);
! 	    }
  	}
        /* If we merge a loop header into its predecessor, update the loop
  	 structure.  */
--- 756,762 ----
  	  /* ... we merge two loop headers, in which case we kill
  	     the inner loop.  */
  	  if (b->loop_father->header == b)
! 	    mark_loop_for_removal (b->loop_father);
  	}
        /* If we merge a loop header into its predecessor, update the loop
  	 structure.  */
*************** duplicate_block (basic_block bb, edge e,
*** 1099,1107 ****
  	  && cloop->header == bb)
  	{
  	  add_bb_to_loop (new_bb, loop_outer (cloop));
! 	  cloop->header = NULL;
! 	  cloop->latch = NULL;
! 	  loops_state_set (LOOPS_NEED_FIXUP);
  	}
        else
  	{
--- 1091,1097 ----
  	  && cloop->header == bb)
  	{
  	  add_bb_to_loop (new_bb, loop_outer (cloop));
! 	  mark_loop_for_removal (cloop);
  	}
        else
  	{
Index: trunk/gcc/cfgloop.c
===================================================================
*** trunk.orig/gcc/cfgloop.c	2014-08-26 13:40:18.815368134 +0200
--- trunk/gcc/cfgloop.c	2014-09-04 15:52:10.887286576 +0200
*************** bb_loop_depth (const_basic_block bb)
*** 1931,1933 ****
--- 1931,1947 ----
  {
    return bb->loop_father ? loop_depth (bb->loop_father) : 0;
  }
+ 
+ /* Marks LOOP for removal and sets LOOPS_NEED_FIXUP.  */
+ 
+ void
+ mark_loop_for_removal (loop_p loop)
+ {
+ #ifdef ENABLE_CHECKING
+   loop->former_header = loop->header;
+   loop->former_header_index = loop->header->index;
+ #endif
+   loop->header = NULL;
+   loop->latch = NULL;
+   loops_state_set (LOOPS_NEED_FIXUP);
+ }
Index: trunk/gcc/cfgloop.h
===================================================================
*** trunk.orig/gcc/cfgloop.h	2014-08-26 13:40:18.816368134 +0200
--- trunk/gcc/cfgloop.h	2014-09-04 15:53:08.012282643 +0200
*************** struct GTY ((chain_next ("%h.next"))) lo
*** 193,198 ****
--- 193,203 ----
  
    /* Number of iteration analysis data for RTL.  */
    struct niter_desc *simple_loop_desc;
+ 
+ #ifdef ENABLE_CHECKING
+   basic_block GTY ((skip)) former_header;
+   int former_header_index;
+ #endif
  };
  
  /* Flags for state of loop structure.  */
*************** struct loop * loop_version (struct loop
*** 336,341 ****
--- 341,348 ----
  extern bool remove_path (edge);
  extern void unloop (struct loop *, bool *, bitmap);
  extern void scale_loop_frequencies (struct loop *, int, int);
+ void mark_loop_for_removal (loop_p);
+ 
  
  /* Induction variable analysis.  */
  
*************** loops_state_clear (unsigned flags)
*** 533,538 ****
--- 540,546 ----
    current_loops->state &= ~flags;
  }
  
+ 
  /* Loop iterators.  */
  
  /* Flags for loop iteration.  */
Index: trunk/gcc/except.c
===================================================================
*** trunk.orig/gcc/except.c	2014-09-04 12:52:14.353029904 +0200
--- trunk/gcc/except.c	2014-09-04 15:31:21.631372586 +0200
*************** sjlj_emit_dispatch_table (rtx_code_label
*** 1375,1384 ****
  	      {
  		for (loop = bb->loop_father;
  		     loop_outer (loop); loop = loop_outer (loop))
! 		  {
! 		    loop->header = NULL;
! 		    loop->latch = NULL;
! 		  }
  	      }
  	  }
  
--- 1375,1381 ----
  	      {
  		for (loop = bb->loop_father;
  		     loop_outer (loop); loop = loop_outer (loop))
! 		  mark_loop_for_removal (loop);
  	      }
  	  }
  
Index: trunk/gcc/loop-init.c
===================================================================
*** trunk.orig/gcc/loop-init.c	2014-07-10 14:13:38.069811669 +0200
--- trunk/gcc/loop-init.c	2014-09-04 15:55:32.440272699 +0200
*************** fix_loop_structure (bitmap changed_bbs)
*** 272,277 ****
--- 272,301 ----
    FOR_EACH_VEC_ELT (*get_loops (cfun), i, loop)
      if (loop && loop->header == NULL)
        {
+ #ifdef ENABLE_CHECKING
+ 	if (dump_file)
+ 	  {
+ 	    /* The following should probably be an ICE.  */
+ 	    for (unsigned j = old_nloops; j < number_of_loops (cfun); ++j)
+ 	      if ((*get_loops (cfun))[j]->header == loop->former_header)
+ 		fprintf (dump_file, "fix_loop_structure: re-discovered "
+ 			 "removed loop %d with pointer-equal header %d\n",
+ 			 loop->num, (*get_loops (cfun))[j]->header->index);
+ 	    if (loop->former_header_index <= last_basic_block_for_fn (cfun))
+ 	      {
+ 		basic_block bb
+ 		  = BASIC_BLOCK_FOR_FN (cfun, loop->former_header_index);
+ 		/* ???  This can probably "misfire" if the basic-block
+ 		   was freed and the basic-block index got re-used.  */
+ 		if (bb && bb->loop_father->num >= old_nloops)
+ 		  fprintf (dump_file, "fix_loop_structure: header %d of "
+ 			   "removed loop %d is part of a newly "
+ 			   "discovered loop %d\n",
+ 			   loop->former_header_index, loop->num,
+ 			   bb->loop_father->num);
+ 	      }
+ 	  }
+ #endif
  	(*get_loops (cfun))[i] = NULL;
  	flow_loop_free (loop);
        }
Index: trunk/gcc/tree-eh.c
===================================================================
*** trunk.orig/gcc/tree-eh.c	2014-09-04 12:52:13.565029958 +0200
--- trunk/gcc/tree-eh.c	2014-09-04 15:32:07.022369460 +0200
*************** cleanup_empty_eh_merge_phis (basic_block
*** 4171,4180 ****
  	   and mark the other loop as possibly having multiple latches.  */
  	if (e->dest == e->dest->loop_father->header)
  	  {
! 	    e->dest->loop_father->header = NULL;
! 	    e->dest->loop_father->latch = NULL;
  	    new_bb->loop_father->latch = NULL;
! 	    loops_state_set (LOOPS_NEED_FIXUP|LOOPS_MAY_HAVE_MULTIPLE_LATCHES);
  	  }
  	redirect_eh_edge_1 (e, new_bb, change_region);
  	redirect_edge_succ (e, new_bb);
--- 4171,4179 ----
  	   and mark the other loop as possibly having multiple latches.  */
  	if (e->dest == e->dest->loop_father->header)
  	  {
! 	    mark_loop_for_removal (e->dest->loop_father);
  	    new_bb->loop_father->latch = NULL;
! 	    loops_state_set (LOOPS_MAY_HAVE_MULTIPLE_LATCHES);
  	  }
  	redirect_eh_edge_1 (e, new_bb, change_region);
  	redirect_edge_succ (e, new_bb);
Index: trunk/gcc/tree-ssa-threadupdate.c
===================================================================
*** trunk.orig/gcc/tree-ssa-threadupdate.c	2014-06-25 15:01:02.807843848 +0200
--- trunk/gcc/tree-ssa-threadupdate.c	2014-09-04 15:32:34.211367589 +0200
*************** ssa_redirect_edges (struct redirection_d
*** 766,776 ****
  
  	  /* If we redirect a loop latch edge cancel its loop.  */
  	  if (e->src == e->src->loop_father->latch)
! 	    {
! 	      e->src->loop_father->header = NULL;
! 	      e->src->loop_father->latch = NULL;
! 	      loops_state_set (LOOPS_NEED_FIXUP);
! 	    }
  
  	  /* Redirect the incoming edge (possibly to the joiner block) to the
  	     appropriate duplicate block.  */
--- 766,772 ----
  
  	  /* If we redirect a loop latch edge cancel its loop.  */
  	  if (e->src == e->src->loop_father->latch)
! 	    mark_loop_for_removal (e->src->loop_father);
  
  	  /* Redirect the incoming edge (possibly to the joiner block) to the
  	     appropriate duplicate block.  */
*************** thread_through_loop_header (struct loop
*** 1304,1312 ****
      {
        /* If the loop ceased to exist, mark it as such, and thread through its
  	 original header.  */
!       loop->header = NULL;
!       loop->latch = NULL;
!       loops_state_set (LOOPS_NEED_FIXUP);
        return thread_block (header, false);
      }
  
--- 1300,1306 ----
      {
        /* If the loop ceased to exist, mark it as such, and thread through its
  	 original header.  */
!       mark_loop_for_removal (loop);
        return thread_block (header, false);
      }
  

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-09-04 14:05 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-12 17:47 RFC: Patch for switch elimination (PR 54742) Steve Ellcey
2014-08-12 17:56 ` Jakub Jelinek
2014-08-12 18:31 ` Jeff Law
2014-08-12 20:23   ` Richard Biener
2014-08-12 20:40     ` Jeff Law
2014-08-12 22:45       ` Steve Ellcey
2014-08-13 20:55         ` Sebastian Pop
2014-08-13 21:06           ` Jeff Law
2014-08-13 21:33             ` Sebastian Pop
2014-08-14 10:32             ` Richard Biener
2014-08-14 15:56               ` Jeff Law
2014-08-14 16:29                 ` David Malcolm
2014-08-14 16:21                   ` Jeff Law
2014-08-14 18:25                     ` Steve Ellcey
2014-08-14 18:45                       ` Sebastian Pop
2014-08-15 10:07                         ` Richard Biener
2014-08-15 13:26                           ` Jeff Law
2014-08-15 10:13                       ` Richard Biener
2014-08-15 10:44                         ` Richard Biener
2014-08-15 15:58                       ` Ramana Radhakrishnan
2014-08-13  2:54       ` Bin.Cheng
2014-08-13  9:52         ` Richard Biener
2014-08-14 17:45           ` Steve Ellcey
2014-08-13  9:44       ` Richard Biener
2014-09-03 21:22         ` Jeff Law
2014-09-04 12:57           ` Richard Biener
2014-09-04 13:07             ` Jakub Jelinek
2014-09-04 14:05               ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).