public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/6] Loop flattening on loop-SSA.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
@ 2010-10-29  3:11 ` Sebastian Pop
  2010-10-29  3:46 ` [PATCH 4/6] if-convert even when the data dependences cannot be computed Sebastian Pop
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  3:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
	(tree-loop-flattening.o): New.
	* common.opt (ftree-loop-flatten): New.
	* dbgcnt.def (lflat): New.
	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
	* passes.c (init_optimization_passes): Add new passes
	pass_flatten_loops and pass_if_conversion after loop vectorization
	and before pass_slp_vectorize.
	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
	* tree-loop-flattening.c: New.
	* tree-pass.h (pass_flatten_loops): Declared.

	* gcc.dg/tree-ssa/flat-loop-1.c: New.
	* gcc.dg/tree-ssa/flat-loop-2.c: New.
	* gcc.dg/tree-ssa/flat-loop-3.c: New.
	* gcc.dg/tree-ssa/flat-loop-4.c: New.
---
 gcc/ChangeLog                               |   14 +
 gcc/Makefile.in                             |    4 +
 gcc/common.opt                              |    4 +
 gcc/dbgcnt.def                              |    1 +
 gcc/params.def                              |    7 +
 gcc/passes.c                                |    2 +
 gcc/testsuite/ChangeLog                     |    7 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
 gcc/timevar.def                             |    1 +
 gcc/tree-loop-flattening.c                  |  625 +++++++++++++++++++++++++++
 gcc/tree-pass.h                             |    1 +
 14 files changed, 775 insertions(+), 0 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
 create mode 100644 gcc/tree-loop-flattening.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index beed454..a0148d2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,17 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
+	(tree-loop-flattening.o): New.
+	* common.opt (ftree-loop-flatten): New.
+	* dbgcnt.def (lflat): New.
+	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
+	* passes.c (init_optimization_passes): Add new passes
+	pass_flatten_loops and pass_if_conversion after loop vectorization
+	and before pass_slp_vectorize.
+	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
+	* tree-loop-flattening.c: New.
+	* tree-pass.h (pass_flatten_loops): Declared.
+
 2010-10-20  Nathan Froyd  <froydnj@codesourcery.com>
 
 	* ifcvt.c (noce_emit_cmove): If both of the values are SUBREGs, try
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 898e962..55b67f4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1368,6 +1368,7 @@ OBJS-common = \
 	tree-into-ssa.o \
 	tree-iterator.o \
 	tree-loop-distribution.o \
+	tree-loop-flattening.o \
 	tree-loop-linear.o \
 	tree-nested.o \
 	tree-nrv.o \
@@ -2773,6 +2774,9 @@ tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) coret
    $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
    $(TREE_PASS_H) $(TREE_DATA_REF_H) $(EXPR_H) \
    langhooks.h $(TREE_VECTORIZER_H)
+tree-loop-flattening.o: tree-loop-flattening.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
+   $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(TREE_PASS_H) $(DBGCNT_H)
 tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(TREE_FLOW_H) $(TREE_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
    $(DIAGNOSTIC_H) $(TREE_PASS_H) langhooks.h gt-tree-parloops.h \
diff --git a/gcc/common.opt b/gcc/common.opt
index 8fe796f..c969979 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1632,6 +1632,10 @@ ftree-loop-distribute-patterns
 Common Report Var(flag_tree_loop_distribute_patterns) Optimization
 Enable loop distribution for patterns transformed into a library call
 
+ftree-loop-flatten
+Common Report Var(flag_tree_loop_flattening) Optimization
+Enable loop flattening on trees
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 0492d66..0ef9a72 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -166,6 +166,7 @@ DEBUG_COUNTER (if_conversion_tree)
 DEBUG_COUNTER (if_after_combine)
 DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (local_alloc_for_sched)
+DEBUG_COUNTER (lflat)
 DEBUG_COUNTER (postreload_cse)
 DEBUG_COUNTER (pre)
 DEBUG_COUNTER (pre_insn)
diff --git a/gcc/params.def b/gcc/params.def
index 49a6185..3fffc35 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -788,6 +788,13 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
 	  "maximum number of basic blocks per function to be analyzed by Graphite",
 	  100, 0, 0)
 
+/* Maximal number of basic blocks in a loop to be flattened.  */
+
+DEFPARAM (PARAM_LFLAT_MAX_NB_BBS,
+	  "lflat-max-nb-bbs",
+	  "maximum number of basic blocks in a loop to be flattened",
+	  100, 0, 0)
+
 /* Avoid doing loop invariant motion on very large loops.  */
 
 DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
diff --git a/gcc/passes.c b/gcc/passes.c
index 1308ce9..4b778bc 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -913,6 +913,8 @@ init_optimization_passes (void)
 	    }
           NEXT_PASS (pass_predcom);
 	  NEXT_PASS (pass_complete_unroll);
+	  NEXT_PASS (pass_flatten_loops);
+	  NEXT_PASS (pass_if_conversion);
 	  NEXT_PASS (pass_slp_vectorize);
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_loop_prefetch);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9d9c543..8ab520e 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	* gcc.dg/tree-ssa/flat-loop-1.c: New.
+	* gcc.dg/tree-ssa/flat-loop-2.c: New.
+	* gcc.dg/tree-ssa/flat-loop-3.c: New.
+	* gcc.dg/tree-ssa/flat-loop-4.c: New.
+
 2010-10-20  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
 
 	PR c++/46024
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
new file mode 100644
index 0000000..bee8a2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+		      struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    *pp = b;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  pss = *pp;
+  while (pss != ((void *)0))
+    ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
new file mode 100644
index 0000000..a7287fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct stack_segment *next;
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+        struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  if (b == ((void *)0))
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    ;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  while (pss != ((void *)0))
+    {
+      struct stack_segment *next;
+      next = pss->next;
+ {
+   if (free_dynamic)
+     {
+       ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+     }
+ }
+      pss = next;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
new file mode 100644
index 0000000..d3d66ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+
+int
+split_directories (const char *name, int *ptr_num_dirs)
+{
+  int num_dirs = 0;
+  char **dirs;
+  const char *p, *q;
+  int ch;
+  while ((ch = *p++) != '\0')
+    {
+   num_dirs++;
+   while (((*p) == '/'))
+     p++;
+    }
+  return (dirs[num_dirs - 1] == ((void *)0));
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
new file mode 100644
index 0000000..8e551ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+void
+formatted_backspace (int common, char *s)
+{
+  int base;
+  int n;
+  do
+    {
+      if (sseek (s, base, 0) < 0)
+	goto io_error;
+
+      while (n > 0)
+	{
+          n--;
+	  base += n + 1;
+	}
+    }
+  while (base != 0);
+ io_error:
+  generate_error (common, 0, ((void *)0));
+}
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 86e2999..89ff8e8 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -152,6 +152,7 @@ DEFTIMEVAR (TV_GRAPHITE_DATA_DEPS    , "Graphite data dep analysis")
 DEFTIMEVAR (TV_GRAPHITE_CODE_GEN     , "Graphite code generation")
 DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
 DEFTIMEVAR (TV_TREE_LOOP_DISTRIBUTION, "tree loop distribution")
+DEFTIMEVAR (TV_TREE_LOOP_FLATTENING  , "tree loop flattening")
 DEFTIMEVAR (TV_CHECK_DATA_DEPS       , "tree check data dependences")
 DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
 DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
new file mode 100644
index 0000000..826e7e8
--- /dev/null
+++ b/gcc/tree-loop-flattening.c
@@ -0,0 +1,625 @@
+/* Loop flattening.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by Sebastian Pop <sebastian.pop@amd.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "ggc.h"
+#include "tree.h"
+#include "rtl.h"
+#include "output.h"
+#include "basic-block.h"
+#include "diagnostic.h"
+#include "tree-flow.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "tree-pass.h"
+#include "gimple.h"
+#include "params.h"
+#include "dbgcnt.h"
+
+/* This loop flattening pass transforms backward pointing edges into
+   forward pointing edges.
+
+   The back-edge removal transformation was described in the 1983
+   paper by Allen J. R., Ken Kennedy, Carrie Porterfield, and Joe
+   Warren: "Conversion of control dependence to data dependence"
+   available from http://doi.acm.org/10.1145/567067.567085
+
+   The back-edge removal algorithm was presented in that paper as part
+   of the if-conversion algorithm for backward pointing edges.  In
+   this section we will first provide a description of this technique
+   adapted for the Gimple-SSA form, followed by an example, and a
+   discussion of the differences with the higher level loop flattening
+   transformation.
+
+   The back-edge removal algorithm transforms control dependences into
+   data dependences by using a boolean variable.  The values taken by
+   the boolean variable control the execution path of the forward
+   edges created in order to use the back-edge of an outer loop.
+
+   The first step of the algorithm detects a surrounding loop and all
+   the back-edges of the loop body: these back-edges can be inner
+   loops or strongly connected components of the CFG that cannot be
+   reduced to natural loops.
+
+   Each back-edge is removed by redirecting the target of the
+   back-edge to the latch basic block of the surrounding loop.  A
+   boolean variable is created in the latch.  It is cleared when the
+   redirected back-edge is taken and it is set to true for any other
+   paths leading to the latch.
+
+   The header basic block of the surrounding loop is split before its
+   statements and a new condition is added based on the control
+   variable: when the control variable is set to true, the execution
+   proceeds as normal to the basic block that contains the statements
+   of the header; when the control variable is cleared, meaning that
+   the back-edge has been taken, the execution proceeds to the point
+   where the redirected back-edge was pointing.
+
+   The last step updates the SSA form after all the back-edges have
+   been redirected to the latch, and the new edges from the header to
+   the destination of back-edges have been created.
+
+   Another description of loop flattening in a very Fortran specific
+   way is in the 1992 paper by Reinhard von Hanxleden and Ken Kennedy:
+   "Relaxing SIMD Control Flow Constraints using Loop Transformations"
+   available from
+   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
+
+/* Keep the loop structure for LOOP and remove all the loop structures
+   under LOOP.  */
+
+static void
+cancel_subloops (loop_p loop)
+{
+  int i;
+  loop_p li;
+  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
+
+  for (li = loop->inner; li; li = li->next)
+    VEC_safe_push (loop_p, heap, lv, li);
+
+  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
+    cancel_loop_tree (li);
+
+  VEC_free (loop_p, heap, lv);
+}
+
+/* Before creating other phi nodes in LOOP->header for the control
+   flags, update the phi nodes of LOOP->header and add the necessary
+   phi nodes in the LOOP->latch that now contains several paths on
+   which the values are not updated.  PRED_E is the single edge that
+   was pointing to the LOOP->latch basic block before inner back-edges
+   were redirected to the LOOP->latch.  */
+
+static void
+update_loop_phi_nodes (loop_p loop, edge pred_e)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      edge e;
+      edge_iterator ei;
+      gimple phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      tree res = gimple_phi_result (phi);
+      tree var = SSA_NAME_VAR (res);
+
+      phi = create_phi_node (var, loop->latch);
+      create_new_def_for (gimple_phi_result (phi), phi,
+			  gimple_phi_result_ptr (phi));
+
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (phi, (e == pred_e ? back_arg : res),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (phi);
+      add_phi_arg (gsi_stmt (gsi), res, loop_latch_edge (loop),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Creates a control flag for the FORWARDED_EDGE that represents the
+   back-edge that has been forwarded to the latch basic block of LOOP.
+   INNER_BODY is the basic block to which the back-edge was pointing
+   before redirection.  This function creates a boolean control flag
+   that is cleared when the FORWARDED_EDGE is taken and set for all
+   the other paths.  This function adds the corresponding phi nodes in
+   LOOP->latch and LOOP->header, and finally adds an edge from
+   LOOP->header to the INNER_BODY guarded by the control flag.  */
+
+static void
+create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
+{
+  edge e, preheader;
+  edge outer_latch_e = loop_latch_edge (loop);
+  const char *name = "_flat_";
+  tree var = create_tmp_var (boolean_type_node, name);
+  tree res;
+  gimple phi, cond_stmt;
+  gimple_stmt_iterator gsi;
+  edge_iterator ei;
+
+  /* Adds a control variable for the redirected FORWARDED_EDGE.  */
+  add_referenced_var (var);
+  phi = create_phi_node (var, forwarded_edge->dest);
+  create_new_def_for (gimple_phi_result (phi), phi,
+		      gimple_phi_result_ptr (phi));
+
+  FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
+    add_phi_arg (phi, (e == forwarded_edge
+		       ? boolean_false_node
+		       : boolean_true_node),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Add a phi node in LOOP->header for the control variable.  */
+  phi = create_phi_node (var, loop->header);
+  create_new_def_for (gimple_phi_result (phi), phi,
+		      gimple_phi_result_ptr (phi));
+
+  preheader = loop_preheader_edge (loop);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (phi, (e == preheader
+		       ? boolean_true_node
+		       : res),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Split LOOP->header to insert the control variable condition.  */
+  e = split_block_after_labels (loop->header);
+  e->flags = EDGE_TRUE_VALUE;
+  e = make_edge (loop->header, inner_body, EDGE_FALSE_VALUE);
+  cond_stmt = gimple_build_cond (EQ_EXPR, res, boolean_true_node,
+				 NULL_TREE, NULL_TREE);
+  gsi = gsi_last_bb (loop->header);
+  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
+}
+
+/* Adds phi nodes to the LOOP->header and LOOP->latch for the ssa_name
+   NAME.  ARG is the argument of the latch phi node set for the
+   FORWARDED_EDGE, and all the other edges merged by the latch phi
+   node are set to the result of the LOOP->header phi node.  The latch
+   edge of the LOOP->header phi node is set to the result of the
+   LOOP->latch phi node, and the other argument is set to an arbitrary
+   valid value defined before the loop (note that this initial value
+   is never used in the loop).  Returns the LOOP->header phi result.  */
+
+static tree
+add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
+			   tree arg)
+{
+  edge e;
+  edge_iterator ei;
+  tree res, zero, var = SSA_NAME_VAR (name);
+  gimple loop_phi = create_phi_node (var, loop->header);
+  gimple latch_phi = create_phi_node (var, loop->latch);
+
+  create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
+		      gimple_phi_result_ptr (loop_phi));
+  create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
+		      gimple_phi_result_ptr (latch_phi));
+
+  /* The value set to ZERO will never be used in the loop, however we
+     have to construct something meaningful for virtual SSA_NAMEs.  */
+  if (TREE_CODE (arg) != SSA_NAME)
+    zero = arg;
+  else if (is_gimple_reg (arg))
+    zero = fold_convert (TREE_TYPE (arg), integer_zero_node);
+  else
+    zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));
+
+  res = gimple_phi_result (latch_phi);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		 e, UNKNOWN_LOCATION);
+
+  res = gimple_phi_result (loop_phi);
+  FOR_EACH_EDGE (e, ei, loop->latch->preds)
+    add_phi_arg (latch_phi, (e == forwarded_edge ? arg : res),
+		 e, UNKNOWN_LOCATION);
+
+  return res;
+}
+
+/* Creates phi nodes for each inductive definition, i.e., loop phi
+   nodes.  For each induction phi node in the old loop header, i.e.,
+   in the single_succ (INNER_BODY), insert a phi node in the
+   LOOP->latch that takes the updated value of the induction on the
+   FORWARDED_EDGE, and maintains the same value as in the phi node of
+   the LOOP->header for all the other possible paths reaching
+   LOOP->latch.  This function has to be called after all the
+   back-edges have been redirected.  */
+
+static void
+update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
+				  basic_block inner_body)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (single_succ (inner_body));
+       !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple old_loop_phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (old_loop_phi,
+					     single_succ_edge (inner_body));
+      tree res = gimple_phi_result (old_loop_phi);
+
+      res = add_header_and_latch_phis (loop, res, forwarded_edge, back_arg);
+      add_phi_arg (old_loop_phi, res, single_succ_edge (inner_body),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Renames all the uses of OLD_NAME with NEW_NAME (except the phi
+   nodes of DEF_BB) in all the basic blocks dominated by DEF_BB and in
+   the arguments of all the phi nodes originating in a basic block
+   that is dominated by DEF_BB.  */
+
+static void
+rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
+		       basic_block def_bb)
+{
+  imm_use_iterator uit;
+  gimple stmt;
+  use_operand_p use_p;
+  ssa_op_iter op_iter;
+
+  FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
+    {
+      enum gimple_code code = gimple_code (stmt);
+      basic_block use_bb = gimple_bb (stmt);
+      edge_iterator ei;
+      edge e;
+
+      if (code == GIMPLE_PHI)
+	{
+	  FOR_EACH_EDGE (e, ei, use_bb->preds)
+	    if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
+		&& dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
+		&& use_bb != def_bb)
+	      replace_exp (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx),
+			   new_name);
+	}
+      else
+	{
+	  if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
+	    continue;
+
+	  if (use_bb->loop_father == loop)
+	    {
+	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
+		if (USE_FROM_PTR (use_p) == old_name)
+		  replace_exp (use_p, new_name);
+	    }
+	  else
+	    /* Virtual operands are not translated into loop closed
+	       SSA form, and thus they may occur in the rest of
+	       the program without a loop close vphi node.  */
+	    FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
+	      if (USE_FROM_PTR (use_p) == old_name)
+		replace_exp (use_p, new_name);
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes_1.  Adds to LOOP all the
+   missing phi nodes for NAME and updates the arguments of the
+   LATCH_PHI node.  LOOP_PHI node is the inductive definition of NAME
+   in LOOP->header.  */
+
+static void
+add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
+			 VEC (gimple, heap) *phis)
+{
+  unsigned i;
+  basic_block bb, dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
+  VEC (basic_block, heap) *dom_bbs = get_all_dominated_blocks (CDI_DOMINATORS,
+							       dom_bb);
+
+  FOR_EACH_VEC_ELT (basic_block, dom_bbs, i, bb)
+    {
+      edge e;
+      edge_iterator ei;
+
+      if (bb == loop->latch
+	  || bb->loop_father != loop)
+	continue;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  gimple phi = VEC_index (gimple, phis, e->dest->index);
+
+	  if (phi)
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+
+	  else if (!single_pred_p (e->dest)
+		   && !dominated_by_p (CDI_DOMINATORS, e->dest, dom_bb)
+		   && e->dest->loop_father == loop)
+	  {
+	    tree var = SSA_NAME_VAR (name);
+
+	    phi = create_phi_node (var, e->dest);
+	    create_new_def_for (gimple_phi_result (phi), phi,
+				gimple_phi_result_ptr (phi));
+	    VEC_replace (gimple, phis, e->dest->index, phi);
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+	    rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
+				   e->dest);
+	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
+				     phis);
+	  }
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes.  For all the definitions
+   of DEF_STMT add the missing phi nodes in LOOP.  */
+
+static void
+add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
+{
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  basic_block bb = gimple_bb (def_stmt);
+
+  FOR_EACH_PHI_OR_STMT_DEF (def_p, def_stmt, op_iter, SSA_OP_DEF|SSA_OP_VDEF)
+    {
+      edge e;
+      edge_iterator ei;
+      tree res, zero, var;
+      gimple loop_phi, latch_phi, use_stmt;
+      imm_use_iterator uit;
+      tree name = DEF_FROM_PTR (def_p);
+      bool needs_update = false;
+      VEC (gimple, heap) *phis;
+      int i;
+
+      FOR_EACH_IMM_USE_STMT (use_stmt, uit, name)
+	{
+	  basic_block use_bb = gimple_bb (use_stmt);
+
+	  if (!dominated_by_p (CDI_DOMINATORS, bb, use_bb))
+	    {
+	      needs_update = true;
+	      BREAK_FROM_IMM_USE_STMT (uit);
+	    }
+	}
+
+      if (!needs_update)
+	continue;
+
+      var = SSA_NAME_VAR (name);
+      loop_phi = create_phi_node (var, loop->header);
+      latch_phi = create_phi_node (var, loop->latch);
+
+      create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
+			  gimple_phi_result_ptr (loop_phi));
+      create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
+			  gimple_phi_result_ptr (latch_phi));
+
+      /* The value set to ZERO will never be used in the loop, however we
+	 have to construct something meaningful for virtual SSA_NAMEs.  */
+      if (is_gimple_reg (name))
+	zero = fold_convert (TREE_TYPE (name), integer_zero_node);
+      else
+	zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
+
+      res = gimple_phi_result (latch_phi);
+      FOR_EACH_EDGE (e, ei, loop->header->preds)
+	add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (loop_phi);
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);
+
+      phis = VEC_alloc (gimple, heap, n_basic_blocks);
+      for (i = 0; i < n_basic_blocks; i++)
+	VEC_quick_push (gimple, phis, NULL);
+
+      VEC_replace (gimple, phis, loop->latch->index, latch_phi);
+      VEC_replace (gimple, phis, loop->header->index, loop_phi);
+      add_missing_phi_nodes_2 (loop, name, name, phis);
+
+      for (i = 0; i < n_basic_blocks; i++)
+	{
+	  gimple phi = VEC_index (gimple, phis, i);
+
+	  if (!phi)
+	    continue;
+
+	  FOR_EACH_EDGE (e, ei, BASIC_BLOCK (i)->preds)
+	    if (!PHI_ARG_DEF_FROM_EDGE (phi, e))
+	      add_phi_arg (phi, res, e, UNKNOWN_LOCATION);
+	}
+
+      VEC_free (gimple, heap, phis);
+    }
+}
+
+/* Walks over the code of LOOP and adds the missing phi nodes at
+   control flow junctions.  When a variable is defined in an outer
+   loop and used in an inner loop, the definition dominates the use.
+   After the loop flattening, the inner loop body is directly
+   reachable from the LOOP->header by using the added edge guarded by
+   the boolean flag that controls the execution of the back-edge that
+   was eliminated.  In this case, the use is not dominated by the
+   definition, and this function adds the missing phi nodes.  */
+
+static void
+add_missing_phi_nodes (loop_p loop)
+{
+  gimple_stmt_iterator gsi;
+  int i, n = loop->num_nodes;
+  basic_block *bbs = get_loop_body (loop);
+
+  for (i = 0; i < n; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* LOOP->header dominates all the blocks of the loop body, and
+	 so we don't have to look at the missing phi nodes for the
+	 definitions of LOOP->header.  */
+      if (bb == loop->header)
+	continue;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (!gimple_nop_p (gsi_stmt (gsi)))
+	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+    }
+
+  free (bbs);
+}
+
+/* Removes all the back-edges of LOOP except its own back-edge.  */
+
+static unsigned
+flatten_loop (loop_p loop)
+{
+  int i, n = loop->num_nodes;
+  basic_block *bbs;
+  VEC (edge, heap) *back_edges;
+  VEC (basic_block, heap) *loop_body;
+  edge_iterator ei;
+  edge e, pred_e;
+  unsigned max_nb_basic_blocks = PARAM_VALUE (PARAM_LFLAT_MAX_NB_BBS);;
+
+  if (loop->num_nodes > max_nb_basic_blocks
+      || !single_exit (loop)
+      || !dbg_cnt (lflat))
+    return 0;
+
+  mark_dfs_back_edges ();
+  bbs = get_loop_body (loop);
+
+  back_edges = VEC_alloc (edge, heap, 3);
+  loop_body = VEC_alloc (basic_block, heap, 3);
+
+  for (i = 0; i < n; i++)
+    FOR_EACH_EDGE (e, ei, bbs[i]->succs)
+      if (e->flags & EDGE_DFS_BACK
+	  && e->src != loop->latch)
+	VEC_safe_push (edge, heap, back_edges, e);
+
+  free (bbs);
+
+  /* Early return and do not modify the code when there are no back
+     edges.  */
+  if (VEC_empty (edge, back_edges))
+    return 0;
+
+  cancel_subloops (loop);
+
+  /* Split the latch edge to make sure that the latch basic block does
+     not contain code.  */
+  loop->latch = split_edge (loop_latch_edge (loop));
+  pred_e = single_pred_edge (loop->latch);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    {
+      basic_block dest = split_edge (e);
+
+      /* Redirect BACK_EDGE to LOOP->latch.  */
+      redirect_edge_and_branch_force (e, loop->latch);
+
+      /* Save the basic block where it was pointing.  */
+      VEC_safe_push (basic_block, heap, loop_body, dest);
+    }
+
+  update_loop_phi_nodes (loop, pred_e);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    create_control_flag (e, loop, VEC_index (basic_block, loop_body, i));
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    update_inner_induction_phi_nodes (e, loop, VEC_index (basic_block,
+							  loop_body, i));
+
+  free_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);
+  add_missing_phi_nodes (loop);
+
+  /* If we redirected some back-edges, split the latch edge to create
+     an empty LOOP->latch.  */
+  if (!single_pred_p (loop->latch))
+    loop->latch = split_edge (loop_latch_edge (loop));
+
+  return TODO_update_ssa | TODO_verify_ssa;
+}
+
+/* Flattens all the loops of the current function.  */
+
+static unsigned int
+tree_loop_flattening (void)
+{
+  unsigned todo = 0;
+  loop_p loop;
+  loop_iterator li;
+
+  if (number_of_loops () <= 1)
+    return 0;
+
+  FOR_EACH_LOOP (li, loop, 0)
+    todo |= flatten_loop (loop);
+
+#ifdef ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+  verify_flow_info ();
+#endif
+
+  cleanup_tree_cfg ();
+  return todo;
+}
+
+static bool
+gate_tree_loop_flattening (void)
+{
+  return flag_tree_loop_flattening != 0;
+}
+
+struct gimple_opt_pass pass_flatten_loops =
+{
+ {
+  GIMPLE_PASS,
+  "lflat",				/* name */
+  gate_tree_loop_flattening,		/* gate */
+  tree_loop_flattening,       		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_LOOP_FLATTENING,  		/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func
+    | TODO_update_ssa
+    | TODO_ggc_collect			/* todo_flags_finish */
+ }
+};
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a87a770..e2f257f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -374,6 +374,7 @@ extern struct gimple_opt_pass pass_graphite;
 extern struct gimple_opt_pass pass_graphite_transforms;
 extern struct gimple_opt_pass pass_if_conversion;
 extern struct gimple_opt_pass pass_loop_distribution;
+extern struct gimple_opt_pass pass_flatten_loops;
 extern struct gimple_opt_pass pass_vectorize;
 extern struct gimple_opt_pass pass_slp_vectorize;
 extern struct gimple_opt_pass pass_complete_unroll;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/6] Loop flattening and improved if-conversion
@ 2010-10-29  3:11 Sebastian Pop
  2010-10-29  3:11 ` [PATCH 1/6] Loop flattening on loop-SSA Sebastian Pop
                   ` (6 more replies)
  0 siblings, 7 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  3:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

Hi,

As explained in the GCC Summit paper "Improving GCC's
auto-vectorization with if-conversion and loop flattening for AMD's
Bulldozer processors", this patch set implements a loop flattening
pass on tree-ssa, and improves the if-conversion, removing the now
unnecessary ifcvt_memrefs_wont_trap analysis: this fixes PR46029.

The patch-set passed bootstrap with BOOT_CFLAG="-O2 -floop-flatten
-ftree-loop-if-convert-stores" and test on amd64-linux.  Ok for trunk?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

Sebastian Pop (6):
  Loop flattening on loop-SSA.
  Remove ifcvt_memrefs_wont_trap analysis.
  Fix PR46029: reimplement if-convert stores.
  if-convert even when the data dependences cannot be computed.
  Call if-conversion from loop flattening.
  Move loop flattening and SLP vectorization at the end of loop
    transforms.

 gcc/ChangeLog                               |   68 +++
 gcc/Makefile.in                             |    4 +
 gcc/common.opt                              |    4 +
 gcc/dbgcnt.def                              |    1 +
 gcc/doc/invoke.texi                         |   18 +-
 gcc/params.def                              |    7 +
 gcc/passes.c                                |    3 +-
 gcc/testsuite/ChangeLog                     |   14 +
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
 gcc/timevar.def                             |    1 +
 gcc/tree-flow.h                             |    4 +
 gcc/tree-if-conv.c                          |  407 ++++++++----------
 gcc/tree-loop-flattening.c                  |  630 +++++++++++++++++++++++++++
 gcc/tree-pass.h                             |    1 +
 20 files changed, 1151 insertions(+), 242 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/tree-loop-flattening.c

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 4/6] if-convert even when the data dependences cannot be computed.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
  2010-10-29  3:11 ` [PATCH 1/6] Loop flattening on loop-SSA Sebastian Pop
@ 2010-10-29  3:46 ` Sebastian Pop
  2010-10-29  3:57 ` [PATCH 5/6] Call if-conversion from loop flattening Sebastian Pop
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  3:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
	compute_data_dependences_for_loop.
	(if_convertible_loop_p): Do not free refs and ddrs.
---
 gcc/ChangeLog      |    6 ++++++
 gcc/tree-if-conv.c |   24 +++---------------------
 2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f14d9b1..4439226 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
+	compute_data_dependences_for_loop.
+	(if_convertible_loop_p): Do not free refs and ddrs.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	PR tree-optimization/46029
 	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
 	* tree-if-conv.c (has_unaligned_memory_refs): New.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index cb4828a..f05213e 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -855,24 +855,15 @@ predicate_bbs (loop_p loop)
 }
 
 /* Return true when LOOP is if-convertible.  This is a helper function
-   for if_convertible_loop_p.  REFS and DDRS are initialized and freed
-   in if_convertible_loop_p.  */
+   for if_convertible_loop_p.  */
 
 static bool
-if_convertible_loop_p_1 (struct loop *loop,
-			 VEC (data_reference_p, heap) **refs,
-			 VEC (ddr_p, heap) **ddrs)
+if_convertible_loop_p_1 (struct loop *loop)
 {
   bool res;
   unsigned int i;
   basic_block exit_bb = NULL;
 
-  /* Don't if-convert the loop when the data dependences cannot be
-     computed: the loop won't be vectorized in that case.  */
-  res = compute_data_dependences_for_loop (loop, true, refs, ddrs);
-  if (!res)
-    return false;
-
   calculate_dominance_info (CDI_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
@@ -934,9 +925,6 @@ if_convertible_loop_p (struct loop *loop)
 {
   edge e;
   edge_iterator ei;
-  bool res = false;
-  VEC (data_reference_p, heap) *refs;
-  VEC (ddr_p, heap) *ddrs;
 
   /* Handle only innermost loop.  */
   if (!loop || loop->inner)
@@ -968,13 +956,7 @@ if_convertible_loop_p (struct loop *loop)
     if (loop_exit_edge_p (loop, e))
       return false;
 
-  refs = VEC_alloc (data_reference_p, heap, 5);
-  ddrs = VEC_alloc (ddr_p, heap, 25);
-  res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
-
-  free_data_refs (refs);
-  free_dependence_relations (ddrs);
-  return res;
+  return if_convertible_loop_p_1 (loop);
 }
 
 /* Basic block BB has two predecessors.  Using predecessor's bb
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 5/6] Call if-conversion from loop flattening.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
  2010-10-29  3:11 ` [PATCH 1/6] Loop flattening on loop-SSA Sebastian Pop
  2010-10-29  3:46 ` [PATCH 4/6] if-convert even when the data dependences cannot be computed Sebastian Pop
@ 2010-10-29  3:57 ` Sebastian Pop
  2010-10-29  4:07 ` [PATCH 3/6] Fix PR46029: reimplement if-convert stores Sebastian Pop
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  3:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* passes.c (init_optimization_passes): Do not call pass_if_conversion
	after pass_flatten_loops.
	* tree-flow.h (gate_tree_if_conversion): Declared.
	(tree_if_conversion): Declared.
	* tree-if-conv.c (tree_if_conversion): Not static anymore.
	(gate_tree_if_conversion): Same.
	* tree-loop-flattening.c (flatten_loop): Extra param.
	Call gate_tree_if_conversion and tree_if_conversion.
	(tree_loop_flattening): Pass to flatten_loop an extra param.
---
 gcc/ChangeLog              |   12 ++++++++++++
 gcc/passes.c               |    1 -
 gcc/tree-flow.h            |    4 ++++
 gcc/tree-if-conv.c         |    4 ++--
 gcc/tree-loop-flattening.c |   11 ++++++++---
 5 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4439226..8907244 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,17 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* passes.c (init_optimization_passes): Do not call pass_if_conversion
+	after pass_flatten_loops.
+	* tree-flow.h (gate_tree_if_conversion): Declared.
+	(tree_if_conversion): Declared.
+	* tree-if-conv.c (tree_if_conversion): Not static anymore.
+	(gate_tree_if_conversion): Same.
+	* tree-loop-flattening.c (flatten_loop): Extra param.
+	Call gate_tree_if_conversion and tree_if_conversion.
+	(tree_loop_flattening): Pass to flatten_loop an extra param.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
 	compute_data_dependences_for_loop.
 	(if_convertible_loop_p): Do not free refs and ddrs.
diff --git a/gcc/passes.c b/gcc/passes.c
index 4b778bc..ed81018 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -914,7 +914,6 @@ init_optimization_passes (void)
           NEXT_PASS (pass_predcom);
 	  NEXT_PASS (pass_complete_unroll);
 	  NEXT_PASS (pass_flatten_loops);
-	  NEXT_PASS (pass_if_conversion);
 	  NEXT_PASS (pass_slp_vectorize);
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_loop_prefetch);
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index c2702dc..e1ee69f 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -730,6 +730,10 @@ bool contains_abnormal_ssa_name_p (tree);
 bool stmt_dominates_stmt_p (gimple, gimple);
 void mark_virtual_ops_for_renaming (gimple);
 
+/* In tree-if-conv.c */
+bool gate_tree_if_conversion (void);
+bool tree_if_conversion (struct loop *, tree *);
+
 /* In tree-ssa-dce.c */
 void mark_virtual_phi_result_for_renaming (gimple);
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f05213e..5ee4599 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1599,7 +1599,7 @@ combine_blocks (struct loop *loop, tree *scratch_pad)
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns true when something changed.  */
 
-static bool
+bool
 tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
@@ -1662,7 +1662,7 @@ main_tree_if_conversion (void)
 
 /* Returns true when the if-conversion pass is enabled.  */
 
-static bool
+bool
 gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 826e7e8..4bc8768 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -497,10 +497,11 @@ add_missing_phi_nodes (loop_p loop)
   free (bbs);
 }
 
-/* Removes all the back-edges of LOOP except its own back-edge.  */
+/* Removes all the back-edges of LOOP except its own back-edge.
+   SCRATCH_PAD is used in if-conversion.  */
 
 static unsigned
-flatten_loop (loop_p loop)
+flatten_loop (loop_p loop, tree *scratch_pad)
 {
   int i, n = loop->num_nodes;
   basic_block *bbs;
@@ -570,6 +571,9 @@ flatten_loop (loop_p loop)
   if (!single_pred_p (loop->latch))
     loop->latch = split_edge (loop_latch_edge (loop));
 
+  if (gate_tree_if_conversion ())
+    tree_if_conversion (loop, scratch_pad);
+
   return TODO_update_ssa | TODO_verify_ssa;
 }
 
@@ -581,12 +585,13 @@ tree_loop_flattening (void)
   unsigned todo = 0;
   loop_p loop;
   loop_iterator li;
+  tree scratch_pad = NULL_TREE;
 
   if (number_of_loops () <= 1)
     return 0;
 
   FOR_EACH_LOOP (li, loop, 0)
-    todo |= flatten_loop (loop);
+    todo |= flatten_loop (loop, &scratch_pad);
 
 #ifdef ENABLE_CHECKING
   verify_dominators (CDI_DOMINATORS);
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/6] Fix PR46029: reimplement if-convert stores.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
                   ` (2 preceding siblings ...)
  2010-10-29  3:57 ` [PATCH 5/6] Call if-conversion from loop flattening Sebastian Pop
@ 2010-10-29  4:07 ` Sebastian Pop
  2010-10-29  4:13 ` [PATCH 2/6] Remove ifcvt_memrefs_wont_trap analysis Sebastian Pop
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  4:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/46029
	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
	* tree-if-conv.c (has_unaligned_memory_refs): New.
	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
	(create_scratchpad): New.
	(create_indirect_cond_expr): New.
	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
	parameter for scratch_pad.
	(combine_blocks): Same.
	(tree_if_conversion): Same.
	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
	scratch_pad.

testsuite/
	* g++.dg/tree-ssa/ifc-pr46029.C: New.
	* gcc.dg/tree-ssa/ifc-8.c: New.
	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
---
 gcc/ChangeLog                               |   15 ++
 gcc/doc/invoke.texi                         |   18 ++-
 gcc/testsuite/ChangeLog                     |    7 +
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 +++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 ++-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++++
 gcc/tree-if-conv.c                          |  193 +++++++++++++++++++++++----
 7 files changed, 318 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 4a51a4d..f14d9b1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,20 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	PR tree-optimization/46029
+	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+	* tree-if-conv.c (has_unaligned_memory_refs): New.
+	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
+	(create_scratchpad): New.
+	(create_indirect_cond_expr): New.
+	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
+	parameter for scratch_pad.
+	(combine_blocks): Same.
+	(tree_if_conversion): Same.
+	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
+	scratch_pad.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* tree-if-conv.c (struct ifc_dr): Removed.
 	(IFC_DR): Removed.
 	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee68454..28b0cbb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6935,20 +6935,26 @@ if vectorization is enabled.
 
 @item -ftree-loop-if-convert-stores
 Attempt to also if-convert conditional jumps containing memory writes.
-This transformation can be unsafe for multi-threaded programs as it
-transforms conditional memory writes into unconditional memory writes.
 For example,
 @smallexample
 for (i = 0; i < N; i++)
   if (cond)
-    A[i] = expr;
+    A[i] = B[i] + 2;
 @end smallexample
 would be transformed to
 @smallexample
-for (i = 0; i < N; i++)
-  A[i] = cond ? expr : A[i];
+void *scratchpad = alloca (64);
+for (i = 0; i < N; i++) @{
+  a = cond ? &A[i] : scratchpad;
+  b = cond ? &B[i] : scratchpad;
+  *a = *b + 2;
+@}
 @end smallexample
-potentially producing data races.
+The compiler allocates a scratchpad memory on the stack for each
+function in which the if-conversion of memory stores or reads
+happened.  This scratchpad memory is used during the part of the
+computation that is discarded, i.e., when the condition is evaluated
+to false.
 
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8ab520e..bf73b91 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,12 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	PR tree-optimization/46029
+	* g++.dg/tree-ssa/ifc-pr46029.C: New.
+	* gcc.dg/tree-ssa/ifc-8.c: New.
+	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* gcc.dg/tree-ssa/flat-loop-1.c: New.
 	* gcc.dg/tree-ssa/flat-loop-2.c: New.
 	* gcc.dg/tree-ssa/flat-loop-3.c: New.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
new file mode 100644
index 0000000..2a54bdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
@@ -0,0 +1,76 @@
+// { dg-do run }
+/* { dg-options "-O -ftree-loop-if-convert-stores" } */
+
+namespace
+{
+  struct rb_tree_node_
+  {
+    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_metadata;
+    }
+    rb_tree_node_ *m_p_left;
+    rb_tree_node_ *m_p_parent;
+    unsigned m_metadata;
+  };
+
+  struct bin_search_tree_const_node_it_
+  {
+    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_p_nd->get_metadata ();
+    }
+    bin_search_tree_const_node_it_ get_l_child ()
+    {
+      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
+    }
+
+    rb_tree_node_ *m_p_nd;
+  };
+
+  struct bin_search_tree_no_data_
+  {
+    typedef rb_tree_node_ *node_pointer;
+      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
+    {
+    }
+    void insert_imp_empty (int r_value)
+    {
+      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
+      m_p_head->m_p_parent = p_new_node;
+      p_new_node->m_p_parent = m_p_head;
+      update_to_top (m_p_head->m_p_parent);
+    }
+    void apply_update (bin_search_tree_const_node_it_ nd_it)
+    {
+      unsigned
+	l_max_endpoint
+	=
+	(nd_it.get_l_child ().m_p_nd ==
+	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
+      nd_it.get_metadata () = l_max_endpoint;
+    }
+    void update_to_top (node_pointer p_nd)
+    {
+      while (p_nd != m_p_head)
+	{
+	  apply_update (p_nd);
+	  p_nd = p_nd->m_p_parent;
+	}
+    }
+
+    rb_tree_node_ * m_p_head;
+  };
+}
+
+int main ()
+{
+  bin_search_tree_no_data_ ().insert_imp_empty (0);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index a9cc816..d88c4a2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
   for (i = 0; i <= nCoeffs; i++)
     {
       level = block[i];
-      if (level < 0)
-	level = level * qmul - qadd;
-      else
-	level = level * qmul + qadd;
-      block[i] = level;
+      if (level)
+        {
+          if (level < 0)
+            {
+              level = level * qmul - qadd;
+            }
+          else
+            {
+              level = level * qmul + qadd;
+            }
+          block[i] = level;
+        }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
new file mode 100644
index 0000000..d7cf279
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef union tree_node *tree;
+struct tree_common
+{
+  unsigned volatile_flag : 1;
+  unsigned unsigned_flag : 1;
+};
+struct tree_type
+{
+  tree next_variant;
+  tree main_variant;
+};
+union tree_node
+{
+  struct tree_common common;
+  struct tree_type type;
+};
+void finish_enum (tree enumtype)
+{
+  tree tem;
+  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
+    {
+      if (tem == enumtype)
+	continue;
+      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
+    }
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index ec03bf6..cb4828a 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -459,6 +459,36 @@ ifcvt_could_trap_p (gimple stmt)
   return gimple_could_trap_p (stmt);
 }
 
+/* Returns true when stmt contains a data reference  */
+
+static bool
+has_unaligned_memory_refs (gimple stmt)
+{
+  int unsignedp, volatilep;
+  HOST_WIDE_INT bitsize, bitpos;
+  tree toffset;
+  enum machine_mode mode;
+  VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
+  bool res = get_references_in_stmt (stmt, &refs);
+  unsigned i;
+  data_ref_loc *ref;
+
+  FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
+    {
+      get_inner_reference (*ref->pos, &bitsize, &bitpos, &toffset,
+			   &mode, &unsignedp, &volatilep, true);
+
+      if ((bitpos % BITS_PER_UNIT) != 0)
+	{
+	  res = true;
+	  break;
+	}
+    }
+
+  VEC_free (data_ref_loc, heap, refs);
+  return res;
+}
+
 /* Return true when STMT is if-convertible.
 
    GIMPLE_ASSIGN statement is not if-convertible if,
@@ -501,6 +531,14 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
 	    fprintf (dump_file, "tree could trap...\n");
 	  return false;
 	}
+
+      if (has_unaligned_memory_refs (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "uses misaligned memory...\n");
+	  return false;
+	}
+
       return true;
     }
 
@@ -1190,6 +1228,78 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Insert at the beginning of the first basic block of the current
+   function the allocation on the stack of N bytes of memory and
+   return a pointer to this scratchpad memory.  */
+
+static tree
+create_scratchpad (void)
+{
+  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+
+  /* void *tmp = __builtin_alloca */
+  const char *name = "scratch_pad";
+  tree x = build_int_cst (integer_type_node, 64);
+  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
+  tree var = create_tmp_var (ptr_type_node, name);
+  tree tmp = make_ssa_name (var, stmt);
+
+  add_referenced_var (var);
+  gimple_call_set_lhs (stmt, tmp);
+  SSA_NAME_DEF_STMT (tmp) = stmt;
+  update_stmt (stmt);
+
+  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+  return tmp;
+}
+
+/* Returns a memory reference to the pointer defined by the
+   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+			   gimple_stmt_iterator *gsi)
+{
+  tree type = TREE_TYPE (ai);
+
+  tree pointer_to_type, address_of_ai, addr_expr, cond_expr;
+  tree pointer, star_pointer;
+  gimple addr_stmt, pointer_stmt;
+
+  /* address_of_ai = &A[i];  */
+  pointer_to_type = build_pointer_type (type);
+  address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
+  add_referenced_var (address_of_ai);
+  addr_expr = build_fold_addr_expr (ai);
+  addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
+  update_stmt (addr_stmt);
+  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+
+  /* Allocate the scratch pad only once per function.  */
+  if (!*scratch_pad)
+    *scratch_pad = create_scratchpad ();
+
+  /* pointer = cond ? address_of_ai : scratch_pad;  */
+  pointer = create_tmp_var (pointer_to_type, "_ifc_");
+  add_referenced_var (pointer);
+  cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
+		      address_of_ai, *scratch_pad);
+  pointer_stmt = gimple_build_assign (pointer, cond_expr);
+  pointer = make_ssa_name (pointer, pointer_stmt);
+  gimple_assign_set_lhs (pointer_stmt, pointer);
+  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
+  update_stmt (pointer_stmt);
+  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
+
+  star_pointer = build_simple_mem_ref (pointer);
+  return star_pointer;
+}
+
 /* Predicate each write to memory in LOOP.
 
    This function transforms control flow constructs containing memory
@@ -1201,10 +1311,19 @@ insert_gimplified_predicates (loop_p loop)
 
    into the following form that does not contain control flow:
 
-   | for (i = 0; i < N; i++)
-   |   A[i] = cond ? expr : A[i];
+   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
+   |
+   | for (i = 0; i < N; i++) {
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
+   | }
+
+   SCRATCH_PAD is allocated on the stack for each function once and it is
+   large enough to contain any kind of scalar assignment or read.  All
+   values read or written to SCRATCH_PAD are not used in the computation.
 
-   The original CFG looks like this:
+   In a more detailed way, the if-conversion of memory writes works
+   like this, supposing that the original CFG looks like this:
 
    | bb_0
    |   i = 0
@@ -1254,10 +1373,12 @@ insert_gimplified_predicates (loop_p loop)
    |   goto bb_1
    | end_bb_4
 
-   predicate_mem_writes is then predicating the memory write as follows:
+   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
+   preceding the loop header, and is predicating the memory write:
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    | end_bb_0
    |
    | bb_1
@@ -1265,12 +1386,14 @@ insert_gimplified_predicates (loop_p loop)
    | end_bb_1
    |
    | bb_2
+   |   cond = some_computation;
    |   if (cond) goto bb_3 else goto bb_4
    | end_bb_2
    |
    | bb_3
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   goto bb_4
    | end_bb_3
    |
@@ -1283,12 +1406,14 @@ insert_gimplified_predicates (loop_p loop)
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    |   if (i < N) goto bb_5 else goto bb_1
    | end_bb_0
    |
    | bb_1
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   if (i < N) goto bb_5 else goto bb_4
    | end_bb_1
    |
@@ -1298,7 +1423,7 @@ insert_gimplified_predicates (loop_p loop)
 */
 
 static void
-predicate_mem_writes (loop_p loop)
+predicate_mem_writes (loop_p loop, tree *scratch_pad)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
 
@@ -1313,20 +1438,35 @@ predicate_mem_writes (loop_p loop)
 	continue;
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	if ((stmt = gsi_stmt (gsi))
-	    && gimple_assign_single_p (stmt)
-	    && gimple_vdef (stmt))
-	  {
-	    tree lhs = gimple_assign_lhs (stmt);
-	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree type = TREE_TYPE (lhs);
-
-	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
-	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
-	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
-	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
-	    update_stmt (stmt);
-	  }
+	{
+	  stmt = gsi_stmt (gsi);
+	  if (gimple_assign_single_p (stmt)
+	      && gimple_vdef (stmt))
+	    {
+	      /* A[i] = x;  */
+	      tree ai = gimple_assign_lhs (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* *pointer = x;  */
+	      gimple_assign_set_lhs (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	  else if (gimple_assign_single_p (stmt)
+		   && gimple_vuse (stmt))
+	    {
+	      /* x = A[i];  */
+	      tree ai = gimple_assign_rhs1 (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* x = *pointer;  */
+	      gimple_assign_set_rhs1 (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	}
     }
 }
 
@@ -1376,7 +1516,7 @@ remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (struct loop *loop)
+combine_blocks (struct loop *loop, tree *scratch_pad)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1389,7 +1529,7 @@ combine_blocks (struct loop *loop)
   predicate_all_scalar_phis (loop);
 
   if (flag_tree_loop_if_convert_stores)
-    predicate_mem_writes (loop);
+    predicate_mem_writes (loop, scratch_pad);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -1478,7 +1618,7 @@ combine_blocks (struct loop *loop)
    profitability analysis.  Returns true when something changed.  */
 
 static bool
-tree_if_conversion (struct loop *loop)
+tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
   ifc_bbs = NULL;
@@ -1490,7 +1630,7 @@ tree_if_conversion (struct loop *loop)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop);
+  combine_blocks (loop, scratch_pad);
 
   if (flag_tree_loop_if_convert_stores)
     mark_sym_for_renaming (gimple_vop (cfun));
@@ -1521,12 +1661,13 @@ main_tree_if_conversion (void)
   struct loop *loop;
   bool changed = false;
   unsigned todo = 0;
+  tree scratch_pad = NULL_TREE;
 
   if (number_of_loops () <= 1)
     return 0;
 
   FOR_EACH_LOOP (li, loop, 0)
-    changed |= tree_if_conversion (loop);
+    changed |= tree_if_conversion (loop, &scratch_pad);
 
   if (changed)
     todo |= TODO_cleanup_cfg;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/6] Remove ifcvt_memrefs_wont_trap analysis.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
                   ` (3 preceding siblings ...)
  2010-10-29  4:07 ` [PATCH 3/6] Fix PR46029: reimplement if-convert stores Sebastian Pop
@ 2010-10-29  4:13 ` Sebastian Pop
  2010-10-29  5:58 ` [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms Sebastian Pop
  2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
  6 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  4:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* tree-if-conv.c (struct ifc_dr): Removed.
	(IFC_DR): Removed.
	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
	(DR_RW_UNCONDITIONALLY): Removed.
	(memrefs_read_or_written_unconditionally): Removed.
	(write_memrefs_written_at_least_once): Removed.
	(ifcvt_memrefs_wont_trap): Removed.
	(ifcvt_could_trap_p): Does not take refs parameter anymore.
	(if_convertible_gimple_assign_stmt_p): Same.
	(if_convertible_stmt_p): Same.
	(if_convertible_loop_p_1): Remove initialization of dr->aux,
	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
	(if_convertible_loop_p): Remove deallocation of the same.
---
 gcc/ChangeLog      |   16 +++++
 gcc/tree-if-conv.c |  192 ++-------------------------------------------------
 2 files changed, 24 insertions(+), 184 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a0148d2..4a51a4d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,21 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* tree-if-conv.c (struct ifc_dr): Removed.
+	(IFC_DR): Removed.
+	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
+	(DR_RW_UNCONDITIONALLY): Removed.
+	(memrefs_read_or_written_unconditionally): Removed.
+	(write_memrefs_written_at_least_once): Removed.
+	(ifcvt_memrefs_wont_trap): Removed.
+	(ifcvt_could_trap_p): Does not take refs parameter anymore.
+	(if_convertible_gimple_assign_stmt_p): Same.
+	(if_convertible_stmt_p): Same.
+	(if_convertible_loop_p_1): Remove initialization of dr->aux,
+	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
+	(if_convertible_loop_p): Remove deallocation of the same.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
 	(tree-loop-flattening.o): New.
 	* common.opt (ftree-loop-flatten): New.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 642dbda..ec03bf6 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -446,168 +446,14 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
-/* Records the status of a data reference.  This struct is attached to
-   each DR->aux field.  */
-
-struct ifc_dr {
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int written_at_least_once;
-
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int rw_unconditionally;
-};
-
-#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
-#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
-#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
-
-/* Returns true when the memory references of STMT are read or written
-   unconditionally.  In other words, this function returns true when
-   for every data reference A in STMT there exist other accesses to
-   the same data reference with predicates that add up (OR-up) to the
-   true predicate: this ensures that the data reference A is touched
-   (read or written) on every iteration of the if-converted loop.  */
-
-static bool
-memrefs_read_or_written_unconditionally (gimple stmt,
-					 VEC (data_reference_p, heap) *drs)
-{
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt)
-      {
-	bool found = false;
-	int x = DR_RW_UNCONDITIONALLY (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && same_data_refs (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_RW_UNCONDITIONALLY (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_RW_UNCONDITIONALLY (a) = 1;
-		  DR_RW_UNCONDITIONALLY (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_RW_UNCONDITIONALLY (a) = 0;
-	    return false;
-	  }
-      }
-
-  return true;
-}
-
-/* Returns true when the memory references of STMT are unconditionally
-   written.  In other words, this function returns true when for every
-   data reference A written in STMT, there exist other writes to the
-   same data reference with predicates that add up (OR-up) to the true
-   predicate: this ensures that the data reference A is written on
-   every iteration of the if-converted loop.  */
-
-static bool
-write_memrefs_written_at_least_once (gimple stmt,
-				     VEC (data_reference_p, heap) *drs)
-{
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt
-	&& DR_IS_WRITE (a))
-      {
-	bool found = false;
-	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && DR_IS_WRITE (b)
-	      && same_data_refs_base_objects (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
-		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
-	    return false;
-	  }
-      }
-
-  return true;
-}
-
-/* Return true when the memory references of STMT won't trap in the
-   if-converted code.  There are two things that we have to check for:
-
-   - writes to memory occur to writable memory: if-conversion of
-   memory writes transforms the conditional memory writes into
-   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
-   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
-   be executed at all in the original code, it may be a readonly
-   memory.  To check that A is not const-qualified, we check that
-   there exists at least an unconditional write to A in the current
-   function.
-
-   - reads or writes to memory are valid memory accesses for every
-   iteration.  To check that the memory accesses are correctly formed
-   and that we are allowed to read and write in these locations, we
-   check that the memory accesses to be if-converted occur at every
-   iteration unconditionally.  */
-
-static bool
-ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
-{
-  return write_memrefs_written_at_least_once (stmt, refs)
-    && memrefs_read_or_written_unconditionally (stmt, refs);
-}
-
 /* Wrapper around gimple_could_trap_p refined for the needs of the
-   if-conversion.  Try to prove that the memory accesses of STMT could
-   not trap in the innermost loop containing STMT.  */
+   if-conversion.  */
 
 static bool
-ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+ifcvt_could_trap_p (gimple stmt)
 {
   if (gimple_vuse (stmt)
-      && !gimple_could_trap_p_1 (stmt, false, false)
-      && ifcvt_memrefs_wont_trap (stmt, refs))
+      && !gimple_could_trap_p_1 (stmt, false, false))
     return false;
 
   return gimple_could_trap_p (stmt);
@@ -621,8 +467,7 @@ ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
    - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (gimple stmt,
-				     VEC (data_reference_p, heap) *refs)
+if_convertible_gimple_assign_stmt_p (gimple stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -650,7 +495,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
   if (flag_tree_loop_if_convert_stores)
     {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "tree could trap...\n");
@@ -690,7 +535,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+if_convertible_stmt_p (gimple stmt)
 {
   switch (gimple_code (stmt))
     {
@@ -700,7 +545,7 @@ if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt, refs);
+      return if_convertible_gimple_assign_stmt_p (stmt);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -1016,18 +861,6 @@ if_convertible_loop_p_1 (struct loop *loop,
   if (!res)
     return false;
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-
-      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
-	{
-	  dr->aux = XNEW (struct ifc_dr);
-	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
-	  DR_RW_UNCONDITIONALLY (dr) = -1;
-	}
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -1040,7 +873,7 @@ if_convertible_loop_p_1 (struct loop *loop,
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (is_predicated (bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr)))
 	    return false;
     }
 
@@ -1101,15 +934,6 @@ if_convertible_loop_p (struct loop *loop)
   ddrs = VEC_alloc (ddr_p, heap, 25);
   res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-      unsigned int i;
-
-      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
-	free (dr->aux);
-    }
-
   free_data_refs (refs);
   free_dependence_relations (ddrs);
   return res;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms.
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
                   ` (4 preceding siblings ...)
  2010-10-29  4:13 ` [PATCH 2/6] Remove ifcvt_memrefs_wont_trap analysis Sebastian Pop
@ 2010-10-29  5:58 ` Sebastian Pop
  2010-10-29 13:44   ` Richard Guenther
  2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
  6 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-10-29  5:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* passes.c (init_optimization_passes): Move pass_flatten_loops and
	pass_slp_vectorize at the end of the loop transforms.
---
 gcc/ChangeLog |    5 +++++
 gcc/passes.c  |    4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8907244..d1215cb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* passes.c (init_optimization_passes): Move pass_flatten_loops and
+	pass_slp_vectorize at the end of the loop transforms.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* passes.c (init_optimization_passes): Do not call pass_if_conversion
 	after pass_flatten_loops.
 	* tree-flow.h (gate_tree_if_conversion): Declared.
diff --git a/gcc/passes.c b/gcc/passes.c
index ed81018..82d5c74 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -913,11 +913,11 @@ init_optimization_passes (void)
 	    }
           NEXT_PASS (pass_predcom);
 	  NEXT_PASS (pass_complete_unroll);
-	  NEXT_PASS (pass_flatten_loops);
-	  NEXT_PASS (pass_slp_vectorize);
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_loop_prefetch);
 	  NEXT_PASS (pass_iv_optimize);
+	  NEXT_PASS (pass_flatten_loops);
+	  NEXT_PASS (pass_slp_vectorize);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
       NEXT_PASS (pass_cse_reciprocals);
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms.
  2010-10-29  5:58 ` [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms Sebastian Pop
@ 2010-10-29 13:44   ` Richard Guenther
  2010-10-30  0:23     ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-10-29 13:44 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Thu, 28 Oct 2010, Sebastian Pop wrote:

> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> 
> 	* passes.c (init_optimization_passes): Move pass_flatten_loops and
> 	pass_slp_vectorize at the end of the loop transforms.
> ---
>  gcc/ChangeLog |    5 +++++
>  gcc/passes.c  |    4 ++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 8907244..d1215cb 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,10 @@
>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>  
> +	* passes.c (init_optimization_passes): Move pass_flatten_loops and
> +	pass_slp_vectorize at the end of the loop transforms.
> +
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
>  	* passes.c (init_optimization_passes): Do not call pass_if_conversion
>  	after pass_flatten_loops.
>  	* tree-flow.h (gate_tree_if_conversion): Declared.
> diff --git a/gcc/passes.c b/gcc/passes.c
> index ed81018..82d5c74 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -913,11 +913,11 @@ init_optimization_passes (void)
>  	    }
>            NEXT_PASS (pass_predcom);
>  	  NEXT_PASS (pass_complete_unroll);
> -	  NEXT_PASS (pass_flatten_loops);
> -	  NEXT_PASS (pass_slp_vectorize);
>  	  NEXT_PASS (pass_parallelize_loops);
>  	  NEXT_PASS (pass_loop_prefetch);
>  	  NEXT_PASS (pass_iv_optimize);
> +	  NEXT_PASS (pass_flatten_loops);
> +	  NEXT_PASS (pass_slp_vectorize);
>  	  NEXT_PASS (pass_tree_loop_done);

IVOPTs should certainly be after SLP.  I also don't expect loop
flattening to introduce SLP opportunities (I'd be curious for
a testcase where it does so).  Which means simply moving flattening
after IVOPTs should be all.

Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms.
  2010-10-29 13:44   ` Richard Guenther
@ 2010-10-30  0:23     ` Sebastian Pop
  2010-10-30  8:01       ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-10-30  0:23 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Fri, Oct 29, 2010 at 08:18, Richard Guenther <rguenther@suse.de> wrote:
> IVOPTs should certainly be after SLP.

Ok.

> I also don't expect loop
> flattening to introduce SLP opportunities (I'd be curious for
> a testcase where it does so).

Whenever the loop nests are not perfectly nested, there is an
opportunity to SLP after loop flattening and if-conversion.
See the last two slides of the presentation that Reza gave at the
summit:

http://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=pop-slides.pdf

for (i = 0; i < 1000; i++) {
   if (i & 1) a[i] = b[i] + 1;
   for (j = 0; j < 50; j++) {
     if (j & 1) c[i,j] = d[j] + 1;
   }
}

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms.
  2010-10-30  0:23     ` Sebastian Pop
@ 2010-10-30  8:01       ` Richard Guenther
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Guenther @ 2010-10-30  8:01 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Fri, Oct 29, 2010 at 10:52 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Fri, Oct 29, 2010 at 08:18, Richard Guenther <rguenther@suse.de> wrote:
>> IVOPTs should certainly be after SLP.
>
> Ok.
>
>> I also don't expect loop
>> flattening to introduce SLP opportunities (I'd be curious for
>> a testcase where it does so).
>
> Whenever the loop nests are not perfectly nested, there is an
> opportunity to SLP after loop flattening and if-conversion.
> See the last two slides of the presentation that Reza gave at the
> summit:
>
> http://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=pop-slides.pdf
>
> for (i = 0; i < 1000; i++) {
>   if (i & 1) a[i] = b[i] + 1;
>   for (j = 0; j < 50; j++) {
>     if (j & 1) c[i,j] = d[j] + 1;
>   }
> }

That's not SLP I think (the slides are somewhat odd - well, the pseudocode is,
I can't see that the loops are equivalent).

Note that SLP is straight-line code vectorization.

Richard.

> Sebastian
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/6] Loop flattening and improved if-conversion
  2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
                   ` (5 preceding siblings ...)
  2010-10-29  5:58 ` [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms Sebastian Pop
@ 2010-11-03 15:18 ` Richard Guenther
  2010-11-03 15:53   ` [PATCH 2/3] if-convert even when the data dependences cannot be computed Sebastian Pop
                     ` (2 more replies)
  6 siblings, 3 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-03 15:18 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Thu, 28 Oct 2010, Sebastian Pop wrote:

> Hi,
> 
> As explained in the GCC Summit paper "Improving GCC's
> auto-vectorization with if-conversion and loop flattening for AMD's
> Bulldozer processors", this patch set implements a loop flattening
> pass on tree-ssa, and improves the if-conversion, removing the now
> unnecessary ifcvt_memrefs_wont_trap analysis: this fixes PR46029.
> 
> The patch-set passed bootstrap with BOOT_CFLAG="-O2 -floop-flatten
> -ftree-loop-if-convert-stores" and test on amd64-linux.  Ok for trunk?

Can you please post a complete patch instead?  A patch series which
reverts changes done by earlier patches is not reviewable (it should
also not be committed that way, btw).

Thanks,
Richard.

> Thanks,
> Sebastian Pop
> --
> AMD / Open Source Compiler Engineering / GNU Tools
> 
> Sebastian Pop (6):
>   Loop flattening on loop-SSA.
>   Remove ifcvt_memrefs_wont_trap analysis.
>   Fix PR46029: reimplement if-convert stores.
>   if-convert even when the data dependences cannot be computed.
>   Call if-conversion from loop flattening.
>   Move loop flattening and SLP vectorization at the end of loop
>     transforms.
> 
>  gcc/ChangeLog                               |   68 +++
>  gcc/Makefile.in                             |    4 +
>  gcc/common.opt                              |    4 +
>  gcc/dbgcnt.def                              |    1 +
>  gcc/doc/invoke.texi                         |   18 +-
>  gcc/params.def                              |    7 +
>  gcc/passes.c                                |    3 +-
>  gcc/testsuite/ChangeLog                     |   14 +
>  gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
>  gcc/timevar.def                             |    1 +
>  gcc/tree-flow.h                             |    4 +
>  gcc/tree-if-conv.c                          |  407 ++++++++----------
>  gcc/tree-loop-flattening.c                  |  630 +++++++++++++++++++++++++++
>  gcc/tree-pass.h                             |    1 +
>  20 files changed, 1151 insertions(+), 242 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
>  create mode 100644 gcc/tree-loop-flattening.c
> 
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
  2010-11-03 15:53   ` [PATCH 2/3] if-convert even when the data dependences cannot be computed Sebastian Pop
@ 2010-11-03 15:53   ` Sebastian Pop
  2010-11-05 12:08     ` Richard Guenther
  2010-11-03 15:54   ` [PATCH 3/3] Loop flattening on loop-SSA Sebastian Pop
  2 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-03 15:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/46029
	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
	* tree-if-conv.c (has_unaligned_memory_refs): New.
	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
	(create_scratchpad): New.
	(create_indirect_cond_expr): New.
	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
	parameter for scratch_pad.
	(combine_blocks): Same.
	(tree_if_conversion): Same.
	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
	scratch_pad.
	(struct ifc_dr): Removed.
	(IFC_DR): Removed.
	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
	(DR_RW_UNCONDITIONALLY): Removed.
	(memrefs_read_or_written_unconditionally): Removed.
	(write_memrefs_written_at_least_once): Removed.
	(ifcvt_memrefs_wont_trap): Removed.
	(ifcvt_could_trap_p): Does not take refs parameter anymore.
	(if_convertible_gimple_assign_stmt_p): Same.
	(if_convertible_stmt_p): Same.
	(if_convertible_loop_p_1): Remove initialization of dr->aux,
	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
	(if_convertible_loop_p): Remove deallocation of the same.

testsuite/
	* g++.dg/tree-ssa/ifc-pr46029.C: New.
	* gcc.dg/tree-ssa/ifc-8.c: New.
	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
---
 gcc/ChangeLog                               |   28 ++
 gcc/doc/invoke.texi                         |   18 +-
 gcc/testsuite/ChangeLog                     |    7 +
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
 gcc/tree-if-conv.c                          |  379 ++++++++++++---------------
 7 files changed, 336 insertions(+), 218 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index beed454..0f58882 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/46029
+	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+	* tree-if-conv.c (has_unaligned_memory_refs): New.
+	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
+	(create_scratchpad): New.
+	(create_indirect_cond_expr): New.
+	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
+	parameter for scratch_pad.
+	(combine_blocks): Same.
+	(tree_if_conversion): Same.
+	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
+	scratch_pad.
+	(struct ifc_dr): Removed.
+	(IFC_DR): Removed.
+	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
+	(DR_RW_UNCONDITIONALLY): Removed.
+	(memrefs_read_or_written_unconditionally): Removed.
+	(write_memrefs_written_at_least_once): Removed.
+	(ifcvt_memrefs_wont_trap): Removed.
+	(ifcvt_could_trap_p): Does not take refs parameter anymore.
+	(if_convertible_gimple_assign_stmt_p): Same.
+	(if_convertible_stmt_p): Same.
+	(if_convertible_loop_p_1): Remove initialization of dr->aux,
+	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
+	(if_convertible_loop_p): Remove deallocation of the same.
+
 2010-10-20  Nathan Froyd  <froydnj@codesourcery.com>
 
 	* ifcvt.c (noce_emit_cmove): If both of the values are SUBREGs, try
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee68454..28b0cbb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6935,20 +6935,26 @@ if vectorization is enabled.
 
 @item -ftree-loop-if-convert-stores
 Attempt to also if-convert conditional jumps containing memory writes.
-This transformation can be unsafe for multi-threaded programs as it
-transforms conditional memory writes into unconditional memory writes.
 For example,
 @smallexample
 for (i = 0; i < N; i++)
   if (cond)
-    A[i] = expr;
+    A[i] = B[i] + 2;
 @end smallexample
 would be transformed to
 @smallexample
-for (i = 0; i < N; i++)
-  A[i] = cond ? expr : A[i];
+void *scratchpad = alloca (64);
+for (i = 0; i < N; i++) @{
+  a = cond ? &A[i] : scratchpad;
+  b = cond ? &B[i] : scratchpad;
+  *a = *b + 2;
+@}
 @end smallexample
-potentially producing data races.
+The compiler allocates a scratchpad memory on the stack for each
+function in which the if-conversion of memory stores or reads
+happened.  This scratchpad memory is used during the part of the
+computation that is discarded, i.e., when the condition is evaluated
+to false.
 
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9d9c543..4233f86 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/46029
+	* g++.dg/tree-ssa/ifc-pr46029.C: New.
+	* gcc.dg/tree-ssa/ifc-8.c: New.
+	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
+
 2010-10-20  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
 
 	PR c++/46024
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
new file mode 100644
index 0000000..2a54bdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
@@ -0,0 +1,76 @@
+// { dg-do run }
+/* { dg-options "-O -ftree-loop-if-convert-stores" } */
+
+namespace
+{
+  struct rb_tree_node_
+  {
+    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_metadata;
+    }
+    rb_tree_node_ *m_p_left;
+    rb_tree_node_ *m_p_parent;
+    unsigned m_metadata;
+  };
+
+  struct bin_search_tree_const_node_it_
+  {
+    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_p_nd->get_metadata ();
+    }
+    bin_search_tree_const_node_it_ get_l_child ()
+    {
+      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
+    }
+
+    rb_tree_node_ *m_p_nd;
+  };
+
+  struct bin_search_tree_no_data_
+  {
+    typedef rb_tree_node_ *node_pointer;
+      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
+    {
+    }
+    void insert_imp_empty (int r_value)
+    {
+      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
+      m_p_head->m_p_parent = p_new_node;
+      p_new_node->m_p_parent = m_p_head;
+      update_to_top (m_p_head->m_p_parent);
+    }
+    void apply_update (bin_search_tree_const_node_it_ nd_it)
+    {
+      unsigned
+	l_max_endpoint
+	=
+	(nd_it.get_l_child ().m_p_nd ==
+	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
+      nd_it.get_metadata () = l_max_endpoint;
+    }
+    void update_to_top (node_pointer p_nd)
+    {
+      while (p_nd != m_p_head)
+	{
+	  apply_update (p_nd);
+	  p_nd = p_nd->m_p_parent;
+	}
+    }
+
+    rb_tree_node_ * m_p_head;
+  };
+}
+
+int main ()
+{
+  bin_search_tree_no_data_ ().insert_imp_empty (0);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index a9cc816..d88c4a2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
   for (i = 0; i <= nCoeffs; i++)
     {
       level = block[i];
-      if (level < 0)
-	level = level * qmul - qadd;
-      else
-	level = level * qmul + qadd;
-      block[i] = level;
+      if (level)
+        {
+          if (level < 0)
+            {
+              level = level * qmul - qadd;
+            }
+          else
+            {
+              level = level * qmul + qadd;
+            }
+          block[i] = level;
+        }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
new file mode 100644
index 0000000..d7cf279
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef union tree_node *tree;
+struct tree_common
+{
+  unsigned volatile_flag : 1;
+  unsigned unsigned_flag : 1;
+};
+struct tree_type
+{
+  tree next_variant;
+  tree main_variant;
+};
+union tree_node
+{
+  struct tree_common common;
+  struct tree_type type;
+};
+void finish_enum (tree enumtype)
+{
+  tree tem;
+  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
+    {
+      if (tem == enumtype)
+	continue;
+      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
+    }
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 642dbda..9fc6190 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -446,171 +446,47 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
-/* Records the status of a data reference.  This struct is attached to
-   each DR->aux field.  */
-
-struct ifc_dr {
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int written_at_least_once;
-
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int rw_unconditionally;
-};
-
-#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
-#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
-#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
-
-/* Returns true when the memory references of STMT are read or written
-   unconditionally.  In other words, this function returns true when
-   for every data reference A in STMT there exist other accesses to
-   the same data reference with predicates that add up (OR-up) to the
-   true predicate: this ensures that the data reference A is touched
-   (read or written) on every iteration of the if-converted loop.  */
-
-static bool
-memrefs_read_or_written_unconditionally (gimple stmt,
-					 VEC (data_reference_p, heap) *drs)
-{
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt)
-      {
-	bool found = false;
-	int x = DR_RW_UNCONDITIONALLY (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && same_data_refs (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_RW_UNCONDITIONALLY (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_RW_UNCONDITIONALLY (a) = 1;
-		  DR_RW_UNCONDITIONALLY (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_RW_UNCONDITIONALLY (a) = 0;
-	    return false;
-	  }
-      }
-
-  return true;
-}
-
-/* Returns true when the memory references of STMT are unconditionally
-   written.  In other words, this function returns true when for every
-   data reference A written in STMT, there exist other writes to the
-   same data reference with predicates that add up (OR-up) to the true
-   predicate: this ensures that the data reference A is written on
-   every iteration of the if-converted loop.  */
+/* Wrapper around gimple_could_trap_p refined for the needs of the
+   if-conversion.  */
 
 static bool
-write_memrefs_written_at_least_once (gimple stmt,
-				     VEC (data_reference_p, heap) *drs)
+ifcvt_could_trap_p (gimple stmt)
 {
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt
-	&& DR_IS_WRITE (a))
-      {
-	bool found = false;
-	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && DR_IS_WRITE (b)
-	      && same_data_refs_base_objects (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
-		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
-	    return false;
-	  }
-      }
+  if (gimple_vuse (stmt)
+      && !gimple_could_trap_p_1 (stmt, false, false))
+    return false;
 
-  return true;
+  return gimple_could_trap_p (stmt);
 }
 
-/* Return true when the memory references of STMT won't trap in the
-   if-converted code.  There are two things that we have to check for:
-
-   - writes to memory occur to writable memory: if-conversion of
-   memory writes transforms the conditional memory writes into
-   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
-   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
-   be executed at all in the original code, it may be a readonly
-   memory.  To check that A is not const-qualified, we check that
-   there exists at least an unconditional write to A in the current
-   function.
-
-   - reads or writes to memory are valid memory accesses for every
-   iteration.  To check that the memory accesses are correctly formed
-   and that we are allowed to read and write in these locations, we
-   check that the memory accesses to be if-converted occur at every
-   iteration unconditionally.  */
+/* Returns true when stmt contains a data reference.  */
 
 static bool
-ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
+has_unaligned_memory_refs (gimple stmt)
 {
-  return write_memrefs_written_at_least_once (stmt, refs)
-    && memrefs_read_or_written_unconditionally (stmt, refs);
-}
-
-/* Wrapper around gimple_could_trap_p refined for the needs of the
-   if-conversion.  Try to prove that the memory accesses of STMT could
-   not trap in the innermost loop containing STMT.  */
+  int unsignedp, volatilep;
+  HOST_WIDE_INT bitsize, bitpos;
+  tree toffset;
+  enum machine_mode mode;
+  VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
+  bool res = get_references_in_stmt (stmt, &refs);
+  unsigned i;
+  data_ref_loc *ref;
+
+  FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
+    {
+      get_inner_reference (*ref->pos, &bitsize, &bitpos, &toffset,
+			   &mode, &unsignedp, &volatilep, true);
 
-static bool
-ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
-{
-  if (gimple_vuse (stmt)
-      && !gimple_could_trap_p_1 (stmt, false, false)
-      && ifcvt_memrefs_wont_trap (stmt, refs))
-    return false;
+      if ((bitpos % BITS_PER_UNIT) != 0)
+	{
+	  res = true;
+	  break;
+	}
+    }
 
-  return gimple_could_trap_p (stmt);
+  VEC_free (data_ref_loc, heap, refs);
+  return res;
 }
 
 /* Return true when STMT is if-convertible.
@@ -621,8 +497,7 @@ ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
    - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (gimple stmt,
-				     VEC (data_reference_p, heap) *refs)
+if_convertible_gimple_assign_stmt_p (gimple stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -650,12 +525,20 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
   if (flag_tree_loop_if_convert_stores)
     {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "tree could trap...\n");
 	  return false;
 	}
+
+      if (has_unaligned_memory_refs (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "uses misaligned memory...\n");
+	  return false;
+	}
+
       return true;
     }
 
@@ -690,7 +573,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+if_convertible_stmt_p (gimple stmt)
 {
   switch (gimple_code (stmt))
     {
@@ -700,7 +583,7 @@ if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt, refs);
+      return if_convertible_gimple_assign_stmt_p (stmt);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -1016,18 +899,6 @@ if_convertible_loop_p_1 (struct loop *loop,
   if (!res)
     return false;
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-
-      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
-	{
-	  dr->aux = XNEW (struct ifc_dr);
-	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
-	  DR_RW_UNCONDITIONALLY (dr) = -1;
-	}
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -1040,7 +911,7 @@ if_convertible_loop_p_1 (struct loop *loop,
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (is_predicated (bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr)))
 	    return false;
     }
 
@@ -1101,15 +972,6 @@ if_convertible_loop_p (struct loop *loop)
   ddrs = VEC_alloc (ddr_p, heap, 25);
   res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-      unsigned int i;
-
-      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
-	free (dr->aux);
-    }
-
   free_data_refs (refs);
   free_dependence_relations (ddrs);
   return res;
@@ -1366,6 +1228,78 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Insert at the beginning of the first basic block of the current
+   function the allocation on the stack of N bytes of memory and
+   return a pointer to this scratchpad memory.  */
+
+static tree
+create_scratchpad (void)
+{
+  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+
+  /* void *tmp = __builtin_alloca */
+  const char *name = "scratch_pad";
+  tree x = build_int_cst (integer_type_node, 64);
+  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
+  tree var = create_tmp_var (ptr_type_node, name);
+  tree tmp = make_ssa_name (var, stmt);
+
+  add_referenced_var (var);
+  gimple_call_set_lhs (stmt, tmp);
+  SSA_NAME_DEF_STMT (tmp) = stmt;
+  update_stmt (stmt);
+
+  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+  return tmp;
+}
+
+/* Returns a memory reference to the pointer defined by the
+   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+			   gimple_stmt_iterator *gsi)
+{
+  tree type = TREE_TYPE (ai);
+
+  tree pointer_to_type, address_of_ai, addr_expr, cond_expr;
+  tree pointer, star_pointer;
+  gimple addr_stmt, pointer_stmt;
+
+  /* address_of_ai = &A[i];  */
+  pointer_to_type = build_pointer_type (type);
+  address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
+  add_referenced_var (address_of_ai);
+  addr_expr = build_fold_addr_expr (ai);
+  addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
+  update_stmt (addr_stmt);
+  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+
+  /* Allocate the scratch pad only once per function.  */
+  if (!*scratch_pad)
+    *scratch_pad = create_scratchpad ();
+
+  /* pointer = cond ? address_of_ai : scratch_pad;  */
+  pointer = create_tmp_var (pointer_to_type, "_ifc_");
+  add_referenced_var (pointer);
+  cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
+		      address_of_ai, *scratch_pad);
+  pointer_stmt = gimple_build_assign (pointer, cond_expr);
+  pointer = make_ssa_name (pointer, pointer_stmt);
+  gimple_assign_set_lhs (pointer_stmt, pointer);
+  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
+  update_stmt (pointer_stmt);
+  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
+
+  star_pointer = build_simple_mem_ref (pointer);
+  return star_pointer;
+}
+
 /* Predicate each write to memory in LOOP.
 
    This function transforms control flow constructs containing memory
@@ -1377,10 +1311,19 @@ insert_gimplified_predicates (loop_p loop)
 
    into the following form that does not contain control flow:
 
-   | for (i = 0; i < N; i++)
-   |   A[i] = cond ? expr : A[i];
+   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
+   |
+   | for (i = 0; i < N; i++) {
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
+   | }
+
+   SCRATCH_PAD is allocated on the stack for each function once and it is
+   large enough to contain any kind of scalar assignment or read.  All
+   values read or written to SCRATCH_PAD are not used in the computation.
 
-   The original CFG looks like this:
+   In a more detailed way, the if-conversion of memory writes works
+   like this, supposing that the original CFG looks like this:
 
    | bb_0
    |   i = 0
@@ -1430,10 +1373,12 @@ insert_gimplified_predicates (loop_p loop)
    |   goto bb_1
    | end_bb_4
 
-   predicate_mem_writes is then predicating the memory write as follows:
+   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
+   preceding the loop header, and is predicating the memory write:
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    | end_bb_0
    |
    | bb_1
@@ -1441,12 +1386,14 @@ insert_gimplified_predicates (loop_p loop)
    | end_bb_1
    |
    | bb_2
+   |   cond = some_computation;
    |   if (cond) goto bb_3 else goto bb_4
    | end_bb_2
    |
    | bb_3
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   goto bb_4
    | end_bb_3
    |
@@ -1459,12 +1406,14 @@ insert_gimplified_predicates (loop_p loop)
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    |   if (i < N) goto bb_5 else goto bb_1
    | end_bb_0
    |
    | bb_1
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   if (i < N) goto bb_5 else goto bb_4
    | end_bb_1
    |
@@ -1474,7 +1423,7 @@ insert_gimplified_predicates (loop_p loop)
 */
 
 static void
-predicate_mem_writes (loop_p loop)
+predicate_mem_writes (loop_p loop, tree *scratch_pad)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
 
@@ -1489,20 +1438,35 @@ predicate_mem_writes (loop_p loop)
 	continue;
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	if ((stmt = gsi_stmt (gsi))
-	    && gimple_assign_single_p (stmt)
-	    && gimple_vdef (stmt))
-	  {
-	    tree lhs = gimple_assign_lhs (stmt);
-	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree type = TREE_TYPE (lhs);
-
-	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
-	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
-	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
-	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
-	    update_stmt (stmt);
-	  }
+	{
+	  stmt = gsi_stmt (gsi);
+	  if (gimple_assign_single_p (stmt)
+	      && gimple_vdef (stmt))
+	    {
+	      /* A[i] = x;  */
+	      tree ai = gimple_assign_lhs (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* *pointer = x;  */
+	      gimple_assign_set_lhs (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	  else if (gimple_assign_single_p (stmt)
+		   && gimple_vuse (stmt))
+	    {
+	      /* x = A[i];  */
+	      tree ai = gimple_assign_rhs1 (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* x = *pointer;  */
+	      gimple_assign_set_rhs1 (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	}
     }
 }
 
@@ -1552,7 +1516,7 @@ remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (struct loop *loop)
+combine_blocks (struct loop *loop, tree *scratch_pad)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1565,7 +1529,7 @@ combine_blocks (struct loop *loop)
   predicate_all_scalar_phis (loop);
 
   if (flag_tree_loop_if_convert_stores)
-    predicate_mem_writes (loop);
+    predicate_mem_writes (loop, scratch_pad);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -1654,7 +1618,7 @@ combine_blocks (struct loop *loop)
    profitability analysis.  Returns true when something changed.  */
 
 static bool
-tree_if_conversion (struct loop *loop)
+tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
   ifc_bbs = NULL;
@@ -1666,7 +1630,7 @@ tree_if_conversion (struct loop *loop)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop);
+  combine_blocks (loop, scratch_pad);
 
   if (flag_tree_loop_if_convert_stores)
     mark_sym_for_renaming (gimple_vop (cfun));
@@ -1697,12 +1661,13 @@ main_tree_if_conversion (void)
   struct loop *loop;
   bool changed = false;
   unsigned todo = 0;
+  tree scratch_pad = NULL_TREE;
 
   if (number_of_loops () <= 1)
     return 0;
 
   FOR_EACH_LOOP (li, loop, 0)
-    changed |= tree_if_conversion (loop);
+    changed |= tree_if_conversion (loop, &scratch_pad);
 
   if (changed)
     todo |= TODO_cleanup_cfg;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/3] if-convert even when the data dependences cannot be computed.
  2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
@ 2010-11-03 15:53   ` Sebastian Pop
  2010-11-03 20:47     ` Richard Guenther
  2010-11-03 15:53   ` [PATCH 1/3] Fix PR46029: reimplement if-convert stores Sebastian Pop
  2010-11-03 15:54   ` [PATCH 3/3] Loop flattening on loop-SSA Sebastian Pop
  2 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-03 15:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
	compute_data_dependences_for_loop.
	(if_convertible_loop_p): Do not free refs and ddrs.
---
 gcc/ChangeLog      |    6 ++++++
 gcc/tree-if-conv.c |   24 +++---------------------
 2 files changed, 9 insertions(+), 21 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 0f58882..3ceb7b6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
+	compute_data_dependences_for_loop.
+	(if_convertible_loop_p): Do not free refs and ddrs.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	PR tree-optimization/46029
 	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
 	* tree-if-conv.c (has_unaligned_memory_refs): New.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 9fc6190..5b941af 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -855,24 +855,15 @@ predicate_bbs (loop_p loop)
 }
 
 /* Return true when LOOP is if-convertible.  This is a helper function
-   for if_convertible_loop_p.  REFS and DDRS are initialized and freed
-   in if_convertible_loop_p.  */
+   for if_convertible_loop_p.  */
 
 static bool
-if_convertible_loop_p_1 (struct loop *loop,
-			 VEC (data_reference_p, heap) **refs,
-			 VEC (ddr_p, heap) **ddrs)
+if_convertible_loop_p_1 (struct loop *loop)
 {
   bool res;
   unsigned int i;
   basic_block exit_bb = NULL;
 
-  /* Don't if-convert the loop when the data dependences cannot be
-     computed: the loop won't be vectorized in that case.  */
-  res = compute_data_dependences_for_loop (loop, true, refs, ddrs);
-  if (!res)
-    return false;
-
   calculate_dominance_info (CDI_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
@@ -934,9 +925,6 @@ if_convertible_loop_p (struct loop *loop)
 {
   edge e;
   edge_iterator ei;
-  bool res = false;
-  VEC (data_reference_p, heap) *refs;
-  VEC (ddr_p, heap) *ddrs;
 
   /* Handle only innermost loop.  */
   if (!loop || loop->inner)
@@ -968,13 +956,7 @@ if_convertible_loop_p (struct loop *loop)
     if (loop_exit_edge_p (loop, e))
       return false;
 
-  refs = VEC_alloc (data_reference_p, heap, 5);
-  ddrs = VEC_alloc (ddr_p, heap, 25);
-  res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
-
-  free_data_refs (refs);
-  free_dependence_relations (ddrs);
-  return res;
+  return if_convertible_loop_p_1 (loop);
 }
 
 /* Basic block BB has two predecessors.  Using predecessor's bb
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
  2010-11-03 15:53   ` [PATCH 2/3] if-convert even when the data dependences cannot be computed Sebastian Pop
  2010-11-03 15:53   ` [PATCH 1/3] Fix PR46029: reimplement if-convert stores Sebastian Pop
@ 2010-11-03 15:54   ` Sebastian Pop
  2010-11-03 16:57     ` Nathan Froyd
  2010-11-05 13:05     ` Richard Guenther
  2 siblings, 2 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-11-03 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
	(tree-loop-flattening.o): New.
	* common.opt (ftree-loop-flatten): New.
	* dbgcnt.def (lflat): New.
	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
	* passes.c (init_optimization_passes): Add new passes
	pass_flatten_loops and pass_if_conversion after loop vectorization
	and before pass_slp_vectorize.
	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
	* tree-loop-flattening.c: New.
	* tree-pass.h (pass_flatten_loops): Declared.
	* tree-flow.h (gate_tree_if_conversion): Declared.
	(tree_if_conversion): Declared.
	* tree-if-conv.c (tree_if_conversion): Not static anymore.
	(gate_tree_if_conversion): Same.

	* gcc.dg/tree-ssa/flat-loop-1.c: New.
	* gcc.dg/tree-ssa/flat-loop-2.c: New.
	* gcc.dg/tree-ssa/flat-loop-3.c: New.
	* gcc.dg/tree-ssa/flat-loop-4.c: New.
---
 gcc/ChangeLog                               |   18 +
 gcc/Makefile.in                             |    4 +
 gcc/common.opt                              |    4 +
 gcc/dbgcnt.def                              |    1 +
 gcc/params.def                              |    7 +
 gcc/passes.c                                |    1 +
 gcc/testsuite/ChangeLog                     |    7 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
 gcc/timevar.def                             |    1 +
 gcc/tree-flow.h                             |    4 +
 gcc/tree-if-conv.c                          |    4 +-
 gcc/tree-loop-flattening.c                  |  630 +++++++++++++++++++++++++++
 gcc/tree-pass.h                             |    1 +
 16 files changed, 789 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
 create mode 100644 gcc/tree-loop-flattening.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3ceb7b6..f312b27 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,23 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
+	(tree-loop-flattening.o): New.
+	* common.opt (ftree-loop-flatten): New.
+	* dbgcnt.def (lflat): New.
+	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
+	* passes.c (init_optimization_passes): Add new passes
+	pass_flatten_loops and pass_if_conversion after loop vectorization
+	and before pass_slp_vectorize.
+	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
+	* tree-loop-flattening.c: New.
+	* tree-pass.h (pass_flatten_loops): Declared.
+	* tree-flow.h (gate_tree_if_conversion): Declared.
+	(tree_if_conversion): Declared.
+	* tree-if-conv.c (tree_if_conversion): Not static anymore.
+	(gate_tree_if_conversion): Same.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
 	compute_data_dependences_for_loop.
 	(if_convertible_loop_p): Do not free refs and ddrs.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 898e962..55b67f4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1368,6 +1368,7 @@ OBJS-common = \
 	tree-into-ssa.o \
 	tree-iterator.o \
 	tree-loop-distribution.o \
+	tree-loop-flattening.o \
 	tree-loop-linear.o \
 	tree-nested.o \
 	tree-nrv.o \
@@ -2773,6 +2774,9 @@ tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) coret
    $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
    $(TREE_PASS_H) $(TREE_DATA_REF_H) $(EXPR_H) \
    langhooks.h $(TREE_VECTORIZER_H)
+tree-loop-flattening.o: tree-loop-flattening.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
+   $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(TREE_PASS_H) $(DBGCNT_H)
 tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(TREE_FLOW_H) $(TREE_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
    $(DIAGNOSTIC_H) $(TREE_PASS_H) langhooks.h gt-tree-parloops.h \
diff --git a/gcc/common.opt b/gcc/common.opt
index 8fe796f..c969979 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1632,6 +1632,10 @@ ftree-loop-distribute-patterns
 Common Report Var(flag_tree_loop_distribute_patterns) Optimization
 Enable loop distribution for patterns transformed into a library call
 
+ftree-loop-flatten
+Common Report Var(flag_tree_loop_flattening) Optimization
+Enable loop flattening on trees
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 0492d66..0ef9a72 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -166,6 +166,7 @@ DEBUG_COUNTER (if_conversion_tree)
 DEBUG_COUNTER (if_after_combine)
 DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (local_alloc_for_sched)
+DEBUG_COUNTER (lflat)
 DEBUG_COUNTER (postreload_cse)
 DEBUG_COUNTER (pre)
 DEBUG_COUNTER (pre_insn)
diff --git a/gcc/params.def b/gcc/params.def
index 49a6185..3fffc35 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -788,6 +788,13 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
 	  "maximum number of basic blocks per function to be analyzed by Graphite",
 	  100, 0, 0)
 
+/* Maximal number of basic blocks in a loop to be flattened.  */
+
+DEFPARAM (PARAM_LFLAT_MAX_NB_BBS,
+	  "lflat-max-nb-bbs",
+	  "maximum number of basic blocks in a loop to be flattened",
+	  100, 0, 0)
+
 /* Avoid doing loop invariant motion on very large loops.  */
 
 DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
diff --git a/gcc/passes.c b/gcc/passes.c
index 1308ce9..22110a4 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -917,6 +917,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_loop_prefetch);
 	  NEXT_PASS (pass_iv_optimize);
+	  NEXT_PASS (pass_flatten_loops);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
       NEXT_PASS (pass_cse_reciprocals);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4233f86..2b3b93e 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,12 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* gcc.dg/tree-ssa/flat-loop-1.c: New.
+	* gcc.dg/tree-ssa/flat-loop-2.c: New.
+	* gcc.dg/tree-ssa/flat-loop-3.c: New.
+	* gcc.dg/tree-ssa/flat-loop-4.c: New.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	PR tree-optimization/46029
 	* g++.dg/tree-ssa/ifc-pr46029.C: New.
 	* gcc.dg/tree-ssa/ifc-8.c: New.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
new file mode 100644
index 0000000..bee8a2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+		      struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    *pp = b;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  pss = *pp;
+  while (pss != ((void *)0))
+    ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
new file mode 100644
index 0000000..a7287fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct stack_segment *next;
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+        struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  if (b == ((void *)0))
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    ;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  while (pss != ((void *)0))
+    {
+      struct stack_segment *next;
+      next = pss->next;
+ {
+   if (free_dynamic)
+     {
+       ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+     }
+ }
+      pss = next;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
new file mode 100644
index 0000000..d3d66ab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+
+int
+split_directories (const char *name, int *ptr_num_dirs)
+{
+  int num_dirs = 0;
+  char **dirs;
+  const char *p, *q;
+  int ch;
+  while ((ch = *p++) != '\0')
+    {
+   num_dirs++;
+   while (((*p) == '/'))
+     p++;
+    }
+  return (dirs[num_dirs - 1] == ((void *)0));
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
new file mode 100644
index 0000000..8e551ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+void
+formatted_backspace (int common, char *s)
+{
+  int base;
+  int n;
+  do
+    {
+      if (sseek (s, base, 0) < 0)
+	goto io_error;
+
+      while (n > 0)
+	{
+          n--;
+	  base += n + 1;
+	}
+    }
+  while (base != 0);
+ io_error:
+  generate_error (common, 0, ((void *)0));
+}
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 86e2999..89ff8e8 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -152,6 +152,7 @@ DEFTIMEVAR (TV_GRAPHITE_DATA_DEPS    , "Graphite data dep analysis")
 DEFTIMEVAR (TV_GRAPHITE_CODE_GEN     , "Graphite code generation")
 DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
 DEFTIMEVAR (TV_TREE_LOOP_DISTRIBUTION, "tree loop distribution")
+DEFTIMEVAR (TV_TREE_LOOP_FLATTENING  , "tree loop flattening")
 DEFTIMEVAR (TV_CHECK_DATA_DEPS       , "tree check data dependences")
 DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
 DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index c2702dc..e1ee69f 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -730,6 +730,10 @@ bool contains_abnormal_ssa_name_p (tree);
 bool stmt_dominates_stmt_p (gimple, gimple);
 void mark_virtual_ops_for_renaming (gimple);
 
+/* In tree-if-conv.c */
+bool gate_tree_if_conversion (void);
+bool tree_if_conversion (struct loop *, tree *);
+
 /* In tree-ssa-dce.c */
 void mark_virtual_phi_result_for_renaming (gimple);
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 5b941af..3c30abb 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1599,7 +1599,7 @@ combine_blocks (struct loop *loop, tree *scratch_pad)
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns true when something changed.  */
 
-static bool
+bool
 tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
@@ -1662,7 +1662,7 @@ main_tree_if_conversion (void)
 
 /* Returns true when the if-conversion pass is enabled.  */
 
-static bool
+bool
 gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
new file mode 100644
index 0000000..4bc8768
--- /dev/null
+++ b/gcc/tree-loop-flattening.c
@@ -0,0 +1,630 @@
+/* Loop flattening.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by Sebastian Pop <sebastian.pop@amd.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "ggc.h"
+#include "tree.h"
+#include "rtl.h"
+#include "output.h"
+#include "basic-block.h"
+#include "diagnostic.h"
+#include "tree-flow.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "tree-pass.h"
+#include "gimple.h"
+#include "params.h"
+#include "dbgcnt.h"
+
+/* This loop flattening pass transforms backward pointing edges into
+   forward pointing edges.
+
+   The back-edge removal transformation was described in the 1983
+   paper by Allen J. R., Ken Kennedy, Carrie Porterfield, and Joe
+   Warren: "Conversion of control dependence to data dependence"
+   available from http://doi.acm.org/10.1145/567067.567085
+
+   The back-edge removal algorithm was presented in that paper as part
+   of the if-conversion algorithm for backward pointing edges.  In
+   this section we will first provide a description of this technique
+   adapted for the Gimple-SSA form, followed by an example, and a
+   discussion of the differences with the higher level loop flattening
+   transformation.
+
+   The back-edge removal algorithm transforms control dependences into
+   data dependences by using a boolean variable.  The values taken by
+   the boolean variable control the execution path of the forward
+   edges created in order to use the back-edge of an outer loop.
+
+   The first step of the algorithm detects a surrounding loop and all
+   the back-edges of the loop body: these back-edges can be inner
+   loops or strongly connected components of the CFG that cannot be
+   reduced to natural loops.
+
+   Each back-edge is removed by redirecting the target of the
+   back-edge to the latch basic block of the surrounding loop.  A
+   boolean variable is created in the latch.  It is cleared when the
+   redirected back-edge is taken and it is set to true for any other
+   paths leading to the latch.
+
+   The header basic block of the surrounding loop is split before its
+   statements and a new condition is added based on the control
+   variable: when the control variable is set to true, the execution
+   proceeds as normal to the basic block that contains the statements
+   of the header; when the control variable is cleared, meaning that
+   the back-edge has been taken, the execution proceeds to the point
+   where the redirected back-edge was pointing.
+
+   The last step updates the SSA form after all the back-edges have
+   been redirected to the latch, and the new edges from the header to
+   the destination of back-edges have been created.
+
+   Another description of loop flattening in a very Fortran specific
+   way is in the 1992 paper by Reinhard von Hanxleden and Ken Kennedy:
+   "Relaxing SIMD Control Flow Constraints using Loop Transformations"
+   available from
+   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
+
+/* Keep the loop structure for LOOP and remove all the loop structures
+   under LOOP.  */
+
+static void
+cancel_subloops (loop_p loop)
+{
+  int i;
+  loop_p li;
+  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
+
+  for (li = loop->inner; li; li = li->next)
+    VEC_safe_push (loop_p, heap, lv, li);
+
+  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
+    cancel_loop_tree (li);
+
+  VEC_free (loop_p, heap, lv);
+}
+
+/* Before creating other phi nodes in LOOP->header for the control
+   flags, update the phi nodes of LOOP->header and add the necessary
+   phi nodes in the LOOP->latch that now contains several paths on
+   which the values are not updated.  PRED_E is the single edge that
+   was pointing to the LOOP->latch basic block before inner back-edges
+   were redirected to the LOOP->latch.  */
+
+static void
+update_loop_phi_nodes (loop_p loop, edge pred_e)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      edge e;
+      edge_iterator ei;
+      gimple phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      tree res = gimple_phi_result (phi);
+      tree var = SSA_NAME_VAR (res);
+
+      phi = create_phi_node (var, loop->latch);
+      create_new_def_for (gimple_phi_result (phi), phi,
+			  gimple_phi_result_ptr (phi));
+
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (phi, (e == pred_e ? back_arg : res),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (phi);
+      add_phi_arg (gsi_stmt (gsi), res, loop_latch_edge (loop),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Creates a control flag for the FORWARDED_EDGE that represents the
+   back-edge that has been forwarded to the latch basic block of LOOP.
+   INNER_BODY is the basic block to which the back-edge was pointing
+   before redirection.  This function creates a boolean control flag
+   that is cleared when the FORWARDED_EDGE is taken and set for all
+   the other paths.  This function adds the corresponding phi nodes in
+   LOOP->latch and LOOP->header, and finally adds an edge from
+   LOOP->header to the INNER_BODY guarded by the control flag.  */
+
+static void
+create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
+{
+  edge e, preheader;
+  edge outer_latch_e = loop_latch_edge (loop);
+  const char *name = "_flat_";
+  tree var = create_tmp_var (boolean_type_node, name);
+  tree res;
+  gimple phi, cond_stmt;
+  gimple_stmt_iterator gsi;
+  edge_iterator ei;
+
+  /* Adds a control variable for the redirected FORWARDED_EDGE.  */
+  add_referenced_var (var);
+  phi = create_phi_node (var, forwarded_edge->dest);
+  create_new_def_for (gimple_phi_result (phi), phi,
+		      gimple_phi_result_ptr (phi));
+
+  FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
+    add_phi_arg (phi, (e == forwarded_edge
+		       ? boolean_false_node
+		       : boolean_true_node),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Add a phi node in LOOP->header for the control variable.  */
+  phi = create_phi_node (var, loop->header);
+  create_new_def_for (gimple_phi_result (phi), phi,
+		      gimple_phi_result_ptr (phi));
+
+  preheader = loop_preheader_edge (loop);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (phi, (e == preheader
+		       ? boolean_true_node
+		       : res),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Split LOOP->header to insert the control variable condition.  */
+  e = split_block_after_labels (loop->header);
+  e->flags = EDGE_TRUE_VALUE;
+  e = make_edge (loop->header, inner_body, EDGE_FALSE_VALUE);
+  cond_stmt = gimple_build_cond (EQ_EXPR, res, boolean_true_node,
+				 NULL_TREE, NULL_TREE);
+  gsi = gsi_last_bb (loop->header);
+  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
+}
+
+/* Adds phi nodes to the LOOP->header and LOOP->latch for the ssa_name
+   NAME.  ARG is the argument of the latch phi node set for the
+   FORWARDED_EDGE, and all the other edges merged by the latch phi
+   node are set to the result of the LOOP->header phi node.  The latch
+   edge of the LOOP->header phi node is set to the result of the
+   LOOP->latch phi node, and the other argument is set to an arbitrary
+   valid value defined before the loop (note that this initial value
+   is never used in the loop).  Returns the LOOP->header phi result.  */
+
+static tree
+add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
+			   tree arg)
+{
+  edge e;
+  edge_iterator ei;
+  tree res, zero, var = SSA_NAME_VAR (name);
+  gimple loop_phi = create_phi_node (var, loop->header);
+  gimple latch_phi = create_phi_node (var, loop->latch);
+
+  create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
+		      gimple_phi_result_ptr (loop_phi));
+  create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
+		      gimple_phi_result_ptr (latch_phi));
+
+  /* The value set to ZERO will never be used in the loop, however we
+     have to construct something meaningful for virtual SSA_NAMEs.  */
+  if (TREE_CODE (arg) != SSA_NAME)
+    zero = arg;
+  else if (is_gimple_reg (arg))
+    zero = fold_convert (TREE_TYPE (arg), integer_zero_node);
+  else
+    zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));
+
+  res = gimple_phi_result (latch_phi);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		 e, UNKNOWN_LOCATION);
+
+  res = gimple_phi_result (loop_phi);
+  FOR_EACH_EDGE (e, ei, loop->latch->preds)
+    add_phi_arg (latch_phi, (e == forwarded_edge ? arg : res),
+		 e, UNKNOWN_LOCATION);
+
+  return res;
+}
+
+/* Creates phi nodes for each inductive definition, i.e., loop phi
+   nodes.  For each induction phi node in the old loop header, i.e.,
+   in the single_succ (INNER_BODY), insert a phi node in the
+   LOOP->latch that takes the updated value of the induction on the
+   FORWARDED_EDGE, and maintains the same value as in the phi node of
+   the LOOP->header for all the other possible paths reaching
+   LOOP->latch.  This function has to be called after all the
+   back-edges have been redirected.  */
+
+static void
+update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
+				  basic_block inner_body)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (single_succ (inner_body));
+       !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple old_loop_phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (old_loop_phi,
+					     single_succ_edge (inner_body));
+      tree res = gimple_phi_result (old_loop_phi);
+
+      res = add_header_and_latch_phis (loop, res, forwarded_edge, back_arg);
+      add_phi_arg (old_loop_phi, res, single_succ_edge (inner_body),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Renames all the uses of OLD_NAME with NEW_NAME (except the phi
+   nodes of DEF_BB) in all the basic blocks dominated by DEF_BB and in
+   the arguments of all the phi nodes originating in a basic block
+   that is dominated by DEF_BB.  */
+
+static void
+rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
+		       basic_block def_bb)
+{
+  imm_use_iterator uit;
+  gimple stmt;
+  use_operand_p use_p;
+  ssa_op_iter op_iter;
+
+  FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
+    {
+      enum gimple_code code = gimple_code (stmt);
+      basic_block use_bb = gimple_bb (stmt);
+      edge_iterator ei;
+      edge e;
+
+      if (code == GIMPLE_PHI)
+	{
+	  FOR_EACH_EDGE (e, ei, use_bb->preds)
+	    if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
+		&& dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
+		&& use_bb != def_bb)
+	      replace_exp (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx),
+			   new_name);
+	}
+      else
+	{
+	  if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
+	    continue;
+
+	  if (use_bb->loop_father == loop)
+	    {
+	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
+		if (USE_FROM_PTR (use_p) == old_name)
+		  replace_exp (use_p, new_name);
+	    }
+	  else
+	    /* Virtual operands are not translated into loop closed
+	       SSA form, and thus they may occur in the rest of
+	       the program without a loop close vphi node.  */
+	    FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
+	      if (USE_FROM_PTR (use_p) == old_name)
+		replace_exp (use_p, new_name);
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes_1.  Adds to LOOP all the
+   missing phi nodes for NAME and updates the arguments of the
+   LATCH_PHI node.  LOOP_PHI node is the inductive definition of NAME
+   in LOOP->header.  */
+
+static void
+add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
+			 VEC (gimple, heap) *phis)
+{
+  unsigned i;
+  basic_block bb, dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
+  VEC (basic_block, heap) *dom_bbs = get_all_dominated_blocks (CDI_DOMINATORS,
+							       dom_bb);
+
+  FOR_EACH_VEC_ELT (basic_block, dom_bbs, i, bb)
+    {
+      edge e;
+      edge_iterator ei;
+
+      if (bb == loop->latch
+	  || bb->loop_father != loop)
+	continue;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  gimple phi = VEC_index (gimple, phis, e->dest->index);
+
+	  if (phi)
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+
+	  else if (!single_pred_p (e->dest)
+		   && !dominated_by_p (CDI_DOMINATORS, e->dest, dom_bb)
+		   && e->dest->loop_father == loop)
+	  {
+	    tree var = SSA_NAME_VAR (name);
+
+	    phi = create_phi_node (var, e->dest);
+	    create_new_def_for (gimple_phi_result (phi), phi,
+				gimple_phi_result_ptr (phi));
+	    VEC_replace (gimple, phis, e->dest->index, phi);
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+	    rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
+				   e->dest);
+	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
+				     phis);
+	  }
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes.  For all the definitions
+   of DEF_STMT add the missing phi nodes in LOOP.  */
+
+static void
+add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
+{
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  basic_block bb = gimple_bb (def_stmt);
+
+  FOR_EACH_PHI_OR_STMT_DEF (def_p, def_stmt, op_iter, SSA_OP_DEF|SSA_OP_VDEF)
+    {
+      edge e;
+      edge_iterator ei;
+      tree res, zero, var;
+      gimple loop_phi, latch_phi, use_stmt;
+      imm_use_iterator uit;
+      tree name = DEF_FROM_PTR (def_p);
+      bool needs_update = false;
+      VEC (gimple, heap) *phis;
+      int i;
+
+      FOR_EACH_IMM_USE_STMT (use_stmt, uit, name)
+	{
+	  basic_block use_bb = gimple_bb (use_stmt);
+
+	  if (!dominated_by_p (CDI_DOMINATORS, bb, use_bb))
+	    {
+	      needs_update = true;
+	      BREAK_FROM_IMM_USE_STMT (uit);
+	    }
+	}
+
+      if (!needs_update)
+	continue;
+
+      var = SSA_NAME_VAR (name);
+      loop_phi = create_phi_node (var, loop->header);
+      latch_phi = create_phi_node (var, loop->latch);
+
+      create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
+			  gimple_phi_result_ptr (loop_phi));
+      create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
+			  gimple_phi_result_ptr (latch_phi));
+
+      /* The value set to ZERO will never be used in the loop, however we
+	 have to construct something meaningful for virtual SSA_NAMEs.  */
+      if (is_gimple_reg (name))
+	zero = fold_convert (TREE_TYPE (name), integer_zero_node);
+      else
+	zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
+
+      res = gimple_phi_result (latch_phi);
+      FOR_EACH_EDGE (e, ei, loop->header->preds)
+	add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (loop_phi);
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);
+
+      phis = VEC_alloc (gimple, heap, n_basic_blocks);
+      for (i = 0; i < n_basic_blocks; i++)
+	VEC_quick_push (gimple, phis, NULL);
+
+      VEC_replace (gimple, phis, loop->latch->index, latch_phi);
+      VEC_replace (gimple, phis, loop->header->index, loop_phi);
+      add_missing_phi_nodes_2 (loop, name, name, phis);
+
+      for (i = 0; i < n_basic_blocks; i++)
+	{
+	  gimple phi = VEC_index (gimple, phis, i);
+
+	  if (!phi)
+	    continue;
+
+	  FOR_EACH_EDGE (e, ei, BASIC_BLOCK (i)->preds)
+	    if (!PHI_ARG_DEF_FROM_EDGE (phi, e))
+	      add_phi_arg (phi, res, e, UNKNOWN_LOCATION);
+	}
+
+      VEC_free (gimple, heap, phis);
+    }
+}
+
+/* Walks over the code of LOOP and adds the missing phi nodes at
+   control flow junctions.  When a variable is defined in an outer
+   loop and used in an inner loop, the definition dominates the use.
+   After the loop flattening, the inner loop body is directly
+   reachable from the LOOP->header by using the added edge guarded by
+   the boolean flag that controls the execution of the back-edge that
+   was eliminated.  In this case, the use is not dominated by the
+   definition, and this function adds the missing phi nodes.  */
+
+static void
+add_missing_phi_nodes (loop_p loop)
+{
+  gimple_stmt_iterator gsi;
+  int i, n = loop->num_nodes;
+  basic_block *bbs = get_loop_body (loop);
+
+  for (i = 0; i < n; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* LOOP->header dominates all the blocks of the loop body, and
+	 so we don't have to look at the missing phi nodes for the
+	 definitions of LOOP->header.  */
+      if (bb == loop->header)
+	continue;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (!gimple_nop_p (gsi_stmt (gsi)))
+	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+    }
+
+  free (bbs);
+}
+
+/* Removes all the back-edges of LOOP except its own back-edge.
+   SCRATCH_PAD is used in if-conversion.  */
+
+static unsigned
+flatten_loop (loop_p loop, tree *scratch_pad)
+{
+  int i, n = loop->num_nodes;
+  basic_block *bbs;
+  VEC (edge, heap) *back_edges;
+  VEC (basic_block, heap) *loop_body;
+  edge_iterator ei;
+  edge e, pred_e;
+  unsigned max_nb_basic_blocks = PARAM_VALUE (PARAM_LFLAT_MAX_NB_BBS);;
+
+  if (loop->num_nodes > max_nb_basic_blocks
+      || !single_exit (loop)
+      || !dbg_cnt (lflat))
+    return 0;
+
+  mark_dfs_back_edges ();
+  bbs = get_loop_body (loop);
+
+  back_edges = VEC_alloc (edge, heap, 3);
+  loop_body = VEC_alloc (basic_block, heap, 3);
+
+  for (i = 0; i < n; i++)
+    FOR_EACH_EDGE (e, ei, bbs[i]->succs)
+      if (e->flags & EDGE_DFS_BACK
+	  && e->src != loop->latch)
+	VEC_safe_push (edge, heap, back_edges, e);
+
+  free (bbs);
+
+  /* Early return and do not modify the code when there are no back
+     edges.  */
+  if (VEC_empty (edge, back_edges))
+    return 0;
+
+  cancel_subloops (loop);
+
+  /* Split the latch edge to make sure that the latch basic block does
+     not contain code.  */
+  loop->latch = split_edge (loop_latch_edge (loop));
+  pred_e = single_pred_edge (loop->latch);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    {
+      basic_block dest = split_edge (e);
+
+      /* Redirect BACK_EDGE to LOOP->latch.  */
+      redirect_edge_and_branch_force (e, loop->latch);
+
+      /* Save the basic block where it was pointing.  */
+      VEC_safe_push (basic_block, heap, loop_body, dest);
+    }
+
+  update_loop_phi_nodes (loop, pred_e);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    create_control_flag (e, loop, VEC_index (basic_block, loop_body, i));
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    update_inner_induction_phi_nodes (e, loop, VEC_index (basic_block,
+							  loop_body, i));
+
+  free_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);
+  add_missing_phi_nodes (loop);
+
+  /* If we redirected some back-edges, split the latch edge to create
+     an empty LOOP->latch.  */
+  if (!single_pred_p (loop->latch))
+    loop->latch = split_edge (loop_latch_edge (loop));
+
+  if (gate_tree_if_conversion ())
+    tree_if_conversion (loop, scratch_pad);
+
+  return TODO_update_ssa | TODO_verify_ssa;
+}
+
+/* Flattens all the loops of the current function.  */
+
+static unsigned int
+tree_loop_flattening (void)
+{
+  unsigned todo = 0;
+  loop_p loop;
+  loop_iterator li;
+  tree scratch_pad = NULL_TREE;
+
+  if (number_of_loops () <= 1)
+    return 0;
+
+  FOR_EACH_LOOP (li, loop, 0)
+    todo |= flatten_loop (loop, &scratch_pad);
+
+#ifdef ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+  verify_flow_info ();
+#endif
+
+  cleanup_tree_cfg ();
+  return todo;
+}
+
+static bool
+gate_tree_loop_flattening (void)
+{
+  return flag_tree_loop_flattening != 0;
+}
+
+struct gimple_opt_pass pass_flatten_loops =
+{
+ {
+  GIMPLE_PASS,
+  "lflat",				/* name */
+  gate_tree_loop_flattening,		/* gate */
+  tree_loop_flattening,       		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_LOOP_FLATTENING,  		/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func
+    | TODO_update_ssa
+    | TODO_ggc_collect			/* todo_flags_finish */
+ }
+};
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a87a770..e2f257f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -374,6 +374,7 @@ extern struct gimple_opt_pass pass_graphite;
 extern struct gimple_opt_pass pass_graphite_transforms;
 extern struct gimple_opt_pass pass_if_conversion;
 extern struct gimple_opt_pass pass_loop_distribution;
+extern struct gimple_opt_pass pass_flatten_loops;
 extern struct gimple_opt_pass pass_vectorize;
 extern struct gimple_opt_pass pass_slp_vectorize;
 extern struct gimple_opt_pass pass_complete_unroll;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-03 15:54   ` [PATCH 3/3] Loop flattening on loop-SSA Sebastian Pop
@ 2010-11-03 16:57     ` Nathan Froyd
  2010-11-03 17:29       ` Sebastian Pop
  2010-11-05 13:05     ` Richard Guenther
  1 sibling, 1 reply; 41+ messages in thread
From: Nathan Froyd @ 2010-11-03 16:57 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, rguenther

On Wed, Nov 03, 2010 at 10:52:26AM -0500, Sebastian Pop wrote:
> --- /dev/null
> +++ b/gcc/tree-loop-flattening.c
> +static void
> +add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
> +      phis = VEC_alloc (gimple, heap, n_basic_blocks);
> +      for (i = 0; i < n_basic_blocks; i++)
> +	VEC_quick_push (gimple, phis, NULL);

Why not just use VEC_safe_grow_cleared here?

> +      for (i = 0; i < n_basic_blocks; i++)
> +	{
> +	  gimple phi = VEC_index (gimple, phis, i);

I think you could use FOR_EACH_VEC_ELT.

-Nathan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-03 16:57     ` Nathan Froyd
@ 2010-11-03 17:29       ` Sebastian Pop
  0 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-11-03 17:29 UTC (permalink / raw)
  To: Nathan Froyd; +Cc: gcc-patches, rguenther

On Wed, Nov 3, 2010 at 11:26, Nathan Froyd <froydnj@codesourcery.com> wrote:
> Why not just use VEC_safe_grow_cleared here?
> I think you could use FOR_EACH_VEC_ELT.

Here is the change on top of the previous patch.
Thanks for the suggestions,
Sebastian

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 4bc8768..56211b4 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -390,7 +390,7 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
       edge e;
       edge_iterator ei;
       tree res, zero, var;
-      gimple loop_phi, latch_phi, use_stmt;
+      gimple loop_phi, latch_phi, use_stmt, phi;
       imm_use_iterator uit;
       tree name = DEF_FROM_PTR (def_p);
       bool needs_update = false;
@@ -437,17 +437,14 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
 	add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);

       phis = VEC_alloc (gimple, heap, n_basic_blocks);
-      for (i = 0; i < n_basic_blocks; i++)
-	VEC_quick_push (gimple, phis, NULL);
+      VEC_safe_grow_cleared (gimple, heap, phis, n_basic_blocks);

       VEC_replace (gimple, phis, loop->latch->index, latch_phi);
       VEC_replace (gimple, phis, loop->header->index, loop_phi);
       add_missing_phi_nodes_2 (loop, name, name, phis);

-      for (i = 0; i < n_basic_blocks; i++)
+      FOR_EACH_VEC_ELT (gimple, phis, i, phi)
 	{
-	  gimple phi = VEC_index (gimple, phis, i);
-
 	  if (!phi)
 	    continue;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] if-convert even when the data dependences cannot be computed.
  2010-11-03 15:53   ` [PATCH 2/3] if-convert even when the data dependences cannot be computed Sebastian Pop
@ 2010-11-03 20:47     ` Richard Guenther
  2010-11-03 20:52       ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-11-03 20:47 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Wed, 3 Nov 2010, Sebastian Pop wrote:

> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> 
> 	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
> 	compute_data_dependences_for_loop.
> 	(if_convertible_loop_p): Do not free refs and ddrs.

This is ok.  Btw, there are some pending bugs in if-conversion,
one which ICEs on SPEC 2k6 tonto with LTO.  It would be nice
if you can address them.

Thanks,
Richard.

> ---
>  gcc/ChangeLog      |    6 ++++++
>  gcc/tree-if-conv.c |   24 +++---------------------
>  2 files changed, 9 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 0f58882..3ceb7b6 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,11 @@
>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>  
> +	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
> +	compute_data_dependences_for_loop.
> +	(if_convertible_loop_p): Do not free refs and ddrs.
> +
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
>  	PR tree-optimization/46029
>  	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
>  	* tree-if-conv.c (has_unaligned_memory_refs): New.
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 9fc6190..5b941af 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -855,24 +855,15 @@ predicate_bbs (loop_p loop)
>  }
>  
>  /* Return true when LOOP is if-convertible.  This is a helper function
> -   for if_convertible_loop_p.  REFS and DDRS are initialized and freed
> -   in if_convertible_loop_p.  */
> +   for if_convertible_loop_p.  */
>  
>  static bool
> -if_convertible_loop_p_1 (struct loop *loop,
> -			 VEC (data_reference_p, heap) **refs,
> -			 VEC (ddr_p, heap) **ddrs)
> +if_convertible_loop_p_1 (struct loop *loop)
>  {
>    bool res;
>    unsigned int i;
>    basic_block exit_bb = NULL;
>  
> -  /* Don't if-convert the loop when the data dependences cannot be
> -     computed: the loop won't be vectorized in that case.  */
> -  res = compute_data_dependences_for_loop (loop, true, refs, ddrs);
> -  if (!res)
> -    return false;
> -
>    calculate_dominance_info (CDI_DOMINATORS);
>  
>    /* Allow statements that can be handled during if-conversion.  */
> @@ -934,9 +925,6 @@ if_convertible_loop_p (struct loop *loop)
>  {
>    edge e;
>    edge_iterator ei;
> -  bool res = false;
> -  VEC (data_reference_p, heap) *refs;
> -  VEC (ddr_p, heap) *ddrs;
>  
>    /* Handle only innermost loop.  */
>    if (!loop || loop->inner)
> @@ -968,13 +956,7 @@ if_convertible_loop_p (struct loop *loop)
>      if (loop_exit_edge_p (loop, e))
>        return false;
>  
> -  refs = VEC_alloc (data_reference_p, heap, 5);
> -  ddrs = VEC_alloc (ddr_p, heap, 25);
> -  res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
> -
> -  free_data_refs (refs);
> -  free_dependence_relations (ddrs);
> -  return res;
> +  return if_convertible_loop_p_1 (loop);
>  }
>  
>  /* Basic block BB has two predecessors.  Using predecessor's bb
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] if-convert even when the data dependences cannot be computed.
  2010-11-03 20:47     ` Richard Guenther
@ 2010-11-03 20:52       ` Sebastian Pop
  0 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-11-03 20:52 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Wed, Nov 3, 2010 at 15:37, Richard Guenther <rguenther@suse.de> wrote:
> On Wed, 3 Nov 2010, Sebastian Pop wrote:
>
>> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>>
>>       * tree-if-conv.c (if_convertible_loop_p_1): Do not call
>>       compute_data_dependences_for_loop.
>>       (if_convertible_loop_p): Do not free refs and ddrs.
>
> This is ok.

Thanks.

> Btw, there are some pending bugs in if-conversion,
> one which ICEs on SPEC 2k6 tonto with LTO.  It would be nice
> if you can address them.

I will.
I am looking through the bugs on which my name is on CC.
If you think I can help on a bug please add me on the CC list.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-03 15:53   ` [PATCH 1/3] Fix PR46029: reimplement if-convert stores Sebastian Pop
@ 2010-11-05 12:08     ` Richard Guenther
  2010-11-05 16:13       ` Sebastian Pop
  2010-11-15 22:39       ` Sebastian Pop
  0 siblings, 2 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-05 12:08 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Wed, 3 Nov 2010, Sebastian Pop wrote:

> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> 
> 	PR tree-optimization/46029
> 	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
> 	* tree-if-conv.c (has_unaligned_memory_refs): New.
> 	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
> 	(create_scratchpad): New.
> 	(create_indirect_cond_expr): New.
> 	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
> 	parameter for scratch_pad.
> 	(combine_blocks): Same.
> 	(tree_if_conversion): Same.
> 	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
> 	scratch_pad.
> 	(struct ifc_dr): Removed.
> 	(IFC_DR): Removed.
> 	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
> 	(DR_RW_UNCONDITIONALLY): Removed.
> 	(memrefs_read_or_written_unconditionally): Removed.
> 	(write_memrefs_written_at_least_once): Removed.
> 	(ifcvt_memrefs_wont_trap): Removed.
> 	(ifcvt_could_trap_p): Does not take refs parameter anymore.
> 	(if_convertible_gimple_assign_stmt_p): Same.
> 	(if_convertible_stmt_p): Same.
> 	(if_convertible_loop_p_1): Remove initialization of dr->aux,
> 	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
> 	(if_convertible_loop_p): Remove deallocation of the same.

Comments in-line

> testsuite/
> 	* g++.dg/tree-ssa/ifc-pr46029.C: New.
> 	* gcc.dg/tree-ssa/ifc-8.c: New.
> 	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
> ---
>  gcc/ChangeLog                               |   28 ++
>  gcc/doc/invoke.texi                         |   18 +-
>  gcc/testsuite/ChangeLog                     |    7 +
>  gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++++
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 +-
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
>  gcc/tree-if-conv.c                          |  379 ++++++++++++---------------
>  7 files changed, 336 insertions(+), 218 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index beed454..0f58882 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,31 @@
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +	PR tree-optimization/46029
> +	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
> +	* tree-if-conv.c (has_unaligned_memory_refs): New.
> +	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
> +	(create_scratchpad): New.
> +	(create_indirect_cond_expr): New.
> +	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
> +	parameter for scratch_pad.
> +	(combine_blocks): Same.
> +	(tree_if_conversion): Same.
> +	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
> +	scratch_pad.
> +	(struct ifc_dr): Removed.
> +	(IFC_DR): Removed.
> +	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
> +	(DR_RW_UNCONDITIONALLY): Removed.
> +	(memrefs_read_or_written_unconditionally): Removed.
> +	(write_memrefs_written_at_least_once): Removed.
> +	(ifcvt_memrefs_wont_trap): Removed.
> +	(ifcvt_could_trap_p): Does not take refs parameter anymore.
> +	(if_convertible_gimple_assign_stmt_p): Same.
> +	(if_convertible_stmt_p): Same.
> +	(if_convertible_loop_p_1): Remove initialization of dr->aux,
> +	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
> +	(if_convertible_loop_p): Remove deallocation of the same.
> +
>  2010-10-20  Nathan Froyd  <froydnj@codesourcery.com>
>  
>  	* ifcvt.c (noce_emit_cmove): If both of the values are SUBREGs, try
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ee68454..28b0cbb 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -6935,20 +6935,26 @@ if vectorization is enabled.
>  
>  @item -ftree-loop-if-convert-stores
>  Attempt to also if-convert conditional jumps containing memory writes.
> -This transformation can be unsafe for multi-threaded programs as it
> -transforms conditional memory writes into unconditional memory writes.
>  For example,
>  @smallexample
>  for (i = 0; i < N; i++)
>    if (cond)
> -    A[i] = expr;
> +    A[i] = B[i] + 2;
>  @end smallexample
>  would be transformed to
>  @smallexample
> -for (i = 0; i < N; i++)
> -  A[i] = cond ? expr : A[i];
> +void *scratchpad = alloca (64);
> +for (i = 0; i < N; i++) @{
> +  a = cond ? &A[i] : scratchpad;
> +  b = cond ? &B[i] : scratchpad;
> +  *a = *b + 2;
> +@}
>  @end smallexample
> -potentially producing data races.
> +The compiler allocates a scratchpad memory on the stack for each
> +function in which the if-conversion of memory stores or reads
> +happened.  This scratchpad memory is used during the part of the
> +computation that is discarded, i.e., when the condition is evaluated
> +to false.
>  
>  @item -ftree-loop-distribution
>  Perform loop distribution.  This flag can improve cache performance on
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 9d9c543..4233f86 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,10 @@
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
> +	PR tree-optimization/46029
> +	* g++.dg/tree-ssa/ifc-pr46029.C: New.
> +	* gcc.dg/tree-ssa/ifc-8.c: New.
> +	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
> +
>  2010-10-20  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
>  
>  	PR c++/46024
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
> new file mode 100644
> index 0000000..2a54bdb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
> @@ -0,0 +1,76 @@
> +// { dg-do run }
> +/* { dg-options "-O -ftree-loop-if-convert-stores" } */
> +
> +namespace
> +{
> +  struct rb_tree_node_
> +  {
> +    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
> +    {
> +    }
> +    unsigned &get_metadata ()
> +    {
> +      return m_metadata;
> +    }
> +    rb_tree_node_ *m_p_left;
> +    rb_tree_node_ *m_p_parent;
> +    unsigned m_metadata;
> +  };
> +
> +  struct bin_search_tree_const_node_it_
> +  {
> +    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
> +    {
> +    }
> +    unsigned &get_metadata ()
> +    {
> +      return m_p_nd->get_metadata ();
> +    }
> +    bin_search_tree_const_node_it_ get_l_child ()
> +    {
> +      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
> +    }
> +
> +    rb_tree_node_ *m_p_nd;
> +  };
> +
> +  struct bin_search_tree_no_data_
> +  {
> +    typedef rb_tree_node_ *node_pointer;
> +      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
> +    {
> +    }
> +    void insert_imp_empty (int r_value)
> +    {
> +      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
> +      m_p_head->m_p_parent = p_new_node;
> +      p_new_node->m_p_parent = m_p_head;
> +      update_to_top (m_p_head->m_p_parent);
> +    }
> +    void apply_update (bin_search_tree_const_node_it_ nd_it)
> +    {
> +      unsigned
> +	l_max_endpoint
> +	=
> +	(nd_it.get_l_child ().m_p_nd ==
> +	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
> +      nd_it.get_metadata () = l_max_endpoint;
> +    }
> +    void update_to_top (node_pointer p_nd)
> +    {
> +      while (p_nd != m_p_head)
> +	{
> +	  apply_update (p_nd);
> +	  p_nd = p_nd->m_p_parent;
> +	}
> +    }
> +
> +    rb_tree_node_ * m_p_head;
> +  };
> +}
> +
> +int main ()
> +{
> +  bin_search_tree_no_data_ ().insert_imp_empty (0);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> index a9cc816..d88c4a2 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> @@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
>    for (i = 0; i <= nCoeffs; i++)
>      {
>        level = block[i];
> -      if (level < 0)
> -	level = level * qmul - qadd;
> -      else
> -	level = level * qmul + qadd;
> -      block[i] = level;
> +      if (level)
> +        {
> +          if (level < 0)
> +            {
> +              level = level * qmul - qadd;
> +            }
> +          else
> +            {
> +              level = level * qmul + qadd;
> +            }
> +          block[i] = level;
> +        }
>      }
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
> new file mode 100644
> index 0000000..d7cf279
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
> +
> +typedef union tree_node *tree;
> +struct tree_common
> +{
> +  unsigned volatile_flag : 1;
> +  unsigned unsigned_flag : 1;
> +};
> +struct tree_type
> +{
> +  tree next_variant;
> +  tree main_variant;
> +};
> +union tree_node
> +{
> +  struct tree_common common;
> +  struct tree_type type;
> +};
> +void finish_enum (tree enumtype)
> +{
> +  tree tem;
> +  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
> +    {
> +      if (tem == enumtype)
> +	continue;
> +      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
> +    }
> +}
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 642dbda..9fc6190 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -446,171 +446,47 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>    return true;
>  }
>  
> -/* Records the status of a data reference.  This struct is attached to
> -   each DR->aux field.  */
> -
> -struct ifc_dr {
> -  /* -1 when not initialized, 0 when false, 1 when true.  */
> -  int written_at_least_once;
> -
> -  /* -1 when not initialized, 0 when false, 1 when true.  */
> -  int rw_unconditionally;
> -};
> -
> -#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
> -#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
> -#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
> -
> -/* Returns true when the memory references of STMT are read or written
> -   unconditionally.  In other words, this function returns true when
> -   for every data reference A in STMT there exist other accesses to
> -   the same data reference with predicates that add up (OR-up) to the
> -   true predicate: this ensures that the data reference A is touched
> -   (read or written) on every iteration of the if-converted loop.  */
> -
> -static bool
> -memrefs_read_or_written_unconditionally (gimple stmt,
> -					 VEC (data_reference_p, heap) *drs)
> -{
> -  int i, j;
> -  data_reference_p a, b;
> -  tree ca = bb_predicate (gimple_bb (stmt));
> -
> -  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> -    if (DR_STMT (a) == stmt)
> -      {
> -	bool found = false;
> -	int x = DR_RW_UNCONDITIONALLY (a);
> -
> -	if (x == 0)
> -	  return false;
> -
> -	if (x == 1)
> -	  continue;
> -
> -	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
> -	  if (DR_STMT (b) != stmt
> -	      && same_data_refs (a, b))
> -	    {
> -	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> -
> -	      if (DR_RW_UNCONDITIONALLY (b) == 1
> -		  || is_true_predicate (cb)
> -		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
> -								 ca, cb)))
> -		{
> -		  DR_RW_UNCONDITIONALLY (a) = 1;
> -		  DR_RW_UNCONDITIONALLY (b) = 1;
> -		  found = true;
> -		  break;
> -		}
> -	    }
> -
> -	if (!found)
> -	  {
> -	    DR_RW_UNCONDITIONALLY (a) = 0;
> -	    return false;
> -	  }
> -      }
> -
> -  return true;
> -}
> -
> -/* Returns true when the memory references of STMT are unconditionally
> -   written.  In other words, this function returns true when for every
> -   data reference A written in STMT, there exist other writes to the
> -   same data reference with predicates that add up (OR-up) to the true
> -   predicate: this ensures that the data reference A is written on
> -   every iteration of the if-converted loop.  */
> +/* Wrapper around gimple_could_trap_p refined for the needs of the
> +   if-conversion.  */
>  
>  static bool
> -write_memrefs_written_at_least_once (gimple stmt,
> -				     VEC (data_reference_p, heap) *drs)
> +ifcvt_could_trap_p (gimple stmt)
>  {
> -  int i, j;
> -  data_reference_p a, b;
> -  tree ca = bb_predicate (gimple_bb (stmt));
> -
> -  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> -    if (DR_STMT (a) == stmt
> -	&& DR_IS_WRITE (a))
> -      {
> -	bool found = false;
> -	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
> -
> -	if (x == 0)
> -	  return false;
> -
> -	if (x == 1)
> -	  continue;
> -
> -	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
> -	  if (DR_STMT (b) != stmt
> -	      && DR_IS_WRITE (b)
> -	      && same_data_refs_base_objects (a, b))
> -	    {
> -	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> -
> -	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
> -		  || is_true_predicate (cb)
> -		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
> -								 ca, cb)))
> -		{
> -		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
> -		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
> -		  found = true;
> -		  break;
> -		}
> -	    }
> -
> -	if (!found)
> -	  {
> -	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
> -	    return false;
> -	  }
> -      }
> +  if (gimple_vuse (stmt)
> +      && !gimple_could_trap_p_1 (stmt, false, false))
> +    return false;
>  
> -  return true;
> +  return gimple_could_trap_p (stmt);
>  }
>  
> -/* Return true when the memory references of STMT won't trap in the
> -   if-converted code.  There are two things that we have to check for:
> -
> -   - writes to memory occur to writable memory: if-conversion of
> -   memory writes transforms the conditional memory writes into
> -   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
> -   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
> -   be executed at all in the original code, it may be a readonly
> -   memory.  To check that A is not const-qualified, we check that
> -   there exists at least an unconditional write to A in the current
> -   function.
> -
> -   - reads or writes to memory are valid memory accesses for every
> -   iteration.  To check that the memory accesses are correctly formed
> -   and that we are allowed to read and write in these locations, we
> -   check that the memory accesses to be if-converted occur at every
> -   iteration unconditionally.  */
> +/* Returns true when stmt contains a data reference.  */
>  
>  static bool
> -ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
> +has_unaligned_memory_refs (gimple stmt)

Ick - unified diffs are hard to read (sometimes).  The comment
doesn't match the function name -- unaligned data reference or not?

>  {
> -  return write_memrefs_written_at_least_once (stmt, refs)
> -    && memrefs_read_or_written_unconditionally (stmt, refs);
> -}
> -
> -/* Wrapper around gimple_could_trap_p refined for the needs of the
> -   if-conversion.  Try to prove that the memory accesses of STMT could
> -   not trap in the innermost loop containing STMT.  */
> +  int unsignedp, volatilep;
> +  HOST_WIDE_INT bitsize, bitpos;
> +  tree toffset;
> +  enum machine_mode mode;
> +  VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
> +  bool res = get_references_in_stmt (stmt, &refs);
> +  unsigned i;
> +  data_ref_loc *ref;
> +
> +  FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
> +    {
> +      get_inner_reference (*ref->pos, &bitsize, &bitpos, &toffset,
> +			   &mode, &unsignedp, &volatilep, true);
>  
> -static bool
> -ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
> -{
> -  if (gimple_vuse (stmt)
> -      && !gimple_could_trap_p_1 (stmt, false, false)
> -      && ifcvt_memrefs_wont_trap (stmt, refs))
> -    return false;
> +      if ((bitpos % BITS_PER_UNIT) != 0)

Hmm, that's not actually unaligned but not addressable, right?
I guess you want to re-use ivopts may_be_nonaddressable_p instead.

> +	{
> +	  res = true;
> +	  break;
> +	}
> +    }
>  
> -  return gimple_could_trap_p (stmt);
> +  VEC_free (data_ref_loc, heap, refs);
> +  return res;
>  }
>  
>  /* Return true when STMT is if-convertible.
> @@ -621,8 +497,7 @@ ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>     - LHS is not var decl.  */
>  
>  static bool
> -if_convertible_gimple_assign_stmt_p (gimple stmt,
> -				     VEC (data_reference_p, heap) *refs)
> +if_convertible_gimple_assign_stmt_p (gimple stmt)
>  {
>    tree lhs = gimple_assign_lhs (stmt);
>    basic_block bb;
> @@ -650,12 +525,20 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>  
>    if (flag_tree_loop_if_convert_stores)
>      {
> -      if (ifcvt_could_trap_p (stmt, refs))
> +      if (ifcvt_could_trap_p (stmt))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file, "tree could trap...\n");
>  	  return false;
>  	}
> +
> +      if (has_unaligned_memory_refs (stmt))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file, "uses misaligned memory...\n");

But here it suggests misaligned again (why'd we care for misalignment?)

> +	  return false;
> +	}
> +
>        return true;
>      }
>  
> @@ -690,7 +573,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>     - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
>  
>  static bool
> -if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
> +if_convertible_stmt_p (gimple stmt)
>  {
>    switch (gimple_code (stmt))
>      {
> @@ -700,7 +583,7 @@ if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>        return true;
>  
>      case GIMPLE_ASSIGN:
> -      return if_convertible_gimple_assign_stmt_p (stmt, refs);
> +      return if_convertible_gimple_assign_stmt_p (stmt);
>  
>      default:
>        /* Don't know what to do with 'em so don't do anything.  */
> @@ -1016,18 +899,6 @@ if_convertible_loop_p_1 (struct loop *loop,
>    if (!res)
>      return false;
>  
> -  if (flag_tree_loop_if_convert_stores)
> -    {
> -      data_reference_p dr;
> -
> -      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
> -	{
> -	  dr->aux = XNEW (struct ifc_dr);
> -	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
> -	  DR_RW_UNCONDITIONALLY (dr) = -1;
> -	}
> -    }
> -
>    for (i = 0; i < loop->num_nodes; i++)
>      {
>        basic_block bb = ifc_bbs[i];
> @@ -1040,7 +911,7 @@ if_convertible_loop_p_1 (struct loop *loop,
>        /* Check the if-convertibility of statements in predicated BBs.  */
>        if (is_predicated (bb))
>  	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
> -	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
> +	  if (!if_convertible_stmt_p (gsi_stmt (itr)))
>  	    return false;
>      }
>  
> @@ -1101,15 +972,6 @@ if_convertible_loop_p (struct loop *loop)
>    ddrs = VEC_alloc (ddr_p, heap, 25);
>    res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
>  
> -  if (flag_tree_loop_if_convert_stores)
> -    {
> -      data_reference_p dr;
> -      unsigned int i;
> -
> -      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
> -	free (dr->aux);
> -    }
> -
>    free_data_refs (refs);
>    free_dependence_relations (ddrs);
>    return res;
> @@ -1366,6 +1228,78 @@ insert_gimplified_predicates (loop_p loop)
>      }
>  }
>  
> +/* Insert at the beginning of the first basic block of the current
> +   function the allocation on the stack of N bytes of memory and
> +   return a pointer to this scratchpad memory.  */
> +
> +static tree
> +create_scratchpad (void)
> +{
> +  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +
> +  /* void *tmp = __builtin_alloca */
> +  const char *name = "scratch_pad";
> +  tree x = build_int_cst (integer_type_node, 64);
> +  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
> +  tree var = create_tmp_var (ptr_type_node, name);
> +  tree tmp = make_ssa_name (var, stmt);

It would be better to use an automatic variable than using alloca
which is expensive.  Why was your choice that way?  (Are we ever
if-converting aggregate stores?  I hope not.)

Also you are unconditionally allocating 64 bytes instead of N.

Note that if you want to make vectorization happy you would need
to ensure that for

  if (x)
    a[i] = ...;

the scratchpad you'll end up using will have the _same_ alignment
as a[i] (same or larger for all offsets).  Using a local array
of chars should make it possible for the vectorizer to adjust
its alignment if needed.

> +  add_referenced_var (var);
> +  gimple_call_set_lhs (stmt, tmp);
> +  SSA_NAME_DEF_STMT (tmp) = stmt;
> +  update_stmt (stmt);
> +
> +  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> +  return tmp;
> +}
> +
> +/* Returns a memory reference to the pointer defined by the
> +   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
> +   inserts this code at GSI.  */
> +
> +static tree
> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
> +			   gimple_stmt_iterator *gsi)
> +{
> +  tree type = TREE_TYPE (ai);
> +
> +  tree pointer_to_type, address_of_ai, addr_expr, cond_expr;
> +  tree pointer, star_pointer;
> +  gimple addr_stmt, pointer_stmt;
> +
> +  /* address_of_ai = &A[i];  */
> +  pointer_to_type = build_pointer_type (type);
> +  address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");

Use create_tmp_reg (everywhere)

> +  add_referenced_var (address_of_ai);
> +  addr_expr = build_fold_addr_expr (ai);

If you build that before create_tmp_reg you can use TREE_TYPE of
it and avoid creating pointer_to_type.

> +  addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
> +  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
> +  gimple_assign_set_lhs (addr_stmt, address_of_ai);
> +  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
> +  update_stmt (addr_stmt);
> +  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
> +
> +  /* Allocate the scratch pad only once per function.  */
> +  if (!*scratch_pad)
> +    *scratch_pad = create_scratchpad ();
> +
> +  /* pointer = cond ? address_of_ai : scratch_pad;  */
> +  pointer = create_tmp_var (pointer_to_type, "_ifc_");
> +  add_referenced_var (pointer);
> +  cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
> +		      address_of_ai, *scratch_pad);
> +  pointer_stmt = gimple_build_assign (pointer, cond_expr);
> +  pointer = make_ssa_name (pointer, pointer_stmt);
> +  gimple_assign_set_lhs (pointer_stmt, pointer);
> +  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
> +  update_stmt (pointer_stmt);
> +  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
> +
> +  star_pointer = build_simple_mem_ref (pointer);

build2 (MEM_REF, TREE_TYPE (ai), pointer,
        build_int_cst (reference_alias_ptr_type (ai), 0));

as you need to preserve TBAA info.

> +  return star_pointer;
> +}
> +
>  /* Predicate each write to memory in LOOP.
>  
>     This function transforms control flow constructs containing memory
> @@ -1377,10 +1311,19 @@ insert_gimplified_predicates (loop_p loop)
>  
>     into the following form that does not contain control flow:
>  
> -   | for (i = 0; i < N; i++)
> -   |   A[i] = cond ? expr : A[i];
> +   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
> +   |
> +   | for (i = 0; i < N; i++) {
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
> +   | }
> +
> +   SCRATCH_PAD is allocated on the stack for each function once and it is
> +   large enough to contain any kind of scalar assignment or read.  All
> +   values read or written to SCRATCH_PAD are not used in the computation.
>  
> -   The original CFG looks like this:
> +   In a more detailed way, the if-conversion of memory writes works
> +   like this, supposing that the original CFG looks like this:
>  
>     | bb_0
>     |   i = 0
> @@ -1430,10 +1373,12 @@ insert_gimplified_predicates (loop_p loop)
>     |   goto bb_1
>     | end_bb_4
>  
> -   predicate_mem_writes is then predicating the memory write as follows:
> +   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
> +   preceding the loop header, and is predicating the memory write:
>  
>     | bb_0
>     |   i = 0
> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>     | end_bb_0
>     |
>     | bb_1
> @@ -1441,12 +1386,14 @@ insert_gimplified_predicates (loop_p loop)
>     | end_bb_1
>     |
>     | bb_2
> +   |   cond = some_computation;
>     |   if (cond) goto bb_3 else goto bb_4
>     | end_bb_2
>     |
>     | bb_3
>     |   cond = some_computation;
> -   |   A[i] = cond ? expr : A[i];
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
>     |   goto bb_4
>     | end_bb_3
>     |
> @@ -1459,12 +1406,14 @@ insert_gimplified_predicates (loop_p loop)
>  
>     | bb_0
>     |   i = 0
> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>     |   if (i < N) goto bb_5 else goto bb_1
>     | end_bb_0
>     |
>     | bb_1
>     |   cond = some_computation;
> -   |   A[i] = cond ? expr : A[i];
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
>     |   if (i < N) goto bb_5 else goto bb_4
>     | end_bb_1
>     |
> @@ -1474,7 +1423,7 @@ insert_gimplified_predicates (loop_p loop)
>  */
>  
>  static void
> -predicate_mem_writes (loop_p loop)
> +predicate_mem_writes (loop_p loop, tree *scratch_pad)
>  {
>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>  
> @@ -1489,20 +1438,35 @@ predicate_mem_writes (loop_p loop)
>  	continue;
>  
>        for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -	if ((stmt = gsi_stmt (gsi))
> -	    && gimple_assign_single_p (stmt)
> -	    && gimple_vdef (stmt))
> -	  {
> -	    tree lhs = gimple_assign_lhs (stmt);
> -	    tree rhs = gimple_assign_rhs1 (stmt);
> -	    tree type = TREE_TYPE (lhs);
> -
> -	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
> -	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
> -	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
> -	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
> -	    update_stmt (stmt);
> -	  }
> +	{
> +	  stmt = gsi_stmt (gsi);
> +	  if (gimple_assign_single_p (stmt)
> +	      && gimple_vdef (stmt))
> +	    {
> +	      /* A[i] = x;  */
> +	      tree ai = gimple_assign_lhs (stmt);
> +
> +	      /* pointer = cond ? &A[i] : scratch_pad;  */
> +	      tree star_pointer = create_indirect_cond_expr (ai, cond,
> +							     scratch_pad, &gsi);
> +	      /* *pointer = x;  */
> +	      gimple_assign_set_lhs (stmt, star_pointer);
> +	      update_stmt (stmt);
> +	    }
> +	  else if (gimple_assign_single_p (stmt)
> +		   && gimple_vuse (stmt))
> +	    {
> +	      /* x = A[i];  */
> +	      tree ai = gimple_assign_rhs1 (stmt);
> +
> +	      /* pointer = cond ? &A[i] : scratch_pad;  */
> +	      tree star_pointer = create_indirect_cond_expr (ai, cond,
> +							     scratch_pad, &gsi);
> +	      /* x = *pointer;  */
> +	      gimple_assign_set_rhs1 (stmt, star_pointer);
> +	      update_stmt (stmt);
> +	    }
> +	}
>      }
>  }
>  
> @@ -1552,7 +1516,7 @@ remove_conditions_and_labels (loop_p loop)
>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>  
>  static void
> -combine_blocks (struct loop *loop)
> +combine_blocks (struct loop *loop, tree *scratch_pad)
>  {
>    basic_block bb, exit_bb, merge_target_bb;
>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> @@ -1565,7 +1529,7 @@ combine_blocks (struct loop *loop)
>    predicate_all_scalar_phis (loop);
>  
>    if (flag_tree_loop_if_convert_stores)
> -    predicate_mem_writes (loop);
> +    predicate_mem_writes (loop, scratch_pad);
>  
>    /* Merge basic blocks: first remove all the edges in the loop,
>       except for those from the exit block.  */
> @@ -1654,7 +1618,7 @@ combine_blocks (struct loop *loop)
>     profitability analysis.  Returns true when something changed.  */
>  
>  static bool
> -tree_if_conversion (struct loop *loop)
> +tree_if_conversion (struct loop *loop, tree *scratch_pad)
>  {
>    bool changed = false;
>    ifc_bbs = NULL;
> @@ -1666,7 +1630,7 @@ tree_if_conversion (struct loop *loop)
>    /* Now all statements are if-convertible.  Combine all the basic
>       blocks into one huge basic block doing the if-conversion
>       on-the-fly.  */
> -  combine_blocks (loop);
> +  combine_blocks (loop, scratch_pad);
>  
>    if (flag_tree_loop_if_convert_stores)
>      mark_sym_for_renaming (gimple_vop (cfun));
> @@ -1697,12 +1661,13 @@ main_tree_if_conversion (void)
>    struct loop *loop;
>    bool changed = false;
>    unsigned todo = 0;
> +  tree scratch_pad = NULL_TREE;
>  
>    if (number_of_loops () <= 1)
>      return 0;
>  
>    FOR_EACH_LOOP (li, loop, 0)
> -    changed |= tree_if_conversion (loop);
> +    changed |= tree_if_conversion (loop, &scratch_pad);
>  
>    if (changed)
>      todo |= TODO_cleanup_cfg;

Overall I like the new way much more.  Please update and repost.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-03 15:54   ` [PATCH 3/3] Loop flattening on loop-SSA Sebastian Pop
  2010-11-03 16:57     ` Nathan Froyd
@ 2010-11-05 13:05     ` Richard Guenther
  2010-11-05 16:57       ` Sebastian Pop
  2010-11-16 22:47       ` Sebastian Pop
  1 sibling, 2 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-05 13:05 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Wed, 3 Nov 2010, Sebastian Pop wrote:

> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> 
> 	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
> 	(tree-loop-flattening.o): New.
> 	* common.opt (ftree-loop-flatten): New.
> 	* dbgcnt.def (lflat): New.
> 	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
> 	* passes.c (init_optimization_passes): Add new passes
> 	pass_flatten_loops and pass_if_conversion after loop vectorization
> 	and before pass_slp_vectorize.
> 	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
> 	* tree-loop-flattening.c: New.
> 	* tree-pass.h (pass_flatten_loops): Declared.
> 	* tree-flow.h (gate_tree_if_conversion): Declared.
> 	(tree_if_conversion): Declared.
> 	* tree-if-conv.c (tree_if_conversion): Not static anymore.
> 	(gate_tree_if_conversion): Same.

Comments inline.

What extra testing apart from the 4 testcases did this new pass get?
Do we pass bootstrap with it enabled?  Did you check if we regress
in SPEC 2k6 when it is enabled?

> 	* gcc.dg/tree-ssa/flat-loop-1.c: New.
> 	* gcc.dg/tree-ssa/flat-loop-2.c: New.
> 	* gcc.dg/tree-ssa/flat-loop-3.c: New.
> 	* gcc.dg/tree-ssa/flat-loop-4.c: New.
> ---
>  gcc/ChangeLog                               |   18 +
>  gcc/Makefile.in                             |    4 +
>  gcc/common.opt                              |    4 +
>  gcc/dbgcnt.def                              |    1 +
>  gcc/params.def                              |    7 +
>  gcc/passes.c                                |    1 +
>  gcc/testsuite/ChangeLog                     |    7 +
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
>  gcc/timevar.def                             |    1 +
>  gcc/tree-flow.h                             |    4 +
>  gcc/tree-if-conv.c                          |    4 +-
>  gcc/tree-loop-flattening.c                  |  630 +++++++++++++++++++++++++++
>  gcc/tree-pass.h                             |    1 +
>  16 files changed, 789 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
>  create mode 100644 gcc/tree-loop-flattening.c
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3ceb7b6..f312b27 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,5 +1,23 @@
>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>  
> +	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
> +	(tree-loop-flattening.o): New.
> +	* common.opt (ftree-loop-flatten): New.
> +	* dbgcnt.def (lflat): New.
> +	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
> +	* passes.c (init_optimization_passes): Add new passes
> +	pass_flatten_loops and pass_if_conversion after loop vectorization
> +	and before pass_slp_vectorize.
> +	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
> +	* tree-loop-flattening.c: New.
> +	* tree-pass.h (pass_flatten_loops): Declared.
> +	* tree-flow.h (gate_tree_if_conversion): Declared.
> +	(tree_if_conversion): Declared.
> +	* tree-if-conv.c (tree_if_conversion): Not static anymore.
> +	(gate_tree_if_conversion): Same.
> +
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
>  	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
>  	compute_data_dependences_for_loop.
>  	(if_convertible_loop_p): Do not free refs and ddrs.
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 898e962..55b67f4 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1368,6 +1368,7 @@ OBJS-common = \
>  	tree-into-ssa.o \
>  	tree-iterator.o \
>  	tree-loop-distribution.o \
> +	tree-loop-flattening.o \
>  	tree-loop-linear.o \
>  	tree-nested.o \
>  	tree-nrv.o \
> @@ -2773,6 +2774,9 @@ tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) coret
>     $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
>     $(TREE_PASS_H) $(TREE_DATA_REF_H) $(EXPR_H) \
>     langhooks.h $(TREE_VECTORIZER_H)
> +tree-loop-flattening.o: tree-loop-flattening.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
> +   $(TM_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
> +   $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(TREE_PASS_H) $(DBGCNT_H)
>  tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>     $(TREE_FLOW_H) $(TREE_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
>     $(DIAGNOSTIC_H) $(TREE_PASS_H) langhooks.h gt-tree-parloops.h \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 8fe796f..c969979 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1632,6 +1632,10 @@ ftree-loop-distribute-patterns
>  Common Report Var(flag_tree_loop_distribute_patterns) Optimization
>  Enable loop distribution for patterns transformed into a library call
>  
> +ftree-loop-flatten
> +Common Report Var(flag_tree_loop_flattening) Optimization
> +Enable loop flattening on trees
> +
>  ftree-loop-im
>  Common Report Var(flag_tree_loop_im) Init(1) Optimization
>  Enable loop invariant motion on trees
> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
> index 0492d66..0ef9a72 100644
> --- a/gcc/dbgcnt.def
> +++ b/gcc/dbgcnt.def
> @@ -166,6 +166,7 @@ DEBUG_COUNTER (if_conversion_tree)
>  DEBUG_COUNTER (if_after_combine)
>  DEBUG_COUNTER (if_after_reload)
>  DEBUG_COUNTER (local_alloc_for_sched)
> +DEBUG_COUNTER (lflat)
>  DEBUG_COUNTER (postreload_cse)
>  DEBUG_COUNTER (pre)
>  DEBUG_COUNTER (pre_insn)
> diff --git a/gcc/params.def b/gcc/params.def
> index 49a6185..3fffc35 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -788,6 +788,13 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
>  	  "maximum number of basic blocks per function to be analyzed by Graphite",
>  	  100, 0, 0)
>  
> +/* Maximal number of basic blocks in a loop to be flattened.  */
> +
> +DEFPARAM (PARAM_LFLAT_MAX_NB_BBS,
> +	  "lflat-max-nb-bbs",
> +	  "maximum number of basic blocks in a loop to be flattened",
> +	  100, 0, 0)
> +
>  /* Avoid doing loop invariant motion on very large loops.  */
>  
>  DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
> diff --git a/gcc/passes.c b/gcc/passes.c
> index 1308ce9..22110a4 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -917,6 +917,7 @@ init_optimization_passes (void)
>  	  NEXT_PASS (pass_parallelize_loops);
>  	  NEXT_PASS (pass_loop_prefetch);
>  	  NEXT_PASS (pass_iv_optimize);
> +	  NEXT_PASS (pass_flatten_loops);
>  	  NEXT_PASS (pass_tree_loop_done);
>  	}
>        NEXT_PASS (pass_cse_reciprocals);
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 4233f86..2b3b93e 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,5 +1,12 @@
>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>  
> +	* gcc.dg/tree-ssa/flat-loop-1.c: New.
> +	* gcc.dg/tree-ssa/flat-loop-2.c: New.
> +	* gcc.dg/tree-ssa/flat-loop-3.c: New.
> +	* gcc.dg/tree-ssa/flat-loop-4.c: New.
> +
> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
> +
>  	PR tree-optimization/46029
>  	* g++.dg/tree-ssa/ifc-pr46029.C: New.
>  	* gcc.dg/tree-ssa/ifc-8.c: New.
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
> new file mode 100644
> index 0000000..bee8a2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-loop-flatten" } */
> +
> +struct stack_segment
> +{
> +  struct dynamic_allocation_blocks *dynamic_allocation;
> +};
> +struct dynamic_allocation_blocks
> +{
> +  struct dynamic_allocation_blocks *next;
> +};
> +static struct dynamic_allocation_blocks *
> +merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
> +		      struct dynamic_allocation_blocks *b)
> +{
> +  struct dynamic_allocation_blocks **pp;
> +  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
> +    *pp = b;
> +  return a;
> +}
> +__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
> +{
> +  struct dynamic_allocation_blocks *ret;
> +  struct stack_segment *pss;
> +  pss = *pp;
> +  while (pss != ((void *)0))
> +    ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
> new file mode 100644
> index 0000000..a7287fb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-loop-flatten" } */
> +
> +struct stack_segment
> +{
> +  struct stack_segment *next;
> +  struct dynamic_allocation_blocks *dynamic_allocation;
> +};
> +struct dynamic_allocation_blocks
> +{
> +  struct dynamic_allocation_blocks *next;
> +};
> +static struct dynamic_allocation_blocks *
> +merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
> +        struct dynamic_allocation_blocks *b)
> +{
> +  struct dynamic_allocation_blocks **pp;
> +  if (b == ((void *)0))
> +  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
> +    ;
> +  return a;
> +}
> +__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
> +{
> +  struct dynamic_allocation_blocks *ret;
> +  struct stack_segment *pss;
> +  while (pss != ((void *)0))
> +    {
> +      struct stack_segment *next;
> +      next = pss->next;
> + {
> +   if (free_dynamic)
> +     {
> +       ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
> +     }
> + }
> +      pss = next;
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
> new file mode 100644
> index 0000000..d3d66ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-loop-flatten" } */
> +
> +
> +int
> +split_directories (const char *name, int *ptr_num_dirs)
> +{
> +  int num_dirs = 0;
> +  char **dirs;
> +  const char *p, *q;
> +  int ch;
> +  while ((ch = *p++) != '\0')
> +    {
> +   num_dirs++;
> +   while (((*p) == '/'))
> +     p++;
> +    }
> +  return (dirs[num_dirs - 1] == ((void *)0));
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
> new file mode 100644
> index 0000000..8e551ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-loop-flatten" } */
> +
> +void
> +formatted_backspace (int common, char *s)
> +{
> +  int base;
> +  int n;
> +  do
> +    {
> +      if (sseek (s, base, 0) < 0)
> +	goto io_error;
> +
> +      while (n > 0)
> +	{
> +          n--;
> +	  base += n + 1;
> +	}
> +    }
> +  while (base != 0);
> + io_error:
> +  generate_error (common, 0, ((void *)0));
> +}

The testcases seem to origin from ICEs found during development.  There
is a lack of functional tests, please consider coming up with some,
eventually testing for enabled extra optimizations.


> diff --git a/gcc/timevar.def b/gcc/timevar.def
> index 86e2999..89ff8e8 100644
> --- a/gcc/timevar.def
> +++ b/gcc/timevar.def
> @@ -152,6 +152,7 @@ DEFTIMEVAR (TV_GRAPHITE_DATA_DEPS    , "Graphite data dep analysis")
>  DEFTIMEVAR (TV_GRAPHITE_CODE_GEN     , "Graphite code generation")
>  DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
>  DEFTIMEVAR (TV_TREE_LOOP_DISTRIBUTION, "tree loop distribution")
> +DEFTIMEVAR (TV_TREE_LOOP_FLATTENING  , "tree loop flattening")
>  DEFTIMEVAR (TV_CHECK_DATA_DEPS       , "tree check data dependences")
>  DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
>  DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
> diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
> index c2702dc..e1ee69f 100644
> --- a/gcc/tree-flow.h
> +++ b/gcc/tree-flow.h
> @@ -730,6 +730,10 @@ bool contains_abnormal_ssa_name_p (tree);
>  bool stmt_dominates_stmt_p (gimple, gimple);
>  void mark_virtual_ops_for_renaming (gimple);
>  
> +/* In tree-if-conv.c */
> +bool gate_tree_if_conversion (void);
> +bool tree_if_conversion (struct loop *, tree *);
> +

Why'd you need to export the gate?  I guess if-conversion should
happen unconditionally for loops that are flattened as I see it is
really part of the flattening transformation?

>  /* In tree-ssa-dce.c */
>  void mark_virtual_phi_result_for_renaming (gimple);
>  
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 5b941af..3c30abb 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -1599,7 +1599,7 @@ combine_blocks (struct loop *loop, tree *scratch_pad)
>  /* If-convert LOOP when it is legal.  For the moment this pass has no
>     profitability analysis.  Returns true when something changed.  */
>  
> -static bool
> +bool
>  tree_if_conversion (struct loop *loop, tree *scratch_pad)
>  {
>    bool changed = false;
> @@ -1662,7 +1662,7 @@ main_tree_if_conversion (void)
>  
>  /* Returns true when the if-conversion pass is enabled.  */
>  
> -static bool
> +bool
>  gate_tree_if_conversion (void)
>  {
>    return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
> diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
> new file mode 100644
> index 0000000..4bc8768
> --- /dev/null
> +++ b/gcc/tree-loop-flattening.c
> @@ -0,0 +1,630 @@
> +/* Loop flattening.
> +   Copyright (C) 2010 Free Software Foundation, Inc.
> +   Contributed by Sebastian Pop <sebastian.pop@amd.com>.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "ggc.h"
> +#include "tree.h"
> +#include "rtl.h"
> +#include "output.h"
> +#include "basic-block.h"
> +#include "diagnostic.h"
> +#include "tree-flow.h"
> +#include "toplev.h"
> +#include "tree-dump.h"
> +#include "timevar.h"
> +#include "cfgloop.h"
> +#include "tree-pass.h"
> +#include "gimple.h"
> +#include "params.h"
> +#include "dbgcnt.h"
> +
> +/* This loop flattening pass transforms backward pointing edges into
> +   forward pointing edges.
> +
> +   The back-edge removal transformation was described in the 1983
> +   paper by Allen J. R., Ken Kennedy, Carrie Porterfield, and Joe
> +   Warren: "Conversion of control dependence to data dependence"
> +   available from http://doi.acm.org/10.1145/567067.567085
> +
> +   The back-edge removal algorithm was presented in that paper as part
> +   of the if-conversion algorithm for backward pointing edges.  In
> +   this section we will first provide a description of this technique
> +   adapted for the Gimple-SSA form, followed by an example, and a
> +   discussion of the differences with the higher level loop flattening
> +   transformation.
> +
> +   The back-edge removal algorithm transforms control dependences into
> +   data dependences by using a boolean variable.  The values taken by
> +   the boolean variable control the execution path of the forward
> +   edges created in order to use the back-edge of an outer loop.
> +
> +   The first step of the algorithm detects a surrounding loop and all
> +   the back-edges of the loop body: these back-edges can be inner
> +   loops or strongly connected components of the CFG that cannot be
> +   reduced to natural loops.
> +
> +   Each back-edge is removed by redirecting the target of the
> +   back-edge to the latch basic block of the surrounding loop.  A
> +   boolean variable is created in the latch.  It is cleared when the
> +   redirected back-edge is taken and it is set to true for any other
> +   paths leading to the latch.
> +
> +   The header basic block of the surrounding loop is split before its
> +   statements and a new condition is added based on the control
> +   variable: when the control variable is set to true, the execution
> +   proceeds as normal to the basic block that contains the statements
> +   of the header; when the control variable is cleared, meaning that
> +   the back-edge has been taken, the execution proceeds to the point
> +   where the redirected back-edge was pointing.
> +
> +   The last step updates the SSA form after all the back-edges have
> +   been redirected to the latch, and the new edges from the header to
> +   the destination of back-edges have been created.
> +
> +   Another description of loop flattening in a very Fortran specific
> +   way is in the 1992 paper by Reinhard von Hanxleden and Ken Kennedy:
> +   "Relaxing SIMD Control Flow Constraints using Loop Transformations"
> +   available from
> +   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
> +
> +/* Keep the loop structure for LOOP and remove all the loop structures
> +   under LOOP.  */
> +
> +static void
> +cancel_subloops (loop_p loop)
> +{
> +  int i;
> +  loop_p li;
> +  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
> +
> +  for (li = loop->inner; li; li = li->next)
> +    VEC_safe_push (loop_p, heap, lv, li);
> +
> +  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
> +    cancel_loop_tree (li);
> +
> +  VEC_free (loop_p, heap, lv);
> +}

This function should be in cfgloop.c and implemented in simpler
form, like

void
cancel_subloops (struct loop *loop)
{
  while (loop->inner)
    cancel_loop_tree (loop->inner);
}

simply following the cancel_loop_tree example.

> +/* Before creating other phi nodes in LOOP->header for the control
> +   flags, update the phi nodes of LOOP->header and add the necessary
> +   phi nodes in the LOOP->latch that now contains several paths on
> +   which the values are not updated.  PRED_E is the single edge that
> +   was pointing to the LOOP->latch basic block before inner back-edges
> +   were redirected to the LOOP->latch.  */
> +
> +static void
> +update_loop_phi_nodes (loop_p loop, edge pred_e)
> +{
> +  gimple_stmt_iterator gsi;
> +
> +  for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      edge e;
> +      edge_iterator ei;
> +      gimple phi = gsi_stmt (gsi);
> +      tree back_arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
> +      tree res = gimple_phi_result (phi);
> +      tree var = SSA_NAME_VAR (res);
> +
> +      phi = create_phi_node (var, loop->latch);
> +      create_new_def_for (gimple_phi_result (phi), phi,
> +			  gimple_phi_result_ptr (phi));

Using create_new_def_for looks very suspicios.  create_phi_node
will already create a new SSA name for you for the result, so
it doesn't make any sense to fiddle with the SSA updaters machinery, does 
it?

> +      FOR_EACH_EDGE (e, ei, loop->latch->preds)
> +	add_phi_arg (phi, (e == pred_e ? back_arg : res),
> +		     e, UNKNOWN_LOCATION);
> +
> +      res = gimple_phi_result (phi);
> +      add_phi_arg (gsi_stmt (gsi), res, loop_latch_edge (loop),
> +		   UNKNOWN_LOCATION);
> +    }
> +}
> +
> +/* Creates a control flag for the FORWARDED_EDGE that represents the
> +   back-edge that has been forwarded to the latch basic block of LOOP.
> +   INNER_BODY is the basic block to which the back-edge was pointing
> +   before redirection.  This function creates a boolean control flag
> +   that is cleared when the FORWARDED_EDGE is taken and set for all
> +   the other paths.  This function adds the corresponding phi nodes in
> +   LOOP->latch and LOOP->header, and finally adds an edge from
> +   LOOP->header to the INNER_BODY guarded by the control flag.  */
> +
> +static void
> +create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
> +{
> +  edge e, preheader;
> +  edge outer_latch_e = loop_latch_edge (loop);
> +  const char *name = "_flat_";
> +  tree var = create_tmp_var (boolean_type_node, name);

create_tmp_reg

> +  tree res;
> +  gimple phi, cond_stmt;
> +  gimple_stmt_iterator gsi;
> +  edge_iterator ei;
> +
> +  /* Adds a control variable for the redirected FORWARDED_EDGE.  */
> +  add_referenced_var (var);
> +  phi = create_phi_node (var, forwarded_edge->dest);
> +  create_new_def_for (gimple_phi_result (phi), phi,
> +		      gimple_phi_result_ptr (phi));

Likewise.

> +  FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
> +    add_phi_arg (phi, (e == forwarded_edge
> +		       ? boolean_false_node
> +		       : boolean_true_node),
> +		 e, UNKNOWN_LOCATION);
> +  res = gimple_phi_result (phi);
> +
> +  /* Add a phi node in LOOP->header for the control variable.  */
> +  phi = create_phi_node (var, loop->header);
> +  create_new_def_for (gimple_phi_result (phi), phi,
> +		      gimple_phi_result_ptr (phi));

Again.

> +  preheader = loop_preheader_edge (loop);
> +  FOR_EACH_EDGE (e, ei, loop->header->preds)
> +    add_phi_arg (phi, (e == preheader
> +		       ? boolean_true_node
> +		       : res),
> +		 e, UNKNOWN_LOCATION);
> +  res = gimple_phi_result (phi);
> +
> +  /* Split LOOP->header to insert the control variable condition.  */
> +  e = split_block_after_labels (loop->header);
> +  e->flags = EDGE_TRUE_VALUE;
> +  e = make_edge (loop->header, inner_body, EDGE_FALSE_VALUE);
> +  cond_stmt = gimple_build_cond (EQ_EXPR, res, boolean_true_node,
> +				 NULL_TREE, NULL_TREE);
> +  gsi = gsi_last_bb (loop->header);
> +  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
> +}
> +
> +/* Adds phi nodes to the LOOP->header and LOOP->latch for the ssa_name
> +   NAME.  ARG is the argument of the latch phi node set for the
> +   FORWARDED_EDGE, and all the other edges merged by the latch phi
> +   node are set to the result of the LOOP->header phi node.  The latch
> +   edge of the LOOP->header phi node is set to the result of the
> +   LOOP->latch phi node, and the other argument is set to an arbitrary
> +   valid value defined before the loop (note that this initial value
> +   is never used in the loop).  Returns the LOOP->header phi result.  */
> +
> +static tree
> +add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
> +			   tree arg)
> +{
> +  edge e;
> +  edge_iterator ei;
> +  tree res, zero, var = SSA_NAME_VAR (name);
> +  gimple loop_phi = create_phi_node (var, loop->header);
> +  gimple latch_phi = create_phi_node (var, loop->latch);
> +
> +  create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
> +		      gimple_phi_result_ptr (loop_phi));
> +  create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
> +		      gimple_phi_result_ptr (latch_phi));

Likewise.

> +  /* The value set to ZERO will never be used in the loop, however we
> +     have to construct something meaningful for virtual SSA_NAMEs.  */
> +  if (TREE_CODE (arg) != SSA_NAME)
> +    zero = arg;
> +  else if (is_gimple_reg (arg))
> +    zero = fold_convert (TREE_TYPE (arg), integer_zero_node);
> +  else
> +    zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));

That looks bogus.  It will create overlapping life-ranges
for virtual operands - just make sure you'll rename the VOPs
and use gimple_vop (cfun) for the fallback.  You shoudl also
use build_zero_cst instead of fold_convert.

Thus,

  mark_sym_for_renaming (gimple_vop (cfun));

> +  res = gimple_phi_result (latch_phi);
> +  FOR_EACH_EDGE (e, ei, loop->header->preds)
> +    add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
> +		 e, UNKNOWN_LOCATION);
> +
> +  res = gimple_phi_result (loop_phi);
> +  FOR_EACH_EDGE (e, ei, loop->latch->preds)
> +    add_phi_arg (latch_phi, (e == forwarded_edge ? arg : res),
> +		 e, UNKNOWN_LOCATION);
> +
> +  return res;
> +}
> +
> +/* Creates phi nodes for each inductive definition, i.e., loop phi
> +   nodes.  For each induction phi node in the old loop header, i.e.,
> +   in the single_succ (INNER_BODY), insert a phi node in the
> +   LOOP->latch that takes the updated value of the induction on the
> +   FORWARDED_EDGE, and maintains the same value as in the phi node of
> +   the LOOP->header for all the other possible paths reaching
> +   LOOP->latch.  This function has to be called after all the
> +   back-edges have been redirected.  */
> +
> +static void
> +update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
> +				  basic_block inner_body)
> +{
> +  gimple_stmt_iterator gsi;
> +
> +  for (gsi = gsi_start_phis (single_succ (inner_body));
> +       !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple old_loop_phi = gsi_stmt (gsi);
> +      tree back_arg = PHI_ARG_DEF_FROM_EDGE (old_loop_phi,
> +					     single_succ_edge (inner_body));
> +      tree res = gimple_phi_result (old_loop_phi);
> +
> +      res = add_header_and_latch_phis (loop, res, forwarded_edge, back_arg);
> +      add_phi_arg (old_loop_phi, res, single_succ_edge (inner_body),
> +		   UNKNOWN_LOCATION);
> +    }
> +}
> +
> +/* Renames all the uses of OLD_NAME with NEW_NAME (except the phi
> +   nodes of DEF_BB) in all the basic blocks dominated by DEF_BB and in
> +   the arguments of all the phi nodes originating in a basic block
> +   that is dominated by DEF_BB.  */
> +
> +static void
> +rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
> +		       basic_block def_bb)
> +{
> +  imm_use_iterator uit;
> +  gimple stmt;
> +  use_operand_p use_p;
> +  ssa_op_iter op_iter;
> +
> +  FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
> +    {
> +      enum gimple_code code = gimple_code (stmt);
> +      basic_block use_bb = gimple_bb (stmt);
> +      edge_iterator ei;
> +      edge e;
> +
> +      if (code == GIMPLE_PHI)
> +	{
> +	  FOR_EACH_EDGE (e, ei, use_bb->preds)
> +	    if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
> +		&& dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
> +		&& use_bb != def_bb)
> +	      replace_exp (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx),
> +			   new_name);

  SET_USE (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx), new_name);

> +	}
> +      else
> +	{
> +	  if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
> +	    continue;
> +
> +	  if (use_bb->loop_father == loop)
> +	    {
> +	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
> +		if (USE_FROM_PTR (use_p) == old_name)
> +		  replace_exp (use_p, new_name);
> +	    }
> +	  else
> +	    /* Virtual operands are not translated into loop closed
> +	       SSA form, and thus they may occur in the rest of
> +	       the program without a loop close vphi node.  */

But you are updating all uses again.

  You should simply use

        FOR_EACH_IMM_USE_ON_STMT (use_p, uit)
          SET_USE (use_p, new_name);

> +	    FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
> +	      if (USE_FROM_PTR (use_p) == old_name)
> +		replace_exp (use_p, new_name);
> +	}
> +    }
> +}
> +
> +/* Helper function for add_missing_phi_nodes_1.  Adds to LOOP all the
> +   missing phi nodes for NAME and updates the arguments of the
> +   LATCH_PHI node.  LOOP_PHI node is the inductive definition of NAME
> +   in LOOP->header.  */
> +
> +static void
> +add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
> +			 VEC (gimple, heap) *phis)
> +{
> +  unsigned i;
> +  basic_block bb, dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
> +  VEC (basic_block, heap) *dom_bbs = get_all_dominated_blocks (CDI_DOMINATORS,
> +							       dom_bb);
> +
> +  FOR_EACH_VEC_ELT (basic_block, dom_bbs, i, bb)
> +    {
> +      edge e;
> +      edge_iterator ei;
> +
> +      if (bb == loop->latch
> +	  || bb->loop_father != loop)
> +	continue;

dom_bbs may be very large, it's much better to iterate over the
loop bbs and do a dominator check.  Or iterate over dominator sons
with first_dom_son (), next_dom_son () and recurse, bailing out when
you're running out of the loop.

> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +	{
> +	  gimple phi = VEC_index (gimple, phis, e->dest->index);
> +
> +	  if (phi)
> +	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
> +
> +	  else if (!single_pred_p (e->dest)
> +		   && !dominated_by_p (CDI_DOMINATORS, e->dest, dom_bb)
> +		   && e->dest->loop_father == loop)
> +	  {
> +	    tree var = SSA_NAME_VAR (name);
> +
> +	    phi = create_phi_node (var, e->dest);
> +	    create_new_def_for (gimple_phi_result (phi), phi,
> +				gimple_phi_result_ptr (phi));

Again.

> +	    VEC_replace (gimple, phis, e->dest->index, phi);
> +	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
> +	    rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
> +				   e->dest);
> +	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
> +				     phis);
> +	  }
> +	}
> +    }

You leak dom_bbs.

> +}
> +
> +/* Helper function for add_missing_phi_nodes.  For all the definitions
> +   of DEF_STMT add the missing phi nodes in LOOP.  */
> +
> +static void
> +add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
> +{
> +  def_operand_p def_p;
> +  ssa_op_iter op_iter;
> +  basic_block bb = gimple_bb (def_stmt);
> +
> +  FOR_EACH_PHI_OR_STMT_DEF (def_p, def_stmt, op_iter, SSA_OP_DEF|SSA_OP_VDEF)
> +    {
> +      edge e;
> +      edge_iterator ei;
> +      tree res, zero, var;
> +      gimple loop_phi, latch_phi, use_stmt;
> +      imm_use_iterator uit;
> +      tree name = DEF_FROM_PTR (def_p);
> +      bool needs_update = false;
> +      VEC (gimple, heap) *phis;
> +      int i;
> +
> +      FOR_EACH_IMM_USE_STMT (use_stmt, uit, name)
> +	{
> +	  basic_block use_bb = gimple_bb (use_stmt);
> +
> +	  if (!dominated_by_p (CDI_DOMINATORS, bb, use_bb))
> +	    {
> +	      needs_update = true;
> +	      BREAK_FROM_IMM_USE_STMT (uit);
> +	    }
> +	}
> +
> +      if (!needs_update)
> +	continue;
> +
> +      var = SSA_NAME_VAR (name);
> +      loop_phi = create_phi_node (var, loop->header);
> +      latch_phi = create_phi_node (var, loop->latch);
> +
> +      create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
> +			  gimple_phi_result_ptr (loop_phi));
> +      create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
> +			  gimple_phi_result_ptr (latch_phi));

Again.

> +      /* The value set to ZERO will never be used in the loop, however we
> +	 have to construct something meaningful for virtual SSA_NAMEs.  */
> +      if (is_gimple_reg (name))
> +	zero = fold_convert (TREE_TYPE (name), integer_zero_node);
> +      else
> +	zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
> +
> +      res = gimple_phi_result (latch_phi);
> +      FOR_EACH_EDGE (e, ei, loop->header->preds)
> +	add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
> +		     e, UNKNOWN_LOCATION);
> +
> +      res = gimple_phi_result (loop_phi);
> +      FOR_EACH_EDGE (e, ei, loop->latch->preds)
> +	add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);
> +
> +      phis = VEC_alloc (gimple, heap, n_basic_blocks);
> +      for (i = 0; i < n_basic_blocks; i++)
> +	VEC_quick_push (gimple, phis, NULL);
> +
> +      VEC_replace (gimple, phis, loop->latch->index, latch_phi);
> +      VEC_replace (gimple, phis, loop->header->index, loop_phi);
> +      add_missing_phi_nodes_2 (loop, name, name, phis);
> +
> +      for (i = 0; i < n_basic_blocks; i++)
> +	{
> +	  gimple phi = VEC_index (gimple, phis, i);
> +
> +	  if (!phi)
> +	    continue;
> +
> +	  FOR_EACH_EDGE (e, ei, BASIC_BLOCK (i)->preds)
> +	    if (!PHI_ARG_DEF_FROM_EDGE (phi, e))
> +	      add_phi_arg (phi, res, e, UNKNOWN_LOCATION);
> +	}
> +
> +      VEC_free (gimple, heap, phis);
> +    }
> +}
> +
> +/* Walks over the code of LOOP and adds the missing phi nodes at
> +   control flow junctions.  When a variable is defined in an outer
> +   loop and used in an inner loop, the definition dominates the use.
> +   After the loop flattening, the inner loop body is directly
> +   reachable from the LOOP->header by using the added edge guarded by
> +   the boolean flag that controls the execution of the back-edge that
> +   was eliminated.  In this case, the use is not dominated by the
> +   definition, and this function adds the missing phi nodes.  */
> +
> +static void
> +add_missing_phi_nodes (loop_p loop)
> +{
> +  gimple_stmt_iterator gsi;
> +  int i, n = loop->num_nodes;
> +  basic_block *bbs = get_loop_body (loop);

So you can even pass this down to add_missing_phi_nodes_2.  Or
even use get_loop_body_in_dom_order and thus only need to walk
adjacent blocks in that array.

> +  for (i = 0; i < n; i++)
> +    {
> +      basic_block bb = bbs[i];
> +
> +      /* LOOP->header dominates all the blocks of the loop body, and
> +	 so we don't have to look at the missing phi nodes for the
> +	 definitions of LOOP->header.  */
> +      if (bb == loop->header)
> +	continue;
> +
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	if (!gimple_nop_p (gsi_stmt (gsi)))
> +	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
> +
> +      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
> +    }
> +
> +  free (bbs);
> +}
> +
> +/* Removes all the back-edges of LOOP except its own back-edge.
> +   SCRATCH_PAD is used in if-conversion.  */
> +
> +static unsigned
> +flatten_loop (loop_p loop, tree *scratch_pad)
> +{
> +  int i, n = loop->num_nodes;
> +  basic_block *bbs;
> +  VEC (edge, heap) *back_edges;
> +  VEC (basic_block, heap) *loop_body;
> +  edge_iterator ei;
> +  edge e, pred_e;
> +  unsigned max_nb_basic_blocks = PARAM_VALUE (PARAM_LFLAT_MAX_NB_BBS);;
> +
> +  if (loop->num_nodes > max_nb_basic_blocks
> +      || !single_exit (loop)
> +      || !dbg_cnt (lflat))
> +    return 0;
> +
> +  mark_dfs_back_edges ();
> +  bbs = get_loop_body (loop);
> +
> +  back_edges = VEC_alloc (edge, heap, 3);
> +  loop_body = VEC_alloc (basic_block, heap, 3);
> +
> +  for (i = 0; i < n; i++)
> +    FOR_EACH_EDGE (e, ei, bbs[i]->succs)
> +      if (e->flags & EDGE_DFS_BACK
> +	  && e->src != loop->latch)
> +	VEC_safe_push (edge, heap, back_edges, e);
> +
> +  free (bbs);
> +
> +  /* Early return and do not modify the code when there are no back
> +     edges.  */
> +  if (VEC_empty (edge, back_edges))
> +    return 0;
> +
> +  cancel_subloops (loop);
> +
> +  /* Split the latch edge to make sure that the latch basic block does
> +     not contain code.  */
> +  loop->latch = split_edge (loop_latch_edge (loop));
> +  pred_e = single_pred_edge (loop->latch);
> +
> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
> +    {
> +      basic_block dest = split_edge (e);
> +
> +      /* Redirect BACK_EDGE to LOOP->latch.  */
> +      redirect_edge_and_branch_force (e, loop->latch);
> +
> +      /* Save the basic block where it was pointing.  */
> +      VEC_safe_push (basic_block, heap, loop_body, dest);
> +    }
> +
> +  update_loop_phi_nodes (loop, pred_e);
> +
> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
> +    create_control_flag (e, loop, VEC_index (basic_block, loop_body, i));
> +
> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
> +    update_inner_induction_phi_nodes (e, loop, VEC_index (basic_block,
> +							  loop_body, i));
> +
> +  free_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  add_missing_phi_nodes (loop);
> +
> +  /* If we redirected some back-edges, split the latch edge to create
> +     an empty LOOP->latch.  */
> +  if (!single_pred_p (loop->latch))
> +    loop->latch = split_edge (loop_latch_edge (loop));
> +
> +  if (gate_tree_if_conversion ())
> +    tree_if_conversion (loop, scratch_pad);

You are leaking VECs.  As mentioned above testing the gate isn't
necessary here.

> +  return TODO_update_ssa | TODO_verify_ssa;

These TODOs belong in the pass structure.

> +}
> +
> +/* Flattens all the loops of the current function.  */
> +
> +static unsigned int
> +tree_loop_flattening (void)
> +{
> +  unsigned todo = 0;
> +  loop_p loop;
> +  loop_iterator li;
> +  tree scratch_pad = NULL_TREE;
> +
> +  if (number_of_loops () <= 1)
> +    return 0;
> +
> +  FOR_EACH_LOOP (li, loop, 0)
> +    todo |= flatten_loop (loop, &scratch_pad);

So we might end up recursively flattening loops (or not, as this
walk is in undefined order).  I'd say you want LI_ONLY_INNERMOST here,
or do you really want to flatten all loop trees up to the number
of basic blocks specified in the parm?  I guess not.

I think the pass misses a cost model and I'm still not sure when
or if it will be profitable to do this at all (as said, no
functional testcases).  What's the immediate benefit for GCC 4.6?

> +#ifdef ENABLE_CHECKING
> +  verify_dominators (CDI_DOMINATORS);
> +  verify_flow_info ();
> +#endif
> +
> +  cleanup_tree_cfg ();
> +  return todo;

return TODO_cleanup_cfg, but only if you flattened a loop.  So
return TODO_cleanup_cfg from flatten_loop instead.

Richard.

> +}
> +
> +static bool
> +gate_tree_loop_flattening (void)
> +{
> +  return flag_tree_loop_flattening != 0;
> +}
> +
> +struct gimple_opt_pass pass_flatten_loops =
> +{
> + {
> +  GIMPLE_PASS,
> +  "lflat",				/* name */
> +  gate_tree_loop_flattening,		/* gate */
> +  tree_loop_flattening,       		/* execute */
> +  NULL,					/* sub */
> +  NULL,					/* next */
> +  0,					/* static_pass_number */
> +  TV_TREE_LOOP_FLATTENING,  		/* tv_id */
> +  PROP_cfg | PROP_ssa,			/* properties_required */
> +  0,					/* properties_provided */
> +  0,					/* properties_destroyed */
> +  0,					/* todo_flags_start */
> +  TODO_dump_func
> +    | TODO_update_ssa
> +    | TODO_ggc_collect			/* todo_flags_finish */
> + }
> +};
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index a87a770..e2f257f 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -374,6 +374,7 @@ extern struct gimple_opt_pass pass_graphite;
>  extern struct gimple_opt_pass pass_graphite_transforms;
>  extern struct gimple_opt_pass pass_if_conversion;
>  extern struct gimple_opt_pass pass_loop_distribution;
> +extern struct gimple_opt_pass pass_flatten_loops;
>  extern struct gimple_opt_pass pass_vectorize;
>  extern struct gimple_opt_pass pass_slp_vectorize;
>  extern struct gimple_opt_pass pass_complete_unroll;
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-05 12:08     ` Richard Guenther
@ 2010-11-05 16:13       ` Sebastian Pop
  2010-11-10 23:24         ` Sebastian Pop
  2010-11-15 22:39       ` Sebastian Pop
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-05 16:13 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Fri, Nov 5, 2010 at 07:06, Richard Guenther <rguenther@suse.de> wrote:
>> +/* Returns true when stmt contains a data reference.  */
>>
>>  static bool
>> -ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
>> +has_unaligned_memory_refs (gimple stmt)
>
> Ick - unified diffs are hard to read (sometimes).  The comment
> doesn't match the function name -- unaligned data reference or not?
>

Right, the comment is incomplete.

>> +      if ((bitpos % BITS_PER_UNIT) != 0)
>
> Hmm, that's not actually unaligned but not addressable, right?
> I guess you want to re-use ivopts may_be_nonaddressable_p instead.

Correct.  One of the testcases that I added triggered this error in
expand, and I took part of this code from may_be_nonaddressable_p.  I
don't remember why I was not able to use may_be_nonaddressable_p, but
I will try.

>> +      if (has_unaligned_memory_refs (stmt))
>> +     {
>> +       if (dump_file && (dump_flags & TDF_DETAILS))
>> +         fprintf (dump_file, "uses misaligned memory...\n");
>
> But here it suggests misaligned again (why'd we care for misalignment?)
>

I will change this printf to nonaddressable memory.

>> +/* Insert at the beginning of the first basic block of the current
>> +   function the allocation on the stack of N bytes of memory and
>> +   return a pointer to this scratchpad memory.  */
>> +
>> +static tree
>> +create_scratchpad (void)
>> +{
>> +  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
>> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
>> +
>> +  /* void *tmp = __builtin_alloca */
>> +  const char *name = "scratch_pad";
>> +  tree x = build_int_cst (integer_type_node, 64);
>> +  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
>> +  tree var = create_tmp_var (ptr_type_node, name);
>> +  tree tmp = make_ssa_name (var, stmt);
>
> It would be better to use an automatic variable than using alloca
> which is expensive.  Why was your choice that way?  (Are we ever
> if-converting aggregate stores?  I hope not.)
>
> Also you are unconditionally allocating 64 bytes instead of N.
>
> Note that if you want to make vectorization happy you would need
> to ensure that for
>
>  if (x)
>    a[i] = ...;
>
> the scratchpad you'll end up using will have the _same_ alignment
> as a[i] (same or larger for all offsets).  Using a local array
> of chars should make it possible for the vectorizer to adjust
> its alignment if needed.
>

Ok, thanks for the recommendation, I will try to use a local array.

Also, for the vectorization to happen, we should teach the vectorizer
and the data dependence analysis that the reads/writes to the
scratchpad have inconsequential effects.  The vectorizer should also
be able to use extra scratchpad memory to transform the scratchpaded
dataref into a vector in the scratchpad.

>
> Overall I like the new way much more.  Please update and repost.

Thanks for the review.  I will send an updated patch.

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-05 13:05     ` Richard Guenther
@ 2010-11-05 16:57       ` Sebastian Pop
  2010-11-08 16:14         ` Richard Guenther
  2010-11-16 22:47       ` Sebastian Pop
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-05 16:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Hi Richi,

Thanks for reviewing this.

On Fri, Nov 5, 2010 at 07:51, Richard Guenther <rguenther@suse.de> wrote:
> What extra testing apart from the 4 testcases did this new pass get?
> Do we pass bootstrap with it enabled?  Did you check if we regress
> in SPEC 2k6 when it is enabled?

I tested loop flattening alone (before I reworked the if-conversion of
stores).  It passed bootstrap with it enabled and passed with no
run/compile errors the SPEC2k6.

> I think the pass misses a cost model and I'm still not sure when
> or if it will be profitable to do this at all (as said, no
> functional testcases).  What's the immediate benefit for GCC 4.6?

I have not yet measured the perf gain/loss due to this pass on
SPEC2k6.  I will report the SPEC2k6 percentages with/without loop
flattening.

> The testcases seem to origin from ICEs found during development.  There
> is a lack of functional tests, please consider coming up with some,
> eventually testing for enabled extra optimizations.

Ok.  I could add a matmult test with its runtime test, and grep in the
code generated by the loop flattening pass for only one loop.  I could
also add all the graphite/run-id-* testcases.

>> +/* In tree-if-conv.c */
>> +bool gate_tree_if_conversion (void);
>> +bool tree_if_conversion (struct loop *, tree *);
>> +
>
> Why'd you need to export the gate?  I guess if-conversion should
> happen unconditionally for loops that are flattened as I see it is
> really part of the flattening transformation?

Right.  I will just call tree_if_conversion unconditionally at the end
of loop flattening.

>> +/* Keep the loop structure for LOOP and remove all the loop structures
>> +   under LOOP.  */
>> +
>> +static void
>> +cancel_subloops (loop_p loop)
>> +{
>> +  int i;
>> +  loop_p li;
>> +  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
>> +
>> +  for (li = loop->inner; li; li = li->next)
>> +    VEC_safe_push (loop_p, heap, lv, li);
>> +
>> +  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
>> +    cancel_loop_tree (li);
>> +
>> +  VEC_free (loop_p, heap, lv);
>> +}
>
> This function should be in cfgloop.c and implemented in simpler
> form, like
>
> void
> cancel_subloops (struct loop *loop)
> {
>  while (loop->inner)
>    cancel_loop_tree (loop->inner);
> }
>

Ok I will move this function to cfgloop.c.  However, if I don't think
we can simplify it further without extra storage: if I write the
simplified form like this:

void
cancel_subloops (struct loop *loop)
{
  loop_p x = loop->inner;

  while (x)
    {
      cancel_loop_tree (x);
      x = x->next;
    }
}

this won't work, as the loop x gets first canceled and then we try to
access x->next and this will produce a segfault.

>> +      phi = create_phi_node (var, loop->latch);
>> +      create_new_def_for (gimple_phi_result (phi), phi,
>> +                       gimple_phi_result_ptr (phi));
>
> Using create_new_def_for looks very suspicios.  create_phi_node
> will already create a new SSA name for you for the result, so
> it doesn't make any sense to fiddle with the SSA updaters machinery, does
> it?
>

When I wrote this, I was following some other code in if-conversion.
I will remove this pattern from this file and from if-conversion.

>> +/* Flattens all the loops of the current function.  */
>> +
>> +static unsigned int
>> +tree_loop_flattening (void)
>> +{
>> +  unsigned todo = 0;
>> +  loop_p loop;
>> +  loop_iterator li;
>> +  tree scratch_pad = NULL_TREE;
>> +
>> +  if (number_of_loops () <= 1)
>> +    return 0;
>> +
>> +  FOR_EACH_LOOP (li, loop, 0)
>> +    todo |= flatten_loop (loop, &scratch_pad);
>
> So we might end up recursively flattening loops (or not, as this
> walk is in undefined order).

The order in which we flatten loops does not really matter.

>  I'd say you want LI_ONLY_INNERMOST here,
> or do you really want to flatten all loop trees up to the number
> of basic blocks specified in the parm?  I guess not.

For a given number of basic blocks per flat loop, any loop tree
traversal will end up with the same loop tree.  If we start walking
the loop tree from the innermost, we may end up flattening flat loops.

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-05 16:57       ` Sebastian Pop
@ 2010-11-08 16:14         ` Richard Guenther
  2010-11-15 23:05           ` Sebastian Pop
  2010-11-15 23:08           ` Sebastian Pop
  0 siblings, 2 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-08 16:14 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2794 bytes --]

On Fri, 5 Nov 2010, Sebastian Pop wrote:

> Hi Richi,
> 
> Thanks for reviewing this.
> >
> > This function should be in cfgloop.c and implemented in simpler
> > form, like
> >
> > void
> > cancel_subloops (struct loop *loop)
> > {
> >  while (loop->inner)
> >    cancel_loop_tree (loop->inner);
> > }
> >
> 
> Ok I will move this function to cfgloop.c.  However, if I don't think
> we can simplify it further without extra storage: if I write the
> simplified form like this:
> 
> void
> cancel_subloops (struct loop *loop)
> {
>   loop_p x = loop->inner;
> 
>   while (x)
>     {
>       cancel_loop_tree (x);
>       x = x->next;
>     }
> }
> 
> this won't work, as the loop x gets first canceled and then we try to
> access x->next and this will produce a segfault.

Which is why my suggested variant would work, no?

> >> +      phi = create_phi_node (var, loop->latch);
> >> +      create_new_def_for (gimple_phi_result (phi), phi,
> >> +                       gimple_phi_result_ptr (phi));
> >
> > Using create_new_def_for looks very suspicios.  create_phi_node
> > will already create a new SSA name for you for the result, so
> > it doesn't make any sense to fiddle with the SSA updaters machinery, does
> > it?
> >
> 
> When I wrote this, I was following some other code in if-conversion.
> I will remove this pattern from this file and from if-conversion.

Thanks.

> >> +/* Flattens all the loops of the current function.  */
> >> +
> >> +static unsigned int
> >> +tree_loop_flattening (void)
> >> +{
> >> +  unsigned todo = 0;
> >> +  loop_p loop;
> >> +  loop_iterator li;
> >> +  tree scratch_pad = NULL_TREE;
> >> +
> >> +  if (number_of_loops () <= 1)
> >> +    return 0;
> >> +
> >> +  FOR_EACH_LOOP (li, loop, 0)
> >> +    todo |= flatten_loop (loop, &scratch_pad);
> >
> > So we might end up recursively flattening loops (or not, as this
> > walk is in undefined order).
> 
> The order in which we flatten loops does not really matter.
> 
> >  I'd say you want LI_ONLY_INNERMOST here,
> > or do you really want to flatten all loop trees up to the number
> > of basic blocks specified in the parm?  I guess not.
> 
> For a given number of basic blocks per flat loop, any loop tree
> traversal will end up with the same loop tree.  If we start walking
> the loop tree from the innermost, we may end up flattening flat loops.

loops with a flattened body you mean?  What I question is the usefullness
of flattening a complete nest.  What I also question is whether
flattening works for outer loops (well - the default # of blocks to
flatten seems to be so low that we never do that and I guess you'll
simply ICE flattening an outer loop if you increase that limit).

Thus, I think you should restrict your self to LI_ONLY_INNERMOST.

Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-05 16:13       ` Sebastian Pop
@ 2010-11-10 23:24         ` Sebastian Pop
  2010-11-11 10:04           ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-10 23:24 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Hi Richi,

On Fri, Nov 5, 2010 at 11:08, Sebastian Pop <sebpop@gmail.com> wrote:
>> Note that if you want to make vectorization happy you would need
>> to ensure that for
>>
>>  if (x)
>>    a[i] = ...;
>>
>> the scratchpad you'll end up using will have the _same_ alignment
>> as a[i] (same or larger for all offsets).  Using a local array
>> of chars should make it possible for the vectorizer to adjust
>> its alignment if needed.
>>
>
> Ok, thanks for the recommendation, I will try to use a local array.

Do you happen to know how to declare an automatic variable?

I was looking at the code in tree-switch-conversion.c:build_one_array.
Does that look the right approach to statically declare the scratchpad
on the stack?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-10 23:24         ` Sebastian Pop
@ 2010-11-11 10:04           ` Richard Guenther
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-11 10:04 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1094 bytes --]

On Wed, 10 Nov 2010, Sebastian Pop wrote:

> Hi Richi,
> 
> On Fri, Nov 5, 2010 at 11:08, Sebastian Pop <sebpop@gmail.com> wrote:
> >> Note that if you want to make vectorization happy you would need
> >> to ensure that for
> >>
> >>  if (x)
> >>    a[i] = ...;
> >>
> >> the scratchpad you'll end up using will have the _same_ alignment
> >> as a[i] (same or larger for all offsets).  Using a local array
> >> of chars should make it possible for the vectorizer to adjust
> >> its alignment if needed.
> >>
> >
> > Ok, thanks for the recommendation, I will try to use a local array.
> 
> Do you happen to know how to declare an automatic variable?
> 
> I was looking at the code in tree-switch-conversion.c:build_one_array.
> Does that look the right approach to statically declare the scratchpad
> on the stack?

As simple as

var = create_tmp_var (type, NULL_TREE);

switch-conversion builds a local static array, not an automatic var.

Richard.

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-05 12:08     ` Richard Guenther
  2010-11-05 16:13       ` Sebastian Pop
@ 2010-11-15 22:39       ` Sebastian Pop
  2010-11-16 14:45         ` Richard Guenther
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 22:39 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 32702 bytes --]

Hi Richi,

fixes to your review are posted separately, see below for the details.
See 0001-Fix-PR46029-reimplement-if-convert-stores.patch for the
combined patch.

On Fri, Nov 5, 2010 at 07:06, Richard Guenther <rguenther@suse.de> wrote:
> On Wed, 3 Nov 2010, Sebastian Pop wrote:
>
>> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>>
>>       PR tree-optimization/46029
>>       * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
>>       * tree-if-conv.c (has_unaligned_memory_refs): New.
>>       (if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
>>       (create_scratchpad): New.
>>       (create_indirect_cond_expr): New.
>>       (predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
>>       parameter for scratch_pad.
>>       (combine_blocks): Same.
>>       (tree_if_conversion): Same.
>>       (main_tree_if_conversion): Pass to tree_if_conversion a pointer to
>>       scratch_pad.
>>       (struct ifc_dr): Removed.
>>       (IFC_DR): Removed.
>>       (DR_WRITTEN_AT_LEAST_ONCE): Removed.
>>       (DR_RW_UNCONDITIONALLY): Removed.
>>       (memrefs_read_or_written_unconditionally): Removed.
>>       (write_memrefs_written_at_least_once): Removed.
>>       (ifcvt_memrefs_wont_trap): Removed.
>>       (ifcvt_could_trap_p): Does not take refs parameter anymore.
>>       (if_convertible_gimple_assign_stmt_p): Same.
>>       (if_convertible_stmt_p): Same.
>>       (if_convertible_loop_p_1): Remove initialization of dr->aux,
>>       DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
>>       (if_convertible_loop_p): Remove deallocation of the same.
>
> Comments in-line
>
>> testsuite/
>>       * g++.dg/tree-ssa/ifc-pr46029.C: New.
>>       * gcc.dg/tree-ssa/ifc-8.c: New.
>>       * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
>> ---
>>  gcc/ChangeLog                               |   28 ++
>>  gcc/doc/invoke.texi                         |   18 +-
>>  gcc/testsuite/ChangeLog                     |    7 +
>>  gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++++
>>  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   17 +-
>>  gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
>>  gcc/tree-if-conv.c                          |  379 ++++++++++++---------------
>>  7 files changed, 336 insertions(+), 218 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index beed454..0f58882 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,31 @@
>> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>> +     PR tree-optimization/46029
>> +     * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
>> +     * tree-if-conv.c (has_unaligned_memory_refs): New.
>> +     (if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
>> +     (create_scratchpad): New.
>> +     (create_indirect_cond_expr): New.
>> +     (predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
>> +     parameter for scratch_pad.
>> +     (combine_blocks): Same.
>> +     (tree_if_conversion): Same.
>> +     (main_tree_if_conversion): Pass to tree_if_conversion a pointer to
>> +     scratch_pad.
>> +     (struct ifc_dr): Removed.
>> +     (IFC_DR): Removed.
>> +     (DR_WRITTEN_AT_LEAST_ONCE): Removed.
>> +     (DR_RW_UNCONDITIONALLY): Removed.
>> +     (memrefs_read_or_written_unconditionally): Removed.
>> +     (write_memrefs_written_at_least_once): Removed.
>> +     (ifcvt_memrefs_wont_trap): Removed.
>> +     (ifcvt_could_trap_p): Does not take refs parameter anymore.
>> +     (if_convertible_gimple_assign_stmt_p): Same.
>> +     (if_convertible_stmt_p): Same.
>> +     (if_convertible_loop_p_1): Remove initialization of dr->aux,
>> +     DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
>> +     (if_convertible_loop_p): Remove deallocation of the same.
>> +
>>  2010-10-20  Nathan Froyd  <froydnj@codesourcery.com>
>>
>>       * ifcvt.c (noce_emit_cmove): If both of the values are SUBREGs, try
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index ee68454..28b0cbb 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -6935,20 +6935,26 @@ if vectorization is enabled.
>>
>>  @item -ftree-loop-if-convert-stores
>>  Attempt to also if-convert conditional jumps containing memory writes.
>> -This transformation can be unsafe for multi-threaded programs as it
>> -transforms conditional memory writes into unconditional memory writes.
>>  For example,
>>  @smallexample
>>  for (i = 0; i < N; i++)
>>    if (cond)
>> -    A[i] = expr;
>> +    A[i] = B[i] + 2;
>>  @end smallexample
>>  would be transformed to
>>  @smallexample
>> -for (i = 0; i < N; i++)
>> -  A[i] = cond ? expr : A[i];
>> +void *scratchpad = alloca (64);
>> +for (i = 0; i < N; i++) @{
>> +  a = cond ? &A[i] : scratchpad;
>> +  b = cond ? &B[i] : scratchpad;
>> +  *a = *b + 2;
>> +@}
>>  @end smallexample
>> -potentially producing data races.
>> +The compiler allocates a scratchpad memory on the stack for each
>> +function in which the if-conversion of memory stores or reads
>> +happened.  This scratchpad memory is used during the part of the
>> +computation that is discarded, i.e., when the condition is evaluated
>> +to false.
>>
>>  @item -ftree-loop-distribution
>>  Perform loop distribution.  This flag can improve cache performance on
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 9d9c543..4233f86 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,10 @@
>> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>> +     PR tree-optimization/46029
>> +     * g++.dg/tree-ssa/ifc-pr46029.C: New.
>> +     * gcc.dg/tree-ssa/ifc-8.c: New.
>> +     * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
>> +
>>  2010-10-20  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>
>>
>>       PR c++/46024
>> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>> new file mode 100644
>> index 0000000..2a54bdb
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>> @@ -0,0 +1,76 @@
>> +// { dg-do run }
>> +/* { dg-options "-O -ftree-loop-if-convert-stores" } */
>> +
>> +namespace
>> +{
>> +  struct rb_tree_node_
>> +  {
>> +    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
>> +    {
>> +    }
>> +    unsigned &get_metadata ()
>> +    {
>> +      return m_metadata;
>> +    }
>> +    rb_tree_node_ *m_p_left;
>> +    rb_tree_node_ *m_p_parent;
>> +    unsigned m_metadata;
>> +  };
>> +
>> +  struct bin_search_tree_const_node_it_
>> +  {
>> +    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
>> +    {
>> +    }
>> +    unsigned &get_metadata ()
>> +    {
>> +      return m_p_nd->get_metadata ();
>> +    }
>> +    bin_search_tree_const_node_it_ get_l_child ()
>> +    {
>> +      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
>> +    }
>> +
>> +    rb_tree_node_ *m_p_nd;
>> +  };
>> +
>> +  struct bin_search_tree_no_data_
>> +  {
>> +    typedef rb_tree_node_ *node_pointer;
>> +      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
>> +    {
>> +    }
>> +    void insert_imp_empty (int r_value)
>> +    {
>> +      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
>> +      m_p_head->m_p_parent = p_new_node;
>> +      p_new_node->m_p_parent = m_p_head;
>> +      update_to_top (m_p_head->m_p_parent);
>> +    }
>> +    void apply_update (bin_search_tree_const_node_it_ nd_it)
>> +    {
>> +      unsigned
>> +     l_max_endpoint
>> +     =
>> +     (nd_it.get_l_child ().m_p_nd ==
>> +      0) ? 0 : nd_it.get_l_child ().get_metadata ();
>> +      nd_it.get_metadata () = l_max_endpoint;
>> +    }
>> +    void update_to_top (node_pointer p_nd)
>> +    {
>> +      while (p_nd != m_p_head)
>> +     {
>> +       apply_update (p_nd);
>> +       p_nd = p_nd->m_p_parent;
>> +     }
>> +    }
>> +
>> +    rb_tree_node_ * m_p_head;
>> +  };
>> +}
>> +
>> +int main ()
>> +{
>> +  bin_search_tree_no_data_ ().insert_imp_empty (0);
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> index a9cc816..d88c4a2 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
>> @@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
>>    for (i = 0; i <= nCoeffs; i++)
>>      {
>>        level = block[i];
>> -      if (level < 0)
>> -     level = level * qmul - qadd;
>> -      else
>> -     level = level * qmul + qadd;
>> -      block[i] = level;
>> +      if (level)
>> +        {
>> +          if (level < 0)
>> +            {
>> +              level = level * qmul - qadd;
>> +            }
>> +          else
>> +            {
>> +              level = level * qmul + qadd;
>> +            }
>> +          block[i] = level;
>> +        }
>>      }
>>  }
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
>> new file mode 100644
>> index 0000000..d7cf279
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
>> +
>> +typedef union tree_node *tree;
>> +struct tree_common
>> +{
>> +  unsigned volatile_flag : 1;
>> +  unsigned unsigned_flag : 1;
>> +};
>> +struct tree_type
>> +{
>> +  tree next_variant;
>> +  tree main_variant;
>> +};
>> +union tree_node
>> +{
>> +  struct tree_common common;
>> +  struct tree_type type;
>> +};
>> +void finish_enum (tree enumtype)
>> +{
>> +  tree tem;
>> +  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
>> +    {
>> +      if (tem == enumtype)
>> +     continue;
>> +      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
>> +    }
>> +}
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index 642dbda..9fc6190 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -446,171 +446,47 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>>    return true;
>>  }
>>
>> -/* Records the status of a data reference.  This struct is attached to
>> -   each DR->aux field.  */
>> -
>> -struct ifc_dr {
>> -  /* -1 when not initialized, 0 when false, 1 when true.  */
>> -  int written_at_least_once;
>> -
>> -  /* -1 when not initialized, 0 when false, 1 when true.  */
>> -  int rw_unconditionally;
>> -};
>> -
>> -#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
>> -#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
>> -#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
>> -
>> -/* Returns true when the memory references of STMT are read or written
>> -   unconditionally.  In other words, this function returns true when
>> -   for every data reference A in STMT there exist other accesses to
>> -   the same data reference with predicates that add up (OR-up) to the
>> -   true predicate: this ensures that the data reference A is touched
>> -   (read or written) on every iteration of the if-converted loop.  */
>> -
>> -static bool
>> -memrefs_read_or_written_unconditionally (gimple stmt,
>> -                                      VEC (data_reference_p, heap) *drs)
>> -{
>> -  int i, j;
>> -  data_reference_p a, b;
>> -  tree ca = bb_predicate (gimple_bb (stmt));
>> -
>> -  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>> -    if (DR_STMT (a) == stmt)
>> -      {
>> -     bool found = false;
>> -     int x = DR_RW_UNCONDITIONALLY (a);
>> -
>> -     if (x == 0)
>> -       return false;
>> -
>> -     if (x == 1)
>> -       continue;
>> -
>> -     for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
>> -       if (DR_STMT (b) != stmt
>> -           && same_data_refs (a, b))
>> -         {
>> -           tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
>> -
>> -           if (DR_RW_UNCONDITIONALLY (b) == 1
>> -               || is_true_predicate (cb)
>> -               || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
>> -                                                              ca, cb)))
>> -             {
>> -               DR_RW_UNCONDITIONALLY (a) = 1;
>> -               DR_RW_UNCONDITIONALLY (b) = 1;
>> -               found = true;
>> -               break;
>> -             }
>> -         }
>> -
>> -     if (!found)
>> -       {
>> -         DR_RW_UNCONDITIONALLY (a) = 0;
>> -         return false;
>> -       }
>> -      }
>> -
>> -  return true;
>> -}
>> -
>> -/* Returns true when the memory references of STMT are unconditionally
>> -   written.  In other words, this function returns true when for every
>> -   data reference A written in STMT, there exist other writes to the
>> -   same data reference with predicates that add up (OR-up) to the true
>> -   predicate: this ensures that the data reference A is written on
>> -   every iteration of the if-converted loop.  */
>> +/* Wrapper around gimple_could_trap_p refined for the needs of the
>> +   if-conversion.  */
>>
>>  static bool
>> -write_memrefs_written_at_least_once (gimple stmt,
>> -                                  VEC (data_reference_p, heap) *drs)
>> +ifcvt_could_trap_p (gimple stmt)
>>  {
>> -  int i, j;
>> -  data_reference_p a, b;
>> -  tree ca = bb_predicate (gimple_bb (stmt));
>> -
>> -  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>> -    if (DR_STMT (a) == stmt
>> -     && DR_IS_WRITE (a))
>> -      {
>> -     bool found = false;
>> -     int x = DR_WRITTEN_AT_LEAST_ONCE (a);
>> -
>> -     if (x == 0)
>> -       return false;
>> -
>> -     if (x == 1)
>> -       continue;
>> -
>> -     for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
>> -       if (DR_STMT (b) != stmt
>> -           && DR_IS_WRITE (b)
>> -           && same_data_refs_base_objects (a, b))
>> -         {
>> -           tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
>> -
>> -           if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
>> -               || is_true_predicate (cb)
>> -               || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
>> -                                                              ca, cb)))
>> -             {
>> -               DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
>> -               DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
>> -               found = true;
>> -               break;
>> -             }
>> -         }
>> -
>> -     if (!found)
>> -       {
>> -         DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
>> -         return false;
>> -       }
>> -      }
>> +  if (gimple_vuse (stmt)
>> +      && !gimple_could_trap_p_1 (stmt, false, false))
>> +    return false;
>>
>> -  return true;
>> +  return gimple_could_trap_p (stmt);
>>  }
>>
>> -/* Return true when the memory references of STMT won't trap in the
>> -   if-converted code.  There are two things that we have to check for:
>> -
>> -   - writes to memory occur to writable memory: if-conversion of
>> -   memory writes transforms the conditional memory writes into
>> -   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
>> -   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
>> -   be executed at all in the original code, it may be a readonly
>> -   memory.  To check that A is not const-qualified, we check that
>> -   there exists at least an unconditional write to A in the current
>> -   function.
>> -
>> -   - reads or writes to memory are valid memory accesses for every
>> -   iteration.  To check that the memory accesses are correctly formed
>> -   and that we are allowed to read and write in these locations, we
>> -   check that the memory accesses to be if-converted occur at every
>> -   iteration unconditionally.  */
>> +/* Returns true when stmt contains a data reference.  */
>>
>>  static bool
>> -ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
>> +has_unaligned_memory_refs (gimple stmt)
>
> Ick - unified diffs are hard to read (sometimes).  The comment
> doesn't match the function name -- unaligned data reference or not?
>
>>  {
>> -  return write_memrefs_written_at_least_once (stmt, refs)
>> -    && memrefs_read_or_written_unconditionally (stmt, refs);
>> -}
>> -
>> -/* Wrapper around gimple_could_trap_p refined for the needs of the
>> -   if-conversion.  Try to prove that the memory accesses of STMT could
>> -   not trap in the innermost loop containing STMT.  */
>> +  int unsignedp, volatilep;
>> +  HOST_WIDE_INT bitsize, bitpos;
>> +  tree toffset;
>> +  enum machine_mode mode;
>> +  VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
>> +  bool res = get_references_in_stmt (stmt, &refs);
>> +  unsigned i;
>> +  data_ref_loc *ref;
>> +
>> +  FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
>> +    {
>> +      get_inner_reference (*ref->pos, &bitsize, &bitpos, &toffset,
>> +                        &mode, &unsignedp, &volatilep, true);
>>
>> -static bool
>> -ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>> -{
>> -  if (gimple_vuse (stmt)
>> -      && !gimple_could_trap_p_1 (stmt, false, false)
>> -      && ifcvt_memrefs_wont_trap (stmt, refs))
>> -    return false;
>> +      if ((bitpos % BITS_PER_UNIT) != 0)
>
> Hmm, that's not actually unaligned but not addressable, right?
> I guess you want to re-use ivopts may_be_nonaddressable_p instead.
>
>> +     {
>> +       res = true;
>> +       break;
>> +     }
>> +    }
>>
>> -  return gimple_could_trap_p (stmt);
>> +  VEC_free (data_ref_loc, heap, refs);
>> +  return res;
>>  }
>>
>>  /* Return true when STMT is if-convertible.
>> @@ -621,8 +497,7 @@ ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>>     - LHS is not var decl.  */
>>
>>  static bool
>> -if_convertible_gimple_assign_stmt_p (gimple stmt,
>> -                                  VEC (data_reference_p, heap) *refs)
>> +if_convertible_gimple_assign_stmt_p (gimple stmt)
>>  {
>>    tree lhs = gimple_assign_lhs (stmt);
>>    basic_block bb;
>> @@ -650,12 +525,20 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>>
>>    if (flag_tree_loop_if_convert_stores)
>>      {
>> -      if (ifcvt_could_trap_p (stmt, refs))
>> +      if (ifcvt_could_trap_p (stmt))
>>       {
>>         if (dump_file && (dump_flags & TDF_DETAILS))
>>           fprintf (dump_file, "tree could trap...\n");
>>         return false;
>>       }
>> +
>> +      if (has_unaligned_memory_refs (stmt))
>> +     {
>> +       if (dump_file && (dump_flags & TDF_DETAILS))
>> +         fprintf (dump_file, "uses misaligned memory...\n");
>
> But here it suggests misaligned again (why'd we care for misalignment?)
>

All the above issues are fixed by
0002-Reuse-ivopts-may_be_nonaddressable_p.patch.

>> +       return false;
>> +     }
>> +
>>        return true;
>>      }
>>
>> @@ -690,7 +573,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>>     - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
>>
>>  static bool
>> -if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>> +if_convertible_stmt_p (gimple stmt)
>>  {
>>    switch (gimple_code (stmt))
>>      {
>> @@ -700,7 +583,7 @@ if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>>        return true;
>>
>>      case GIMPLE_ASSIGN:
>> -      return if_convertible_gimple_assign_stmt_p (stmt, refs);
>> +      return if_convertible_gimple_assign_stmt_p (stmt);
>>
>>      default:
>>        /* Don't know what to do with 'em so don't do anything.  */
>> @@ -1016,18 +899,6 @@ if_convertible_loop_p_1 (struct loop *loop,
>>    if (!res)
>>      return false;
>>
>> -  if (flag_tree_loop_if_convert_stores)
>> -    {
>> -      data_reference_p dr;
>> -
>> -      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
>> -     {
>> -       dr->aux = XNEW (struct ifc_dr);
>> -       DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
>> -       DR_RW_UNCONDITIONALLY (dr) = -1;
>> -     }
>> -    }
>> -
>>    for (i = 0; i < loop->num_nodes; i++)
>>      {
>>        basic_block bb = ifc_bbs[i];
>> @@ -1040,7 +911,7 @@ if_convertible_loop_p_1 (struct loop *loop,
>>        /* Check the if-convertibility of statements in predicated BBs.  */
>>        if (is_predicated (bb))
>>       for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
>> -       if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
>> +       if (!if_convertible_stmt_p (gsi_stmt (itr)))
>>           return false;
>>      }
>>
>> @@ -1101,15 +972,6 @@ if_convertible_loop_p (struct loop *loop)
>>    ddrs = VEC_alloc (ddr_p, heap, 25);
>>    res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
>>
>> -  if (flag_tree_loop_if_convert_stores)
>> -    {
>> -      data_reference_p dr;
>> -      unsigned int i;
>> -
>> -      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
>> -     free (dr->aux);
>> -    }
>> -
>>    free_data_refs (refs);
>>    free_dependence_relations (ddrs);
>>    return res;
>> @@ -1366,6 +1228,78 @@ insert_gimplified_predicates (loop_p loop)
>>      }
>>  }
>>
>> +/* Insert at the beginning of the first basic block of the current
>> +   function the allocation on the stack of N bytes of memory and
>> +   return a pointer to this scratchpad memory.  */
>> +
>> +static tree
>> +create_scratchpad (void)
>> +{
>> +  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
>> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
>> +
>> +  /* void *tmp = __builtin_alloca */
>> +  const char *name = "scratch_pad";
>> +  tree x = build_int_cst (integer_type_node, 64);
>> +  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
>> +  tree var = create_tmp_var (ptr_type_node, name);
>> +  tree tmp = make_ssa_name (var, stmt);
>
> It would be better to use an automatic variable than using alloca
> which is expensive.  Why was your choice that way?  (Are we ever
> if-converting aggregate stores?  I hope not.)
>
> Also you are unconditionally allocating 64 bytes instead of N.
>
> Note that if you want to make vectorization happy you would need
> to ensure that for
>
>  if (x)
>    a[i] = ...;
>
> the scratchpad you'll end up using will have the _same_ alignment
> as a[i] (same or larger for all offsets).  Using a local array
> of chars should make it possible for the vectorizer to adjust
> its alignment if needed.
>

This is addressed in 0003-Don-t-use-alloca.patch.

>> +  add_referenced_var (var);
>> +  gimple_call_set_lhs (stmt, tmp);
>> +  SSA_NAME_DEF_STMT (tmp) = stmt;
>> +  update_stmt (stmt);
>> +
>> +  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
>> +  return tmp;
>> +}
>> +
>> +/* Returns a memory reference to the pointer defined by the
>> +   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
>> +   inserts this code at GSI.  */
>> +
>> +static tree
>> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
>> +                        gimple_stmt_iterator *gsi)
>> +{
>> +  tree type = TREE_TYPE (ai);
>> +
>> +  tree pointer_to_type, address_of_ai, addr_expr, cond_expr;
>> +  tree pointer, star_pointer;
>> +  gimple addr_stmt, pointer_stmt;
>> +
>> +  /* address_of_ai = &A[i];  */
>> +  pointer_to_type = build_pointer_type (type);
>> +  address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
>
> Use create_tmp_reg (everywhere)

This is in 0004-Use-create_tmp_reg-instead-of-create_tmp_var.patch.

>
>> +  add_referenced_var (address_of_ai);
>> +  addr_expr = build_fold_addr_expr (ai);
>
> If you build that before create_tmp_reg you can use TREE_TYPE of
> it and avoid creating pointer_to_type.
>

Fixed in 0005-Avoid-call-to-build_pointer_type.patch.

>> +  addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
>> +  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
>> +  gimple_assign_set_lhs (addr_stmt, address_of_ai);
>> +  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
>> +  update_stmt (addr_stmt);
>> +  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
>> +
>> +  /* Allocate the scratch pad only once per function.  */
>> +  if (!*scratch_pad)
>> +    *scratch_pad = create_scratchpad ();
>> +
>> +  /* pointer = cond ? address_of_ai : scratch_pad;  */
>> +  pointer = create_tmp_var (pointer_to_type, "_ifc_");
>> +  add_referenced_var (pointer);
>> +  cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
>> +                   address_of_ai, *scratch_pad);
>> +  pointer_stmt = gimple_build_assign (pointer, cond_expr);
>> +  pointer = make_ssa_name (pointer, pointer_stmt);
>> +  gimple_assign_set_lhs (pointer_stmt, pointer);
>> +  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
>> +  update_stmt (pointer_stmt);
>> +  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
>> +
>> +  star_pointer = build_simple_mem_ref (pointer);
>
> build2 (MEM_REF, TREE_TYPE (ai), pointer,
>        build_int_cst (reference_alias_ptr_type (ai), 0));
>
> as you need to preserve TBAA info.
>

Fixed in 0006-Preserve-TBAA.patch.

>> +  return star_pointer;
>> +}
>> +
>>  /* Predicate each write to memory in LOOP.
>>
>>     This function transforms control flow constructs containing memory
>> @@ -1377,10 +1311,19 @@ insert_gimplified_predicates (loop_p loop)
>>
>>     into the following form that does not contain control flow:
>>
>> -   | for (i = 0; i < N; i++)
>> -   |   A[i] = cond ? expr : A[i];
>> +   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>> +   |
>> +   | for (i = 0; i < N; i++) {
>> +   |   p = cond ? &A[i] : scratch_pad;
>> +   |   *p = expr;
>> +   | }
>> +
>> +   SCRATCH_PAD is allocated on the stack for each function once and it is
>> +   large enough to contain any kind of scalar assignment or read.  All
>> +   values read or written to SCRATCH_PAD are not used in the computation.
>>
>> -   The original CFG looks like this:
>> +   In a more detailed way, the if-conversion of memory writes works
>> +   like this, supposing that the original CFG looks like this:
>>
>>     | bb_0
>>     |   i = 0
>> @@ -1430,10 +1373,12 @@ insert_gimplified_predicates (loop_p loop)
>>     |   goto bb_1
>>     | end_bb_4
>>
>> -   predicate_mem_writes is then predicating the memory write as follows:
>> +   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
>> +   preceding the loop header, and is predicating the memory write:
>>
>>     | bb_0
>>     |   i = 0
>> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>>     | end_bb_0
>>     |
>>     | bb_1
>> @@ -1441,12 +1386,14 @@ insert_gimplified_predicates (loop_p loop)
>>     | end_bb_1
>>     |
>>     | bb_2
>> +   |   cond = some_computation;
>>     |   if (cond) goto bb_3 else goto bb_4
>>     | end_bb_2
>>     |
>>     | bb_3
>>     |   cond = some_computation;
>> -   |   A[i] = cond ? expr : A[i];
>> +   |   p = cond ? &A[i] : scratch_pad;
>> +   |   *p = expr;
>>     |   goto bb_4
>>     | end_bb_3
>>     |
>> @@ -1459,12 +1406,14 @@ insert_gimplified_predicates (loop_p loop)
>>
>>     | bb_0
>>     |   i = 0
>> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>>     |   if (i < N) goto bb_5 else goto bb_1
>>     | end_bb_0
>>     |
>>     | bb_1
>>     |   cond = some_computation;
>> -   |   A[i] = cond ? expr : A[i];
>> +   |   p = cond ? &A[i] : scratch_pad;
>> +   |   *p = expr;
>>     |   if (i < N) goto bb_5 else goto bb_4
>>     | end_bb_1
>>     |
>> @@ -1474,7 +1423,7 @@ insert_gimplified_predicates (loop_p loop)
>>  */
>>
>>  static void
>> -predicate_mem_writes (loop_p loop)
>> +predicate_mem_writes (loop_p loop, tree *scratch_pad)
>>  {
>>    unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>>
>> @@ -1489,20 +1438,35 @@ predicate_mem_writes (loop_p loop)
>>       continue;
>>
>>        for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> -     if ((stmt = gsi_stmt (gsi))
>> -         && gimple_assign_single_p (stmt)
>> -         && gimple_vdef (stmt))
>> -       {
>> -         tree lhs = gimple_assign_lhs (stmt);
>> -         tree rhs = gimple_assign_rhs1 (stmt);
>> -         tree type = TREE_TYPE (lhs);
>> -
>> -         lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
>> -         rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
>> -         rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
>> -         gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
>> -         update_stmt (stmt);
>> -       }
>> +     {
>> +       stmt = gsi_stmt (gsi);
>> +       if (gimple_assign_single_p (stmt)
>> +           && gimple_vdef (stmt))
>> +         {
>> +           /* A[i] = x;  */
>> +           tree ai = gimple_assign_lhs (stmt);
>> +
>> +           /* pointer = cond ? &A[i] : scratch_pad;  */
>> +           tree star_pointer = create_indirect_cond_expr (ai, cond,
>> +                                                          scratch_pad, &gsi);
>> +           /* *pointer = x;  */
>> +           gimple_assign_set_lhs (stmt, star_pointer);
>> +           update_stmt (stmt);
>> +         }
>> +       else if (gimple_assign_single_p (stmt)
>> +                && gimple_vuse (stmt))
>> +         {
>> +           /* x = A[i];  */
>> +           tree ai = gimple_assign_rhs1 (stmt);
>> +
>> +           /* pointer = cond ? &A[i] : scratch_pad;  */
>> +           tree star_pointer = create_indirect_cond_expr (ai, cond,
>> +                                                          scratch_pad, &gsi);
>> +           /* x = *pointer;  */
>> +           gimple_assign_set_rhs1 (stmt, star_pointer);
>> +           update_stmt (stmt);
>> +         }
>> +     }
>>      }
>>  }
>>
>> @@ -1552,7 +1516,7 @@ remove_conditions_and_labels (loop_p loop)
>>     blocks.  Replace PHI nodes with conditional modify expressions.  */
>>
>>  static void
>> -combine_blocks (struct loop *loop)
>> +combine_blocks (struct loop *loop, tree *scratch_pad)
>>  {
>>    basic_block bb, exit_bb, merge_target_bb;
>>    unsigned int orig_loop_num_nodes = loop->num_nodes;
>> @@ -1565,7 +1529,7 @@ combine_blocks (struct loop *loop)
>>    predicate_all_scalar_phis (loop);
>>
>>    if (flag_tree_loop_if_convert_stores)
>> -    predicate_mem_writes (loop);
>> +    predicate_mem_writes (loop, scratch_pad);
>>
>>    /* Merge basic blocks: first remove all the edges in the loop,
>>       except for those from the exit block.  */
>> @@ -1654,7 +1618,7 @@ combine_blocks (struct loop *loop)
>>     profitability analysis.  Returns true when something changed.  */
>>
>>  static bool
>> -tree_if_conversion (struct loop *loop)
>> +tree_if_conversion (struct loop *loop, tree *scratch_pad)
>>  {
>>    bool changed = false;
>>    ifc_bbs = NULL;
>> @@ -1666,7 +1630,7 @@ tree_if_conversion (struct loop *loop)
>>    /* Now all statements are if-convertible.  Combine all the basic
>>       blocks into one huge basic block doing the if-conversion
>>       on-the-fly.  */
>> -  combine_blocks (loop);
>> +  combine_blocks (loop, scratch_pad);
>>
>>    if (flag_tree_loop_if_convert_stores)
>>      mark_sym_for_renaming (gimple_vop (cfun));
>> @@ -1697,12 +1661,13 @@ main_tree_if_conversion (void)
>>    struct loop *loop;
>>    bool changed = false;
>>    unsigned todo = 0;
>> +  tree scratch_pad = NULL_TREE;
>>
>>    if (number_of_loops () <= 1)
>>      return 0;
>>
>>    FOR_EACH_LOOP (li, loop, 0)
>> -    changed |= tree_if_conversion (loop);
>> +    changed |= tree_if_conversion (loop, &scratch_pad);
>>
>>    if (changed)
>>      todo |= TODO_cleanup_cfg;
>
> Overall I like the new way much more.  Please update and repost.
>

All the patches are combined on top of the previous patch.  The
updated patch is 0001-Fix-PR46029-reimplement-if-convert-stores.patch.
I am bootstrapping and testing this combined patch on amd64-linux.
Let me know if I can improve this patch in some other way.

Thanks,
Sebastian

[-- Attachment #2: 0002-Reuse-ivopts-may_be_nonaddressable_p.patch --]
[-- Type: text/x-patch, Size: 2226 bytes --]

From 892a091f4d6138437bd09f605cbc5268ea025dcc Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Wed, 10 Nov 2010 13:43:38 -0600
Subject: [PATCH 2/6] Reuse ivopts may_be_nonaddressable_p.

---
 gcc/tree-if-conv.c |   29 ++++++++++-------------------
 1 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 8b36904..7da102f 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -462,28 +462,19 @@ ifcvt_could_trap_p (gimple stmt)
 /* Returns true when stmt contains a data reference.  */
 
 static bool
-has_unaligned_memory_refs (gimple stmt)
+has_non_addressable_refs (gimple stmt)
 {
-  int unsignedp, volatilep;
-  HOST_WIDE_INT bitsize, bitpos;
-  tree toffset;
-  enum machine_mode mode;
   VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
   bool res = get_references_in_stmt (stmt, &refs);
   unsigned i;
   data_ref_loc *ref;
 
   FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
-    {
-      get_inner_reference (*ref->pos, &bitsize, &bitpos, &toffset,
-			   &mode, &unsignedp, &volatilep, true);
-
-      if ((bitpos % BITS_PER_UNIT) != 0)
-	{
-	  res = true;
-	  break;
-	}
-    }
+    if (may_be_nonaddressable_p (*(ref->pos)))
+      {
+	res = true;
+	break;
+      }
 
   VEC_free (data_ref_loc, heap, refs);
   return res;
@@ -528,14 +519,14 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
       if (ifcvt_could_trap_p (stmt))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "tree could trap...\n");
+	    fprintf (dump_file, "tree could trap\n");
 	  return false;
 	}
 
-      if (has_unaligned_memory_refs (stmt))
+      if (has_non_addressable_refs (stmt))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "uses misaligned memory...\n");
+	    fprintf (dump_file, "has non-addressable memory references\n");
 	  return false;
 	}
 
@@ -545,7 +536,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
   if (gimple_assign_rhs_could_trap_p (stmt))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "tree could trap...\n");
+	fprintf (dump_file, "tree could trap\n");
       return false;
     }
 
-- 
1.7.0.4


[-- Attachment #3: 0003-Don-t-use-alloca.patch --]
[-- Type: text/x-patch, Size: 4491 bytes --]

From 3deeb91c9114bc4577b057854bbdb2c825db61f3 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 15 Nov 2010 14:21:15 -0600
Subject: [PATCH 3/6] Don't use alloca.

---
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c |    2 +-
 gcc/tree-if-conv.c                    |   63 ++++++++++++++++++--------------
 2 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index d88c4a2..4be2cdb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
 
 void
 dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 7da102f..19332a5 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1223,30 +1223,45 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
+   returns the ADDRESS_OF_AI.  */
+
+static tree
+insert_address_of (tree ai, tree pointer_to_type, gimple_stmt_iterator *gsi)
+{
+  tree address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
+  tree addr_expr = build_fold_addr_expr (ai);
+  gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+
+  add_referenced_var (address_of_ai);
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
+  update_stmt (addr_stmt);
+  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+
+  return address_of_ai;
+}
+
 /* Insert at the beginning of the first basic block of the current
-   function the allocation on the stack of N bytes of memory and
+   function the allocation on the stack of N_BYTES of memory and
    return a pointer to this scratchpad memory.  */
 
 static tree
-create_scratchpad (void)
+create_scratchpad (int n_bytes)
 {
   basic_block bb = single_succ (ENTRY_BLOCK_PTR);
   gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  tree x = build_int_cst (integer_type_node, n_bytes - 1);
+  tree elt_type = char_type_node;
+  tree array_type = build_array_type (elt_type, build_index_type (x));
+  tree base = create_tmp_var (array_type, "scratch_pad");
+  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE,
+		    NULL_TREE);
 
-  /* void *tmp = __builtin_alloca */
-  const char *name = "scratch_pad";
-  tree x = build_int_cst (integer_type_node, 64);
-  gimple stmt = gimple_build_call (built_in_decls[BUILT_IN_ALLOCA], 1, x);
-  tree var = create_tmp_var (ptr_type_node, name);
-  tree tmp = make_ssa_name (var, stmt);
-
-  add_referenced_var (var);
-  gimple_call_set_lhs (stmt, tmp);
-  SSA_NAME_DEF_STMT (tmp) = stmt;
-  update_stmt (stmt);
+  add_referenced_var (base);
 
-  gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
-  return tmp;
+  return insert_address_of (a0, build_pointer_type (elt_type), &gsi);
 }
 
 /* Returns a memory reference to the pointer defined by the
@@ -1259,25 +1274,17 @@ create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
 {
   tree type = TREE_TYPE (ai);
 
-  tree pointer_to_type, address_of_ai, addr_expr, cond_expr;
+  tree cond_expr;
   tree pointer, star_pointer;
-  gimple addr_stmt, pointer_stmt;
+  gimple pointer_stmt;
 
   /* address_of_ai = &A[i];  */
-  pointer_to_type = build_pointer_type (type);
-  address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
-  add_referenced_var (address_of_ai);
-  addr_expr = build_fold_addr_expr (ai);
-  addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
-  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
-  gimple_assign_set_lhs (addr_stmt, address_of_ai);
-  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
-  update_stmt (addr_stmt);
-  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+  tree pointer_to_type = build_pointer_type (type);
+  tree address_of_ai = insert_address_of (ai, pointer_to_type, gsi);
 
   /* Allocate the scratch pad only once per function.  */
   if (!*scratch_pad)
-    *scratch_pad = create_scratchpad ();
+    *scratch_pad = create_scratchpad (64);
 
   /* pointer = cond ? address_of_ai : scratch_pad;  */
   pointer = create_tmp_var (pointer_to_type, "_ifc_");
-- 
1.7.0.4


[-- Attachment #4: 0004-Use-create_tmp_reg-instead-of-create_tmp_var.patch --]
[-- Type: text/x-patch, Size: 2060 bytes --]

From a6cc5720d165df8fab4ed5c1874bdf5ba97c481c Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 15 Nov 2010 15:59:19 -0600
Subject: [PATCH 4/6] Use create_tmp_reg instead of create_tmp_var.

---
 gcc/tree-if-conv.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 19332a5..fe1f08b 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -226,7 +226,7 @@ ifc_temp_var (tree type, tree expr, gimple_stmt_iterator *gsi)
   gimple stmt;
 
   /* Create new temporary variable.  */
-  var = create_tmp_var (type, name);
+  var = create_tmp_reg (type, name);
   add_referenced_var (var);
 
   /* Build new statement to assign EXPR to new variable.  */
@@ -1229,7 +1229,7 @@ insert_gimplified_predicates (loop_p loop)
 static tree
 insert_address_of (tree ai, tree pointer_to_type, gimple_stmt_iterator *gsi)
 {
-  tree address_of_ai = create_tmp_var (pointer_to_type, "_ifc_");
+  tree address_of_ai = create_tmp_reg (pointer_to_type, "_ifc_");
   tree addr_expr = build_fold_addr_expr (ai);
   gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
 
@@ -1255,7 +1255,7 @@ create_scratchpad (int n_bytes)
   tree x = build_int_cst (integer_type_node, n_bytes - 1);
   tree elt_type = char_type_node;
   tree array_type = build_array_type (elt_type, build_index_type (x));
-  tree base = create_tmp_var (array_type, "scratch_pad");
+  tree base = create_tmp_reg (array_type, "scratch_pad");
   tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE,
 		    NULL_TREE);
 
@@ -1287,7 +1287,7 @@ create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
     *scratch_pad = create_scratchpad (64);
 
   /* pointer = cond ? address_of_ai : scratch_pad;  */
-  pointer = create_tmp_var (pointer_to_type, "_ifc_");
+  pointer = create_tmp_reg (pointer_to_type, "_ifc_");
   add_referenced_var (pointer);
   cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
 		      address_of_ai, *scratch_pad);
-- 
1.7.0.4


[-- Attachment #5: 0005-Avoid-call-to-build_pointer_type.patch --]
[-- Type: text/x-patch, Size: 2480 bytes --]

From 67e5a340f25d11e0f8c556eb2fdc83278695ca09 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 15 Nov 2010 16:04:38 -0600
Subject: [PATCH 5/6] Avoid call to build_pointer_type.

---
 gcc/tree-if-conv.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index fe1f08b..5912acf 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1227,10 +1227,10 @@ insert_gimplified_predicates (loop_p loop)
    returns the ADDRESS_OF_AI.  */
 
 static tree
-insert_address_of (tree ai, tree pointer_to_type, gimple_stmt_iterator *gsi)
+insert_address_of (tree ai, gimple_stmt_iterator *gsi)
 {
-  tree address_of_ai = create_tmp_reg (pointer_to_type, "_ifc_");
   tree addr_expr = build_fold_addr_expr (ai);
+  tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
   gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
 
   add_referenced_var (address_of_ai);
@@ -1260,8 +1260,7 @@ create_scratchpad (int n_bytes)
 		    NULL_TREE);
 
   add_referenced_var (base);
-
-  return insert_address_of (a0, build_pointer_type (elt_type), &gsi);
+  return insert_address_of (a0, &gsi);
 }
 
 /* Returns a memory reference to the pointer defined by the
@@ -1272,25 +1271,22 @@ static tree
 create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
 			   gimple_stmt_iterator *gsi)
 {
-  tree type = TREE_TYPE (ai);
-
   tree cond_expr;
   tree pointer, star_pointer;
   gimple pointer_stmt;
 
   /* address_of_ai = &A[i];  */
-  tree pointer_to_type = build_pointer_type (type);
-  tree address_of_ai = insert_address_of (ai, pointer_to_type, gsi);
+  tree address_of_ai = insert_address_of (ai, gsi);
 
   /* Allocate the scratch pad only once per function.  */
   if (!*scratch_pad)
     *scratch_pad = create_scratchpad (64);
 
   /* pointer = cond ? address_of_ai : scratch_pad;  */
-  pointer = create_tmp_reg (pointer_to_type, "_ifc_");
+  pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");
   add_referenced_var (pointer);
-  cond_expr = build3 (COND_EXPR, pointer_to_type, unshare_expr (cond),
-		      address_of_ai, *scratch_pad);
+  cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
+		      unshare_expr (cond), address_of_ai, *scratch_pad);
   pointer_stmt = gimple_build_assign (pointer, cond_expr);
   pointer = make_ssa_name (pointer, pointer_stmt);
   gimple_assign_set_lhs (pointer_stmt, pointer);
-- 
1.7.0.4


[-- Attachment #6: 0006-Preserve-TBAA.patch --]
[-- Type: text/x-patch, Size: 1079 bytes --]

From 61e8f1f691a8b37bd8e87f2d1fefac881e2cd12b Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 15 Nov 2010 16:10:15 -0600
Subject: [PATCH 6/6] Preserve TBAA.

---
 gcc/tree-if-conv.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 5912acf..f4732ad 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1272,7 +1272,7 @@ create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
 			   gimple_stmt_iterator *gsi)
 {
   tree cond_expr;
-  tree pointer, star_pointer;
+  tree pointer;
   gimple pointer_stmt;
 
   /* address_of_ai = &A[i];  */
@@ -1294,8 +1294,8 @@ create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
   update_stmt (pointer_stmt);
   gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
 
-  star_pointer = build_simple_mem_ref (pointer);
-  return star_pointer;
+  return build2 (MEM_REF, TREE_TYPE (ai), pointer,
+		 build_int_cst (reference_alias_ptr_type (ai), 0));
 }
 
 /* Predicate each write to memory in LOOP.
-- 
1.7.0.4


[-- Attachment #7: 0001-Fix-PR46029-reimplement-if-convert-stores.patch --]
[-- Type: text/x-patch, Size: 26941 bytes --]

From 3dd7298ecc74c49df46e05913efeb8395bb11c62 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 26 Oct 2010 16:34:29 -0500
Subject: [PATCH] Fix PR46029: reimplement if-convert stores.

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	PR tree-optimization/46029
	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
	* tree-if-conv.c (has_unaligned_memory_refs): New.
	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
	(create_scratchpad): New.
	(create_indirect_cond_expr): New.
	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
	parameter for scratch_pad.
	(combine_blocks): Same.
	(tree_if_conversion): Same.
	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
	scratch_pad.
	(struct ifc_dr): Removed.
	(IFC_DR): Removed.
	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
	(DR_RW_UNCONDITIONALLY): Removed.
	(memrefs_read_or_written_unconditionally): Removed.
	(write_memrefs_written_at_least_once): Removed.
	(ifcvt_memrefs_wont_trap): Removed.
	(ifcvt_could_trap_p): Does not take refs parameter anymore.
	(if_convertible_gimple_assign_stmt_p): Same.
	(if_convertible_stmt_p): Same.
	(if_convertible_loop_p_1): Remove initialization of dr->aux,
	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
	(if_convertible_loop_p): Remove deallocation of the same.

testsuite/
	* g++.dg/tree-ssa/ifc-pr46029.C: New.
	* gcc.dg/tree-ssa/ifc-8.c: New.
	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
---
 gcc/ChangeLog                               |   28 ++
 gcc/doc/invoke.texi                         |   18 +-
 gcc/testsuite/ChangeLog                     |    7 +
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C |   76 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c       |   19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c       |   29 ++
 gcc/tree-if-conv.c                          |  375 ++++++++++++---------------
 7 files changed, 332 insertions(+), 220 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index df496e3..0623d2a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/46029
+	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+	* tree-if-conv.c (has_unaligned_memory_refs): New.
+	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
+	(create_scratchpad): New.
+	(create_indirect_cond_expr): New.
+	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
+	parameter for scratch_pad.
+	(combine_blocks): Same.
+	(tree_if_conversion): Same.
+	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
+	scratch_pad.
+	(struct ifc_dr): Removed.
+	(IFC_DR): Removed.
+	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
+	(DR_RW_UNCONDITIONALLY): Removed.
+	(memrefs_read_or_written_unconditionally): Removed.
+	(write_memrefs_written_at_least_once): Removed.
+	(ifcvt_memrefs_wont_trap): Removed.
+	(ifcvt_could_trap_p): Does not take refs parameter anymore.
+	(if_convertible_gimple_assign_stmt_p): Same.
+	(if_convertible_stmt_p): Same.
+	(if_convertible_loop_p_1): Remove initialization of dr->aux,
+	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
+	(if_convertible_loop_p): Remove deallocation of the same.
+
 2010-11-11  Joern Rennecke  <amylaar@spamcop.net>
 
 	PR target/44749
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f197483..d322123 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6959,20 +6959,26 @@ if vectorization is enabled.
 
 @item -ftree-loop-if-convert-stores
 Attempt to also if-convert conditional jumps containing memory writes.
-This transformation can be unsafe for multi-threaded programs as it
-transforms conditional memory writes into unconditional memory writes.
 For example,
 @smallexample
 for (i = 0; i < N; i++)
   if (cond)
-    A[i] = expr;
+    A[i] = B[i] + 2;
 @end smallexample
 would be transformed to
 @smallexample
-for (i = 0; i < N; i++)
-  A[i] = cond ? expr : A[i];
+void *scratchpad = alloca (64);
+for (i = 0; i < N; i++) @{
+  a = cond ? &A[i] : scratchpad;
+  b = cond ? &B[i] : scratchpad;
+  *a = *b + 2;
+@}
 @end smallexample
-potentially producing data races.
+The compiler allocates a scratchpad memory on the stack for each
+function in which the if-conversion of memory stores or reads
+happened.  This scratchpad memory is used during the part of the
+computation that is discarded, i.e., when the condition is evaluated
+to false.
 
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 3156601..c5c2473 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
+	PR tree-optimization/46029
+	* g++.dg/tree-ssa/ifc-pr46029.C: New.
+	* gcc.dg/tree-ssa/ifc-8.c: New.
+	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
+
 2010-11-11  Nicola Pero  <nicola.pero@meta-innovation.com>
 
 	* objc.dg/property/at-property-20.m: New.
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
new file mode 100644
index 0000000..2a54bdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
@@ -0,0 +1,76 @@
+// { dg-do run }
+/* { dg-options "-O -ftree-loop-if-convert-stores" } */
+
+namespace
+{
+  struct rb_tree_node_
+  {
+    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_metadata;
+    }
+    rb_tree_node_ *m_p_left;
+    rb_tree_node_ *m_p_parent;
+    unsigned m_metadata;
+  };
+
+  struct bin_search_tree_const_node_it_
+  {
+    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_p_nd->get_metadata ();
+    }
+    bin_search_tree_const_node_it_ get_l_child ()
+    {
+      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
+    }
+
+    rb_tree_node_ *m_p_nd;
+  };
+
+  struct bin_search_tree_no_data_
+  {
+    typedef rb_tree_node_ *node_pointer;
+      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
+    {
+    }
+    void insert_imp_empty (int r_value)
+    {
+      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
+      m_p_head->m_p_parent = p_new_node;
+      p_new_node->m_p_parent = m_p_head;
+      update_to_top (m_p_head->m_p_parent);
+    }
+    void apply_update (bin_search_tree_const_node_it_ nd_it)
+    {
+      unsigned
+	l_max_endpoint
+	=
+	(nd_it.get_l_child ().m_p_nd ==
+	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
+      nd_it.get_metadata () = l_max_endpoint;
+    }
+    void update_to_top (node_pointer p_nd)
+    {
+      while (p_nd != m_p_head)
+	{
+	  apply_update (p_nd);
+	  p_nd = p_nd->m_p_parent;
+	}
+    }
+
+    rb_tree_node_ * m_p_head;
+  };
+}
+
+int main ()
+{
+  bin_search_tree_no_data_ ().insert_imp_empty (0);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index a9cc816..4be2cdb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
 
 void
 dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
@@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
   for (i = 0; i <= nCoeffs; i++)
     {
       level = block[i];
-      if (level < 0)
-	level = level * qmul - qadd;
-      else
-	level = level * qmul + qadd;
-      block[i] = level;
+      if (level)
+        {
+          if (level < 0)
+            {
+              level = level * qmul - qadd;
+            }
+          else
+            {
+              level = level * qmul + qadd;
+            }
+          block[i] = level;
+        }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
new file mode 100644
index 0000000..d7cf279
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef union tree_node *tree;
+struct tree_common
+{
+  unsigned volatile_flag : 1;
+  unsigned unsigned_flag : 1;
+};
+struct tree_type
+{
+  tree next_variant;
+  tree main_variant;
+};
+union tree_node
+{
+  struct tree_common common;
+  struct tree_type type;
+};
+void finish_enum (tree enumtype)
+{
+  tree tem;
+  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
+    {
+      if (tem == enumtype)
+	continue;
+      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
+    }
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index fc65845..f4732ad 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -226,7 +226,7 @@ ifc_temp_var (tree type, tree expr, gimple_stmt_iterator *gsi)
   gimple stmt;
 
   /* Create new temporary variable.  */
-  var = create_tmp_var (type, name);
+  var = create_tmp_reg (type, name);
   add_referenced_var (var);
 
   /* Build new statement to assign EXPR to new variable.  */
@@ -446,171 +446,38 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
-/* Records the status of a data reference.  This struct is attached to
-   each DR->aux field.  */
-
-struct ifc_dr {
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int written_at_least_once;
-
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int rw_unconditionally;
-};
-
-#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
-#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
-#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
-
-/* Returns true when the memory references of STMT are read or written
-   unconditionally.  In other words, this function returns true when
-   for every data reference A in STMT there exist other accesses to
-   the same data reference with predicates that add up (OR-up) to the
-   true predicate: this ensures that the data reference A is touched
-   (read or written) on every iteration of the if-converted loop.  */
+/* Wrapper around gimple_could_trap_p refined for the needs of the
+   if-conversion.  */
 
 static bool
-memrefs_read_or_written_unconditionally (gimple stmt,
-					 VEC (data_reference_p, heap) *drs)
+ifcvt_could_trap_p (gimple stmt)
 {
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt)
-      {
-	bool found = false;
-	int x = DR_RW_UNCONDITIONALLY (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && same_data_refs (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_RW_UNCONDITIONALLY (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_RW_UNCONDITIONALLY (a) = 1;
-		  DR_RW_UNCONDITIONALLY (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_RW_UNCONDITIONALLY (a) = 0;
-	    return false;
-	  }
-      }
+  if (gimple_vuse (stmt)
+      && !gimple_could_trap_p_1 (stmt, false, false))
+    return false;
 
-  return true;
+  return gimple_could_trap_p (stmt);
 }
 
-/* Returns true when the memory references of STMT are unconditionally
-   written.  In other words, this function returns true when for every
-   data reference A written in STMT, there exist other writes to the
-   same data reference with predicates that add up (OR-up) to the true
-   predicate: this ensures that the data reference A is written on
-   every iteration of the if-converted loop.  */
+/* Returns true when stmt contains a data reference.  */
 
 static bool
-write_memrefs_written_at_least_once (gimple stmt,
-				     VEC (data_reference_p, heap) *drs)
+has_non_addressable_refs (gimple stmt)
 {
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
+  VEC (data_ref_loc, heap) *refs = VEC_alloc (data_ref_loc, heap, 3);
+  bool res = get_references_in_stmt (stmt, &refs);
+  unsigned i;
+  data_ref_loc *ref;
 
-  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
-    if (DR_STMT (a) == stmt
-	&& DR_IS_WRITE (a))
+  FOR_EACH_VEC_ELT (data_ref_loc, refs, i, ref)
+    if (may_be_nonaddressable_p (*(ref->pos)))
       {
-	bool found = false;
-	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
-	  if (DR_STMT (b) != stmt
-	      && DR_IS_WRITE (b)
-	      && same_data_refs_base_objects (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
-		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
-
-	if (!found)
-	  {
-	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
-	    return false;
-	  }
+	res = true;
+	break;
       }
 
-  return true;
-}
-
-/* Return true when the memory references of STMT won't trap in the
-   if-converted code.  There are two things that we have to check for:
-
-   - writes to memory occur to writable memory: if-conversion of
-   memory writes transforms the conditional memory writes into
-   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
-   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
-   be executed at all in the original code, it may be a readonly
-   memory.  To check that A is not const-qualified, we check that
-   there exists at least an unconditional write to A in the current
-   function.
-
-   - reads or writes to memory are valid memory accesses for every
-   iteration.  To check that the memory accesses are correctly formed
-   and that we are allowed to read and write in these locations, we
-   check that the memory accesses to be if-converted occur at every
-   iteration unconditionally.  */
-
-static bool
-ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
-{
-  return write_memrefs_written_at_least_once (stmt, refs)
-    && memrefs_read_or_written_unconditionally (stmt, refs);
-}
-
-/* Wrapper around gimple_could_trap_p refined for the needs of the
-   if-conversion.  Try to prove that the memory accesses of STMT could
-   not trap in the innermost loop containing STMT.  */
-
-static bool
-ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
-{
-  if (gimple_vuse (stmt)
-      && !gimple_could_trap_p_1 (stmt, false, false)
-      && ifcvt_memrefs_wont_trap (stmt, refs))
-    return false;
-
-  return gimple_could_trap_p (stmt);
+  VEC_free (data_ref_loc, heap, refs);
+  return res;
 }
 
 /* Return true when STMT is if-convertible.
@@ -621,8 +488,7 @@ ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
    - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (gimple stmt,
-				     VEC (data_reference_p, heap) *refs)
+if_convertible_gimple_assign_stmt_p (gimple stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -650,19 +516,27 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
   if (flag_tree_loop_if_convert_stores)
     {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "tree could trap...\n");
+	    fprintf (dump_file, "tree could trap\n");
 	  return false;
 	}
+
+      if (has_non_addressable_refs (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "has non-addressable memory references\n");
+	  return false;
+	}
+
       return true;
     }
 
   if (gimple_assign_rhs_could_trap_p (stmt))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "tree could trap...\n");
+	fprintf (dump_file, "tree could trap\n");
       return false;
     }
 
@@ -690,7 +564,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+if_convertible_stmt_p (gimple stmt)
 {
   switch (gimple_code (stmt))
     {
@@ -700,7 +574,7 @@ if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt, refs);
+      return if_convertible_gimple_assign_stmt_p (stmt);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -1016,18 +890,6 @@ if_convertible_loop_p_1 (struct loop *loop,
   if (!res)
     return false;
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-
-      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
-	{
-	  dr->aux = XNEW (struct ifc_dr);
-	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
-	  DR_RW_UNCONDITIONALLY (dr) = -1;
-	}
-    }
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -1040,7 +902,7 @@ if_convertible_loop_p_1 (struct loop *loop,
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (is_predicated (bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr)))
 	    return false;
     }
 
@@ -1101,15 +963,6 @@ if_convertible_loop_p (struct loop *loop)
   ddrs = VEC_alloc (ddr_p, heap, 25);
   res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
 
-  if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-      unsigned int i;
-
-      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
-	free (dr->aux);
-    }
-
   free_data_refs (refs);
   free_dependence_relations (ddrs);
   return res;
@@ -1370,6 +1223,81 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
+   returns the ADDRESS_OF_AI.  */
+
+static tree
+insert_address_of (tree ai, gimple_stmt_iterator *gsi)
+{
+  tree addr_expr = build_fold_addr_expr (ai);
+  tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
+  gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+
+  add_referenced_var (address_of_ai);
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
+  update_stmt (addr_stmt);
+  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+
+  return address_of_ai;
+}
+
+/* Insert at the beginning of the first basic block of the current
+   function the allocation on the stack of N_BYTES of memory and
+   return a pointer to this scratchpad memory.  */
+
+static tree
+create_scratchpad (int n_bytes)
+{
+  basic_block bb = single_succ (ENTRY_BLOCK_PTR);
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  tree x = build_int_cst (integer_type_node, n_bytes - 1);
+  tree elt_type = char_type_node;
+  tree array_type = build_array_type (elt_type, build_index_type (x));
+  tree base = create_tmp_reg (array_type, "scratch_pad");
+  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE,
+		    NULL_TREE);
+
+  add_referenced_var (base);
+  return insert_address_of (a0, &gsi);
+}
+
+/* Returns a memory reference to the pointer defined by the
+   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+			   gimple_stmt_iterator *gsi)
+{
+  tree cond_expr;
+  tree pointer;
+  gimple pointer_stmt;
+
+  /* address_of_ai = &A[i];  */
+  tree address_of_ai = insert_address_of (ai, gsi);
+
+  /* Allocate the scratch pad only once per function.  */
+  if (!*scratch_pad)
+    *scratch_pad = create_scratchpad (64);
+
+  /* pointer = cond ? address_of_ai : scratch_pad;  */
+  pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");
+  add_referenced_var (pointer);
+  cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
+		      unshare_expr (cond), address_of_ai, *scratch_pad);
+  pointer_stmt = gimple_build_assign (pointer, cond_expr);
+  pointer = make_ssa_name (pointer, pointer_stmt);
+  gimple_assign_set_lhs (pointer_stmt, pointer);
+  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
+  update_stmt (pointer_stmt);
+  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
+
+  return build2 (MEM_REF, TREE_TYPE (ai), pointer,
+		 build_int_cst (reference_alias_ptr_type (ai), 0));
+}
+
 /* Predicate each write to memory in LOOP.
 
    This function transforms control flow constructs containing memory
@@ -1381,10 +1309,19 @@ insert_gimplified_predicates (loop_p loop)
 
    into the following form that does not contain control flow:
 
-   | for (i = 0; i < N; i++)
-   |   A[i] = cond ? expr : A[i];
+   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
+   |
+   | for (i = 0; i < N; i++) {
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
+   | }
+
+   SCRATCH_PAD is allocated on the stack for each function once and it is
+   large enough to contain any kind of scalar assignment or read.  All
+   values read or written to SCRATCH_PAD are not used in the computation.
 
-   The original CFG looks like this:
+   In a more detailed way, the if-conversion of memory writes works
+   like this, supposing that the original CFG looks like this:
 
    | bb_0
    |   i = 0
@@ -1434,10 +1371,12 @@ insert_gimplified_predicates (loop_p loop)
    |   goto bb_1
    | end_bb_4
 
-   predicate_mem_writes is then predicating the memory write as follows:
+   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
+   preceding the loop header, and is predicating the memory write:
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    | end_bb_0
    |
    | bb_1
@@ -1445,12 +1384,14 @@ insert_gimplified_predicates (loop_p loop)
    | end_bb_1
    |
    | bb_2
+   |   cond = some_computation;
    |   if (cond) goto bb_3 else goto bb_4
    | end_bb_2
    |
    | bb_3
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   goto bb_4
    | end_bb_3
    |
@@ -1463,12 +1404,14 @@ insert_gimplified_predicates (loop_p loop)
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    |   if (i < N) goto bb_5 else goto bb_1
    | end_bb_0
    |
    | bb_1
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   if (i < N) goto bb_5 else goto bb_4
    | end_bb_1
    |
@@ -1478,7 +1421,7 @@ insert_gimplified_predicates (loop_p loop)
 */
 
 static void
-predicate_mem_writes (loop_p loop)
+predicate_mem_writes (loop_p loop, tree *scratch_pad)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
 
@@ -1493,20 +1436,35 @@ predicate_mem_writes (loop_p loop)
 	continue;
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	if ((stmt = gsi_stmt (gsi))
-	    && gimple_assign_single_p (stmt)
-	    && gimple_vdef (stmt))
-	  {
-	    tree lhs = gimple_assign_lhs (stmt);
-	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree type = TREE_TYPE (lhs);
-
-	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
-	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
-	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
-	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
-	    update_stmt (stmt);
-	  }
+	{
+	  stmt = gsi_stmt (gsi);
+	  if (gimple_assign_single_p (stmt)
+	      && gimple_vdef (stmt))
+	    {
+	      /* A[i] = x;  */
+	      tree ai = gimple_assign_lhs (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* *pointer = x;  */
+	      gimple_assign_set_lhs (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	  else if (gimple_assign_single_p (stmt)
+		   && gimple_vuse (stmt))
+	    {
+	      /* x = A[i];  */
+	      tree ai = gimple_assign_rhs1 (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+							     scratch_pad, &gsi);
+	      /* x = *pointer;  */
+	      gimple_assign_set_rhs1 (stmt, star_pointer);
+	      update_stmt (stmt);
+	    }
+	}
     }
 }
 
@@ -1556,7 +1514,7 @@ remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (struct loop *loop)
+combine_blocks (struct loop *loop, tree *scratch_pad)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1569,7 +1527,7 @@ combine_blocks (struct loop *loop)
   predicate_all_scalar_phis (loop);
 
   if (flag_tree_loop_if_convert_stores)
-    predicate_mem_writes (loop);
+    predicate_mem_writes (loop, scratch_pad);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -1658,7 +1616,7 @@ combine_blocks (struct loop *loop)
    profitability analysis.  Returns true when something changed.  */
 
 static bool
-tree_if_conversion (struct loop *loop)
+tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
   ifc_bbs = NULL;
@@ -1670,7 +1628,7 @@ tree_if_conversion (struct loop *loop)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop);
+  combine_blocks (loop, scratch_pad);
 
   if (flag_tree_loop_if_convert_stores)
     mark_sym_for_renaming (gimple_vop (cfun));
@@ -1701,12 +1659,13 @@ main_tree_if_conversion (void)
   struct loop *loop;
   bool changed = false;
   unsigned todo = 0;
+  tree scratch_pad = NULL_TREE;
 
   if (number_of_loops () <= 1)
     return 0;
 
   FOR_EACH_LOOP (li, loop, 0)
-    changed |= tree_if_conversion (loop);
+    changed |= tree_if_conversion (loop, &scratch_pad);
 
   if (changed)
     todo |= TODO_cleanup_cfg;
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-08 16:14         ` Richard Guenther
@ 2010-11-15 23:05           ` Sebastian Pop
  2010-11-15 23:17             ` Richard Guenther
  2010-11-15 23:08           ` Sebastian Pop
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 23:05 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Mon, Nov 8, 2010 at 10:09, Richard Guenther <rguenther@suse.de> wrote:
>> > This function should be in cfgloop.c and implemented in simpler
>> > form, like
>> >
>> > void
>> > cancel_subloops (struct loop *loop)
>> > {
>> >  while (loop->inner)
>> >    cancel_loop_tree (loop->inner);
>> > }
>> >
>>
>> Ok I will move this function to cfgloop.c.  However, if I don't think
>> we can simplify it further without extra storage: if I write the
>> simplified form like this:
>>
>> void
>> cancel_subloops (struct loop *loop)
>> {
>>   loop_p x = loop->inner;
>>
>>   while (x)
>>     {
>>       cancel_loop_tree (x);
>>       x = x->next;
>>     }
>> }
>>
>> this won't work, as the loop x gets first canceled and then we try to
>> access x->next and this will produce a segfault.
>
> Which is why my suggested variant would work, no?
>

Your simplified variant does not handle sibling loops: for example, it
wouldn't cancel loop_3 during a cancel_subloops (loop_1) in

loop_1
  loop_2
  end_2

  loop_3
  end_3
end_1

you have to iterate on loop->next to cancel loop_3, and your variant
doesn't do that: cancel_loop_tree "Cancels LOOP and all its subloops."
and that does not include sibling loops.

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-08 16:14         ` Richard Guenther
  2010-11-15 23:05           ` Sebastian Pop
@ 2010-11-15 23:08           ` Sebastian Pop
  2010-11-15 23:10             ` Sebastian Pop
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 23:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Mon, Nov 8, 2010 at 10:09, Richard Guenther <rguenther@suse.de> wrote:
> loops with a flattened body you mean?  What I question is the usefullness
> of flattening a complete nest.  What I also question is whether
> flattening works for outer loops (well - the default # of blocks to
> flatten seems to be so low that we never do that and I guess you'll
> simply ICE flattening an outer loop if you increase that limit).
>

The patch passed bootstrap with the limit set to flatten all loops
with bodies of 1000 basic blocks or less.  I haven't tried more than a
thousand, but I don't see why this transformation wouldn't work for
arbitrarily complex loop nests.

> Thus, I think you should restrict your self to LI_ONLY_INNERMOST.

That's debatable.

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:08           ` Sebastian Pop
@ 2010-11-15 23:10             ` Sebastian Pop
  2010-11-15 23:30               ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 23:10 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

On Mon, Nov 15, 2010 at 16:49, Sebastian Pop <sebpop@gmail.com> wrote:
>> Thus, I think you should restrict your self to LI_ONLY_INNERMOST.
>
> That's debatable.

I just happened looking at the definition of

  LI_ONLY_INNERMOST = 4		/* Iterate only over innermost loops.  */

It makes no sense to call loop flattening on the innermost loop:
innermost loops contain zero inner loops, so there is nothing to flatten...

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:05           ` Sebastian Pop
@ 2010-11-15 23:17             ` Richard Guenther
  2010-11-15 23:35               ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-11-15 23:17 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Mon, Nov 15, 2010 at 11:42 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Mon, Nov 8, 2010 at 10:09, Richard Guenther <rguenther@suse.de> wrote:
>>> > This function should be in cfgloop.c and implemented in simpler
>>> > form, like
>>> >
>>> > void
>>> > cancel_subloops (struct loop *loop)
>>> > {
>>> >  while (loop->inner)
>>> >    cancel_loop_tree (loop->inner);
>>> > }
>>> >
>>>
>>> Ok I will move this function to cfgloop.c.  However, if I don't think
>>> we can simplify it further without extra storage: if I write the
>>> simplified form like this:
>>>
>>> void
>>> cancel_subloops (struct loop *loop)
>>> {
>>>   loop_p x = loop->inner;
>>>
>>>   while (x)
>>>     {
>>>       cancel_loop_tree (x);
>>>       x = x->next;
>>>     }
>>> }
>>>
>>> this won't work, as the loop x gets first canceled and then we try to
>>> access x->next and this will produce a segfault.
>>
>> Which is why my suggested variant would work, no?
>>
>
> Your simplified variant does not handle sibling loops: for example, it
> wouldn't cancel loop_3 during a cancel_subloops (loop_1) in
>
> loop_1
>  loop_2
>  end_2
>
>  loop_3
>  end_3
> end_1
>
> you have to iterate on loop->next to cancel loop_3, and your variant
> doesn't do that: cancel_loop_tree "Cancels LOOP and all its subloops."
> and that does not include sibling loops.

Sure it would - cancel_loop_tree (loop->inner) makes loop->inner->next
loop->inner.  This is why cancel_loop_tree works in the first place.

Richard.

> Sebastian
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:10             ` Sebastian Pop
@ 2010-11-15 23:30               ` Richard Guenther
  2010-11-15 23:53                 ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-11-15 23:30 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Mon, Nov 15, 2010 at 11:57 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Mon, Nov 15, 2010 at 16:49, Sebastian Pop <sebpop@gmail.com> wrote:
>>> Thus, I think you should restrict your self to LI_ONLY_INNERMOST.
>>
>> That's debatable.
>
> I just happened looking at the definition of
>
>  LI_ONLY_INNERMOST = 4         /* Iterate only over innermost loops.  */
>
> It makes no sense to call loop flattening on the innermost loop:
> innermost loops contain zero inner loops, so there is nothing to flatten...

Oh, so you're not removing the loop you are applying to (flattening it to its
superloop) but flattening all subloops in loop.

Well.  Then there's indeed no suitable order for you (there's no
LI_FROM_OUTERMOST which would then make sense).

Which means that IMHO it makes sense to only apply flattening
to parents of innermost loops (which is what I tried to suggest).

Any benchmark results yet?

Thanks,
Richard.

> Sebastian
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:17             ` Richard Guenther
@ 2010-11-15 23:35               ` Sebastian Pop
  2010-11-16  0:32                 ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 23:35 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Guenther, gcc-patches

On Mon, Nov 15, 2010 at 17:01, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Sure it would - cancel_loop_tree (loop->inner) makes loop->inner->next
> loop->inner.  This is why cancel_loop_tree works in the first place.

I still cannot see how this code would cancel loop->next...
I see no occurrences of ->next in these functions:

/* Cancels the LOOP; it must be innermost one.  */

static void
cancel_loop (struct loop *loop)
{
  basic_block *bbs;
  unsigned i;
  struct loop *outer = loop_outer (loop);

  gcc_assert (!loop->inner);

  /* Move blocks up one level (they should be removed as soon as possible).  */
  bbs = get_loop_body (loop);
  for (i = 0; i < loop->num_nodes; i++)
    bbs[i]->loop_father = outer;

  delete_loop (loop);
}

/* Cancels LOOP and all its subloops.  */
void
cancel_loop_tree (struct loop *loop)
{
  while (loop->inner)
    cancel_loop_tree (loop->inner);
  cancel_loop (loop);
}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:30               ` Richard Guenther
@ 2010-11-15 23:53                 ` Sebastian Pop
  0 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-11-15 23:53 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Guenther, gcc-patches

On Mon, Nov 15, 2010 at 17:04, Richard Guenther
<richard.guenther@gmail.com> wrote:
> Oh, so you're not removing the loop you are applying to (flattening it to its
> superloop) but flattening all subloops in loop.
>

Right.

> Well.  Then there's indeed no suitable order for you (there's no
> LI_FROM_OUTERMOST which would then make sense).
>
> Which means that IMHO it makes sense to only apply flattening
> to parents of innermost loops (which is what I tried to suggest).
>
> Any benchmark results yet?

Not yet.  I haven't run the spec with loop flattening on.

Sebastian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-15 23:35               ` Sebastian Pop
@ 2010-11-16  0:32                 ` Richard Guenther
  0 siblings, 0 replies; 41+ messages in thread
From: Richard Guenther @ 2010-11-16  0:32 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Tue, Nov 16, 2010 at 12:07 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Mon, Nov 15, 2010 at 17:01, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> Sure it would - cancel_loop_tree (loop->inner) makes loop->inner->next
>> loop->inner.  This is why cancel_loop_tree works in the first place.
>
> I still cannot see how this code would cancel loop->next...
> I see no occurrences of ->next in these functions:
>
> /* Cancels the LOOP; it must be innermost one.  */
>
> static void
> cancel_loop (struct loop *loop)
> {
>  basic_block *bbs;
>  unsigned i;
>  struct loop *outer = loop_outer (loop);
>
>  gcc_assert (!loop->inner);
>
>  /* Move blocks up one level (they should be removed as soon as possible).  */
>  bbs = get_loop_body (loop);
>  for (i = 0; i < loop->num_nodes; i++)
>    bbs[i]->loop_father = outer;
>
>  delete_loop (loop);
> }
>
> /* Cancels LOOP and all its subloops.  */
> void
> cancel_loop_tree (struct loop *loop)
> {
>  while (loop->inner)
>    cancel_loop_tree (loop->inner);
>  cancel_loop (loop);
> }

delete_loop calls flow_loop_tree_node_remove which updates the loop
fathers loop sibling chain.

Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-15 22:39       ` Sebastian Pop
@ 2010-11-16 14:45         ` Richard Guenther
  2010-11-16 15:01           ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-11-16 14:45 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Mon, 15 Nov 2010, Sebastian Pop wrote:

> Hi Richi,
> 
> fixes to your review are posted separately, see below for the details.
> See 0001-Fix-PR46029-reimplement-if-convert-stores.patch for the
> combined patch.

+  tree base = create_tmp_var (array_type, "scratch_pad");
+  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, 
NULL_TREE,
+                   NULL_TREE);

you can drop creating the ARRAY_REF and do

+  return insert_address_of (base, build_pointer_type (elt_type), &gsi);


The patches are ok with the above change.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-16 14:45         ` Richard Guenther
@ 2010-11-16 15:01           ` Richard Guenther
  2011-01-03 21:39             ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2010-11-16 15:01 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches

On Tue, 16 Nov 2010, Richard Guenther wrote:

> On Mon, 15 Nov 2010, Sebastian Pop wrote:
> 
> > Hi Richi,
> > 
> > fixes to your review are posted separately, see below for the details.
> > See 0001-Fix-PR46029-reimplement-if-convert-stores.patch for the
> > combined patch.
> 
> +  tree base = create_tmp_var (array_type, "scratch_pad");
> +  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, 
> NULL_TREE,
> +                   NULL_TREE);
> 
> you can drop creating the ARRAY_REF and do
> 
> +  return insert_address_of (base, build_pointer_type (elt_type), &gsi);
> 
> 
> The patches are ok with the above change.

Btw, in insert_address_of you might want to add

  struct ptr_info_def *pi = get_ptr_info (address_of_ai);
  pt_solution_set_var (&pi->pt, SSA_NAME_VAR (address_of_ai));

Richard.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-05 13:05     ` Richard Guenther
  2010-11-05 16:57       ` Sebastian Pop
@ 2010-11-16 22:47       ` Sebastian Pop
  2010-11-16 23:56         ` Sebastian Pop
  1 sibling, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2010-11-16 22:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 42502 bytes --]

Hi Richi,

Combined patch is 0001-Loop-flattening-on-loop-SSA.patch
the other patches are the separate fixes as asked in this review.

I am currently testing this on amd64-linux.

Sebastian

On Fri, Nov 5, 2010 at 07:51, Richard Guenther <rguenther@suse.de> wrote:
> On Wed, 3 Nov 2010, Sebastian Pop wrote:
>
>> 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>>
>>       * Makefile.in (OBJS-common): Add tree-loop-flattening.o.
>>       (tree-loop-flattening.o): New.
>>       * common.opt (ftree-loop-flatten): New.
>>       * dbgcnt.def (lflat): New.
>>       * params.def (PARAM_LFLAT_MAX_NB_BBS): New.
>>       * passes.c (init_optimization_passes): Add new passes
>>       pass_flatten_loops and pass_if_conversion after loop vectorization
>>       and before pass_slp_vectorize.
>>       * timevar.def (TV_TREE_LOOP_FLATTENING): New.
>>       * tree-loop-flattening.c: New.
>>       * tree-pass.h (pass_flatten_loops): Declared.
>>       * tree-flow.h (gate_tree_if_conversion): Declared.
>>       (tree_if_conversion): Declared.
>>       * tree-if-conv.c (tree_if_conversion): Not static anymore.
>>       (gate_tree_if_conversion): Same.
>
> Comments inline.
>
> What extra testing apart from the 4 testcases did this new pass get?
> Do we pass bootstrap with it enabled?  Did you check if we regress
> in SPEC 2k6 when it is enabled?
>
>>       * gcc.dg/tree-ssa/flat-loop-1.c: New.
>>       * gcc.dg/tree-ssa/flat-loop-2.c: New.
>>       * gcc.dg/tree-ssa/flat-loop-3.c: New.
>>       * gcc.dg/tree-ssa/flat-loop-4.c: New.
>> ---
>>  gcc/ChangeLog                               |   18 +
>>  gcc/Makefile.in                             |    4 +
>>  gcc/common.opt                              |    4 +
>>  gcc/dbgcnt.def                              |    1 +
>>  gcc/params.def                              |    7 +
>>  gcc/passes.c                                |    1 +
>>  gcc/testsuite/ChangeLog                     |    7 +
>>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c |   28 ++
>>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c |   39 ++
>>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c |   19 +
>>  gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c |   23 +
>>  gcc/timevar.def                             |    1 +
>>  gcc/tree-flow.h                             |    4 +
>>  gcc/tree-if-conv.c                          |    4 +-
>>  gcc/tree-loop-flattening.c                  |  630 +++++++++++++++++++++++++++
>>  gcc/tree-pass.h                             |    1 +
>>  16 files changed, 789 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
>>  create mode 100644 gcc/tree-loop-flattening.c
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 3ceb7b6..f312b27 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,5 +1,23 @@
>>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>>
>> +     * Makefile.in (OBJS-common): Add tree-loop-flattening.o.
>> +     (tree-loop-flattening.o): New.
>> +     * common.opt (ftree-loop-flatten): New.
>> +     * dbgcnt.def (lflat): New.
>> +     * params.def (PARAM_LFLAT_MAX_NB_BBS): New.
>> +     * passes.c (init_optimization_passes): Add new passes
>> +     pass_flatten_loops and pass_if_conversion after loop vectorization
>> +     and before pass_slp_vectorize.
>> +     * timevar.def (TV_TREE_LOOP_FLATTENING): New.
>> +     * tree-loop-flattening.c: New.
>> +     * tree-pass.h (pass_flatten_loops): Declared.
>> +     * tree-flow.h (gate_tree_if_conversion): Declared.
>> +     (tree_if_conversion): Declared.
>> +     * tree-if-conv.c (tree_if_conversion): Not static anymore.
>> +     (gate_tree_if_conversion): Same.
>> +
>> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>>       * tree-if-conv.c (if_convertible_loop_p_1): Do not call
>>       compute_data_dependences_for_loop.
>>       (if_convertible_loop_p): Do not free refs and ddrs.
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index 898e962..55b67f4 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1368,6 +1368,7 @@ OBJS-common = \
>>       tree-into-ssa.o \
>>       tree-iterator.o \
>>       tree-loop-distribution.o \
>> +     tree-loop-flattening.o \
>>       tree-loop-linear.o \
>>       tree-nested.o \
>>       tree-nrv.o \
>> @@ -2773,6 +2774,9 @@ tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) coret
>>     $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
>>     $(TREE_PASS_H) $(TREE_DATA_REF_H) $(EXPR_H) \
>>     langhooks.h $(TREE_VECTORIZER_H)
>> +tree-loop-flattening.o: tree-loop-flattening.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
>> +   $(TM_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
>> +   $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(TREE_PASS_H) $(DBGCNT_H)
>>  tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
>>     $(TREE_FLOW_H) $(TREE_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
>>     $(DIAGNOSTIC_H) $(TREE_PASS_H) langhooks.h gt-tree-parloops.h \
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 8fe796f..c969979 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1632,6 +1632,10 @@ ftree-loop-distribute-patterns
>>  Common Report Var(flag_tree_loop_distribute_patterns) Optimization
>>  Enable loop distribution for patterns transformed into a library call
>>
>> +ftree-loop-flatten
>> +Common Report Var(flag_tree_loop_flattening) Optimization
>> +Enable loop flattening on trees
>> +
>>  ftree-loop-im
>>  Common Report Var(flag_tree_loop_im) Init(1) Optimization
>>  Enable loop invariant motion on trees
>> diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
>> index 0492d66..0ef9a72 100644
>> --- a/gcc/dbgcnt.def
>> +++ b/gcc/dbgcnt.def
>> @@ -166,6 +166,7 @@ DEBUG_COUNTER (if_conversion_tree)
>>  DEBUG_COUNTER (if_after_combine)
>>  DEBUG_COUNTER (if_after_reload)
>>  DEBUG_COUNTER (local_alloc_for_sched)
>> +DEBUG_COUNTER (lflat)
>>  DEBUG_COUNTER (postreload_cse)
>>  DEBUG_COUNTER (pre)
>>  DEBUG_COUNTER (pre_insn)
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 49a6185..3fffc35 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -788,6 +788,13 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
>>         "maximum number of basic blocks per function to be analyzed by Graphite",
>>         100, 0, 0)
>>
>> +/* Maximal number of basic blocks in a loop to be flattened.  */
>> +
>> +DEFPARAM (PARAM_LFLAT_MAX_NB_BBS,
>> +       "lflat-max-nb-bbs",
>> +       "maximum number of basic blocks in a loop to be flattened",
>> +       100, 0, 0)
>> +
>>  /* Avoid doing loop invariant motion on very large loops.  */
>>
>>  DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
>> diff --git a/gcc/passes.c b/gcc/passes.c
>> index 1308ce9..22110a4 100644
>> --- a/gcc/passes.c
>> +++ b/gcc/passes.c
>> @@ -917,6 +917,7 @@ init_optimization_passes (void)
>>         NEXT_PASS (pass_parallelize_loops);
>>         NEXT_PASS (pass_loop_prefetch);
>>         NEXT_PASS (pass_iv_optimize);
>> +       NEXT_PASS (pass_flatten_loops);
>>         NEXT_PASS (pass_tree_loop_done);
>>       }
>>        NEXT_PASS (pass_cse_reciprocals);
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index 4233f86..2b3b93e 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,5 +1,12 @@
>>  2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>>
>> +     * gcc.dg/tree-ssa/flat-loop-1.c: New.
>> +     * gcc.dg/tree-ssa/flat-loop-2.c: New.
>> +     * gcc.dg/tree-ssa/flat-loop-3.c: New.
>> +     * gcc.dg/tree-ssa/flat-loop-4.c: New.
>> +
>> +2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
>> +
>>       PR tree-optimization/46029
>>       * g++.dg/tree-ssa/ifc-pr46029.C: New.
>>       * gcc.dg/tree-ssa/ifc-8.c: New.
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
>> new file mode 100644
>> index 0000000..bee8a2b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
>> @@ -0,0 +1,28 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-loop-flatten" } */
>> +
>> +struct stack_segment
>> +{
>> +  struct dynamic_allocation_blocks *dynamic_allocation;
>> +};
>> +struct dynamic_allocation_blocks
>> +{
>> +  struct dynamic_allocation_blocks *next;
>> +};
>> +static struct dynamic_allocation_blocks *
>> +merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
>> +                   struct dynamic_allocation_blocks *b)
>> +{
>> +  struct dynamic_allocation_blocks **pp;
>> +  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
>> +    *pp = b;
>> +  return a;
>> +}
>> +__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
>> +{
>> +  struct dynamic_allocation_blocks *ret;
>> +  struct stack_segment *pss;
>> +  pss = *pp;
>> +  while (pss != ((void *)0))
>> +    ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
>> new file mode 100644
>> index 0000000..a7287fb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-loop-flatten" } */
>> +
>> +struct stack_segment
>> +{
>> +  struct stack_segment *next;
>> +  struct dynamic_allocation_blocks *dynamic_allocation;
>> +};
>> +struct dynamic_allocation_blocks
>> +{
>> +  struct dynamic_allocation_blocks *next;
>> +};
>> +static struct dynamic_allocation_blocks *
>> +merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
>> +        struct dynamic_allocation_blocks *b)
>> +{
>> +  struct dynamic_allocation_blocks **pp;
>> +  if (b == ((void *)0))
>> +  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
>> +    ;
>> +  return a;
>> +}
>> +__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
>> +{
>> +  struct dynamic_allocation_blocks *ret;
>> +  struct stack_segment *pss;
>> +  while (pss != ((void *)0))
>> +    {
>> +      struct stack_segment *next;
>> +      next = pss->next;
>> + {
>> +   if (free_dynamic)
>> +     {
>> +       ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
>> +     }
>> + }
>> +      pss = next;
>> +    }
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
>> new file mode 100644
>> index 0000000..d3d66ab
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-loop-flatten" } */
>> +
>> +
>> +int
>> +split_directories (const char *name, int *ptr_num_dirs)
>> +{
>> +  int num_dirs = 0;
>> +  char **dirs;
>> +  const char *p, *q;
>> +  int ch;
>> +  while ((ch = *p++) != '\0')
>> +    {
>> +   num_dirs++;
>> +   while (((*p) == '/'))
>> +     p++;
>> +    }
>> +  return (dirs[num_dirs - 1] == ((void *)0));
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
>> new file mode 100644
>> index 0000000..8e551ac
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -ftree-loop-flatten" } */
>> +
>> +void
>> +formatted_backspace (int common, char *s)
>> +{
>> +  int base;
>> +  int n;
>> +  do
>> +    {
>> +      if (sseek (s, base, 0) < 0)
>> +     goto io_error;
>> +
>> +      while (n > 0)
>> +     {
>> +          n--;
>> +       base += n + 1;
>> +     }
>> +    }
>> +  while (base != 0);
>> + io_error:
>> +  generate_error (common, 0, ((void *)0));
>> +}
>
> The testcases seem to origin from ICEs found during development.  There
> is a lack of functional tests, please consider coming up with some,
> eventually testing for enabled extra optimizations.
>
>
>> diff --git a/gcc/timevar.def b/gcc/timevar.def
>> index 86e2999..89ff8e8 100644
>> --- a/gcc/timevar.def
>> +++ b/gcc/timevar.def
>> @@ -152,6 +152,7 @@ DEFTIMEVAR (TV_GRAPHITE_DATA_DEPS    , "Graphite data dep analysis")
>>  DEFTIMEVAR (TV_GRAPHITE_CODE_GEN     , "Graphite code generation")
>>  DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
>>  DEFTIMEVAR (TV_TREE_LOOP_DISTRIBUTION, "tree loop distribution")
>> +DEFTIMEVAR (TV_TREE_LOOP_FLATTENING  , "tree loop flattening")
>>  DEFTIMEVAR (TV_CHECK_DATA_DEPS       , "tree check data dependences")
>>  DEFTIMEVAR (TV_TREE_PREFETCH      , "tree prefetching")
>>  DEFTIMEVAR (TV_TREE_LOOP_IVOPTS           , "tree iv optimization")
>> diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
>> index c2702dc..e1ee69f 100644
>> --- a/gcc/tree-flow.h
>> +++ b/gcc/tree-flow.h
>> @@ -730,6 +730,10 @@ bool contains_abnormal_ssa_name_p (tree);
>>  bool stmt_dominates_stmt_p (gimple, gimple);
>>  void mark_virtual_ops_for_renaming (gimple);
>>
>> +/* In tree-if-conv.c */
>> +bool gate_tree_if_conversion (void);
>> +bool tree_if_conversion (struct loop *, tree *);
>> +
>
> Why'd you need to export the gate?  I guess if-conversion should
> happen unconditionally for loops that are flattened as I see it is
> really part of the flattening transformation?
>
>>  /* In tree-ssa-dce.c */
>>  void mark_virtual_phi_result_for_renaming (gimple);
>>
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index 5b941af..3c30abb 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -1599,7 +1599,7 @@ combine_blocks (struct loop *loop, tree *scratch_pad)
>>  /* If-convert LOOP when it is legal.  For the moment this pass has no
>>     profitability analysis.  Returns true when something changed.  */
>>
>> -static bool
>> +bool
>>  tree_if_conversion (struct loop *loop, tree *scratch_pad)
>>  {
>>    bool changed = false;
>> @@ -1662,7 +1662,7 @@ main_tree_if_conversion (void)
>>
>>  /* Returns true when the if-conversion pass is enabled.  */
>>
>> -static bool
>> +bool
>>  gate_tree_if_conversion (void)
>>  {
>>    return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
>> diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
>> new file mode 100644
>> index 0000000..4bc8768
>> --- /dev/null
>> +++ b/gcc/tree-loop-flattening.c
>> @@ -0,0 +1,630 @@
>> +/* Loop flattening.
>> +   Copyright (C) 2010 Free Software Foundation, Inc.
>> +   Contributed by Sebastian Pop <sebastian.pop@amd.com>.
>> +
>> +This file is part of GCC.
>> +
>> +GCC is free software; you can redistribute it and/or modify
>> +it under the terms of the GNU General Public License as published by
>> +the Free Software Foundation; either version 3, or (at your option)
>> +any later version.
>> +
>> +GCC is distributed in the hope that it will be useful,
>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +GNU General Public License for more details.
>> +
>> +You should have received a copy of the GNU General Public License
>> +along with GCC; see the file COPYING3.  If not see
>> +<http://www.gnu.org/licenses/>.  */
>> +
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "tm.h"
>> +#include "ggc.h"
>> +#include "tree.h"
>> +#include "rtl.h"
>> +#include "output.h"
>> +#include "basic-block.h"
>> +#include "diagnostic.h"
>> +#include "tree-flow.h"
>> +#include "toplev.h"
>> +#include "tree-dump.h"
>> +#include "timevar.h"
>> +#include "cfgloop.h"
>> +#include "tree-pass.h"
>> +#include "gimple.h"
>> +#include "params.h"
>> +#include "dbgcnt.h"
>> +
>> +/* This loop flattening pass transforms backward pointing edges into
>> +   forward pointing edges.
>> +
>> +   The back-edge removal transformation was described in the 1983
>> +   paper by Allen J. R., Ken Kennedy, Carrie Porterfield, and Joe
>> +   Warren: "Conversion of control dependence to data dependence"
>> +   available from http://doi.acm.org/10.1145/567067.567085
>> +
>> +   The back-edge removal algorithm was presented in that paper as part
>> +   of the if-conversion algorithm for backward pointing edges.  In
>> +   this section we will first provide a description of this technique
>> +   adapted for the Gimple-SSA form, followed by an example, and a
>> +   discussion of the differences with the higher level loop flattening
>> +   transformation.
>> +
>> +   The back-edge removal algorithm transforms control dependences into
>> +   data dependences by using a boolean variable.  The values taken by
>> +   the boolean variable control the execution path of the forward
>> +   edges created in order to use the back-edge of an outer loop.
>> +
>> +   The first step of the algorithm detects a surrounding loop and all
>> +   the back-edges of the loop body: these back-edges can be inner
>> +   loops or strongly connected components of the CFG that cannot be
>> +   reduced to natural loops.
>> +
>> +   Each back-edge is removed by redirecting the target of the
>> +   back-edge to the latch basic block of the surrounding loop.  A
>> +   boolean variable is created in the latch.  It is cleared when the
>> +   redirected back-edge is taken and it is set to true for any other
>> +   paths leading to the latch.
>> +
>> +   The header basic block of the surrounding loop is split before its
>> +   statements and a new condition is added based on the control
>> +   variable: when the control variable is set to true, the execution
>> +   proceeds as normal to the basic block that contains the statements
>> +   of the header; when the control variable is cleared, meaning that
>> +   the back-edge has been taken, the execution proceeds to the point
>> +   where the redirected back-edge was pointing.
>> +
>> +   The last step updates the SSA form after all the back-edges have
>> +   been redirected to the latch, and the new edges from the header to
>> +   the destination of back-edges have been created.
>> +
>> +   Another description of loop flattening in a very Fortran specific
>> +   way is in the 1992 paper by Reinhard von Hanxleden and Ken Kennedy:
>> +   "Relaxing SIMD Control Flow Constraints using Loop Transformations"
>> +   available from
>> +   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
>> +
>> +/* Keep the loop structure for LOOP and remove all the loop structures
>> +   under LOOP.  */
>> +
>> +static void
>> +cancel_subloops (loop_p loop)
>> +{
>> +  int i;
>> +  loop_p li;
>> +  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
>> +
>> +  for (li = loop->inner; li; li = li->next)
>> +    VEC_safe_push (loop_p, heap, lv, li);
>> +
>> +  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
>> +    cancel_loop_tree (li);
>> +
>> +  VEC_free (loop_p, heap, lv);
>> +}
>
> This function should be in cfgloop.c and implemented in simpler
> form, like
>
> void
> cancel_subloops (struct loop *loop)
> {
>  while (loop->inner)
>    cancel_loop_tree (loop->inner);
> }
>
> simply following the cancel_loop_tree example.
>
>> +/* Before creating other phi nodes in LOOP->header for the control
>> +   flags, update the phi nodes of LOOP->header and add the necessary
>> +   phi nodes in the LOOP->latch that now contains several paths on
>> +   which the values are not updated.  PRED_E is the single edge that
>> +   was pointing to the LOOP->latch basic block before inner back-edges
>> +   were redirected to the LOOP->latch.  */
>> +
>> +static void
>> +update_loop_phi_nodes (loop_p loop, edge pred_e)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +
>> +  for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      edge e;
>> +      edge_iterator ei;
>> +      gimple phi = gsi_stmt (gsi);
>> +      tree back_arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
>> +      tree res = gimple_phi_result (phi);
>> +      tree var = SSA_NAME_VAR (res);
>> +
>> +      phi = create_phi_node (var, loop->latch);
>> +      create_new_def_for (gimple_phi_result (phi), phi,
>> +                       gimple_phi_result_ptr (phi));
>
> Using create_new_def_for looks very suspicios.  create_phi_node
> will already create a new SSA name for you for the result, so
> it doesn't make any sense to fiddle with the SSA updaters machinery, does
> it?
>
>> +      FOR_EACH_EDGE (e, ei, loop->latch->preds)
>> +     add_phi_arg (phi, (e == pred_e ? back_arg : res),
>> +                  e, UNKNOWN_LOCATION);
>> +
>> +      res = gimple_phi_result (phi);
>> +      add_phi_arg (gsi_stmt (gsi), res, loop_latch_edge (loop),
>> +                UNKNOWN_LOCATION);
>> +    }
>> +}
>> +
>> +/* Creates a control flag for the FORWARDED_EDGE that represents the
>> +   back-edge that has been forwarded to the latch basic block of LOOP.
>> +   INNER_BODY is the basic block to which the back-edge was pointing
>> +   before redirection.  This function creates a boolean control flag
>> +   that is cleared when the FORWARDED_EDGE is taken and set for all
>> +   the other paths.  This function adds the corresponding phi nodes in
>> +   LOOP->latch and LOOP->header, and finally adds an edge from
>> +   LOOP->header to the INNER_BODY guarded by the control flag.  */
>> +
>> +static void
>> +create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
>> +{
>> +  edge e, preheader;
>> +  edge outer_latch_e = loop_latch_edge (loop);
>> +  const char *name = "_flat_";
>> +  tree var = create_tmp_var (boolean_type_node, name);
>
> create_tmp_reg
>
>> +  tree res;
>> +  gimple phi, cond_stmt;
>> +  gimple_stmt_iterator gsi;
>> +  edge_iterator ei;
>> +
>> +  /* Adds a control variable for the redirected FORWARDED_EDGE.  */
>> +  add_referenced_var (var);
>> +  phi = create_phi_node (var, forwarded_edge->dest);
>> +  create_new_def_for (gimple_phi_result (phi), phi,
>> +                   gimple_phi_result_ptr (phi));
>
> Likewise.
>
>> +  FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
>> +    add_phi_arg (phi, (e == forwarded_edge
>> +                    ? boolean_false_node
>> +                    : boolean_true_node),
>> +              e, UNKNOWN_LOCATION);
>> +  res = gimple_phi_result (phi);
>> +
>> +  /* Add a phi node in LOOP->header for the control variable.  */
>> +  phi = create_phi_node (var, loop->header);
>> +  create_new_def_for (gimple_phi_result (phi), phi,
>> +                   gimple_phi_result_ptr (phi));
>
> Again.
>
>> +  preheader = loop_preheader_edge (loop);
>> +  FOR_EACH_EDGE (e, ei, loop->header->preds)
>> +    add_phi_arg (phi, (e == preheader
>> +                    ? boolean_true_node
>> +                    : res),
>> +              e, UNKNOWN_LOCATION);
>> +  res = gimple_phi_result (phi);
>> +
>> +  /* Split LOOP->header to insert the control variable condition.  */
>> +  e = split_block_after_labels (loop->header);
>> +  e->flags = EDGE_TRUE_VALUE;
>> +  e = make_edge (loop->header, inner_body, EDGE_FALSE_VALUE);
>> +  cond_stmt = gimple_build_cond (EQ_EXPR, res, boolean_true_node,
>> +                              NULL_TREE, NULL_TREE);
>> +  gsi = gsi_last_bb (loop->header);
>> +  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
>> +}
>> +
>> +/* Adds phi nodes to the LOOP->header and LOOP->latch for the ssa_name
>> +   NAME.  ARG is the argument of the latch phi node set for the
>> +   FORWARDED_EDGE, and all the other edges merged by the latch phi
>> +   node are set to the result of the LOOP->header phi node.  The latch
>> +   edge of the LOOP->header phi node is set to the result of the
>> +   LOOP->latch phi node, and the other argument is set to an arbitrary
>> +   valid value defined before the loop (note that this initial value
>> +   is never used in the loop).  Returns the LOOP->header phi result.  */
>> +
>> +static tree
>> +add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
>> +                        tree arg)
>> +{
>> +  edge e;
>> +  edge_iterator ei;
>> +  tree res, zero, var = SSA_NAME_VAR (name);
>> +  gimple loop_phi = create_phi_node (var, loop->header);
>> +  gimple latch_phi = create_phi_node (var, loop->latch);
>> +
>> +  create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
>> +                   gimple_phi_result_ptr (loop_phi));
>> +  create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
>> +                   gimple_phi_result_ptr (latch_phi));
>
> Likewise.
>
>> +  /* The value set to ZERO will never be used in the loop, however we
>> +     have to construct something meaningful for virtual SSA_NAMEs.  */
>> +  if (TREE_CODE (arg) != SSA_NAME)
>> +    zero = arg;
>> +  else if (is_gimple_reg (arg))
>> +    zero = fold_convert (TREE_TYPE (arg), integer_zero_node);
>> +  else
>> +    zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));
>
> That looks bogus.  It will create overlapping life-ranges
> for virtual operands - just make sure you'll rename the VOPs
> and use gimple_vop (cfun) for the fallback.  You shoudl also
> use build_zero_cst instead of fold_convert.
>
> Thus,
>
>  mark_sym_for_renaming (gimple_vop (cfun));
>
>> +  res = gimple_phi_result (latch_phi);
>> +  FOR_EACH_EDGE (e, ei, loop->header->preds)
>> +    add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
>> +              e, UNKNOWN_LOCATION);
>> +
>> +  res = gimple_phi_result (loop_phi);
>> +  FOR_EACH_EDGE (e, ei, loop->latch->preds)
>> +    add_phi_arg (latch_phi, (e == forwarded_edge ? arg : res),
>> +              e, UNKNOWN_LOCATION);
>> +
>> +  return res;
>> +}
>> +
>> +/* Creates phi nodes for each inductive definition, i.e., loop phi
>> +   nodes.  For each induction phi node in the old loop header, i.e.,
>> +   in the single_succ (INNER_BODY), insert a phi node in the
>> +   LOOP->latch that takes the updated value of the induction on the
>> +   FORWARDED_EDGE, and maintains the same value as in the phi node of
>> +   the LOOP->header for all the other possible paths reaching
>> +   LOOP->latch.  This function has to be called after all the
>> +   back-edges have been redirected.  */
>> +
>> +static void
>> +update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
>> +                               basic_block inner_body)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +
>> +  for (gsi = gsi_start_phis (single_succ (inner_body));
>> +       !gsi_end_p (gsi); gsi_next (&gsi))
>> +    {
>> +      gimple old_loop_phi = gsi_stmt (gsi);
>> +      tree back_arg = PHI_ARG_DEF_FROM_EDGE (old_loop_phi,
>> +                                          single_succ_edge (inner_body));
>> +      tree res = gimple_phi_result (old_loop_phi);
>> +
>> +      res = add_header_and_latch_phis (loop, res, forwarded_edge, back_arg);
>> +      add_phi_arg (old_loop_phi, res, single_succ_edge (inner_body),
>> +                UNKNOWN_LOCATION);
>> +    }
>> +}
>> +
>> +/* Renames all the uses of OLD_NAME with NEW_NAME (except the phi
>> +   nodes of DEF_BB) in all the basic blocks dominated by DEF_BB and in
>> +   the arguments of all the phi nodes originating in a basic block
>> +   that is dominated by DEF_BB.  */
>> +
>> +static void
>> +rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
>> +                    basic_block def_bb)
>> +{
>> +  imm_use_iterator uit;
>> +  gimple stmt;
>> +  use_operand_p use_p;
>> +  ssa_op_iter op_iter;
>> +
>> +  FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
>> +    {
>> +      enum gimple_code code = gimple_code (stmt);
>> +      basic_block use_bb = gimple_bb (stmt);
>> +      edge_iterator ei;
>> +      edge e;
>> +
>> +      if (code == GIMPLE_PHI)
>> +     {
>> +       FOR_EACH_EDGE (e, ei, use_bb->preds)
>> +         if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
>> +             && dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
>> +             && use_bb != def_bb)
>> +           replace_exp (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx),
>> +                        new_name);
>
>  SET_USE (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx), new_name);
>
>> +     }
>> +      else
>> +     {
>> +       if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
>> +         continue;
>> +
>> +       if (use_bb->loop_father == loop)
>> +         {
>> +           FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
>> +             if (USE_FROM_PTR (use_p) == old_name)
>> +               replace_exp (use_p, new_name);
>> +         }
>> +       else
>> +         /* Virtual operands are not translated into loop closed
>> +            SSA form, and thus they may occur in the rest of
>> +            the program without a loop close vphi node.  */
>
> But you are updating all uses again.
>
>  You should simply use
>
>        FOR_EACH_IMM_USE_ON_STMT (use_p, uit)
>          SET_USE (use_p, new_name);
>
>> +         FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
>> +           if (USE_FROM_PTR (use_p) == old_name)
>> +             replace_exp (use_p, new_name);
>> +     }
>> +    }
>> +}
>> +
>> +/* Helper function for add_missing_phi_nodes_1.  Adds to LOOP all the
>> +   missing phi nodes for NAME and updates the arguments of the
>> +   LATCH_PHI node.  LOOP_PHI node is the inductive definition of NAME
>> +   in LOOP->header.  */
>> +
>> +static void
>> +add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
>> +                      VEC (gimple, heap) *phis)
>> +{
>> +  unsigned i;
>> +  basic_block bb, dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
>> +  VEC (basic_block, heap) *dom_bbs = get_all_dominated_blocks (CDI_DOMINATORS,
>> +                                                            dom_bb);
>> +
>> +  FOR_EACH_VEC_ELT (basic_block, dom_bbs, i, bb)
>> +    {
>> +      edge e;
>> +      edge_iterator ei;
>> +
>> +      if (bb == loop->latch
>> +       || bb->loop_father != loop)
>> +     continue;
>
> dom_bbs may be very large, it's much better to iterate over the
> loop bbs and do a dominator check.  Or iterate over dominator sons
> with first_dom_son (), next_dom_son () and recurse, bailing out when
> you're running out of the loop.
>
>> +      FOR_EACH_EDGE (e, ei, bb->succs)
>> +     {
>> +       gimple phi = VEC_index (gimple, phis, e->dest->index);
>> +
>> +       if (phi)
>> +         add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
>> +
>> +       else if (!single_pred_p (e->dest)
>> +                && !dominated_by_p (CDI_DOMINATORS, e->dest, dom_bb)
>> +                && e->dest->loop_father == loop)
>> +       {
>> +         tree var = SSA_NAME_VAR (name);
>> +
>> +         phi = create_phi_node (var, e->dest);
>> +         create_new_def_for (gimple_phi_result (phi), phi,
>> +                             gimple_phi_result_ptr (phi));
>
> Again.
>
>> +         VEC_replace (gimple, phis, e->dest->index, phi);
>> +         add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
>> +         rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
>> +                                e->dest);
>> +         add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
>> +                                  phis);
>> +       }
>> +     }
>> +    }
>
> You leak dom_bbs.
>
>> +}
>> +
>> +/* Helper function for add_missing_phi_nodes.  For all the definitions
>> +   of DEF_STMT add the missing phi nodes in LOOP.  */
>> +
>> +static void
>> +add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
>> +{
>> +  def_operand_p def_p;
>> +  ssa_op_iter op_iter;
>> +  basic_block bb = gimple_bb (def_stmt);
>> +
>> +  FOR_EACH_PHI_OR_STMT_DEF (def_p, def_stmt, op_iter, SSA_OP_DEF|SSA_OP_VDEF)
>> +    {
>> +      edge e;
>> +      edge_iterator ei;
>> +      tree res, zero, var;
>> +      gimple loop_phi, latch_phi, use_stmt;
>> +      imm_use_iterator uit;
>> +      tree name = DEF_FROM_PTR (def_p);
>> +      bool needs_update = false;
>> +      VEC (gimple, heap) *phis;
>> +      int i;
>> +
>> +      FOR_EACH_IMM_USE_STMT (use_stmt, uit, name)
>> +     {
>> +       basic_block use_bb = gimple_bb (use_stmt);
>> +
>> +       if (!dominated_by_p (CDI_DOMINATORS, bb, use_bb))
>> +         {
>> +           needs_update = true;
>> +           BREAK_FROM_IMM_USE_STMT (uit);
>> +         }
>> +     }
>> +
>> +      if (!needs_update)
>> +     continue;
>> +
>> +      var = SSA_NAME_VAR (name);
>> +      loop_phi = create_phi_node (var, loop->header);
>> +      latch_phi = create_phi_node (var, loop->latch);
>> +
>> +      create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
>> +                       gimple_phi_result_ptr (loop_phi));
>> +      create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
>> +                       gimple_phi_result_ptr (latch_phi));
>
> Again.
>
>> +      /* The value set to ZERO will never be used in the loop, however we
>> +      have to construct something meaningful for virtual SSA_NAMEs.  */
>> +      if (is_gimple_reg (name))
>> +     zero = fold_convert (TREE_TYPE (name), integer_zero_node);
>> +      else
>> +     zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
>> +
>> +      res = gimple_phi_result (latch_phi);
>> +      FOR_EACH_EDGE (e, ei, loop->header->preds)
>> +     add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
>> +                  e, UNKNOWN_LOCATION);
>> +
>> +      res = gimple_phi_result (loop_phi);
>> +      FOR_EACH_EDGE (e, ei, loop->latch->preds)
>> +     add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);
>> +
>> +      phis = VEC_alloc (gimple, heap, n_basic_blocks);
>> +      for (i = 0; i < n_basic_blocks; i++)
>> +     VEC_quick_push (gimple, phis, NULL);
>> +
>> +      VEC_replace (gimple, phis, loop->latch->index, latch_phi);
>> +      VEC_replace (gimple, phis, loop->header->index, loop_phi);
>> +      add_missing_phi_nodes_2 (loop, name, name, phis);
>> +
>> +      for (i = 0; i < n_basic_blocks; i++)
>> +     {
>> +       gimple phi = VEC_index (gimple, phis, i);
>> +
>> +       if (!phi)
>> +         continue;
>> +
>> +       FOR_EACH_EDGE (e, ei, BASIC_BLOCK (i)->preds)
>> +         if (!PHI_ARG_DEF_FROM_EDGE (phi, e))
>> +           add_phi_arg (phi, res, e, UNKNOWN_LOCATION);
>> +     }
>> +
>> +      VEC_free (gimple, heap, phis);
>> +    }
>> +}
>> +
>> +/* Walks over the code of LOOP and adds the missing phi nodes at
>> +   control flow junctions.  When a variable is defined in an outer
>> +   loop and used in an inner loop, the definition dominates the use.
>> +   After the loop flattening, the inner loop body is directly
>> +   reachable from the LOOP->header by using the added edge guarded by
>> +   the boolean flag that controls the execution of the back-edge that
>> +   was eliminated.  In this case, the use is not dominated by the
>> +   definition, and this function adds the missing phi nodes.  */
>> +
>> +static void
>> +add_missing_phi_nodes (loop_p loop)
>> +{
>> +  gimple_stmt_iterator gsi;
>> +  int i, n = loop->num_nodes;
>> +  basic_block *bbs = get_loop_body (loop);
>
> So you can even pass this down to add_missing_phi_nodes_2.  Or
> even use get_loop_body_in_dom_order and thus only need to walk
> adjacent blocks in that array.
>
>> +  for (i = 0; i < n; i++)
>> +    {
>> +      basic_block bb = bbs[i];
>> +
>> +      /* LOOP->header dominates all the blocks of the loop body, and
>> +      so we don't have to look at the missing phi nodes for the
>> +      definitions of LOOP->header.  */
>> +      if (bb == loop->header)
>> +     continue;
>> +
>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +     if (!gimple_nop_p (gsi_stmt (gsi)))
>> +       add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
>> +
>> +      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> +     add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
>> +    }
>> +
>> +  free (bbs);
>> +}
>> +
>> +/* Removes all the back-edges of LOOP except its own back-edge.
>> +   SCRATCH_PAD is used in if-conversion.  */
>> +
>> +static unsigned
>> +flatten_loop (loop_p loop, tree *scratch_pad)
>> +{
>> +  int i, n = loop->num_nodes;
>> +  basic_block *bbs;
>> +  VEC (edge, heap) *back_edges;
>> +  VEC (basic_block, heap) *loop_body;
>> +  edge_iterator ei;
>> +  edge e, pred_e;
>> +  unsigned max_nb_basic_blocks = PARAM_VALUE (PARAM_LFLAT_MAX_NB_BBS);;
>> +
>> +  if (loop->num_nodes > max_nb_basic_blocks
>> +      || !single_exit (loop)
>> +      || !dbg_cnt (lflat))
>> +    return 0;
>> +
>> +  mark_dfs_back_edges ();
>> +  bbs = get_loop_body (loop);
>> +
>> +  back_edges = VEC_alloc (edge, heap, 3);
>> +  loop_body = VEC_alloc (basic_block, heap, 3);
>> +
>> +  for (i = 0; i < n; i++)
>> +    FOR_EACH_EDGE (e, ei, bbs[i]->succs)
>> +      if (e->flags & EDGE_DFS_BACK
>> +       && e->src != loop->latch)
>> +     VEC_safe_push (edge, heap, back_edges, e);
>> +
>> +  free (bbs);
>> +
>> +  /* Early return and do not modify the code when there are no back
>> +     edges.  */
>> +  if (VEC_empty (edge, back_edges))
>> +    return 0;
>> +
>> +  cancel_subloops (loop);
>> +
>> +  /* Split the latch edge to make sure that the latch basic block does
>> +     not contain code.  */
>> +  loop->latch = split_edge (loop_latch_edge (loop));
>> +  pred_e = single_pred_edge (loop->latch);
>> +
>> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
>> +    {
>> +      basic_block dest = split_edge (e);
>> +
>> +      /* Redirect BACK_EDGE to LOOP->latch.  */
>> +      redirect_edge_and_branch_force (e, loop->latch);
>> +
>> +      /* Save the basic block where it was pointing.  */
>> +      VEC_safe_push (basic_block, heap, loop_body, dest);
>> +    }
>> +
>> +  update_loop_phi_nodes (loop, pred_e);
>> +
>> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
>> +    create_control_flag (e, loop, VEC_index (basic_block, loop_body, i));
>> +
>> +  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
>> +    update_inner_induction_phi_nodes (e, loop, VEC_index (basic_block,
>> +                                                       loop_body, i));
>> +
>> +  free_dominance_info (CDI_DOMINATORS);
>> +  calculate_dominance_info (CDI_DOMINATORS);
>> +  add_missing_phi_nodes (loop);
>> +
>> +  /* If we redirected some back-edges, split the latch edge to create
>> +     an empty LOOP->latch.  */
>> +  if (!single_pred_p (loop->latch))
>> +    loop->latch = split_edge (loop_latch_edge (loop));
>> +
>> +  if (gate_tree_if_conversion ())
>> +    tree_if_conversion (loop, scratch_pad);
>
> You are leaking VECs.  As mentioned above testing the gate isn't
> necessary here.
>
>> +  return TODO_update_ssa | TODO_verify_ssa;
>
> These TODOs belong in the pass structure.
>
>> +}
>> +
>> +/* Flattens all the loops of the current function.  */
>> +
>> +static unsigned int
>> +tree_loop_flattening (void)
>> +{
>> +  unsigned todo = 0;
>> +  loop_p loop;
>> +  loop_iterator li;
>> +  tree scratch_pad = NULL_TREE;
>> +
>> +  if (number_of_loops () <= 1)
>> +    return 0;
>> +
>> +  FOR_EACH_LOOP (li, loop, 0)
>> +    todo |= flatten_loop (loop, &scratch_pad);
>
> So we might end up recursively flattening loops (or not, as this
> walk is in undefined order).  I'd say you want LI_ONLY_INNERMOST here,
> or do you really want to flatten all loop trees up to the number
> of basic blocks specified in the parm?  I guess not.
>
> I think the pass misses a cost model and I'm still not sure when
> or if it will be profitable to do this at all (as said, no
> functional testcases).  What's the immediate benefit for GCC 4.6?
>
>> +#ifdef ENABLE_CHECKING
>> +  verify_dominators (CDI_DOMINATORS);
>> +  verify_flow_info ();
>> +#endif
>> +
>> +  cleanup_tree_cfg ();
>> +  return todo;
>
> return TODO_cleanup_cfg, but only if you flattened a loop.  So
> return TODO_cleanup_cfg from flatten_loop instead.
>
> Richard.
>
>> +}
>> +
>> +static bool
>> +gate_tree_loop_flattening (void)
>> +{
>> +  return flag_tree_loop_flattening != 0;
>> +}
>> +
>> +struct gimple_opt_pass pass_flatten_loops =
>> +{
>> + {
>> +  GIMPLE_PASS,
>> +  "lflat",                           /* name */
>> +  gate_tree_loop_flattening,         /* gate */
>> +  tree_loop_flattening,                      /* execute */
>> +  NULL,                                      /* sub */
>> +  NULL,                                      /* next */
>> +  0,                                 /* static_pass_number */
>> +  TV_TREE_LOOP_FLATTENING,           /* tv_id */
>> +  PROP_cfg | PROP_ssa,                       /* properties_required */
>> +  0,                                 /* properties_provided */
>> +  0,                                 /* properties_destroyed */
>> +  0,                                 /* todo_flags_start */
>> +  TODO_dump_func
>> +    | TODO_update_ssa
>> +    | TODO_ggc_collect                       /* todo_flags_finish */
>> + }
>> +};
>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
>> index a87a770..e2f257f 100644
>> --- a/gcc/tree-pass.h
>> +++ b/gcc/tree-pass.h
>> @@ -374,6 +374,7 @@ extern struct gimple_opt_pass pass_graphite;
>>  extern struct gimple_opt_pass pass_graphite_transforms;
>>  extern struct gimple_opt_pass pass_if_conversion;
>>  extern struct gimple_opt_pass pass_loop_distribution;
>> +extern struct gimple_opt_pass pass_flatten_loops;
>>  extern struct gimple_opt_pass pass_vectorize;
>>  extern struct gimple_opt_pass pass_slp_vectorize;
>>  extern struct gimple_opt_pass pass_complete_unroll;
>>
>
> --
> Richard Guenther <rguenther@suse.de>
> Novell / SUSE Labs
> SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex
>

[-- Attachment #2: 0001-Loop-flattening-on-loop-SSA.patch --]
[-- Type: text/x-patch, Size: 45885 bytes --]

From 37ae67187f681c9e5635c6c825997dd2a56cd179 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Fri, 17 Sep 2010 14:04:49 -0500
Subject: [PATCH] Loop flattening on loop-SSA.

2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>

	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
	(tree-loop-flattening.o): New.
	* common.opt (ftree-loop-flatten): New.
	* dbgcnt.def (lflat): New.
	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
	* passes.c (init_optimization_passes): Add new passes
	pass_flatten_loops and pass_if_conversion after loop vectorization
	and before pass_slp_vectorize.
	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
	* tree-loop-flattening.c: New.
	* tree-pass.h (pass_flatten_loops): Declared.
	* tree-flow.h (gate_tree_if_conversion): Declared.
	(tree_if_conversion): Declared.
	* tree-if-conv.c (tree_if_conversion): Not static anymore.
	(gate_tree_if_conversion): Same.

	* gcc.dg/tree-ssa/flat-loop-1.c: New.
	* gcc.dg/tree-ssa/flat-loop-2.c: New.
	* gcc.dg/tree-ssa/flat-loop-3.c: New.
	* gcc.dg/tree-ssa/flat-loop-4.c: New.
---
 gcc/ChangeLog                                |   18 +
 gcc/Makefile.in                              |    4 +
 gcc/cfgloop.c                                |   10 +
 gcc/cfgloop.h                                |    1 +
 gcc/common.opt                               |    4 +
 gcc/dbgcnt.def                               |    1 +
 gcc/params.def                               |    7 +
 gcc/passes.c                                 |    1 +
 gcc/testsuite/ChangeLog                      |    7 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c  |   28 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c |   61 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c |   34 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c |   35 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c |   55 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c |   55 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c  |   35 ++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c  |   19 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c  |   23 +
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c  |   50 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c  |   50 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c  |   66 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c  |   61 +++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c  |   56 +++
 gcc/timevar.def                              |    1 +
 gcc/tree-flow.h                              |    3 +
 gcc/tree-if-conv.c                           |    2 +-
 gcc/tree-loop-flattening.c                   |  586 ++++++++++++++++++++++++++
 gcc/tree-pass.h                              |    1 +
 28 files changed, 1273 insertions(+), 1 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c
 create mode 100644 gcc/tree-loop-flattening.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d360463..f912f93 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,23 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* Makefile.in (OBJS-common): Add tree-loop-flattening.o.
+	(tree-loop-flattening.o): New.
+	* common.opt (ftree-loop-flatten): New.
+	* dbgcnt.def (lflat): New.
+	* params.def (PARAM_LFLAT_MAX_NB_BBS): New.
+	* passes.c (init_optimization_passes): Add new passes
+	pass_flatten_loops and pass_if_conversion after loop vectorization
+	and before pass_slp_vectorize.
+	* timevar.def (TV_TREE_LOOP_FLATTENING): New.
+	* tree-loop-flattening.c: New.
+	* tree-pass.h (pass_flatten_loops): Declared.
+	* tree-flow.h (gate_tree_if_conversion): Declared.
+	(tree_if_conversion): Declared.
+	* tree-if-conv.c (tree_if_conversion): Not static anymore.
+	(gate_tree_if_conversion): Same.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	* tree-if-conv.c (if_convertible_loop_p_1): Do not call
 	compute_data_dependences_for_loop.
 	(if_convertible_loop_p): Do not free refs and ddrs.
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index cc58d7f..01fa1e8 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1362,6 +1362,7 @@ OBJS-common = \
 	tree-into-ssa.o \
 	tree-iterator.o \
 	tree-loop-distribution.o \
+	tree-loop-flattening.o \
 	tree-loop-linear.o \
 	tree-nested.o \
 	tree-nrv.o \
@@ -2767,6 +2768,9 @@ tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) coret
    $(DIAGNOSTIC_H) $(TREE_FLOW_H) $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) \
    $(TREE_PASS_H) $(TREE_DATA_REF_H) $(EXPR_H) \
    langhooks.h $(TREE_VECTORIZER_H)
+tree-loop-flattening.o: tree-loop-flattening.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+   $(TM_H) $(OPTABS_H) $(TREE_H) $(BASIC_BLOCK_H) $(DIAGNOSTIC_H) $(TREE_FLOW_H) \
+   $(TREE_DUMP_H) $(TIMEVAR_H) $(CFGLOOP_H) $(TREE_PASS_H) $(DBGCNT_H)
 tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
    $(TREE_FLOW_H) $(TREE_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) \
    $(DIAGNOSTIC_H) $(TREE_PASS_H) langhooks.h gt-tree-parloops.h \
diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 08d689d..bfab67b 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1297,6 +1297,16 @@ cancel_loop_tree (struct loop *loop)
   cancel_loop (loop);
 }
 
+/* Keep the loop structure for LOOP and remove all the loop structures
+   under LOOP.  */
+
+void
+cancel_subloops (loop_p loop)
+{
+  while (loop->inner)
+    cancel_loop_tree (loop->inner);
+}
+
 /* Checks that information about loops is correct
      -- sizes of loops are all right
      -- results of get_loop_body really belong to the loop
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index bf2614e..1679019 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -256,6 +256,7 @@ extern void add_bb_to_loop (basic_block, struct loop *);
 extern void remove_bb_from_loops (basic_block);
 
 extern void cancel_loop_tree (struct loop *);
+extern void cancel_subloops (struct loop *);
 extern void delete_loop (struct loop *);
 
 enum
diff --git a/gcc/common.opt b/gcc/common.opt
index 71f4578..49afc96 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1698,6 +1698,10 @@ ftree-loop-distribute-patterns
 Common Report Var(flag_tree_loop_distribute_patterns) Optimization
 Enable loop distribution for patterns transformed into a library call
 
+ftree-loop-flatten
+Common Report Var(flag_tree_loop_flattening) Optimization
+Enable loop flattening on trees
+
 ftree-loop-im
 Common Report Var(flag_tree_loop_im) Init(1) Optimization
 Enable loop invariant motion on trees
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 0492d66..0ef9a72 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -166,6 +166,7 @@ DEBUG_COUNTER (if_conversion_tree)
 DEBUG_COUNTER (if_after_combine)
 DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (local_alloc_for_sched)
+DEBUG_COUNTER (lflat)
 DEBUG_COUNTER (postreload_cse)
 DEBUG_COUNTER (pre)
 DEBUG_COUNTER (pre_insn)
diff --git a/gcc/params.def b/gcc/params.def
index 6e55db6..d7b5d16 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -794,6 +794,13 @@ DEFPARAM (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION,
 	  "maximum number of basic blocks per function to be analyzed by Graphite",
 	  100, 0, 0)
 
+/* Maximal number of basic blocks in a loop to be flattened.  */
+
+DEFPARAM (PARAM_LFLAT_MAX_NB_BBS,
+	  "lflat-max-nb-bbs",
+	  "maximum number of basic blocks in a loop to be flattened",
+	  100, 0, 0)
+
 /* Avoid doing loop invariant motion on very large loops.  */
 
 DEFPARAM (PARAM_LOOP_INVARIANT_MAX_BBS_IN_LOOP,
diff --git a/gcc/passes.c b/gcc/passes.c
index da9bb15..d276723 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -917,6 +917,7 @@ init_optimization_passes (void)
 	  NEXT_PASS (pass_parallelize_loops);
 	  NEXT_PASS (pass_loop_prefetch);
 	  NEXT_PASS (pass_iv_optimize);
+	  NEXT_PASS (pass_flatten_loops);
 	  NEXT_PASS (pass_tree_loop_done);
 	}
       NEXT_PASS (pass_cse_reciprocals);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index c5c2473..b213477 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,12 @@
 2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
 
+	* gcc.dg/tree-ssa/flat-loop-1.c: New.
+	* gcc.dg/tree-ssa/flat-loop-2.c: New.
+	* gcc.dg/tree-ssa/flat-loop-3.c: New.
+	* gcc.dg/tree-ssa/flat-loop-4.c: New.
+
+2010-10-20  Sebastian Pop  <sebastian.pop@amd.com>
+
 	PR tree-optimization/46029
 	* g++.dg/tree-ssa/ifc-pr46029.C: New.
 	* gcc.dg/tree-ssa/ifc-8.c: New.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
new file mode 100644
index 0000000..bee8a2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+		      struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    *pp = b;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  pss = *pp;
+  while (pss != ((void *)0))
+    ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
new file mode 100644
index 0000000..56d14f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
@@ -0,0 +1,61 @@
+/* From graphite/block-7.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int A[N][N], B[N][N], C[N][N];
+
+static void __attribute__((noinline))
+matmult (void)
+{
+  int i, j, k;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+        A[i][j] = 0;
+        for (k = 0; k < N; k++)
+          A[i][j] += B[i][k] * C[k][j];
+      }
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res = 0;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	B[i][j] = j;
+	C[i][j] = i;
+      }
+
+  matmult ();
+
+  for (i = 0; i < N; i++)
+    res += A[i][i];
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 529340000)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 2 loops" 1 "lflat" } } */
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
new file mode 100644
index 0000000..68c5d49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
@@ -0,0 +1,34 @@
+/* From graphite/run-id-1.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+void abort (void);
+
+void foo (int N)
+{
+  int i, j;
+  int x[1000][1000];
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      x[i][j] = i + j + 3;
+
+  /* This loop will not be flattened as the outermost loop has two
+     exit edges.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      if (x[i][j] != i + j + 3)
+	abort ();
+}
+
+int main(void)
+{
+  foo (1000);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
new file mode 100644
index 0000000..eb0fb3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
@@ -0,0 +1,35 @@
+/* From graphite/run-id-4.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+extern void abort (void);
+
+__attribute__ ((noinline)) int
+foo (int x, int y, int *z)
+{
+  int a, b, c, d;
+
+  a = b = 0;
+  for (d = 0; d < y; d++)
+    {
+      if (z)
+	b = d * *z;
+      for (c = 0; c < x; c++)
+	a += b;
+    }
+
+  return a;
+}
+
+int
+main (void)
+{
+  if (foo (3, 2, 0) != 0)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
new file mode 100644
index 0000000..1d04dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
@@ -0,0 +1,55 @@
+/* From graphite/run-id-5.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#include <stdarg.h>
+
+extern void abort ();
+#define N 40
+
+int a[N];
+
+__attribute__ ((noinline)) int
+foo (int n){
+  int i,j;
+  int sum;
+
+  if (n<=0)
+    return 0;
+
+  for (i = 0; i < N; i++) {
+    sum = 0;
+    for (j = 0; j < n; j+=2) {
+      sum += j;
+    }
+    a[i] = sum + j;
+  }
+}
+
+int main (void)
+{
+  int i,j;
+  int sum;
+
+  for (i=0; i<N; i++)
+    a[i] = i;
+
+  foo (N);
+
+  /* This won't be flattened.  */
+  for (i=0; i<N; i++)
+    {
+      sum = 0;
+      for (j = 0; j < N; j+=2)
+        sum += j;
+      if (a[i] != sum + j)
+        abort();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
new file mode 100644
index 0000000..c87b893
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
@@ -0,0 +1,55 @@
+/* From graphite/run-id-6.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#include <stdarg.h>
+
+extern void abort ();
+#define N 40
+
+int a[N];
+
+__attribute__ ((noinline)) int
+foo (int n){
+  int i,j,k=0;
+  int sum;
+
+  if (n<=0)
+    return 0;
+
+  for (i = 0; i < N; i++) {
+    sum = 0;
+    for (j = 0; j < n; j+=2) {
+      sum += k++;
+    }
+    a[i] = sum + j;
+  }
+}
+
+int main (void)
+{
+  int i,j,k=0;
+  int sum;
+
+  for (i=0; i<N; i++)
+    a[i] = i;
+
+  foo (N);
+
+  /* This is not flattened.  */
+  for (i=0; i<N; i++)
+    {
+      sum = 0;
+      for (j = 0; j < N; j+=2)
+        sum += k++;
+      if (a[i] != sum + j)
+	abort();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
new file mode 100644
index 0000000..4573801
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+struct stack_segment
+{
+  struct stack_segment *next;
+  struct dynamic_allocation_blocks *dynamic_allocation;
+};
+struct dynamic_allocation_blocks
+{
+  struct dynamic_allocation_blocks *next;
+};
+static struct dynamic_allocation_blocks *
+merge_dynamic_blocks (struct dynamic_allocation_blocks *a,
+        struct dynamic_allocation_blocks *b)
+{
+  struct dynamic_allocation_blocks **pp;
+  if (b == ((void *)0))
+  for (pp = &a->next; *pp != ((void *)0); pp = &(*pp)->next)
+    ;
+  return a;
+}
+__morestack_release_segments (struct stack_segment **pp, int free_dynamic)
+{
+  struct dynamic_allocation_blocks *ret;
+  struct stack_segment *pss;
+  while (pss != ((void *)0))
+    {
+      struct stack_segment *next;
+      next = pss->next;
+      if (free_dynamic)
+	ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
+      pss = next;
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
new file mode 100644
index 0000000..cf01273
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+
+int
+split_directories (const char *name, int *ptr_num_dirs)
+{
+  int num_dirs = 0;
+  char **dirs;
+  const char *p, *q;
+  int ch;
+  while ((ch = *p++) != '\0')
+    {
+      num_dirs++;
+      while (((*p) == '/'))
+	p++;
+    }
+  return (dirs[num_dirs - 1] == ((void *)0));
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
new file mode 100644
index 0000000..8e551ac
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-4.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-loop-flatten" } */
+
+void
+formatted_backspace (int common, char *s)
+{
+  int base;
+  int n;
+  do
+    {
+      if (sseek (s, base, 0) < 0)
+	goto io_error;
+
+      while (n > 0)
+	{
+          n--;
+	  base += n + 1;
+	}
+    }
+  while (base != 0);
+ io_error:
+  generate_error (common, 0, ((void *)0));
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
new file mode 100644
index 0000000..24704fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
@@ -0,0 +1,50 @@
+/* From graphite/block-0.c.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 1000
+int a[N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int j;
+  int i;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      a[j] = a[i] + 1;
+
+  return a[0];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, res;
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 1999)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
new file mode 100644
index 0000000..8a5382f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
@@ -0,0 +1,50 @@
+/* From graphite/block-1.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define MAX 100
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j;
+  int sum = 0;
+  int A[MAX * MAX];
+  int B[MAX * MAX];
+
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      {
+	A[i*MAX + j] = j;
+	B[i*MAX + j] = j;
+      }
+
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      A[i*MAX + j] += B[j*MAX + i];
+
+  for(i = 0; i < MAX; i++)
+    for(j = 0; j < MAX; j++)
+      sum += A[i*MAX + j];
+
+#if DEBUG
+  fprintf (stderr, "sum = %d \n", sum);
+#endif
+
+  if (sum != 990000)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 3 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
new file mode 100644
index 0000000..3252545
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
@@ -0,0 +1,66 @@
+/* From graphite/block-3.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-timeout-factor 4.0 } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 24
+#define M 100
+
+int A[M][M][M], B[M][M], C[M][M];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j, k;
+
+  /* These loops contain too few iterations to be blocked by 64.  */
+  for (i = 0; i < 24; i++)
+    for (j = 0; j < 24; j++)
+      for (k = 0; k < 24; k++)
+        A[i][j][k] = B[i][k] * C[k][j];
+
+  /* These loops should still be loop blocked.  */
+  for (i = 0; i < M; i++)
+    for (j = 0; j < M; j++)
+      for (k = 0; k < M; k++)
+        A[i][j][k] = B[i][k] * C[k][j];
+
+  return A[0][0][0] + A[M-1][M-1][M-1];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < M; j++)
+      {
+	B[i][j] = i;
+	C[i][j] = j;
+      }
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 9801)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 2 loops" 2 "lflat" } } */
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
new file mode 100644
index 0000000..388e721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
@@ -0,0 +1,61 @@
+/* From graphite/block-5.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int a[N][N];
+int b[N][N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j;
+  int res = 0;
+
+  /* This loop nest should be blocked.  */
+  for (j = 1; j < N; j++)
+    for (i = 0; i < N; i++)
+      a[i][j] = a[i][j-1] + b[i][j];
+
+  for (i = 0; i < N; i++)
+    res += a[i][i];
+
+  return res;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	a[i][j] = i + j;
+	b[i][j] = i - j;
+      }
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 1333300)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 2 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c
new file mode 100644
index 0000000..06a2667
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c
@@ -0,0 +1,56 @@
+/* From graphite/block-6.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+int a[N][N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j;
+  int res = 0;
+
+  /* Interchange is not legal for loops 0 and 1.  */
+  for (i = 1; i < N; i++)
+    for (j = 1; j < N - 1; j++)
+      a[i][j] = a[i-1][j+1] * a[i-1][j+1] / 2;
+
+  for (i = 0; i < N; i++)
+    res += a[i][i];
+
+  return res;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      a[i][j] = i + j;
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 204007516)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 2 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/timevar.def b/gcc/timevar.def
index 86e2999..89ff8e8 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -152,6 +152,7 @@ DEFTIMEVAR (TV_GRAPHITE_DATA_DEPS    , "Graphite data dep analysis")
 DEFTIMEVAR (TV_GRAPHITE_CODE_GEN     , "Graphite code generation")
 DEFTIMEVAR (TV_TREE_LINEAR_TRANSFORM , "tree loop linear")
 DEFTIMEVAR (TV_TREE_LOOP_DISTRIBUTION, "tree loop distribution")
+DEFTIMEVAR (TV_TREE_LOOP_FLATTENING  , "tree loop flattening")
 DEFTIMEVAR (TV_CHECK_DATA_DEPS       , "tree check data dependences")
 DEFTIMEVAR (TV_TREE_PREFETCH	     , "tree prefetching")
 DEFTIMEVAR (TV_TREE_LOOP_IVOPTS	     , "tree iv optimization")
diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index c2702dc..a4bf6f8 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -730,6 +730,9 @@ bool contains_abnormal_ssa_name_p (tree);
 bool stmt_dominates_stmt_p (gimple, gimple);
 void mark_virtual_ops_for_renaming (gimple);
 
+/* In tree-if-conv.c */
+bool tree_if_conversion (struct loop *, tree *);
+
 /* In tree-ssa-dce.c */
 void mark_virtual_phi_result_for_renaming (gimple);
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index e3f5941..794be57 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1598,7 +1598,7 @@ combine_blocks (struct loop *loop, tree *scratch_pad)
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns true when something changed.  */
 
-static bool
+bool
 tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   bool changed = false;
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
new file mode 100644
index 0000000..cd49be1
--- /dev/null
+++ b/gcc/tree-loop-flattening.c
@@ -0,0 +1,586 @@
+/* Loop flattening.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by Sebastian Pop <sebastian.pop@amd.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "ggc.h"
+#include "tree.h"
+#include "rtl.h"
+#include "output.h"
+#include "basic-block.h"
+#include "diagnostic.h"
+#include "tree-flow.h"
+#include "toplev.h"
+#include "tree-dump.h"
+#include "timevar.h"
+#include "cfgloop.h"
+#include "tree-pass.h"
+#include "gimple.h"
+#include "params.h"
+#include "dbgcnt.h"
+
+/* This loop flattening pass transforms backward pointing edges into
+   forward pointing edges.
+
+   The back-edge removal transformation was described in the 1983
+   paper by Allen J. R., Ken Kennedy, Carrie Porterfield, and Joe
+   Warren: "Conversion of control dependence to data dependence"
+   available from http://doi.acm.org/10.1145/567067.567085
+
+   The back-edge removal algorithm was presented in that paper as part
+   of the if-conversion algorithm for backward pointing edges.  In
+   this section we will first provide a description of this technique
+   adapted for the Gimple-SSA form, followed by an example, and a
+   discussion of the differences with the higher level loop flattening
+   transformation.
+
+   The back-edge removal algorithm transforms control dependences into
+   data dependences by using a boolean variable.  The values taken by
+   the boolean variable control the execution path of the forward
+   edges created in order to use the back-edge of an outer loop.
+
+   The first step of the algorithm detects a surrounding loop and all
+   the back-edges of the loop body: these back-edges can be inner
+   loops or strongly connected components of the CFG that cannot be
+   reduced to natural loops.
+
+   Each back-edge is removed by redirecting the target of the
+   back-edge to the latch basic block of the surrounding loop.  A
+   boolean variable is created in the latch.  It is cleared when the
+   redirected back-edge is taken and it is set to true for any other
+   paths leading to the latch.
+
+   The header basic block of the surrounding loop is split before its
+   statements and a new condition is added based on the control
+   variable: when the control variable is set to true, the execution
+   proceeds as normal to the basic block that contains the statements
+   of the header; when the control variable is cleared, meaning that
+   the back-edge has been taken, the execution proceeds to the point
+   where the redirected back-edge was pointing.
+
+   The last step updates the SSA form after all the back-edges have
+   been redirected to the latch, and the new edges from the header to
+   the destination of back-edges have been created.
+
+   Another description of loop flattening in a very Fortran specific
+   way is in the 1992 paper by Reinhard von Hanxleden and Ken Kennedy:
+   "Relaxing SIMD Control Flow Constraints using Loop Transformations"
+   available from
+   http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
+
+/* Before creating other phi nodes in LOOP->header for the control
+   flags, update the phi nodes of LOOP->header and add the necessary
+   phi nodes in the LOOP->latch that now contains several paths on
+   which the values are not updated.  PRED_E is the single edge that
+   was pointing to the LOOP->latch basic block before inner back-edges
+   were redirected to the LOOP->latch.  */
+
+static void
+update_loop_phi_nodes (loop_p loop, edge pred_e)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      edge e;
+      edge_iterator ei;
+      gimple phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
+      tree res = gimple_phi_result (phi);
+      tree var = SSA_NAME_VAR (res);
+
+      phi = create_phi_node (var, loop->latch);
+
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (phi, (e == pred_e ? back_arg : res),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (phi);
+      add_phi_arg (gsi_stmt (gsi), res, loop_latch_edge (loop),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Creates a control flag for the FORWARDED_EDGE that represents the
+   back-edge that has been forwarded to the latch basic block of LOOP.
+   INNER_BODY is the basic block to which the back-edge was pointing
+   before redirection.  This function creates a boolean control flag
+   that is cleared when the FORWARDED_EDGE is taken and set for all
+   the other paths.  This function adds the corresponding phi nodes in
+   LOOP->latch and LOOP->header, and finally adds an edge from
+   LOOP->header to the INNER_BODY guarded by the control flag.  */
+
+static void
+create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
+{
+  edge e, preheader;
+  edge outer_latch_e = loop_latch_edge (loop);
+  const char *name = "_flat_";
+  tree var = create_tmp_reg (boolean_type_node, name);
+  tree res;
+  gimple phi, cond_stmt;
+  gimple_stmt_iterator gsi;
+  edge_iterator ei;
+
+  /* Adds a control variable for the redirected FORWARDED_EDGE.  */
+  add_referenced_var (var);
+  phi = create_phi_node (var, forwarded_edge->dest);
+
+  FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
+    add_phi_arg (phi, (e == forwarded_edge
+		       ? boolean_false_node
+		       : boolean_true_node),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Add a phi node in LOOP->header for the control variable.  */
+  phi = create_phi_node (var, loop->header);
+
+  preheader = loop_preheader_edge (loop);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (phi, (e == preheader
+		       ? boolean_true_node
+		       : res),
+		 e, UNKNOWN_LOCATION);
+  res = gimple_phi_result (phi);
+
+  /* Split LOOP->header to insert the control variable condition.  */
+  e = split_block_after_labels (loop->header);
+  e->flags = EDGE_TRUE_VALUE;
+  e = make_edge (loop->header, inner_body, EDGE_FALSE_VALUE);
+  cond_stmt = gimple_build_cond (EQ_EXPR, res, boolean_true_node,
+				 NULL_TREE, NULL_TREE);
+  gsi = gsi_last_bb (loop->header);
+  gsi_insert_after (&gsi, cond_stmt, GSI_NEW_STMT);
+}
+
+/* Adds phi nodes to the LOOP->header and LOOP->latch for the ssa_name
+   NAME.  ARG is the argument of the latch phi node set for the
+   FORWARDED_EDGE, and all the other edges merged by the latch phi
+   node are set to the result of the LOOP->header phi node.  The latch
+   edge of the LOOP->header phi node is set to the result of the
+   LOOP->latch phi node, and the other argument is set to an arbitrary
+   valid value defined before the loop (note that this initial value
+   is never used in the loop).  Returns the LOOP->header phi result.  */
+
+static tree
+add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
+			   tree arg)
+{
+  edge e;
+  edge_iterator ei;
+  tree res, zero, var = SSA_NAME_VAR (name);
+  gimple loop_phi = create_phi_node (var, loop->header);
+  gimple latch_phi = create_phi_node (var, loop->latch);
+
+  /* The value set to ZERO will never be used in the loop, however we
+     have to construct something meaningful for virtual SSA_NAMEs.  */
+  if (TREE_CODE (arg) != SSA_NAME)
+    zero = arg;
+  else if (is_gimple_reg (arg))
+    zero = build_zero_cst (TREE_TYPE (arg));
+  else
+    {
+      zero = gimple_vop (cfun);
+      mark_sym_for_renaming (gimple_vop (cfun));
+    }
+
+  res = gimple_phi_result (latch_phi);
+  FOR_EACH_EDGE (e, ei, loop->header->preds)
+    add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		 e, UNKNOWN_LOCATION);
+
+  res = gimple_phi_result (loop_phi);
+  FOR_EACH_EDGE (e, ei, loop->latch->preds)
+    add_phi_arg (latch_phi, (e == forwarded_edge ? arg : res),
+		 e, UNKNOWN_LOCATION);
+
+  return res;
+}
+
+/* Creates phi nodes for each inductive definition, i.e., loop phi
+   nodes.  For each induction phi node in the old loop header, i.e.,
+   in the single_succ (INNER_BODY), insert a phi node in the
+   LOOP->latch that takes the updated value of the induction on the
+   FORWARDED_EDGE, and maintains the same value as in the phi node of
+   the LOOP->header for all the other possible paths reaching
+   LOOP->latch.  This function has to be called after all the
+   back-edges have been redirected.  */
+
+static void
+update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
+				  basic_block inner_body)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_phis (single_succ (inner_body));
+       !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple old_loop_phi = gsi_stmt (gsi);
+      tree back_arg = PHI_ARG_DEF_FROM_EDGE (old_loop_phi,
+					     single_succ_edge (inner_body));
+      tree res = gimple_phi_result (old_loop_phi);
+
+      res = add_header_and_latch_phis (loop, res, forwarded_edge, back_arg);
+      add_phi_arg (old_loop_phi, res, single_succ_edge (inner_body),
+		   UNKNOWN_LOCATION);
+    }
+}
+
+/* Renames all the uses of OLD_NAME with NEW_NAME (except the phi
+   nodes of DEF_BB) in all the basic blocks dominated by DEF_BB and in
+   the arguments of all the phi nodes originating in a basic block
+   that is dominated by DEF_BB.  */
+
+static void
+rename_dominated_uses (tree old_name, tree new_name, basic_block def_bb)
+{
+  imm_use_iterator uit;
+  gimple stmt;
+  use_operand_p use_p;
+
+  FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
+    {
+      enum gimple_code code = gimple_code (stmt);
+      basic_block use_bb = gimple_bb (stmt);
+      edge_iterator ei;
+      edge e;
+
+      if (code == GIMPLE_PHI)
+	{
+	  FOR_EACH_EDGE (e, ei, use_bb->preds)
+	    if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
+		&& dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
+		&& use_bb != def_bb)
+	      SET_USE (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx), new_name);
+	}
+      else
+	{
+	  if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
+	    continue;
+
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, uit)
+	    SET_USE (use_p, new_name);
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes_1.  Adds to LOOP all the
+   missing phi nodes for NAME and updates the arguments of the
+   LATCH_PHI node.  LOOP_PHI node is the inductive definition of NAME
+   in LOOP->header.  */
+
+static void
+add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
+			 VEC (gimple, heap) *phis, basic_block *bbs)
+{
+  basic_block dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
+  int i, n = loop->num_nodes;
+
+  for (i = 0; i < n; i++)
+    {
+      edge e;
+      edge_iterator ei;
+      basic_block bb = bbs[i];
+
+      if (bb == loop->latch
+	  || !dominated_by_p (CDI_DOMINATORS, bb, dom_bb))
+	continue;
+
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	{
+	  gimple phi = VEC_index (gimple, phis, e->dest->index);
+
+	  if (phi)
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+
+	  else if (!single_pred_p (e->dest)
+		   && !dominated_by_p (CDI_DOMINATORS, e->dest, dom_bb)
+		   && e->dest->loop_father == loop)
+	  {
+	    tree var = SSA_NAME_VAR (name);
+
+	    phi = create_phi_node (var, e->dest);
+	    VEC_replace (gimple, phis, e->dest->index, phi);
+	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
+	    rename_dominated_uses (old_name, gimple_phi_result (phi),
+				   e->dest);
+	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
+				     phis, bbs);
+	  }
+	}
+    }
+}
+
+/* Helper function for add_missing_phi_nodes.  For all the definitions
+   of DEF_STMT add the missing phi nodes in LOOP.  */
+
+static void
+add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt, basic_block *bbs)
+{
+  def_operand_p def_p;
+  ssa_op_iter op_iter;
+  basic_block bb = gimple_bb (def_stmt);
+
+  FOR_EACH_PHI_OR_STMT_DEF (def_p, def_stmt, op_iter, SSA_OP_DEF|SSA_OP_VDEF)
+    {
+      edge e;
+      edge_iterator ei;
+      tree res, zero, var;
+      gimple loop_phi, latch_phi, use_stmt, phi;
+      imm_use_iterator uit;
+      tree name = DEF_FROM_PTR (def_p);
+      bool needs_update = false;
+      VEC (gimple, heap) *phis;
+      int i;
+
+      FOR_EACH_IMM_USE_STMT (use_stmt, uit, name)
+	{
+	  basic_block use_bb = gimple_bb (use_stmt);
+
+	  if (!dominated_by_p (CDI_DOMINATORS, bb, use_bb))
+	    {
+	      needs_update = true;
+	      BREAK_FROM_IMM_USE_STMT (uit);
+	    }
+	}
+
+      if (!needs_update)
+	continue;
+
+      var = SSA_NAME_VAR (name);
+      loop_phi = create_phi_node (var, loop->header);
+      latch_phi = create_phi_node (var, loop->latch);
+
+      /* The value set to ZERO will never be used in the loop, however we
+	 have to construct something meaningful for virtual SSA_NAMEs.  */
+      if (is_gimple_reg (name))
+	zero = build_zero_cst (TREE_TYPE (name));
+      else
+	{
+	  zero = gimple_vop (cfun);
+	  mark_sym_for_renaming (gimple_vop (cfun));
+	}
+
+      res = gimple_phi_result (latch_phi);
+      FOR_EACH_EDGE (e, ei, loop->header->preds)
+	add_phi_arg (loop_phi, (e == loop_latch_edge (loop) ? res : zero),
+		     e, UNKNOWN_LOCATION);
+
+      res = gimple_phi_result (loop_phi);
+      FOR_EACH_EDGE (e, ei, loop->latch->preds)
+	add_phi_arg (latch_phi, res, e, UNKNOWN_LOCATION);
+
+      phis = VEC_alloc (gimple, heap, n_basic_blocks);
+      VEC_safe_grow_cleared (gimple, heap, phis, n_basic_blocks);
+
+      VEC_replace (gimple, phis, loop->latch->index, latch_phi);
+      VEC_replace (gimple, phis, loop->header->index, loop_phi);
+      add_missing_phi_nodes_2 (loop, name, name, phis, bbs);
+
+      FOR_EACH_VEC_ELT (gimple, phis, i, phi)
+	{
+	  if (!phi)
+	    continue;
+
+	  FOR_EACH_EDGE (e, ei, BASIC_BLOCK (i)->preds)
+	    if (!PHI_ARG_DEF_FROM_EDGE (phi, e))
+	      add_phi_arg (phi, res, e, UNKNOWN_LOCATION);
+	}
+
+      VEC_free (gimple, heap, phis);
+    }
+}
+
+/* Walks over the code of LOOP and adds the missing phi nodes at
+   control flow junctions.  When a variable is defined in an outer
+   loop and used in an inner loop, the definition dominates the use.
+   After the loop flattening, the inner loop body is directly
+   reachable from the LOOP->header by using the added edge guarded by
+   the boolean flag that controls the execution of the back-edge that
+   was eliminated.  In this case, the use is not dominated by the
+   definition, and this function adds the missing phi nodes.  */
+
+static void
+add_missing_phi_nodes (loop_p loop)
+{
+  gimple_stmt_iterator gsi;
+  int i, n = loop->num_nodes;
+  basic_block *bbs = get_loop_body (loop);
+
+  for (i = 0; i < n; i++)
+    {
+      basic_block bb = bbs[i];
+
+      /* LOOP->header dominates all the blocks of the loop body, and
+	 so we don't have to look at the missing phi nodes for the
+	 definitions of LOOP->header.  */
+      if (bb == loop->header)
+	continue;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (!gimple_nop_p (gsi_stmt (gsi)))
+	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi), bbs);
+
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi), bbs);
+    }
+
+  free (bbs);
+}
+
+/* Removes all the back-edges of LOOP except its own back-edge.
+   SCRATCH_PAD is used in if-conversion.  */
+
+static unsigned
+flatten_loop (loop_p loop, tree *scratch_pad)
+{
+  int i, n = loop->num_nodes;
+  basic_block *bbs;
+  VEC (edge, heap) *back_edges;
+  VEC (basic_block, heap) *loop_body;
+  edge_iterator ei;
+  edge e, pred_e;
+  unsigned max_nb_basic_blocks = PARAM_VALUE (PARAM_LFLAT_MAX_NB_BBS);;
+
+  if (loop->num_nodes > max_nb_basic_blocks
+      || !single_exit (loop)
+      || !dbg_cnt (lflat))
+    return 0;
+
+  mark_dfs_back_edges ();
+  bbs = get_loop_body (loop);
+
+  back_edges = VEC_alloc (edge, heap, 3);
+  loop_body = VEC_alloc (basic_block, heap, 3);
+
+  for (i = 0; i < n; i++)
+    FOR_EACH_EDGE (e, ei, bbs[i]->succs)
+      if (e->flags & EDGE_DFS_BACK
+	  && e->src != loop->latch)
+	VEC_safe_push (edge, heap, back_edges, e);
+
+  free (bbs);
+
+  /* Early return and do not modify the code when there are no back
+     edges.  */
+  if (VEC_empty (edge, back_edges))
+    return 0;
+
+  cancel_subloops (loop);
+
+  /* Split the latch edge to make sure that the latch basic block does
+     not contain code.  */
+  loop->latch = split_edge (loop_latch_edge (loop));
+  pred_e = single_pred_edge (loop->latch);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    {
+      basic_block dest = split_edge (e);
+
+      /* Redirect BACK_EDGE to LOOP->latch.  */
+      redirect_edge_and_branch_force (e, loop->latch);
+
+      /* Save the basic block where it was pointing.  */
+      VEC_safe_push (basic_block, heap, loop_body, dest);
+    }
+
+  update_loop_phi_nodes (loop, pred_e);
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    create_control_flag (e, loop, VEC_index (basic_block, loop_body, i));
+
+  FOR_EACH_VEC_ELT (edge, back_edges, i, e)
+    update_inner_induction_phi_nodes (e, loop, VEC_index (basic_block,
+							  loop_body, i));
+
+  free_dominance_info (CDI_DOMINATORS);
+  calculate_dominance_info (CDI_DOMINATORS);
+  add_missing_phi_nodes (loop);
+
+  /* If we redirected some back-edges, split the latch edge to create
+     an empty LOOP->latch.  */
+  if (!single_pred_p (loop->latch))
+    loop->latch = split_edge (loop_latch_edge (loop));
+
+  tree_if_conversion (loop, scratch_pad);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Flattened %d loops.\n", VEC_length (edge, back_edges));
+
+  VEC_free (edge, heap, back_edges);
+  VEC_free (basic_block, heap, loop_body);
+
+  return TODO_cleanup_cfg;
+}
+
+/* Flattens all the loops of the current function.  */
+
+static unsigned int
+tree_loop_flattening (void)
+{
+  unsigned todo = 0;
+  loop_p loop;
+  loop_iterator li;
+  tree scratch_pad = NULL_TREE;
+
+  if (number_of_loops () <= 1)
+    return 0;
+
+  FOR_EACH_LOOP (li, loop, 0)
+    todo |= flatten_loop (loop, &scratch_pad);
+
+#ifdef ENABLE_CHECKING
+  verify_dominators (CDI_DOMINATORS);
+  verify_flow_info ();
+#endif
+
+  return todo;
+}
+
+static bool
+gate_tree_loop_flattening (void)
+{
+  return flag_tree_loop_flattening != 0;
+}
+
+struct gimple_opt_pass pass_flatten_loops =
+{
+ {
+  GIMPLE_PASS,
+  "lflat",				/* name */
+  gate_tree_loop_flattening,		/* gate */
+  tree_loop_flattening,       		/* execute */
+  NULL,					/* sub */
+  NULL,					/* next */
+  0,					/* static_pass_number */
+  TV_TREE_LOOP_FLATTENING,  		/* tv_id */
+  PROP_cfg | PROP_ssa,			/* properties_required */
+  0,					/* properties_provided */
+  0,					/* properties_destroyed */
+  0,					/* todo_flags_start */
+  TODO_dump_func
+  | TODO_verify_ssa
+  | TODO_update_ssa
+  | TODO_ggc_collect			/* todo_flags_finish */
+ }
+};
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a87a770..e2f257f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -374,6 +374,7 @@ extern struct gimple_opt_pass pass_graphite;
 extern struct gimple_opt_pass pass_graphite_transforms;
 extern struct gimple_opt_pass pass_if_conversion;
 extern struct gimple_opt_pass pass_loop_distribution;
+extern struct gimple_opt_pass pass_flatten_loops;
 extern struct gimple_opt_pass pass_vectorize;
 extern struct gimple_opt_pass pass_slp_vectorize;
 extern struct gimple_opt_pass pass_complete_unroll;
-- 
1.7.0.4


[-- Attachment #3: 0002-Add-functional-tests-for-loop-flattening.patch --]
[-- Type: text/x-patch, Size: 15127 bytes --]

From f7bd38c2245b8adf7b704208af990fea52ff66be Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 14:28:36 -0600
Subject: [PATCH 02/12] Add functional tests for loop flattening.

---
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c |   61 ++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c |   34 +++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c |   35 ++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c |   55 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c |   55 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c  |    8 +--
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c  |    6 +-
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c  |   50 +++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c  |   50 +++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c  |   66 ++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c  |   61 ++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c  |   56 ++++++++++++++++++++++
 gcc/tree-loop-flattening.c                   |    3 +
 13 files changed, 531 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
new file mode 100644
index 0000000..56d14f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-10.c
@@ -0,0 +1,61 @@
+/* From graphite/block-7.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int A[N][N], B[N][N], C[N][N];
+
+static void __attribute__((noinline))
+matmult (void)
+{
+  int i, j, k;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+        A[i][j] = 0;
+        for (k = 0; k < N; k++)
+          A[i][j] += B[i][k] * C[k][j];
+      }
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res = 0;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	B[i][j] = j;
+	C[i][j] = i;
+      }
+
+  matmult ();
+
+  for (i = 0; i < N; i++)
+    res += A[i][i];
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 529340000)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 2 loops" 1 "lflat" } } */
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
new file mode 100644
index 0000000..68c5d49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-11.c
@@ -0,0 +1,34 @@
+/* From graphite/run-id-1.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+void abort (void);
+
+void foo (int N)
+{
+  int i, j;
+  int x[1000][1000];
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      x[i][j] = i + j + 3;
+
+  /* This loop will not be flattened as the outermost loop has two
+     exit edges.  */
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      if (x[i][j] != i + j + 3)
+	abort ();
+}
+
+int main(void)
+{
+  foo (1000);
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
new file mode 100644
index 0000000..eb0fb3d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-12.c
@@ -0,0 +1,35 @@
+/* From graphite/run-id-4.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+extern void abort (void);
+
+__attribute__ ((noinline)) int
+foo (int x, int y, int *z)
+{
+  int a, b, c, d;
+
+  a = b = 0;
+  for (d = 0; d < y; d++)
+    {
+      if (z)
+	b = d * *z;
+      for (c = 0; c < x; c++)
+	a += b;
+    }
+
+  return a;
+}
+
+int
+main (void)
+{
+  if (foo (3, 2, 0) != 0)
+    abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
new file mode 100644
index 0000000..1d04dbe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-13.c
@@ -0,0 +1,55 @@
+/* From graphite/run-id-5.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#include <stdarg.h>
+
+extern void abort ();
+#define N 40
+
+int a[N];
+
+__attribute__ ((noinline)) int
+foo (int n){
+  int i,j;
+  int sum;
+
+  if (n<=0)
+    return 0;
+
+  for (i = 0; i < N; i++) {
+    sum = 0;
+    for (j = 0; j < n; j+=2) {
+      sum += j;
+    }
+    a[i] = sum + j;
+  }
+}
+
+int main (void)
+{
+  int i,j;
+  int sum;
+
+  for (i=0; i<N; i++)
+    a[i] = i;
+
+  foo (N);
+
+  /* This won't be flattened.  */
+  for (i=0; i<N; i++)
+    {
+      sum = 0;
+      for (j = 0; j < N; j+=2)
+        sum += j;
+      if (a[i] != sum + j)
+        abort();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
new file mode 100644
index 0000000..c87b893
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-14.c
@@ -0,0 +1,55 @@
+/* From graphite/run-id-6.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#include <stdarg.h>
+
+extern void abort ();
+#define N 40
+
+int a[N];
+
+__attribute__ ((noinline)) int
+foo (int n){
+  int i,j,k=0;
+  int sum;
+
+  if (n<=0)
+    return 0;
+
+  for (i = 0; i < N; i++) {
+    sum = 0;
+    for (j = 0; j < n; j+=2) {
+      sum += k++;
+    }
+    a[i] = sum + j;
+  }
+}
+
+int main (void)
+{
+  int i,j,k=0;
+  int sum;
+
+  for (i=0; i<N; i++)
+    a[i] = i;
+
+  foo (N);
+
+  /* This is not flattened.  */
+  for (i=0; i<N; i++)
+    {
+      sum = 0;
+      for (j = 0; j < N; j+=2)
+        sum += k++;
+      if (a[i] != sum + j)
+	abort();
+    }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
index a7287fb..4573801 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-2.c
@@ -28,12 +28,8 @@ __morestack_release_segments (struct stack_segment **pp, int free_dynamic)
     {
       struct stack_segment *next;
       next = pss->next;
- {
-   if (free_dynamic)
-     {
-       ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
-     }
- }
+      if (free_dynamic)
+	ret = merge_dynamic_blocks (pss->dynamic_allocation, ret);
       pss = next;
     }
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
index d3d66ab..cf01273 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-3.c
@@ -11,9 +11,9 @@ split_directories (const char *name, int *ptr_num_dirs)
   int ch;
   while ((ch = *p++) != '\0')
     {
-   num_dirs++;
-   while (((*p) == '/'))
-     p++;
+      num_dirs++;
+      while (((*p) == '/'))
+	p++;
     }
   return (dirs[num_dirs - 1] == ((void *)0));
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
new file mode 100644
index 0000000..24704fb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-5.c
@@ -0,0 +1,50 @@
+/* From graphite/block-0.c.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 1000
+int a[N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int j;
+  int i;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      a[j] = a[i] + 1;
+
+  return a[0];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, res;
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 1999)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
new file mode 100644
index 0000000..8a5382f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-6.c
@@ -0,0 +1,50 @@
+/* From graphite/block-1.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define MAX 100
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j;
+  int sum = 0;
+  int A[MAX * MAX];
+  int B[MAX * MAX];
+
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      {
+	A[i*MAX + j] = j;
+	B[i*MAX + j] = j;
+      }
+
+  for (i = 0; i < MAX; i++)
+    for (j = 0; j < MAX; j++)
+      A[i*MAX + j] += B[j*MAX + i];
+
+  for(i = 0; i < MAX; i++)
+    for(j = 0; j < MAX; j++)
+      sum += A[i*MAX + j];
+
+#if DEBUG
+  fprintf (stderr, "sum = %d \n", sum);
+#endif
+
+  if (sum != 990000)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 3 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
new file mode 100644
index 0000000..3252545
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-7.c
@@ -0,0 +1,66 @@
+/* From graphite/block-3.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-timeout-factor 4.0 } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 24
+#define M 100
+
+int A[M][M][M], B[M][M], C[M][M];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j, k;
+
+  /* These loops contain too few iterations to be blocked by 64.  */
+  for (i = 0; i < 24; i++)
+    for (j = 0; j < 24; j++)
+      for (k = 0; k < 24; k++)
+        A[i][j][k] = B[i][k] * C[k][j];
+
+  /* These loops should still be loop blocked.  */
+  for (i = 0; i < M; i++)
+    for (j = 0; j < M; j++)
+      for (k = 0; k < M; k++)
+        A[i][j][k] = B[i][k] * C[k][j];
+
+  return A[0][0][0] + A[M-1][M-1][M-1];
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < M; i++)
+    for (j = 0; j < M; j++)
+      {
+	B[i][j] = i;
+	C[i][j] = j;
+      }
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 9801)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 2 loops" 2 "lflat" } } */
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 1 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
new file mode 100644
index 0000000..388e721
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-8.c
@@ -0,0 +1,61 @@
+/* From graphite/block-5.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+
+int a[N][N];
+int b[N][N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j;
+  int res = 0;
+
+  /* This loop nest should be blocked.  */
+  for (j = 1; j < N; j++)
+    for (i = 0; i < N; i++)
+      a[i][j] = a[i][j-1] + b[i][j];
+
+  for (i = 0; i < N; i++)
+    res += a[i][i];
+
+  return res;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      {
+	a[i][j] = i + j;
+	b[i][j] = i - j;
+      }
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 1333300)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 2 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c
new file mode 100644
index 0000000..06a2667
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/flat-loop-9.c
@@ -0,0 +1,56 @@
+/* From graphite/block-6.c.  */
+
+/* { dg-require-effective-target size32plus } */
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-loop-flatten -fdump-tree-lflat-all" } */
+
+#define DEBUG 0
+#if DEBUG
+#include <stdio.h>
+#endif
+
+#define N 200
+int a[N][N];
+
+static int __attribute__((noinline))
+foo (void)
+{
+  int i, j;
+  int res = 0;
+
+  /* Interchange is not legal for loops 0 and 1.  */
+  for (i = 1; i < N; i++)
+    for (j = 1; j < N - 1; j++)
+      a[i][j] = a[i-1][j+1] * a[i-1][j+1] / 2;
+
+  for (i = 0; i < N; i++)
+    res += a[i][i];
+
+  return res;
+}
+
+extern void abort ();
+
+int
+main (void)
+{
+  int i, j, res;
+
+  for (i = 0; i < N; i++)
+    for (j = 0; j < N; j++)
+      a[i][j] = i + j;
+
+  res = foo ();
+
+#if DEBUG
+  fprintf (stderr, "res = %d \n", res);
+#endif
+
+  if (res != 204007516)
+    abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Flattened 1 loops" 2 "lflat" } } */
+/* { dg-final { cleanup-tree-dump "lflat" } } */
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 56211b4..1f887a2 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -571,6 +571,9 @@ flatten_loop (loop_p loop, tree *scratch_pad)
   if (gate_tree_if_conversion ())
     tree_if_conversion (loop, scratch_pad);
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Flattened %d loops.\n", VEC_length (edge, back_edges));
+
   return TODO_update_ssa | TODO_verify_ssa;
 }
 
-- 
1.7.0.4


[-- Attachment #4: 0003-Unconditionally-call-tree_if_conversion.patch --]
[-- Type: text/x-patch, Size: 1722 bytes --]

From af6e13e63236bf7a1c3cb4b8cee7839dae758298 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 15 Nov 2010 16:59:44 -0600
Subject: [PATCH 03/12] Unconditionally call tree_if_conversion.

---
 gcc/tree-flow.h            |    1 -
 gcc/tree-if-conv.c         |    2 +-
 gcc/tree-loop-flattening.c |    3 +--
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-flow.h b/gcc/tree-flow.h
index e1ee69f..a4bf6f8 100644
--- a/gcc/tree-flow.h
+++ b/gcc/tree-flow.h
@@ -731,7 +731,6 @@ bool stmt_dominates_stmt_p (gimple, gimple);
 void mark_virtual_ops_for_renaming (gimple);
 
 /* In tree-if-conv.c */
-bool gate_tree_if_conversion (void);
 bool tree_if_conversion (struct loop *, tree *);
 
 /* In tree-ssa-dce.c */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index eaef273..794be57 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1661,7 +1661,7 @@ main_tree_if_conversion (void)
 
 /* Returns true when the if-conversion pass is enabled.  */
 
-bool
+static bool
 gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 1f887a2..1d22dc6 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -568,8 +568,7 @@ flatten_loop (loop_p loop, tree *scratch_pad)
   if (!single_pred_p (loop->latch))
     loop->latch = split_edge (loop_latch_edge (loop));
 
-  if (gate_tree_if_conversion ())
-    tree_if_conversion (loop, scratch_pad);
+  tree_if_conversion (loop, scratch_pad);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Flattened %d loops.\n", VEC_length (edge, back_edges));
-- 
1.7.0.4


[-- Attachment #5: 0004-Use-simplified-version-of-cancel_subloops.patch --]
[-- Type: text/x-patch, Size: 2310 bytes --]

From be8156967aaee764448fe528b64604df39c4c2de Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 13:24:07 -0600
Subject: [PATCH 04/12] Use simplified version of cancel_subloops.

---
 gcc/cfgloop.c              |   10 ++++++++++
 gcc/cfgloop.h              |    1 +
 gcc/tree-loop-flattening.c |   19 -------------------
 3 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 08d689d..bfab67b 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1297,6 +1297,16 @@ cancel_loop_tree (struct loop *loop)
   cancel_loop (loop);
 }
 
+/* Keep the loop structure for LOOP and remove all the loop structures
+   under LOOP.  */
+
+void
+cancel_subloops (loop_p loop)
+{
+  while (loop->inner)
+    cancel_loop_tree (loop->inner);
+}
+
 /* Checks that information about loops is correct
      -- sizes of loops are all right
      -- results of get_loop_body really belong to the loop
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index bf2614e..1679019 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -256,6 +256,7 @@ extern void add_bb_to_loop (basic_block, struct loop *);
 extern void remove_bb_from_loops (basic_block);
 
 extern void cancel_loop_tree (struct loop *);
+extern void cancel_subloops (struct loop *);
 extern void delete_loop (struct loop *);
 
 enum
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 1d22dc6..b36c563 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -87,25 +87,6 @@ along with GCC; see the file COPYING3.  If not see
    available from
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.5033 */
 
-/* Keep the loop structure for LOOP and remove all the loop structures
-   under LOOP.  */
-
-static void
-cancel_subloops (loop_p loop)
-{
-  int i;
-  loop_p li;
-  VEC (loop_p, heap) *lv = VEC_alloc (loop_p, heap, 3);
-
-  for (li = loop->inner; li; li = li->next)
-    VEC_safe_push (loop_p, heap, lv, li);
-
-  FOR_EACH_VEC_ELT (loop_p, lv, i, li)
-    cancel_loop_tree (li);
-
-  VEC_free (loop_p, heap, lv);
-}
-
 /* Before creating other phi nodes in LOOP->header for the control
    flags, update the phi nodes of LOOP->header and add the necessary
    phi nodes in the LOOP->latch that now contains several paths on
-- 
1.7.0.4


[-- Attachment #6: 0005-Do-not-call-create_new_def_for.patch --]
[-- Type: text/x-patch, Size: 3314 bytes --]

From 831866b109d358441069aa8fef34ff2a43f83259 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 12:28:04 -0600
Subject: [PATCH 05/12] Do not call create_new_def_for

---
 gcc/tree-loop-flattening.c |   18 ------------------
 1 files changed, 0 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index b36c563..d28117e 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -109,8 +109,6 @@ update_loop_phi_nodes (loop_p loop, edge pred_e)
       tree var = SSA_NAME_VAR (res);
 
       phi = create_phi_node (var, loop->latch);
-      create_new_def_for (gimple_phi_result (phi), phi,
-			  gimple_phi_result_ptr (phi));
 
       FOR_EACH_EDGE (e, ei, loop->latch->preds)
 	add_phi_arg (phi, (e == pred_e ? back_arg : res),
@@ -146,8 +144,6 @@ create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
   /* Adds a control variable for the redirected FORWARDED_EDGE.  */
   add_referenced_var (var);
   phi = create_phi_node (var, forwarded_edge->dest);
-  create_new_def_for (gimple_phi_result (phi), phi,
-		      gimple_phi_result_ptr (phi));
 
   FOR_EACH_EDGE (e, ei, outer_latch_e->src->preds)
     add_phi_arg (phi, (e == forwarded_edge
@@ -158,8 +154,6 @@ create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
 
   /* Add a phi node in LOOP->header for the control variable.  */
   phi = create_phi_node (var, loop->header);
-  create_new_def_for (gimple_phi_result (phi), phi,
-		      gimple_phi_result_ptr (phi));
 
   preheader = loop_preheader_edge (loop);
   FOR_EACH_EDGE (e, ei, loop->header->preds)
@@ -198,11 +192,6 @@ add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
   gimple loop_phi = create_phi_node (var, loop->header);
   gimple latch_phi = create_phi_node (var, loop->latch);
 
-  create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
-		      gimple_phi_result_ptr (loop_phi));
-  create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
-		      gimple_phi_result_ptr (latch_phi));
-
   /* The value set to ZERO will never be used in the loop, however we
      have to construct something meaningful for virtual SSA_NAMEs.  */
   if (TREE_CODE (arg) != SSA_NAME)
@@ -343,8 +332,6 @@ add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
 	    tree var = SSA_NAME_VAR (name);
 
 	    phi = create_phi_node (var, e->dest);
-	    create_new_def_for (gimple_phi_result (phi), phi,
-				gimple_phi_result_ptr (phi));
 	    VEC_replace (gimple, phis, e->dest->index, phi);
 	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
 	    rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
@@ -396,11 +383,6 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
       loop_phi = create_phi_node (var, loop->header);
       latch_phi = create_phi_node (var, loop->latch);
 
-      create_new_def_for (gimple_phi_result (loop_phi), loop_phi,
-			  gimple_phi_result_ptr (loop_phi));
-      create_new_def_for (gimple_phi_result (latch_phi), latch_phi,
-			  gimple_phi_result_ptr (latch_phi));
-
       /* The value set to ZERO will never be used in the loop, however we
 	 have to construct something meaningful for virtual SSA_NAMEs.  */
       if (is_gimple_reg (name))
-- 
1.7.0.4


[-- Attachment #7: 0006-Use-create_tmp_reg.patch --]
[-- Type: text/x-patch, Size: 844 bytes --]

From 7fa8ed733959fb40fe4105800e752b7929b116cf Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 14:36:08 -0600
Subject: [PATCH 06/12] Use create_tmp_reg.

---
 gcc/tree-loop-flattening.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index d28117e..8750d80 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -135,7 +135,7 @@ create_control_flag (edge forwarded_edge, loop_p loop, basic_block inner_body)
   edge e, preheader;
   edge outer_latch_e = loop_latch_edge (loop);
   const char *name = "_flat_";
-  tree var = create_tmp_var (boolean_type_node, name);
+  tree var = create_tmp_reg (boolean_type_node, name);
   tree res;
   gimple phi, cond_stmt;
   gimple_stmt_iterator gsi;
-- 
1.7.0.4


[-- Attachment #8: 0007-Use-build_zero_cst.patch --]
[-- Type: text/x-patch, Size: 1249 bytes --]

From d1dff67ed0880e457df4df1a30711e59f820880d Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 14:40:12 -0600
Subject: [PATCH 07/12] Use build_zero_cst.

---
 gcc/tree-loop-flattening.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 8750d80..7d61e79 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -197,7 +197,7 @@ add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
   if (TREE_CODE (arg) != SSA_NAME)
     zero = arg;
   else if (is_gimple_reg (arg))
-    zero = fold_convert (TREE_TYPE (arg), integer_zero_node);
+    zero = build_zero_cst (TREE_TYPE (arg));
   else
     zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));
 
@@ -386,7 +386,7 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
       /* The value set to ZERO will never be used in the loop, however we
 	 have to construct something meaningful for virtual SSA_NAMEs.  */
       if (is_gimple_reg (name))
-	zero = fold_convert (TREE_TYPE (name), integer_zero_node);
+	zero = build_zero_cst (TREE_TYPE (name));
       else
 	zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
 
-- 
1.7.0.4


[-- Attachment #9: 0008-Correct-the-use-of-the-default-virtual-operands.patch --]
[-- Type: text/x-patch, Size: 1327 bytes --]

From 9ad088cb779576462c83b5dbed0449b138edc15d Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 14:43:32 -0600
Subject: [PATCH 08/12] Correct the use of the default virtual operands.

---
 gcc/tree-loop-flattening.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 7d61e79..5708a87 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -199,7 +199,10 @@ add_header_and_latch_phis (loop_p loop, tree name, edge forwarded_edge,
   else if (is_gimple_reg (arg))
     zero = build_zero_cst (TREE_TYPE (arg));
   else
-    zero = gimple_default_def (cfun, SSA_NAME_VAR (arg));
+    {
+      zero = gimple_vop (cfun);
+      mark_sym_for_renaming (gimple_vop (cfun));
+    }
 
   res = gimple_phi_result (latch_phi);
   FOR_EACH_EDGE (e, ei, loop->header->preds)
@@ -388,7 +391,10 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
       if (is_gimple_reg (name))
 	zero = build_zero_cst (TREE_TYPE (name));
       else
-	zero = gimple_default_def (cfun, SSA_NAME_VAR (name));
+	{
+	  zero = gimple_vop (cfun);
+	  mark_sym_for_renaming (gimple_vop (cfun));
+	}
 
       res = gimple_phi_result (latch_phi);
       FOR_EACH_EDGE (e, ei, loop->header->preds)
-- 
1.7.0.4


[-- Attachment #10: 0009-Use-SET_USE-instead-of-replace_exp.patch --]
[-- Type: text/x-patch, Size: 2547 bytes --]

From 4f9defed6dab1367dd3acdf29699e8fe0801bdf3 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 15:14:41 -0600
Subject: [PATCH 09/12] Use SET_USE instead of replace_exp.

---
 gcc/tree-loop-flattening.c |   24 +++++-------------------
 1 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 5708a87..c3cf9a2 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -252,13 +252,11 @@ update_inner_induction_phi_nodes (edge forwarded_edge, loop_p loop,
    that is dominated by DEF_BB.  */
 
 static void
-rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
-		       basic_block def_bb)
+rename_dominated_uses (tree old_name, tree new_name, basic_block def_bb)
 {
   imm_use_iterator uit;
   gimple stmt;
   use_operand_p use_p;
-  ssa_op_iter op_iter;
 
   FOR_EACH_IMM_USE_STMT (stmt, uit, old_name)
     {
@@ -273,27 +271,15 @@ rename_dominated_uses (loop_p loop, tree old_name, tree new_name,
 	    if (PHI_ARG_DEF_FROM_EDGE (stmt, e) == old_name
 		&& dominated_by_p (CDI_DOMINATORS, e->src, def_bb)
 		&& use_bb != def_bb)
-	      replace_exp (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx),
-			   new_name);
+	      SET_USE (gimple_phi_arg_imm_use_ptr (stmt, e->dest_idx), new_name);
 	}
       else
 	{
 	  if (!dominated_by_p (CDI_DOMINATORS, use_bb, def_bb))
 	    continue;
 
-	  if (use_bb->loop_father == loop)
-	    {
-	      FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
-		if (USE_FROM_PTR (use_p) == old_name)
-		  replace_exp (use_p, new_name);
-	    }
-	  else
-	    /* Virtual operands are not translated into loop closed
-	       SSA form, and thus they may occur in the rest of
-	       the program without a loop close vphi node.  */
-	    FOR_EACH_SSA_USE_OPERAND (use_p, stmt, op_iter, SSA_OP_ALL_USES)
-	      if (USE_FROM_PTR (use_p) == old_name)
-		replace_exp (use_p, new_name);
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, uit)
+	    SET_USE (use_p, new_name);
 	}
     }
 }
@@ -337,7 +323,7 @@ add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
 	    phi = create_phi_node (var, e->dest);
 	    VEC_replace (gimple, phis, e->dest->index, phi);
 	    add_phi_arg (phi, name, e, UNKNOWN_LOCATION);
-	    rename_dominated_uses (loop, old_name, gimple_phi_result (phi),
+	    rename_dominated_uses (old_name, gimple_phi_result (phi),
 				   e->dest);
 	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
 				     phis);
-- 
1.7.0.4


[-- Attachment #11: 0010-Iterate-over-the-loop-body.patch --]
[-- Type: text/x-patch, Size: 2852 bytes --]

From 25475ef2e20eb242ea1f523d6276add4447a5aa4 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 15:32:30 -0600
Subject: [PATCH 10/12] Iterate over the loop body.

---
 gcc/tree-loop-flattening.c |   23 +++++++++++------------
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index c3cf9a2..9ddc52a 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -291,20 +291,19 @@ rename_dominated_uses (tree old_name, tree new_name, basic_block def_bb)
 
 static void
 add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
-			 VEC (gimple, heap) *phis)
+			 VEC (gimple, heap) *phis, basic_block *bbs)
 {
-  unsigned i;
-  basic_block bb, dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
-  VEC (basic_block, heap) *dom_bbs = get_all_dominated_blocks (CDI_DOMINATORS,
-							       dom_bb);
+  basic_block dom_bb = gimple_bb (SSA_NAME_DEF_STMT (name));
+  int i, n = loop->num_nodes;
 
-  FOR_EACH_VEC_ELT (basic_block, dom_bbs, i, bb)
+  for (i = 0; i < n; i++)
     {
       edge e;
       edge_iterator ei;
+      basic_block bb = bbs[i];
 
       if (bb == loop->latch
-	  || bb->loop_father != loop)
+	  || !dominated_by_p (CDI_DOMINATORS, bb, dom_bb))
 	continue;
 
       FOR_EACH_EDGE (e, ei, bb->succs)
@@ -326,7 +325,7 @@ add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
 	    rename_dominated_uses (old_name, gimple_phi_result (phi),
 				   e->dest);
 	    add_missing_phi_nodes_2 (loop, gimple_phi_result (phi), old_name,
-				     phis);
+				     phis, bbs);
 	  }
 	}
     }
@@ -336,7 +335,7 @@ add_missing_phi_nodes_2 (loop_p loop, tree name, tree old_name,
    of DEF_STMT add the missing phi nodes in LOOP.  */
 
 static void
-add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
+add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt, basic_block *bbs)
 {
   def_operand_p def_p;
   ssa_op_iter op_iter;
@@ -396,7 +395,7 @@ add_missing_phi_nodes_1 (loop_p loop, gimple def_stmt)
 
       VEC_replace (gimple, phis, loop->latch->index, latch_phi);
       VEC_replace (gimple, phis, loop->header->index, loop_phi);
-      add_missing_phi_nodes_2 (loop, name, name, phis);
+      add_missing_phi_nodes_2 (loop, name, name, phis, bbs);
 
       FOR_EACH_VEC_ELT (gimple, phis, i, phi)
 	{
@@ -440,10 +439,10 @@ add_missing_phi_nodes (loop_p loop)
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!gimple_nop_p (gsi_stmt (gsi)))
-	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+	  add_missing_phi_nodes_1 (loop, gsi_stmt (gsi), bbs);
 
       for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi));
+	add_missing_phi_nodes_1 (loop, gsi_stmt (gsi), bbs);
     }
 
   free (bbs);
-- 
1.7.0.4


[-- Attachment #12: 0011-Fix-memory-leak.patch --]
[-- Type: text/x-patch, Size: 793 bytes --]

From 64c0173d8cf8b433d46941e95fe98979afeba773 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 15:37:46 -0600
Subject: [PATCH 11/12] Fix memory leak.

---
 gcc/tree-loop-flattening.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 9ddc52a..3613dc9 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -527,6 +527,9 @@ flatten_loop (loop_p loop, tree *scratch_pad)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Flattened %d loops.\n", VEC_length (edge, back_edges));
 
+  VEC_free (edge, heap, back_edges);
+  VEC_free (basic_block, heap, loop_body);
+
   return TODO_update_ssa | TODO_verify_ssa;
 }
 
-- 
1.7.0.4


[-- Attachment #13: 0012-Fix-pass-TODOs.patch --]
[-- Type: text/x-patch, Size: 1199 bytes --]

From 0e4f1d41f599fbbb5d338945f1b30c6ad8c37618 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 15:40:28 -0600
Subject: [PATCH 12/12] Fix pass TODOs.

---
 gcc/tree-loop-flattening.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index 3613dc9..cd49be1 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -530,7 +530,7 @@ flatten_loop (loop_p loop, tree *scratch_pad)
   VEC_free (edge, heap, back_edges);
   VEC_free (basic_block, heap, loop_body);
 
-  return TODO_update_ssa | TODO_verify_ssa;
+  return TODO_cleanup_cfg;
 }
 
 /* Flattens all the loops of the current function.  */
@@ -554,7 +554,6 @@ tree_loop_flattening (void)
   verify_flow_info ();
 #endif
 
-  cleanup_tree_cfg ();
   return todo;
 }
 
@@ -580,7 +579,8 @@ struct gimple_opt_pass pass_flatten_loops =
   0,					/* properties_destroyed */
   0,					/* todo_flags_start */
   TODO_dump_func
-    | TODO_update_ssa
-    | TODO_ggc_collect			/* todo_flags_finish */
+  | TODO_verify_ssa
+  | TODO_update_ssa
+  | TODO_ggc_collect			/* todo_flags_finish */
  }
 };
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] Loop flattening on loop-SSA.
  2010-11-16 22:47       ` Sebastian Pop
@ 2010-11-16 23:56         ` Sebastian Pop
  0 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2010-11-16 23:56 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 247 bytes --]

Hi,

I ran valgrind on cc1 and I still have found several memory leaks in
the code exercised by the loop flattening.  Attached is the patch on
top of the previous changes.  I will test this as well on amd64-linux.
Ok for trunk?

Thanks,
Sebastian

[-- Attachment #2: 0001-Fix-memory-leaks.patch --]
[-- Type: text/x-patch, Size: 1909 bytes --]

From ca452b744393b1cc78a123e6a4c27498b60429bf Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 16 Nov 2010 17:01:14 -0600
Subject: [PATCH] Fix memory leaks.

---
 gcc/cfgloop.c              |    1 +
 gcc/tree-loop-flattening.c |   10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index bfab67b..109dc72 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1285,6 +1285,7 @@ cancel_loop (struct loop *loop)
   for (i = 0; i < loop->num_nodes; i++)
     bbs[i]->loop_father = outer;
 
+  free (bbs);
   delete_loop (loop);
 }
 
diff --git a/gcc/tree-loop-flattening.c b/gcc/tree-loop-flattening.c
index cd49be1..e94ff2a 100644
--- a/gcc/tree-loop-flattening.c
+++ b/gcc/tree-loop-flattening.c
@@ -469,9 +469,7 @@ flatten_loop (loop_p loop, tree *scratch_pad)
 
   mark_dfs_back_edges ();
   bbs = get_loop_body (loop);
-
   back_edges = VEC_alloc (edge, heap, 3);
-  loop_body = VEC_alloc (basic_block, heap, 3);
 
   for (i = 0; i < n; i++)
     FOR_EACH_EDGE (e, ei, bbs[i]->succs)
@@ -484,9 +482,13 @@ flatten_loop (loop_p loop, tree *scratch_pad)
   /* Early return and do not modify the code when there are no back
      edges.  */
   if (VEC_empty (edge, back_edges))
-    return 0;
+    {
+      VEC_free (edge, heap, back_edges);
+      return 0;
+    }
 
   cancel_subloops (loop);
+  loop_body = VEC_alloc (basic_block, heap, VEC_length (edge, back_edges));
 
   /* Split the latch edge to make sure that the latch basic block does
      not contain code.  */
@@ -501,7 +503,7 @@ flatten_loop (loop_p loop, tree *scratch_pad)
       redirect_edge_and_branch_force (e, loop->latch);
 
       /* Save the basic block where it was pointing.  */
-      VEC_safe_push (basic_block, heap, loop_body, dest);
+      VEC_quick_push (basic_block, loop_body, dest);
     }
 
   update_loop_phi_nodes (loop, pred_e);
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2010-11-16 15:01           ` Richard Guenther
@ 2011-01-03 21:39             ` Sebastian Pop
  2011-01-03 21:52               ` Richard Guenther
  0 siblings, 1 reply; 41+ messages in thread
From: Sebastian Pop @ 2011-01-03 21:39 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches

Hi Richi,

On Tue, Nov 16, 2010 at 08:09, Richard Guenther <rguenther@suse.de> wrote:
> On Tue, 16 Nov 2010, Richard Guenther wrote:
>
>> On Mon, 15 Nov 2010, Sebastian Pop wrote:
>>
>> > Hi Richi,
>> >
>> > fixes to your review are posted separately, see below for the details.
>> > See 0001-Fix-PR46029-reimplement-if-convert-stores.patch for the
>> > combined patch.
>>
>> +  tree base = create_tmp_var (array_type, "scratch_pad");
>> +  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node,
>> NULL_TREE,
>> +                   NULL_TREE);
>>
>> you can drop creating the ARRAY_REF and do
>>
>> +  return insert_address_of (base, build_pointer_type (elt_type), &gsi);
>>
>>
>> The patches are ok with the above change.
>
> Btw, in insert_address_of you might want to add
>
>  struct ptr_info_def *pi = get_ptr_info (address_of_ai);
>  pt_solution_set_var (&pi->pt, SSA_NAME_VAR (address_of_ai));
>

In four tests of the SPEC benchmarks I am getting this kind of errors:

results.f:19:0: error: address taken, but ADDRESSABLE bit not set
scratch_pad.1438
f951: note: in statement
_ifc_.1454_733 = [cond_expr] D.2525_407 != 0 ? _ifc_.1453_797 :
&scratch_pad.1438;

results.f:19:0: error: address taken, but ADDRESSABLE bit not set
vkl
f951: note: in statement
_ifc_.1450_7837 = [cond_expr] prephitmp.1376_7987 > 1 ? &vkl[11] :
&scratch_pad.1438;

If I'm watching the addressable field of the vkl or scratch_pad decls,
I see that TREE_ADDRESSABLE is cleared in maybe_optimize_var

  if (TREE_ADDRESSABLE (var)
      /* Do not change TREE_ADDRESSABLE if we need to preserve var as
	 a non-register.  Otherwise we are confused and forget to
	 add virtual operands for it.  */
      && (!is_gimple_reg_type (TREE_TYPE (var))
	  || !bitmap_bit_p (not_reg_needs, DECL_UID (var))))
    {
      TREE_ADDRESSABLE (var) = 0;

this is in pass_fold_builtins executed after the loop optimizer is done.
If I'm disabling this optimization, the above ICE disappears.
Any idea how to fix this issue?

Thanks,
Sebastian

PS: With the previous code building an alloca call to allocate the scratch_pad,
I do not see the ICE on the scratch_pad.1438 but I still see the other error
on the vkl array.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2011-01-03 21:39             ` Sebastian Pop
@ 2011-01-03 21:52               ` Richard Guenther
  2011-01-03 23:19                 ` Sebastian Pop
  0 siblings, 1 reply; 41+ messages in thread
From: Richard Guenther @ 2011-01-03 21:52 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches

On Mon, Jan 3, 2011 at 9:26 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi Richi,
>
> On Tue, Nov 16, 2010 at 08:09, Richard Guenther <rguenther@suse.de> wrote:
>> On Tue, 16 Nov 2010, Richard Guenther wrote:
>>
>>> On Mon, 15 Nov 2010, Sebastian Pop wrote:
>>>
>>> > Hi Richi,
>>> >
>>> > fixes to your review are posted separately, see below for the details.
>>> > See 0001-Fix-PR46029-reimplement-if-convert-stores.patch for the
>>> > combined patch.
>>>
>>> +  tree base = create_tmp_var (array_type, "scratch_pad");
>>> +  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node,
>>> NULL_TREE,
>>> +                   NULL_TREE);
>>>
>>> you can drop creating the ARRAY_REF and do
>>>
>>> +  return insert_address_of (base, build_pointer_type (elt_type), &gsi);
>>>
>>>
>>> The patches are ok with the above change.
>>
>> Btw, in insert_address_of you might want to add
>>
>>  struct ptr_info_def *pi = get_ptr_info (address_of_ai);
>>  pt_solution_set_var (&pi->pt, SSA_NAME_VAR (address_of_ai));
>>
>
> In four tests of the SPEC benchmarks I am getting this kind of errors:
>
> results.f:19:0: error: address taken, but ADDRESSABLE bit not set
> scratch_pad.1438
> f951: note: in statement
> _ifc_.1454_733 = [cond_expr] D.2525_407 != 0 ? _ifc_.1453_797 :
> &scratch_pad.1438;
>
> results.f:19:0: error: address taken, but ADDRESSABLE bit not set
> vkl
> f951: note: in statement
> _ifc_.1450_7837 = [cond_expr] prephitmp.1376_7987 > 1 ? &vkl[11] :
> &scratch_pad.1438;
>
> If I'm watching the addressable field of the vkl or scratch_pad decls,
> I see that TREE_ADDRESSABLE is cleared in maybe_optimize_var
>
>  if (TREE_ADDRESSABLE (var)
>      /* Do not change TREE_ADDRESSABLE if we need to preserve var as
>         a non-register.  Otherwise we are confused and forget to
>         add virtual operands for it.  */
>      && (!is_gimple_reg_type (TREE_TYPE (var))
>          || !bitmap_bit_p (not_reg_needs, DECL_UID (var))))
>    {
>      TREE_ADDRESSABLE (var) = 0;
>
> this is in pass_fold_builtins executed after the loop optimizer is done.
> If I'm disabling this optimization, the above ICE disappears.
> Any idea how to fix this issue?

walk_stmt_load_store_addr_ops needs to handle COND_EXPRs in
the gimple_assign_single_p () visit_addr case.

Of course COND_EXPRs should be moved to non-single RHS at some point.

Richard.

> Thanks,
> Sebastian
>
> PS: With the previous code building an alloca call to allocate the scratch_pad,
> I do not see the ICE on the scratch_pad.1438 but I still see the other error
> on the vkl array.
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] Fix PR46029: reimplement if-convert stores.
  2011-01-03 21:52               ` Richard Guenther
@ 2011-01-03 23:19                 ` Sebastian Pop
  0 siblings, 0 replies; 41+ messages in thread
From: Sebastian Pop @ 2011-01-03 23:19 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Guenther, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

On Mon, Jan 3, 2011 at 15:39, Richard Guenther
<richard.guenther@gmail.com> wrote:
> walk_stmt_load_store_addr_ops needs to handle COND_EXPRs in
> the gimple_assign_single_p () visit_addr case.
>
> Of course COND_EXPRs should be moved to non-single RHS at some point.

The attached patch fixes the problems I was seeing.
I will test again the fix for PR46029 with this patch on top and I
will repost it.

Thanks for your help,
Sebastian

[-- Attachment #2: 0001-Handle-COND_EXPR-in-walk_stmt_load_store_addr_ops.patch --]
[-- Type: text/x-diff, Size: 1238 bytes --]

From 3ffab79f5fd56a6578fb2e38035275b759729a3f Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Mon, 3 Jan 2011 17:06:56 -0600
Subject: [PATCH] Handle COND_EXPR in walk_stmt_load_store_addr_ops.

---
 gcc/gimple.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index e686e63..dab819a 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -4931,6 +4931,20 @@ walk_stmt_load_store_addr_ops (gimple stmt, void *data,
 		   && TREE_CODE (OBJ_TYPE_REF_OBJECT (rhs)) == ADDR_EXPR)
 	    ret |= visit_addr (stmt, TREE_OPERAND (OBJ_TYPE_REF_OBJECT (rhs),
 						   0), data);
+	  /* FIXME: COND_EXPR should be moved to non-single RHS at
+	     some point.  */
+	  else if (TREE_CODE (rhs) == COND_EXPR)
+	    {
+	      tree op1 = TREE_OPERAND (rhs, 1);
+	      tree op2 = TREE_OPERAND (rhs, 2);
+
+	      if (TREE_CODE (op1) == ADDR_EXPR)
+		ret |= visit_addr (stmt, TREE_OPERAND (op1, 0), data);
+
+	      if (TREE_CODE (op2) == ADDR_EXPR)
+		ret |= visit_addr (stmt, TREE_OPERAND (op2, 0), data);
+	    }
+
           lhs = gimple_assign_lhs (stmt);
 	  if (TREE_CODE (lhs) == TARGET_MEM_REF
               && TREE_CODE (TMR_BASE (lhs)) == ADDR_EXPR)
-- 
1.7.1


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-01-03 23:10 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-29  3:11 [PATCH 0/6] Loop flattening and improved if-conversion Sebastian Pop
2010-10-29  3:11 ` [PATCH 1/6] Loop flattening on loop-SSA Sebastian Pop
2010-10-29  3:46 ` [PATCH 4/6] if-convert even when the data dependences cannot be computed Sebastian Pop
2010-10-29  3:57 ` [PATCH 5/6] Call if-conversion from loop flattening Sebastian Pop
2010-10-29  4:07 ` [PATCH 3/6] Fix PR46029: reimplement if-convert stores Sebastian Pop
2010-10-29  4:13 ` [PATCH 2/6] Remove ifcvt_memrefs_wont_trap analysis Sebastian Pop
2010-10-29  5:58 ` [PATCH 6/6] Move loop flattening and SLP vectorization at the end of loop transforms Sebastian Pop
2010-10-29 13:44   ` Richard Guenther
2010-10-30  0:23     ` Sebastian Pop
2010-10-30  8:01       ` Richard Guenther
2010-11-03 15:18 ` [PATCH 0/6] Loop flattening and improved if-conversion Richard Guenther
2010-11-03 15:53   ` [PATCH 2/3] if-convert even when the data dependences cannot be computed Sebastian Pop
2010-11-03 20:47     ` Richard Guenther
2010-11-03 20:52       ` Sebastian Pop
2010-11-03 15:53   ` [PATCH 1/3] Fix PR46029: reimplement if-convert stores Sebastian Pop
2010-11-05 12:08     ` Richard Guenther
2010-11-05 16:13       ` Sebastian Pop
2010-11-10 23:24         ` Sebastian Pop
2010-11-11 10:04           ` Richard Guenther
2010-11-15 22:39       ` Sebastian Pop
2010-11-16 14:45         ` Richard Guenther
2010-11-16 15:01           ` Richard Guenther
2011-01-03 21:39             ` Sebastian Pop
2011-01-03 21:52               ` Richard Guenther
2011-01-03 23:19                 ` Sebastian Pop
2010-11-03 15:54   ` [PATCH 3/3] Loop flattening on loop-SSA Sebastian Pop
2010-11-03 16:57     ` Nathan Froyd
2010-11-03 17:29       ` Sebastian Pop
2010-11-05 13:05     ` Richard Guenther
2010-11-05 16:57       ` Sebastian Pop
2010-11-08 16:14         ` Richard Guenther
2010-11-15 23:05           ` Sebastian Pop
2010-11-15 23:17             ` Richard Guenther
2010-11-15 23:35               ` Sebastian Pop
2010-11-16  0:32                 ` Richard Guenther
2010-11-15 23:08           ` Sebastian Pop
2010-11-15 23:10             ` Sebastian Pop
2010-11-15 23:30               ` Richard Guenther
2010-11-15 23:53                 ` Sebastian Pop
2010-11-16 22:47       ` Sebastian Pop
2010-11-16 23:56         ` Sebastian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).