public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4] if-conversion of loops with conditionals containing memory writes
@ 2010-07-08 21:41 Sebastian Pop
  2010-07-08 21:41 ` [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list Sebastian Pop
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-07-08 21:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

Hi,

following Michael's review, this patch-set reworks the previous
patches in order to speed up the computation of non-trapping property
of data reference accesses in the statements to be if-converted.

1/4 is identical to the previously submitted patch.
2/4 is a trivial clean up.
3/4 simplifies the control flow in the previously submitted patch and
    now calls the function outlined in 2/4 to OR predicates.
4/4 is the speeding up part that Michael proposed.

This patch-set passed regstrap on amd64-linux.  Ok for trunk?

Thanks,
Sebastian Pop
--
AMD / Open Source Compiler Engineering / GNU Tools

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
@ 2010-07-08 21:41 ` Sebastian Pop
  2010-07-09 12:13   ` Richard Guenther
  2010-07-08 21:42 ` [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-08 21:41 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

	* tree-if-conv.c (fold_or_predicates): New.
	(add_to_predicate_list): Call it.
---
 gcc/tree-if-conv.c |   43 ++++++++++++++++++++++---------------------
 1 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 34b4159..cac5a3b 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -300,6 +300,27 @@ parse_predicate (tree cond, tree *op0, tree *op1)
   return ERROR_MARK;
 }
 
+/* Returns the fold of predicate C1 OR C2.  */
+
+static tree
+fold_or_predicates (tree c1, tree c2)
+{
+  tree op1a, op1b, op2a, op2b;
+  enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
+  enum tree_code code2 = parse_predicate (c2, &op2a, &op2b);
+
+  if (code1 != ERROR_MARK && code2 != ERROR_MARK)
+    {
+      tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
+					  code2, op2a, op2b);
+      if (t)
+	return t;
+    }
+
+  return fold_build2_loc (UNKNOWN_LOCATION, TRUTH_OR_EXPR,
+			  boolean_type_node, c1, c2);
+}
+
 /* Add condition NC to the predicate list of basic block BB.  */
 
 static inline void
@@ -313,27 +334,7 @@ add_to_predicate_list (basic_block bb, tree nc)
   if (!is_predicated (bb))
     bc = nc;
   else
-    {
-      enum tree_code code1, code2;
-      tree op1a, op1b, op2a, op2b;
-
-      bc = bb_predicate (bb);
-      code1 = parse_predicate (bc, &op1a, &op1b);
-      code2 = parse_predicate (nc, &op2a, &op2b);
-
-      if (code1 != ERROR_MARK && code2 != ERROR_MARK)
-	{
-	  tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
-					      code2, op2a, op2b);
-	  if (!t)
-	    t = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
-				 boolean_type_node, bc, nc);
-	  bc = t;
-	}
-      else
-	bc = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
-			      boolean_type_node, bc, nc);
-    }
+    bc = fold_or_predicates (nc, bb_predicate (bb));
 
   if (!is_gimple_condexpr (bc))
     {
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
  2010-07-08 21:41 ` [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list Sebastian Pop
  2010-07-08 21:42 ` [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
@ 2010-07-08 21:42 ` Sebastian Pop
  2010-08-13  9:41   ` Richard Guenther
  2010-07-08 21:42 ` [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
  2010-08-12 16:19 ` [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
  4 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-08 21:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

This patch relaxes the checks from gimple_could_trap_p in order to
allow the flag_loop_if_convert_memory_writes to if-convert more loops
in which it is possible to prove that:

- the accesses to an array in a loop do not trap (more than the
  original non-if-converted loop).  This is true when the memory
  accesses are executed at every iteration of the if-converted loop.

- the writes to memory occur on arrays that are not const qualified.
  This is true when there exists at least one unconditional write to
  the array in the analyzed program.  In this patch this analysis is
  limited to the loop to be if-converted.

	* gimple.c (gimple_op_could_trap_p): Outlined from
	gimple_could_trap_p_1.
	(gimple_could_trap_p_1): Call gimple_op_could_trap_p.
	* gimple.h (gimple_op_could_trap_p): Declared.
	* tree-data-ref.h (same_data_refs_base_objects): New.
	(same_data_refs): New.
	* tree-if-conv.c (memref_read_or_written_unconditionally): New.
	(memrefs_read_or_written_unconditionally): New.
	(write_memref_written_at_least_once): New.
	(write_memrefs_written_at_least_once): New.
	(ifcvt_memrefs_wont_trap): New.
	(operations_could_trap): New.
	(ifcvt_could_trap_p): New.
	(if_convertible_gimple_assign_stmt_p): Call ifcvt_could_trap_p.
	Gets a vector of data refs.
	(if_convertible_stmt_p): Same.
	(if_convertible_loop_p): Before freeing the data refs vector,
	pass it to if_convertible_stmt_p.

	* gcc.dg/tree-ssa/ifc-5.c: New.
	* gcc.dg/vect/vect-cond-1.c: Update pattern.
	* gcc.dg/vect/vect-cond-3.c: Same.
	* gcc.dg/vect/vect-cond-4.c: Same.
	* gcc.dg/vect/vect-cond-6.c: Same.
---
 gcc/gimple.c                            |   38 ++++--
 gcc/gimple.h                            |    1 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c   |   24 ++++
 gcc/testsuite/gcc.dg/vect/vect-cond-1.c |    2 +-
 gcc/testsuite/gcc.dg/vect/vect-cond-3.c |    2 +-
 gcc/testsuite/gcc.dg/vect/vect-cond-4.c |    2 +-
 gcc/testsuite/gcc.dg/vect/vect-cond-6.c |    1 +
 gcc/tree-data-ref.h                     |   34 +++++
 gcc/tree-if-conv.c                      |  220 +++++++++++++++++++++++++++----
 9 files changed, 279 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 707d4e4..f515acc 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2399,24 +2399,14 @@ gimple_rhs_has_side_effects (const_gimple s)
   return false;
 }
 
+/* Helper for gimple_could_trap_p_1.  Return true if the operation of
+   S can trap.  */
 
-/* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
-   Return true if S can trap.  If INCLUDE_LHS is true and S is a
-   GIMPLE_ASSIGN, the LHS of the assignment is also checked.
-   Otherwise, only the RHS of the assignment is checked.  */
-
-static bool
-gimple_could_trap_p_1 (gimple s, bool include_lhs)
+bool
+gimple_op_could_trap_p (gimple s)
 {
-  unsigned i, start;
-  tree t, div = NULL_TREE;
   enum tree_code op;
-
-  start = (is_gimple_assign (s) && !include_lhs) ? 1 : 0;
-
-  for (i = start; i < gimple_num_ops (s); i++)
-    if (tree_could_trap_p (gimple_op (s, i)))
-      return true;
+  tree t, div = NULL_TREE;
 
   switch (gimple_code (s))
     {
@@ -2445,7 +2435,25 @@ gimple_could_trap_p_1 (gimple s, bool include_lhs)
     }
 
   return false;
+}
+
+/* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
+   Return true if S can trap.  If INCLUDE_LHS is true and S is a
+   GIMPLE_ASSIGN, the LHS of the assignment is also checked.
+   Otherwise, only the RHS of the assignment is checked.  */
+
+static bool
+gimple_could_trap_p_1 (gimple s, bool include_lhs)
+{
+  unsigned i, start;
+
+  start = (is_gimple_assign (s) && !include_lhs) ? 1 : 0;
+
+  for (i = start; i < gimple_num_ops (s); i++)
+    if (tree_could_trap_p (gimple_op (s, i)))
+      return true;
 
+  return gimple_op_could_trap_p (s);
 }
 
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index ec7fc93..c446cd3 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -885,6 +885,7 @@ gimple gimple_build_cond_from_tree (tree, tree, tree);
 void gimple_cond_set_condition_from_tree (gimple, tree);
 bool gimple_has_side_effects (const_gimple);
 bool gimple_rhs_has_side_effects (const_gimple);
+bool gimple_op_could_trap_p (gimple);
 bool gimple_could_trap_p (gimple);
 bool gimple_assign_rhs_could_trap_p (gimple);
 void gimple_regimplify_operands (gimple, gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
new file mode 100644
index 0000000..a9cc816
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+void
+dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
+{
+  int i, level, qmul, qadd;
+
+  qadd = (qscale - 1) | 1;
+  qmul = qscale << 1;
+
+  for (i = 0; i <= nCoeffs; i++)
+    {
+      level = block[i];
+      if (level < 0)
+	level = level * qmul - qadd;
+      else
+	level = level * qmul + qadd;
+      block[i] = level;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
index 4ee6713..888e5cb 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
@@ -52,7 +52,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
index 56cfbb2..42b9f35 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
@@ -60,7 +60,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
index c3a1585..1d1d8fc 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
@@ -57,7 +57,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
 
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
index e5e9391..7820444 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
@@ -56,5 +56,6 @@ int main ()
 }
 
 /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
       
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index eff5348..391d6fc 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -417,6 +417,40 @@ extern void create_rdg_vertices (struct graph *, VEC (gimple, heap) *);
 extern bool dr_may_alias_p (const struct data_reference *,
 			    const struct data_reference *);
 
+
+/* Return true when the base objects of data references A and B are
+   the same memory object.  */
+
+static inline bool
+same_data_refs_base_objects (data_reference_p a, data_reference_p b)
+{
+  return DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS (b)
+    && dr_may_alias_p (a, b)
+    && operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0);
+}
+
+/* Return true when the data references A and B are accessing the same
+   memory object with the same access functions.  */
+
+static inline bool
+same_data_refs (data_reference_p a, data_reference_p b)
+{
+  unsigned int i;
+
+  /* The references are exactly the same.  */
+  if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
+    return true;
+
+  if (!same_data_refs_base_objects (a, b))
+    return false;
+
+  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+    if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
+      return false;
+
+  return true;
+}
+
 /* Return true when the DDR contains two data references that have the
    same access functions.  */
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index cac5a3b..f0bc026 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -441,6 +441,166 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
+/* Returns true when the memory reference A at position P in the data
+   references vector DRS and executed under the condition CA, is read
+   or written unconditionally.  */
+
+static bool
+memref_read_or_written_unconditionally (int pos, data_reference_p a,
+					VEC (data_reference_p, heap) *drs,
+					tree ca)
+{
+  int i;
+  data_reference_p b;
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
+    if (i != pos && same_data_refs (a, b))
+      {
+	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
+
+	if (is_true_predicate (cb)
+	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
+	  return true;
+      }
+
+  return false;
+}
+
+/* Returns true when the memory references of STMT are read or written
+   unconditionally.  */
+
+static bool
+memrefs_read_or_written_unconditionally (gimple stmt,
+					 VEC (data_reference_p, heap) *drs)
+{
+  int i;
+  data_reference_p a;
+  tree ca = bb_predicate (gimple_bb (stmt));
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
+    if (DR_STMT (a) == stmt
+	&& !memref_read_or_written_unconditionally (i, a, drs, ca))
+      return false;
+
+  return true;
+}
+
+
+/* Returns true when the memory reference A at position P in the data
+   references vector DRS and executed under the condition CA, is
+   unconditionally written.  */
+
+static bool
+write_memref_written_at_least_once (int pos, data_reference_p a,
+				    VEC (data_reference_p, heap) *drs,
+				    tree ca)
+{
+  int i;
+  data_reference_p b;
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
+    if (i != pos
+	&& !DR_IS_READ (b)
+	&& same_data_refs_base_objects (a, b))
+      {
+	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
+
+	if (is_true_predicate (cb)
+	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
+	  return true;
+      }
+
+  return false;
+}
+
+/* Returns true when the memory references of STMT are unconditionally
+   written.  */
+
+static bool
+write_memrefs_written_at_least_once (gimple stmt,
+				     VEC (data_reference_p, heap) *drs)
+{
+  int i;
+  data_reference_p a;
+  tree ca = bb_predicate (gimple_bb (stmt));
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
+    if (DR_STMT (a) == stmt
+	&& !DR_IS_READ (a)
+	&& !write_memref_written_at_least_once (i, a, drs, ca))
+      return false;
+
+  return true;
+}
+
+/* Return true when the memory references of STMT won't trap in the
+   if-converted code.  There are two things that we have to check for:
+
+   - writes to memory occur to writable memory: if-conversion of
+   memory writes transforms the conditional memory writes into
+   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
+   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
+   be executed at all in the original code, it may be a readonly
+   memory.  To check that A is not const-qualified, we check that
+   there exists at least an unconditional write to A in the current
+   function.
+
+   - reads or writes to memory are valid memory accesses for every
+   iteration.  To check that the memory accesses are correctly formed
+   and that we are allowed to read and write in these locations, we
+   check that the memory accesses to be if-converted occur at every
+   iteration unconditionally.  */
+
+static bool
+ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
+{
+  return write_memrefs_written_at_least_once (stmt, refs)
+    && memrefs_read_or_written_unconditionally (stmt, refs);
+}
+
+/* Same as gimple_could_trap_p_1, returns true when the statement S
+   could trap, but do not check the memory references.  */
+
+static bool
+operations_could_trap (gimple s)
+{
+  unsigned i;
+
+  for (i = 0; i < gimple_num_ops (s); i++)
+    {
+      tree expr = gimple_op (s, i);
+
+      switch (TREE_CODE (expr))
+	{
+	case ARRAY_REF:
+	case INDIRECT_REF:
+	case MISALIGNED_INDIRECT_REF:
+	  break;
+
+	default:
+	  if (tree_could_trap_p (expr))
+	    return true;
+	}
+    }
+
+  return gimple_op_could_trap_p (s);
+}
+
+/* Wrapper around gimple_could_trap_p refined for the needs of the
+   if-conversion.  Try to prove that the memory accesses of STMT could
+   not trap in the innermost loop containing STMT.  */
+
+static bool
+ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+{
+  if (gimple_has_mem_ops (stmt)
+      && ifcvt_memrefs_wont_trap (stmt, refs)
+      && !operations_could_trap (stmt))
+    return false;
+
+  return gimple_could_trap_p (stmt);
+}
+
 /* Return true when STMT is if-convertible.
 
    GIMPLE_ASSIGN statement is not if-convertible if,
@@ -449,7 +609,8 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
    - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (gimple stmt)
+if_convertible_gimple_assign_stmt_p (gimple stmt,
+				     VEC (data_reference_p, heap) *refs)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -477,7 +638,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
 
   if (flag_tree_loop_if_convert_memory_writes)
     {
-      if (gimple_could_trap_p (stmt))
+      if (ifcvt_could_trap_p (stmt, refs))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "tree could trap...\n");
@@ -517,7 +678,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt)
+if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
 {
   switch (gimple_code (stmt))
     {
@@ -527,7 +688,7 @@ if_convertible_stmt_p (gimple stmt)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt);
+      return if_convertible_gimple_assign_stmt_p (stmt, refs);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -810,6 +971,9 @@ if_convertible_loop_p (struct loop *loop)
   edge e;
   edge_iterator ei;
   basic_block exit_bb = NULL;
+  bool res = false;
+  VEC (data_reference_p, heap) *refs;
+  VEC (ddr_p, heap) *ddrs;
 
   /* Handle only innermost loop.  */
   if (!loop || loop->inner)
@@ -847,18 +1011,14 @@ if_convertible_loop_p (struct loop *loop)
 
   /* Don't if-convert the loop when the data dependences cannot be
      computed: the loop won't be vectorized in that case.  */
-  {
-    VEC (data_reference_p, heap) *refs = VEC_alloc (data_reference_p, heap, 5);
-    VEC (ddr_p, heap) *ddrs = VEC_alloc (ddr_p, heap, 25);
-    bool res = compute_data_dependences_for_loop (loop, true, &refs, &ddrs);
-
-    free_data_refs (refs);
-    free_dependence_relations (ddrs);
+  refs = VEC_alloc (data_reference_p, heap, 5);
+  ddrs = VEC_alloc (ddr_p, heap, 25);
+  res = compute_data_dependences_for_loop (loop, true, &refs, &ddrs);
 
-    if (!res)
-      return false;
-  }
+  if (!res)
+    goto cleanup;
 
+  res = false;
   calculate_dominance_info (CDI_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
@@ -867,7 +1027,7 @@ if_convertible_loop_p (struct loop *loop)
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Irreducible loop\n");
-      return false;
+      goto cleanup;
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -875,14 +1035,18 @@ if_convertible_loop_p (struct loop *loop)
       basic_block bb = ifc_bbs[i];
 
       if (!if_convertible_bb_p (loop, bb, exit_bb))
-	return false;
+	goto cleanup;
 
       if (bb_with_exit_edge_p (loop, bb))
 	exit_bb = bb;
     }
 
-  if (!predicate_bbs (loop))
-    return false;
+  res = predicate_bbs (loop);
+
+  if (!res)
+    goto cleanup;
+
+  res = false;
 
   for (i = 0; i < loop->num_nodes; i++)
     {
@@ -891,21 +1055,23 @@ if_convertible_loop_p (struct loop *loop)
 
       for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
 	if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr)))
-	  return false;
-
-      /* For non predicated BBs, don't check their statements.  */
-      if (!is_predicated (bb))
-	continue;
+	  goto cleanup;
 
-      for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	if (!if_convertible_stmt_p (gsi_stmt (itr)))
-	  return false;
+      /* Check the if-convertibility of statements in predicated BBs.  */
+      if (is_predicated (bb))
+	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr), refs))
+	    goto cleanup;
     }
 
+  res = true;
   if (dump_file)
     fprintf (dump_file, "Applying if-conversion\n");
 
-  return true;
+ cleanup:
+  free_data_refs (refs);
+  free_dependence_relations (ddrs);
+  return res;
 }
 
 /* Basic block BB has two predecessors.  Using predecessor's bb
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results.
  2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
                   ` (2 preceding siblings ...)
  2010-07-08 21:42 ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
@ 2010-07-08 21:42 ` Sebastian Pop
  2010-08-13  9:02   ` Richard Guenther
  2010-08-12 16:19 ` [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
  4 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-08 21:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

This patch speeds up the ifcvt_memrefs_wont_trap computation by
caching the results of the computations in the data references ->aux
fields.

	* tree-if-conv.c (struct ifc_dr): New.
	(memref_read_or_written_unconditionally): Use the cached values
	when possible.
	(write_memref_written_at_least_once): Same.
	(if_convertible_loop_p): Initialize and free DR->aux fields.
---
 gcc/tree-if-conv.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f0bc026..655c168 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -441,6 +441,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
+/* Records the status of a data reference.  This struct is attached to
+   each DR->aux field.  */
+
+struct ifc_dr {
+  /* -1 when not initialized, 0 when false, 1 when true.  */
+  int dr_is_written_at_least_once;
+
+  /* -1 when not initialized, 0 when false, 1 when true.  */
+  int dr_is_rw_unconditionally;
+};
+
 /* Returns true when the memory reference A at position P in the data
    references vector DRS and executed under the condition CA, is read
    or written unconditionally.  */
@@ -452,17 +463,27 @@ memref_read_or_written_unconditionally (int pos, data_reference_p a,
 {
   int i;
   data_reference_p b;
+  int x = ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally;
+
+  if (x != -1)
+    return x;
 
   for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
     if (i != pos && same_data_refs (a, b))
       {
 	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
 
-	if (is_true_predicate (cb)
+	if (((struct ifc_dr *) b->aux)->dr_is_rw_unconditionally == 1
+	    || is_true_predicate (cb)
 	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
-	  return true;
+	  {
+	    ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally = 1;
+	    ((struct ifc_dr *) b->aux)->dr_is_rw_unconditionally = 1;
+	    return true;
+	  }
       }
 
+  ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally = 0;
   return false;
 }
 
@@ -497,6 +518,10 @@ write_memref_written_at_least_once (int pos, data_reference_p a,
 {
   int i;
   data_reference_p b;
+  int x = ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once;
+
+  if (x != -1)
+    return x;
 
   for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
     if (i != pos
@@ -505,11 +530,17 @@ write_memref_written_at_least_once (int pos, data_reference_p a,
       {
 	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
 
-	if (is_true_predicate (cb)
+	if (((struct ifc_dr *) b->aux)->dr_is_written_at_least_once == 1
+	    || is_true_predicate (cb)
 	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
-	  return true;
+	  {
+	    ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once = 1;
+	    ((struct ifc_dr *) b->aux)->dr_is_written_at_least_once = 1;
+	    return true;
+	  }
       }
 
+  ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once = 0;
   return false;
 }
 
@@ -972,6 +1003,7 @@ if_convertible_loop_p (struct loop *loop)
   edge_iterator ei;
   basic_block exit_bb = NULL;
   bool res = false;
+  data_reference_p dr;
   VEC (data_reference_p, heap) *refs;
   VEC (ddr_p, heap) *ddrs;
 
@@ -1048,6 +1080,14 @@ if_convertible_loop_p (struct loop *loop)
 
   res = false;
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
+      {
+	dr->aux = XNEW (struct ifc_dr);
+	((struct ifc_dr *) dr->aux)->dr_is_written_at_least_once = -1;
+	((struct ifc_dr *) dr->aux)->dr_is_rw_unconditionally = -1;
+      }
+
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -1069,6 +1109,11 @@ if_convertible_loop_p (struct loop *loop)
     fprintf (dump_file, "Applying if-conversion\n");
 
  cleanup:
+
+  if (flag_tree_loop_if_convert_memory_writes)
+    for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
+      free (dr->aux);
+
   free_data_refs (refs);
   free_dependence_relations (ddrs);
   return res;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
  2010-07-08 21:41 ` [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list Sebastian Pop
@ 2010-07-08 21:42 ` Sebastian Pop
  2010-08-13  8:58   ` Richard Guenther
  2010-07-08 21:42 ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-08 21:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

This patch adds a flag that controls the replacement of the memory
writes that are in predicated basic blocks with a full write:

for (...)
  if (cond)
    A[i] = foo

is replaced with:

for (...)
  A[i] = cond ? foo : A[i]

In order to do this, we have to call gimple_could_trap_p instead of
gimple_assign_rhs_could_trap_p, as we have to also check that the LHS
of assign stmts does not trap.

	* common.opt (ftree-loop-if-convert-memory-writes): New flag.
	* doc/invoke.texi (ftree-loop-if-convert-memory-writes): Documented.
	* tree-if-conv.c (if_convertible_phi_p): Allow virtual phi nodes when
	flag_loop_if_convert_memory_writes is set.
	(if_convertible_gimple_assign_stmt_p): Allow memory reads and writes
	Do not handle types that do not match is_gimple_reg_type.
	Remove loop and bb parameters.  Call gimple_could_trap_p instead of
	when flag_loop_if_convert_memory_writes	is set, as LHS can contain
	memory refs.
	(if_convertible_stmt_p): Remove loop and bb parameters.  Update calls
	to if_convertible_gimple_assign_stmt_p.
	(if_convertible_loop_p): Update call to if_convertible_stmt_p.
	(gimplify_condition): New.
	(replace_phi_with_cond_gimple_assign_stmt): Renamed
	predicate_scalar_phi.  Do not handle virtual phi nodes.
	(ifconvert_phi_nodes): Renamed predicate_all_scalar_phis.
	Call predicate_scalar_phi.
	(insert_gimplified_predicates): Insert the gimplified predicate of a BB
	just after the labels for flag_loop_if_convert_memory_writes, otherwise
	insert the predicate in the end of the BB.
	(predicate_mem_writes): New.
	(combine_blocks): Call predicate_all_scalar_phis.  When
	flag_loop_if_convert_memory_writes is set, call predicate_mem_writes.
	(tree_if_conversion): Call mark_sym_for_renaming when
	flag_loop_if_convert_memory_writes is set.
	(main_tree_if_conversion): Call update_ssa when
	flag_loop_if_convert_memory_writes is set.

	* gcc.dg/tree-ssa/ifc-4.c: New.
	* gcc.dg/tree-ssa/ifc-7.c: New.
---
 gcc/common.opt                        |    4 +
 gcc/doc/invoke.texi                   |   20 ++++-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c |   53 ++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c |   29 +++++
 gcc/tree-if-conv.c                    |  181 ++++++++++++++++++++++++++-------
 5 files changed, 247 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 111d7b7..ceb94f8 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -657,6 +657,10 @@ ftree-loop-if-convert
 Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
 Convert conditional jumps in innermost loops to branchless equivalents
 
+ftree-loop-if-convert-memory-writes
+Common Report Var(flag_tree_loop_if_convert_memory_writes) Optimization
+Also if-convert conditional jumps containing memory writes
+
 ; -finhibit-size-directive inhibits output of .size for ELF.
 ; This is used only for compiling crtstuff.c,
 ; and it may be extended to other effects
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4d6a1af..3634069 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -383,7 +383,8 @@ Objective-C and Objective-C++ Dialects}.
 -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
--ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
+-ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
+-ftree-loop-if-convert-memory-writes -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
@@ -6890,6 +6891,23 @@ the innermost loops in order to improve the ability of the
 vectorization pass to handle these loops.  This is enabled by default
 if vectorization is enabled.
 
+@item -ftree-loop-if-convert-memory-writes
+Attempt to also if-convert conditional jumps containing memory writes.
+This transformation can be unsafe for multi-threaded programs as it
+transforms conditional memory writes into unconditional memory writes.
+For example,
+@smallexample
+for (i = 0; i < N; i++)
+  if (cond)
+    A[i] = expr;
+@end smallexample
+would be transformed to
+@smallexample
+for (i = 0; i < N; i++)
+  A[i] = cond ? expr : A[i];
+@end smallexample
+potentially producing data races.
+
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
 big loop bodies and allow further loop optimizations, like
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
new file mode 100644
index 0000000..beb1a0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+struct ht
+{
+  void * (*alloc_subobject) (int);
+};
+typedef struct cpp_reader cpp_reader;
+typedef struct cpp_token cpp_token;
+typedef struct cpp_macro cpp_macro;
+enum cpp_ttype
+{
+    CPP_PASTE,
+};
+struct cpp_token {
+  __extension__ enum cpp_ttype type : 8;
+} cpp_comment_table;
+struct cpp_macro {
+  union cpp_macro_u
+  {
+    cpp_token * tokens;
+  } exp;
+  unsigned int count;
+};
+struct cpp_reader
+{
+  struct ht *hash_table;
+};
+create_iso_definition (cpp_reader *pfile, cpp_macro *macro)
+{
+  unsigned int num_extra_tokens = 0;
+  {
+    cpp_token *tokns =
+      (cpp_token *) pfile->hash_table->alloc_subobject (sizeof (cpp_token)
+							* macro->count);
+    {
+      cpp_token *normal_dest = tokns;
+      cpp_token *extra_dest = tokns + macro->count - num_extra_tokens;
+      unsigned int i;
+      for (i = 0; i < macro->count; i++)
+	{
+	  if (macro->exp.tokens[i].type == CPP_PASTE)
+	    *extra_dest++ = macro->exp.tokens[i];
+	  else
+	    *normal_dest++ = macro->exp.tokens[i];
+	}
+    }
+  }
+}
+
+/* This cannot be if-converted because the stores are to aggregate types.  */
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 0 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
new file mode 100644
index 0000000..4d26dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef struct eqn_d
+{
+  int *coef;
+} *eqn;
+typedef struct omega_pb_d
+{
+  eqn subs;
+} *omega_pb;
+
+omega_pb omega_solve_problem (omega_pb);
+
+omega_pb
+omega_solve_geq (omega_pb pb, int n)
+{
+  int i, e;
+  int j = 0;
+
+  for (e = n - 1; e >= 0; e--)
+    if (pb->subs[e].coef[i] != pb->subs[e].coef[j])
+      {
+	pb->subs[e].coef[i] = j;
+	pb->subs[e].coef[j] = i;
+      }
+
+  return omega_solve_problem (pb);
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 7058ee4..34b4159 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -385,9 +385,12 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    and it belongs to basic block BB.
 
    PHI is not if-convertible if:
-   - it has more than 2 arguments,
-   - virtual PHI is immediately used in another PHI node,
-   - virtual PHI on BB other than header.  */
+   - it has more than 2 arguments.
+
+   When the flag_tree_loop_if_convert_memory_writes is not set, PHI is not
+   if-convertible if:
+   - a virtual PHI is immediately used in another PHI node,
+   - there is a virtual PHI in a BB other than the loop->header.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
@@ -405,6 +408,12 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
       return false;
     }
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    return true;
+
+  /* When the flag_tree_loop_if_convert_memory_writes is not set, check
+     that there are no memory writes in the branches of the loop to be
+     if-converted.  */
   if (!is_gimple_reg (SSA_NAME_VAR (gimple_phi_result (phi))))
     {
       imm_use_iterator imm_iter;
@@ -413,9 +422,10 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
       if (bb != loop->header)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Virtual phi not on loop header.\n");
+	    fprintf (dump_file, "Virtual phi not on loop->header.\n");
 	  return false;
 	}
+
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
 	{
 	  if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
@@ -435,15 +445,13 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
    GIMPLE_ASSIGN statement is not if-convertible if,
    - it is not movable,
    - it could trap,
-   - LHS is not var decl.
-
-   GIMPLE_ASSIGN is part of block BB, which is inside loop LOOP.  */
+   - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
-    				     gimple stmt)
+if_convertible_gimple_assign_stmt_p (gimple stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
+  basic_block bb;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -451,6 +459,9 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
     }
 
+  if (!is_gimple_reg_type (TREE_TYPE (lhs)))
+    return false;
+
   /* Some of these constrains might be too conservative.  */
   if (stmt_ends_bb_p (stmt)
       || gimple_has_volatile_ops (stmt)
@@ -463,6 +474,17 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       return false;
     }
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    {
+      if (gimple_could_trap_p (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "tree could trap...\n");
+	  return false;
+	}
+      return true;
+    }
+
   if (gimple_assign_rhs_could_trap_p (stmt))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -470,9 +492,11 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       return false;
     }
 
+  bb = gimple_bb (stmt);
+
   if (TREE_CODE (lhs) != SSA_NAME
-      && bb != loop->header
-      && !bb_with_exit_edge_p (loop, bb))
+      && bb != bb->loop_father->header
+      && !bb_with_exit_edge_p (bb->loop_father, bb))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -489,12 +513,10 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSGIN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.
-
-   STMT is inside BB, which is inside loop LOOP.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
+if_convertible_stmt_p (gimple stmt)
 {
   switch (gimple_code (stmt))
     {
@@ -504,7 +526,7 @@ if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (loop, bb, stmt);
+      return if_convertible_gimple_assign_stmt_p (stmt);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -875,7 +897,7 @@ if_convertible_loop_p (struct loop *loop)
 	continue;
 
       for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	if (!if_convertible_stmt_p (loop, bb, gsi_stmt (itr)))
+	if (!if_convertible_stmt_p (gsi_stmt (itr)))
 	  return false;
     }
 
@@ -895,7 +917,7 @@ if_convertible_loop_p (struct loop *loop)
 static basic_block
 find_phi_replacement_condition (struct loop *loop,
 				basic_block bb, tree *cond,
-                                gimple_stmt_iterator *gsi)
+				gimple_stmt_iterator *gsi)
 {
   edge first_edge, second_edge;
   tree tmp_cond;
@@ -980,8 +1002,9 @@ find_phi_replacement_condition (struct loop *loop,
   return first_edge->src;
 }
 
-/* Replace PHI node with conditional modify expr using COND.  This
-   routine does not handle PHI nodes with more than two arguments.
+/* Replace a scalar PHI node with a COND_EXPR using COND as condition.
+   This routine does not handle PHI nodes with more than two
+   arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)
@@ -993,18 +1016,22 @@ find_phi_replacement_condition (struct loop *loop,
    TRUE_BB is selected.  */
 
 static void
-replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
-    					  basic_block true_bb,
-                                   	  gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gimple phi, tree cond,
+		      basic_block true_bb,
+		      gimple_stmt_iterator *gsi)
 {
   gimple new_stmt;
   basic_block bb;
-  tree rhs;
-  tree arg;
+  tree rhs, res, arg;
 
   gcc_assert (gimple_code (phi) == GIMPLE_PHI
 	      && gimple_phi_num_args (phi) == 2);
 
+  res = gimple_phi_result (phi);
+  /* Do not handle virtual phi nodes.  */
+  if (!is_gimple_reg (SSA_NAME_VAR (res)))
+    return;
+
   bb = gimple_bb (phi);
 
   arg = degenerate_phi_result (phi);
@@ -1026,11 +1053,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
 	}
 
       /* Build new RHS using selected condition and arguments.  */
-      rhs = build3 (COND_EXPR, TREE_TYPE (PHI_RESULT (phi)),
+      rhs = build3 (COND_EXPR, TREE_TYPE (res),
 		    unshare_expr (cond), arg_0, arg_1);
     }
 
-  new_stmt = gimple_build_assign (PHI_RESULT (phi), rhs);
+  new_stmt = gimple_build_assign (res, rhs);
   SSA_NAME_DEF_STMT (gimple_phi_result (phi)) = new_stmt;
   gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
   update_stmt (new_stmt);
@@ -1042,11 +1069,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
     }
 }
 
-/* Replaces in LOOP all the phi nodes other than those in the
+/* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
 static void
-ifconvert_phi_nodes (struct loop *loop)
+predicate_all_scalar_phis (struct loop *loop)
 {
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1075,7 +1102,7 @@ ifconvert_phi_nodes (struct loop *loop)
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  replace_phi_with_cond_gimple_assign_stmt (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1095,7 +1122,7 @@ insert_gimplified_predicates (loop_p loop)
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
-      gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
+      gimple_seq stmts;
 
       if (!is_predicated (bb))
 	{
@@ -1106,15 +1133,30 @@ insert_gimplified_predicates (loop_p loop)
 	  continue;
 	}
 
+      stmts = bb_predicate_gimplified_stmts (bb);
       if (stmts)
 	{
-	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
-
-	  if (gsi_end_p (gsi)
-	      || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
-	    gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	  if (flag_tree_loop_if_convert_memory_writes)
+	    {
+	      /* Insert the predicate of the BB just after the label,
+		 as the if-conversion of memory writes will use this
+		 predicate.  */
+	      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+	      gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	    }
 	  else
-	    gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
+	    {
+	      /* Insert the predicate of the BB at the end of the BB
+		 as this would reduce the register pressure: the only
+		 use of this predicate will be in successor BBs.  */
+	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
+
+	      if (gsi_end_p (gsi)
+		  || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
+		gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	      else
+		gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
+	    }
 
 	  /* Once the sequence is code generated, set it to NULL.  */
 	  set_bb_predicate_gimplified_stmts (bb, NULL);
@@ -1122,6 +1164,56 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Predicate each write to memory in LOOP.
+
+   Replace a statement "A[i] = foo" with "A[i] = cond ? foo : A[i]"
+   with the condition COND determined from the predicate of the basic
+   block containing the statement.  */
+
+static void
+predicate_mem_writes (loop_p loop)
+{
+  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+
+  for (i = 1; i < orig_loop_num_nodes; i++)
+    {
+      gimple_stmt_iterator gsi;
+      basic_block bb = ifc_bbs[i];
+      tree cond = bb_predicate (bb);
+
+      if (is_true_predicate (cond))
+	continue;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (is_gimple_assign (gsi_stmt (gsi))
+	    && gimple_vdef (gsi_stmt (gsi)))
+	  {
+	    gimple new_stmt, lhs_stmt, rhs_stmt;
+	    gimple stmt = gsi_stmt (gsi);
+	    tree lhs = gimple_get_lhs (stmt);
+	    tree rhs = gimple_op (stmt, 1);
+
+	    gcc_assert (gimple_num_ops (stmt) == 2);
+
+	    rhs_stmt = ifc_temp_var (TREE_TYPE (rhs), unshare_expr (rhs));
+	    gsi_insert_before (&gsi, rhs_stmt, GSI_SAME_STMT);
+	    rhs = gimple_get_lhs (rhs_stmt);
+
+	    lhs_stmt = ifc_temp_var (TREE_TYPE (lhs), unshare_expr (lhs));
+	    gsi_insert_before (&gsi, lhs_stmt, GSI_SAME_STMT);
+	    lhs = gimple_get_lhs (lhs_stmt);
+
+	    cond = unshare_expr (cond);
+	    rhs = build3 (COND_EXPR, TREE_TYPE (lhs), cond, rhs, lhs);
+
+	    new_stmt = ifc_temp_var (TREE_TYPE (lhs), rhs);
+	    gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+	    gimple_set_op (stmt, 1, gimple_get_lhs (new_stmt));
+	    update_stmt (stmt);
+	  }
+    }
+}
+
 /* Remove all GIMPLE_CONDs and GIMPLE_LABELs of all the basic blocks
    other than the exit and latch of the LOOP.  Also resets the
    GIMPLE_DEBUG information.  */
@@ -1178,7 +1270,10 @@ combine_blocks (struct loop *loop)
 
   remove_conditions_and_labels (loop);
   insert_gimplified_predicates (loop);
-  ifconvert_phi_nodes (loop);
+  predicate_all_scalar_phis (loop);
+
+  if (flag_tree_loop_if_convert_memory_writes)
+    predicate_mem_writes (loop);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -1280,6 +1375,10 @@ tree_if_conversion (struct loop *loop)
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
   combine_blocks (loop);
+
+  if (flag_tree_loop_if_convert_memory_writes)
+    mark_sym_for_renaming (gimple_vop (cfun));
+
   changed = true;
 
  cleanup:
@@ -1312,6 +1411,9 @@ main_tree_if_conversion (void)
   FOR_EACH_LOOP (li, loop, 0)
     changed |= tree_if_conversion (loop);
 
+  if (changed && flag_tree_loop_if_convert_memory_writes)
+    update_ssa (TODO_update_ssa);
+
   return changed ? TODO_cleanup_cfg : 0;
 }
 
@@ -1321,7 +1423,8 @@ static bool
 gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
-	  || flag_tree_loop_if_convert == 1);
+	  || flag_tree_loop_if_convert == 1
+	  || flag_tree_loop_if_convert_memory_writes == 1);
 }
 
 struct gimple_opt_pass pass_if_conversion =
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-08 21:41 ` [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list Sebastian Pop
@ 2010-07-09 12:13   ` Richard Guenther
  2010-07-09 16:43     ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guenther @ 2010-07-09 12:13 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Thu, 8 Jul 2010, Sebastian Pop wrote:

> 	* tree-if-conv.c (fold_or_predicates): New.
> 	(add_to_predicate_list): Call it.

I requested this change before the previous patch was checked in.
Please do not ignore reviews this way.  Thanks.

> ---
>  gcc/tree-if-conv.c |   43 ++++++++++++++++++++++---------------------
>  1 files changed, 22 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 34b4159..cac5a3b 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -300,6 +300,27 @@ parse_predicate (tree cond, tree *op0, tree *op1)
>    return ERROR_MARK;
>  }
>  
> +/* Returns the fold of predicate C1 OR C2.  */
> +
> +static tree
> +fold_or_predicates (tree c1, tree c2)
> +{
> +  tree op1a, op1b, op2a, op2b;
> +  enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
> +  enum tree_code code2 = parse_predicate (c2, &op2a, &op2b);
> +
> +  if (code1 != ERROR_MARK && code2 != ERROR_MARK)
> +    {
> +      tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
> +					  code2, op2a, op2b);
> +      if (t)
> +	return t;
> +    }
> +
> +  return fold_build2_loc (UNKNOWN_LOCATION, TRUTH_OR_EXPR,

Why did you change it to UNKNOWN_LOCATION?  Instead pass in
the location from the caller here.

Ok with that change (_not_ as a followup).

Thanks,
Richard.

> +			  boolean_type_node, c1, c2);
> +}
> +
>  /* Add condition NC to the predicate list of basic block BB.  */
>  
>  static inline void
> @@ -313,27 +334,7 @@ add_to_predicate_list (basic_block bb, tree nc)
>    if (!is_predicated (bb))
>      bc = nc;
>    else
> -    {
> -      enum tree_code code1, code2;
> -      tree op1a, op1b, op2a, op2b;
> -
> -      bc = bb_predicate (bb);
> -      code1 = parse_predicate (bc, &op1a, &op1b);
> -      code2 = parse_predicate (nc, &op2a, &op2b);
> -
> -      if (code1 != ERROR_MARK && code2 != ERROR_MARK)
> -	{
> -	  tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
> -					      code2, op2a, op2b);
> -	  if (!t)
> -	    t = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
> -				 boolean_type_node, bc, nc);
> -	  bc = t;
> -	}
> -      else
> -	bc = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
> -			      boolean_type_node, bc, nc);
> -    }
> +    bc = fold_or_predicates (nc, bb_predicate (bb));
>  
>    if (!is_gimple_condexpr (bc))
>      {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-09 12:13   ` Richard Guenther
@ 2010-07-09 16:43     ` Sebastian Pop
  2010-07-09 17:04       ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-09 16:43 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Fri, Jul 9, 2010 at 07:12, Richard Guenther <rguenther@suse.de> wrote:
> On Thu, 8 Jul 2010, Sebastian Pop wrote:
>
>>       * tree-if-conv.c (fold_or_predicates): New.
>>       (add_to_predicate_list): Call it.
>
> I requested this change before the previous patch was checked in.

No, you did not request this change.

> Please do not ignore reviews this way.  Thanks.

I do not ignore reviews.

>
>> ---
>>  gcc/tree-if-conv.c |   43 ++++++++++++++++++++++---------------------
>>  1 files changed, 22 insertions(+), 21 deletions(-)
>>
>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index 34b4159..cac5a3b 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -300,6 +300,27 @@ parse_predicate (tree cond, tree *op0, tree *op1)
>>    return ERROR_MARK;
>>  }
>>
>> +/* Returns the fold of predicate C1 OR C2.  */
>> +
>> +static tree
>> +fold_or_predicates (tree c1, tree c2)
>> +{
>> +  tree op1a, op1b, op2a, op2b;
>> +  enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
>> +  enum tree_code code2 = parse_predicate (c2, &op2a, &op2b);
>> +
>> +  if (code1 != ERROR_MARK && code2 != ERROR_MARK)
>> +    {
>> +      tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
>> +                                       code2, op2a, op2b);
>> +      if (t)
>> +     return t;
>> +    }
>> +
>> +  return fold_build2_loc (UNKNOWN_LOCATION, TRUTH_OR_EXPR,
>
> Why did you change it to UNKNOWN_LOCATION?  Instead pass in
> the location from the caller here.
>
> Ok with that change (_not_ as a followup).
>

I will do this and I will post the updated patch.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-09 16:43     ` Sebastian Pop
@ 2010-07-09 17:04       ` Sebastian Pop
  2010-07-09 18:25         ` Richard Guenther
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-07-09 17:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Fri, Jul 9, 2010 at 11:42, Sebastian Pop <sebpop@gmail.com> wrote:
> On Fri, Jul 9, 2010 at 07:12, Richard Guenther <rguenther@suse.de> wrote:
>> On Thu, 8 Jul 2010, Sebastian Pop wrote:
>>
>>>       * tree-if-conv.c (fold_or_predicates): New.
>>>       (add_to_predicate_list): Call it.
>>

Please see attached the updated patch.
The changes with respect to the previous patch are:

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index b0ac9b2..0f1caaa 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -300,10 +300,10 @@ parse_predicate (tree cond, tree *op0, tree *op1)
   return ERROR_MARK;
 }

-/* Returns the fold of predicate C1 OR C2.  */
+/* Returns the fold of predicate C1 OR C2 at location LOC.  */

 static tree
-fold_or_predicates (tree c1, tree c2)
+fold_or_predicates (location_t loc, tree c1, tree c2)
 {
   tree op1a, op1b, op2a, op2b;
   enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
@@ -317,8 +317,7 @@ fold_or_predicates (tree c1, tree c2)
 	return t;
     }

-  return fold_build2_loc (UNKNOWN_LOCATION, TRUTH_OR_EXPR,
-			  boolean_type_node, c1, c2);
+  return fold_build2_loc (loc, TRUTH_OR_EXPR, boolean_type_node, c1, c2);
 }

 /* Add condition NC to the predicate list of basic block BB.  */
@@ -334,7 +333,10 @@ add_to_predicate_list (basic_block bb, tree nc)
   if (!is_predicated (bb))
     bc = nc;
   else
-    bc = fold_or_predicates (nc, bb_predicate (bb));
+    {
+      bc = bb_predicate (bb);
+      bc = fold_or_predicates (EXPR_LOCATION (bc), nc, bc);
+    }

   if (!is_gimple_condexpr (bc))
     {

[-- Attachment #2: 0001-Outline-fold_or_predicates-from-add_to_predicate_lis.patch --]
[-- Type: text/x-patch, Size: 2107 bytes --]

From bf4fc497c5d8c7b02b1b03640db6ddff65a227d0 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Thu, 8 Jul 2010 13:15:14 -0500
Subject: [PATCH] Outline fold_or_predicates from add_to_predicate_list.

	* tree-if-conv.c (fold_or_predicates): New.
	(add_to_predicate_list): Call it.
---
 gcc/tree-if-conv.c |   39 +++++++++++++++++++++------------------
 1 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 7058ee4..0f1caaa 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -300,6 +300,26 @@ parse_predicate (tree cond, tree *op0, tree *op1)
   return ERROR_MARK;
 }
 
+/* Returns the fold of predicate C1 OR C2 at location LOC.  */
+
+static tree
+fold_or_predicates (location_t loc, tree c1, tree c2)
+{
+  tree op1a, op1b, op2a, op2b;
+  enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
+  enum tree_code code2 = parse_predicate (c2, &op2a, &op2b);
+
+  if (code1 != ERROR_MARK && code2 != ERROR_MARK)
+    {
+      tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
+					  code2, op2a, op2b);
+      if (t)
+	return t;
+    }
+
+  return fold_build2_loc (loc, TRUTH_OR_EXPR, boolean_type_node, c1, c2);
+}
+
 /* Add condition NC to the predicate list of basic block BB.  */
 
 static inline void
@@ -314,25 +334,8 @@ add_to_predicate_list (basic_block bb, tree nc)
     bc = nc;
   else
     {
-      enum tree_code code1, code2;
-      tree op1a, op1b, op2a, op2b;
-
       bc = bb_predicate (bb);
-      code1 = parse_predicate (bc, &op1a, &op1b);
-      code2 = parse_predicate (nc, &op2a, &op2b);
-
-      if (code1 != ERROR_MARK && code2 != ERROR_MARK)
-	{
-	  tree t = maybe_fold_or_comparisons (code1, op1a, op1b,
-					      code2, op2a, op2b);
-	  if (!t)
-	    t = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
-				 boolean_type_node, bc, nc);
-	  bc = t;
-	}
-      else
-	bc = fold_build2_loc (EXPR_LOCATION (bc), TRUTH_OR_EXPR,
-			      boolean_type_node, bc, nc);
+      bc = fold_or_predicates (EXPR_LOCATION (bc), nc, bc);
     }
 
   if (!is_gimple_condexpr (bc))
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-09 17:04       ` Sebastian Pop
@ 2010-07-09 18:25         ` Richard Guenther
  2010-07-09 18:59           ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guenther @ 2010-07-09 18:25 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches, matz

On Fri, Jul 9, 2010 at 7:03 PM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Fri, Jul 9, 2010 at 11:42, Sebastian Pop <sebpop@gmail.com> wrote:
>> On Fri, Jul 9, 2010 at 07:12, Richard Guenther <rguenther@suse.de> wrote:
>>> On Thu, 8 Jul 2010, Sebastian Pop wrote:
>>>
>>>>       * tree-if-conv.c (fold_or_predicates): New.
>>>>       (add_to_predicate_list): Call it.
>>>
>
> Please see attached the updated patch.
> The changes with respect to the previous patch are:

Ok.

Thanks,
Richard.

> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index b0ac9b2..0f1caaa 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -300,10 +300,10 @@ parse_predicate (tree cond, tree *op0, tree *op1)
>   return ERROR_MARK;
>  }
>
> -/* Returns the fold of predicate C1 OR C2.  */
> +/* Returns the fold of predicate C1 OR C2 at location LOC.  */
>
>  static tree
> -fold_or_predicates (tree c1, tree c2)
> +fold_or_predicates (location_t loc, tree c1, tree c2)
>  {
>   tree op1a, op1b, op2a, op2b;
>   enum tree_code code1 = parse_predicate (c1, &op1a, &op1b);
> @@ -317,8 +317,7 @@ fold_or_predicates (tree c1, tree c2)
>        return t;
>     }
>
> -  return fold_build2_loc (UNKNOWN_LOCATION, TRUTH_OR_EXPR,
> -                         boolean_type_node, c1, c2);
> +  return fold_build2_loc (loc, TRUTH_OR_EXPR, boolean_type_node, c1, c2);
>  }
>
>  /* Add condition NC to the predicate list of basic block BB.  */
> @@ -334,7 +333,10 @@ add_to_predicate_list (basic_block bb, tree nc)
>   if (!is_predicated (bb))
>     bc = nc;
>   else
> -    bc = fold_or_predicates (nc, bb_predicate (bb));
> +    {
> +      bc = bb_predicate (bb);
> +      bc = fold_or_predicates (EXPR_LOCATION (bc), nc, bc);
> +    }
>
>   if (!is_gimple_condexpr (bc))
>     {
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list.
  2010-07-09 18:25         ` Richard Guenther
@ 2010-07-09 18:59           ` Sebastian Pop
  0 siblings, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-07-09 18:59 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Guenther, gcc-patches, matz

On Fri, Jul 9, 2010 at 13:25, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 9, 2010 at 7:03 PM, Sebastian Pop <sebpop@gmail.com> wrote:
>> On Fri, Jul 9, 2010 at 11:42, Sebastian Pop <sebpop@gmail.com> wrote:
>>> On Fri, Jul 9, 2010 at 07:12, Richard Guenther <rguenther@suse.de> wrote:
>>>> On Thu, 8 Jul 2010, Sebastian Pop wrote:
>>>>
>>>>>       * tree-if-conv.c (fold_or_predicates): New.
>>>>>       (add_to_predicate_list): Call it.
>>>>
>>
>> Please see attached the updated patch.
>> The changes with respect to the previous patch are:
>
> Ok.
>

Committed r162007.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] if-conversion of loops with conditionals containing memory writes
  2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
                   ` (3 preceding siblings ...)
  2010-07-08 21:42 ` [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
@ 2010-08-12 16:19 ` Sebastian Pop
  4 siblings, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-12 16:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, matz, Sebastian Pop

On Thu, Jul 8, 2010 at 16:41, Sebastian Pop <sebpop@gmail.com> wrote:
> Hi,
>
> following Michael's review, this patch-set reworks the previous
> patches in order to speed up the computation of non-trapping property
> of data reference accesses in the statements to be if-converted.
>
> 1/4 is identical to the previously submitted patch.
> 2/4 is a trivial clean up.
> 3/4 simplifies the control flow in the previously submitted patch and
>    now calls the function outlined in 2/4 to OR predicates.
> 4/4 is the speeding up part that Michael proposed.
>
> This patch-set passed regstrap on amd64-linux.  Ok for trunk?

Ping.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-07-08 21:42 ` [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
@ 2010-08-13  8:58   ` Richard Guenther
  2010-08-13 20:43     ` Sebastian Pop
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-13  8:58 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Thu, 8 Jul 2010, Sebastian Pop wrote:

> This patch adds a flag that controls the replacement of the memory
> writes that are in predicated basic blocks with a full write:
> 
> for (...)
>   if (cond)
>     A[i] = foo
> 
> is replaced with:
> 
> for (...)
>   A[i] = cond ? foo : A[i]
> 
> In order to do this, we have to call gimple_could_trap_p instead of
> gimple_assign_rhs_could_trap_p, as we have to also check that the LHS
> of assign stmts does not trap.

Comments below.

> 	* common.opt (ftree-loop-if-convert-memory-writes): New flag.
> 	* doc/invoke.texi (ftree-loop-if-convert-memory-writes): Documented.
> 	* tree-if-conv.c (if_convertible_phi_p): Allow virtual phi nodes when
> 	flag_loop_if_convert_memory_writes is set.
> 	(if_convertible_gimple_assign_stmt_p): Allow memory reads and writes
> 	Do not handle types that do not match is_gimple_reg_type.
> 	Remove loop and bb parameters.  Call gimple_could_trap_p instead of
> 	when flag_loop_if_convert_memory_writes	is set, as LHS can contain
> 	memory refs.
> 	(if_convertible_stmt_p): Remove loop and bb parameters.  Update calls
> 	to if_convertible_gimple_assign_stmt_p.
> 	(if_convertible_loop_p): Update call to if_convertible_stmt_p.
> 	(gimplify_condition): New.
> 	(replace_phi_with_cond_gimple_assign_stmt): Renamed
> 	predicate_scalar_phi.  Do not handle virtual phi nodes.
> 	(ifconvert_phi_nodes): Renamed predicate_all_scalar_phis.
> 	Call predicate_scalar_phi.
> 	(insert_gimplified_predicates): Insert the gimplified predicate of a BB
> 	just after the labels for flag_loop_if_convert_memory_writes, otherwise
> 	insert the predicate in the end of the BB.
> 	(predicate_mem_writes): New.
> 	(combine_blocks): Call predicate_all_scalar_phis.  When
> 	flag_loop_if_convert_memory_writes is set, call predicate_mem_writes.
> 	(tree_if_conversion): Call mark_sym_for_renaming when
> 	flag_loop_if_convert_memory_writes is set.
> 	(main_tree_if_conversion): Call update_ssa when
> 	flag_loop_if_convert_memory_writes is set.
> 
> 	* gcc.dg/tree-ssa/ifc-4.c: New.
> 	* gcc.dg/tree-ssa/ifc-7.c: New.
> ---
>  gcc/common.opt                        |    4 +
>  gcc/doc/invoke.texi                   |   20 ++++-
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c |   53 ++++++++++
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c |   29 +++++
>  gcc/tree-if-conv.c                    |  181 ++++++++++++++++++++++++++-------
>  5 files changed, 247 insertions(+), 40 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 111d7b7..ceb94f8 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -657,6 +657,10 @@ ftree-loop-if-convert
>  Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
>  Convert conditional jumps in innermost loops to branchless equivalents
>  
> +ftree-loop-if-convert-memory-writes
> +Common Report Var(flag_tree_loop_if_convert_memory_writes) Optimization
> +Also if-convert conditional jumps containing memory writes
> +
>  ; -finhibit-size-directive inhibits output of .size for ELF.
>  ; This is used only for compiling crtstuff.c,
>  ; and it may be extended to other effects
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4d6a1af..3634069 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -383,7 +383,8 @@ Objective-C and Objective-C++ Dialects}.
>  -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol
>  -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
>  -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
> --ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
> +-ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
> +-ftree-loop-if-convert-memory-writes -ftree-loop-im @gol
>  -ftree-phiprop -ftree-loop-distribution @gol
>  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
>  -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
> @@ -6890,6 +6891,23 @@ the innermost loops in order to improve the ability of the
>  vectorization pass to handle these loops.  This is enabled by default
>  if vectorization is enabled.
>  
> +@item -ftree-loop-if-convert-memory-writes
> +Attempt to also if-convert conditional jumps containing memory writes.
> +This transformation can be unsafe for multi-threaded programs as it
> +transforms conditional memory writes into unconditional memory writes.
> +For example,
> +@smallexample
> +for (i = 0; i < N; i++)
> +  if (cond)
> +    A[i] = expr;
> +@end smallexample
> +would be transformed to
> +@smallexample
> +for (i = 0; i < N; i++)
> +  A[i] = cond ? expr : A[i];
> +@end smallexample
> +potentially producing data races.
> +
>  @item -ftree-loop-distribution
>  Perform loop distribution.  This flag can improve cache performance on
>  big loop bodies and allow further loop optimizations, like
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
> new file mode 100644
> index 0000000..beb1a0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +struct ht
> +{
> +  void * (*alloc_subobject) (int);
> +};
> +typedef struct cpp_reader cpp_reader;
> +typedef struct cpp_token cpp_token;
> +typedef struct cpp_macro cpp_macro;
> +enum cpp_ttype
> +{
> +    CPP_PASTE,
> +};
> +struct cpp_token {
> +  __extension__ enum cpp_ttype type : 8;
> +} cpp_comment_table;
> +struct cpp_macro {
> +  union cpp_macro_u
> +  {
> +    cpp_token * tokens;
> +  } exp;
> +  unsigned int count;
> +};
> +struct cpp_reader
> +{
> +  struct ht *hash_table;
> +};
> +create_iso_definition (cpp_reader *pfile, cpp_macro *macro)
> +{
> +  unsigned int num_extra_tokens = 0;
> +  {
> +    cpp_token *tokns =
> +      (cpp_token *) pfile->hash_table->alloc_subobject (sizeof (cpp_token)
> +							* macro->count);
> +    {
> +      cpp_token *normal_dest = tokns;
> +      cpp_token *extra_dest = tokns + macro->count - num_extra_tokens;
> +      unsigned int i;
> +      for (i = 0; i < macro->count; i++)
> +	{
> +	  if (macro->exp.tokens[i].type == CPP_PASTE)
> +	    *extra_dest++ = macro->exp.tokens[i];
> +	  else
> +	    *normal_dest++ = macro->exp.tokens[i];
> +	}
> +    }
> +  }
> +}
> +
> +/* This cannot be if-converted because the stores are to aggregate types.  */
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 0 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
> new file mode 100644
> index 0000000..4d26dc7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
> +
> +typedef struct eqn_d
> +{
> +  int *coef;
> +} *eqn;
> +typedef struct omega_pb_d
> +{
> +  eqn subs;
> +} *omega_pb;
> +
> +omega_pb omega_solve_problem (omega_pb);
> +
> +omega_pb
> +omega_solve_geq (omega_pb pb, int n)
> +{
> +  int i, e;
> +  int j = 0;
> +
> +  for (e = n - 1; e >= 0; e--)
> +    if (pb->subs[e].coef[i] != pb->subs[e].coef[j])
> +      {
> +	pb->subs[e].coef[i] = j;
> +	pb->subs[e].coef[j] = i;
> +      }
> +
> +  return omega_solve_problem (pb);
> +}
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 7058ee4..34b4159 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -385,9 +385,12 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
>     and it belongs to basic block BB.
>  
>     PHI is not if-convertible if:
> -   - it has more than 2 arguments,
> -   - virtual PHI is immediately used in another PHI node,
> -   - virtual PHI on BB other than header.  */
> +   - it has more than 2 arguments.
> +
> +   When the flag_tree_loop_if_convert_memory_writes is not set, PHI is not
> +   if-convertible if:
> +   - a virtual PHI is immediately used in another PHI node,
> +   - there is a virtual PHI in a BB other than the loop->header.  */
>  
>  static bool
>  if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
> @@ -405,6 +408,12 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>        return false;
>      }
>  
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    return true;
> +
> +  /* When the flag_tree_loop_if_convert_memory_writes is not set, check
> +     that there are no memory writes in the branches of the loop to be
> +     if-converted.  */
>    if (!is_gimple_reg (SSA_NAME_VAR (gimple_phi_result (phi))))
>      {
>        imm_use_iterator imm_iter;
> @@ -413,9 +422,10 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>        if (bb != loop->header)
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> -	    fprintf (dump_file, "Virtual phi not on loop header.\n");
> +	    fprintf (dump_file, "Virtual phi not on loop->header.\n");
>  	  return false;
>  	}
> +
>        FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
>  	{
>  	  if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
> @@ -435,15 +445,13 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>     GIMPLE_ASSIGN statement is not if-convertible if,
>     - it is not movable,
>     - it could trap,
> -   - LHS is not var decl.
> -
> -   GIMPLE_ASSIGN is part of block BB, which is inside loop LOOP.  */
> +   - LHS is not var decl.  */
>  
>  static bool
> -if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
> -    				     gimple stmt)
> +if_convertible_gimple_assign_stmt_p (gimple stmt)
>  {
>    tree lhs = gimple_assign_lhs (stmt);
> +  basic_block bb;
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -451,6 +459,9 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
>        print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>      }
>  
> +  if (!is_gimple_reg_type (TREE_TYPE (lhs)))
> +    return false;
> +
>    /* Some of these constrains might be too conservative.  */
>    if (stmt_ends_bb_p (stmt)
>        || gimple_has_volatile_ops (stmt)
> @@ -463,6 +474,17 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
>        return false;
>      }
>  
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    {
> +      if (gimple_could_trap_p (stmt))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file, "tree could trap...\n");
> +	  return false;
> +	}
> +      return true;
> +    }
> +
>    if (gimple_assign_rhs_could_trap_p (stmt))
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -470,9 +492,11 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
>        return false;
>      }
>  
> +  bb = gimple_bb (stmt);
> +
>    if (TREE_CODE (lhs) != SSA_NAME
> -      && bb != loop->header
> -      && !bb_with_exit_edge_p (loop, bb))
> +      && bb != bb->loop_father->header
> +      && !bb_with_exit_edge_p (bb->loop_father, bb))
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	{
> @@ -489,12 +513,10 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
>  
>     A statement is if-convertible if:
>     - it is an if-convertible GIMPLE_ASSGIN,
> -   - it is a GIMPLE_LABEL or a GIMPLE_COND.
> -
> -   STMT is inside BB, which is inside loop LOOP.  */
> +   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
>  
>  static bool
> -if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
> +if_convertible_stmt_p (gimple stmt)
>  {
>    switch (gimple_code (stmt))
>      {
> @@ -504,7 +526,7 @@ if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
>        return true;
>  
>      case GIMPLE_ASSIGN:
> -      return if_convertible_gimple_assign_stmt_p (loop, bb, stmt);
> +      return if_convertible_gimple_assign_stmt_p (stmt);
>  
>      default:
>        /* Don't know what to do with 'em so don't do anything.  */
> @@ -875,7 +897,7 @@ if_convertible_loop_p (struct loop *loop)
>  	continue;
>  
>        for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
> -	if (!if_convertible_stmt_p (loop, bb, gsi_stmt (itr)))
> +	if (!if_convertible_stmt_p (gsi_stmt (itr)))
>  	  return false;
>      }
>  
> @@ -895,7 +917,7 @@ if_convertible_loop_p (struct loop *loop)
>  static basic_block
>  find_phi_replacement_condition (struct loop *loop,
>  				basic_block bb, tree *cond,
> -                                gimple_stmt_iterator *gsi)
> +				gimple_stmt_iterator *gsi)
>  {
>    edge first_edge, second_edge;
>    tree tmp_cond;
> @@ -980,8 +1002,9 @@ find_phi_replacement_condition (struct loop *loop,
>    return first_edge->src;
>  }
>  
> -/* Replace PHI node with conditional modify expr using COND.  This
> -   routine does not handle PHI nodes with more than two arguments.
> +/* Replace a scalar PHI node with a COND_EXPR using COND as condition.
> +   This routine does not handle PHI nodes with more than two
> +   arguments.
>  
>     For example,
>       S1: A = PHI <x1(1), x2(5)
> @@ -993,18 +1016,22 @@ find_phi_replacement_condition (struct loop *loop,
>     TRUE_BB is selected.  */
>  
>  static void
> -replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
> -    					  basic_block true_bb,
> -                                   	  gimple_stmt_iterator *gsi)
> +predicate_scalar_phi (gimple phi, tree cond,
> +		      basic_block true_bb,
> +		      gimple_stmt_iterator *gsi)
>  {
>    gimple new_stmt;
>    basic_block bb;
> -  tree rhs;
> -  tree arg;
> +  tree rhs, res, arg;
>  
>    gcc_assert (gimple_code (phi) == GIMPLE_PHI
>  	      && gimple_phi_num_args (phi) == 2);
>  
> +  res = gimple_phi_result (phi);
> +  /* Do not handle virtual phi nodes.  */
> +  if (!is_gimple_reg (SSA_NAME_VAR (res)))
> +    return;
> +
>    bb = gimple_bb (phi);
>  
>    arg = degenerate_phi_result (phi);
> @@ -1026,11 +1053,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
>  	}
>  
>        /* Build new RHS using selected condition and arguments.  */
> -      rhs = build3 (COND_EXPR, TREE_TYPE (PHI_RESULT (phi)),
> +      rhs = build3 (COND_EXPR, TREE_TYPE (res),
>  		    unshare_expr (cond), arg_0, arg_1);
>      }
>  
> -  new_stmt = gimple_build_assign (PHI_RESULT (phi), rhs);
> +  new_stmt = gimple_build_assign (res, rhs);
>    SSA_NAME_DEF_STMT (gimple_phi_result (phi)) = new_stmt;
>    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
>    update_stmt (new_stmt);
> @@ -1042,11 +1069,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
>      }
>  }
>  
> -/* Replaces in LOOP all the phi nodes other than those in the
> +/* Replaces in LOOP all the scalar phi nodes other than those in the
>     LOOP->header block with conditional modify expressions.  */
>  
>  static void
> -ifconvert_phi_nodes (struct loop *loop)
> +predicate_all_scalar_phis (struct loop *loop)
>  {
>    basic_block bb;
>    unsigned int orig_loop_num_nodes = loop->num_nodes;
> @@ -1075,7 +1102,7 @@ ifconvert_phi_nodes (struct loop *loop)
>        while (!gsi_end_p (phi_gsi))
>  	{
>  	  phi = gsi_stmt (phi_gsi);
> -	  replace_phi_with_cond_gimple_assign_stmt (phi, cond, true_bb, &gsi);
> +	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
>  	  release_phi_node (phi);
>  	  gsi_next (&phi_gsi);
>  	}
> @@ -1095,7 +1122,7 @@ insert_gimplified_predicates (loop_p loop)
>    for (i = 0; i < loop->num_nodes; i++)
>      {
>        basic_block bb = ifc_bbs[i];
> -      gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
> +      gimple_seq stmts;
>  
>        if (!is_predicated (bb))
>  	{
> @@ -1106,15 +1133,30 @@ insert_gimplified_predicates (loop_p loop)
>  	  continue;
>  	}
>  
> +      stmts = bb_predicate_gimplified_stmts (bb);
>        if (stmts)
>  	{
> -	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
> -
> -	  if (gsi_end_p (gsi)
> -	      || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
> -	    gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> +	  if (flag_tree_loop_if_convert_memory_writes)
> +	    {
> +	      /* Insert the predicate of the BB just after the label,
> +		 as the if-conversion of memory writes will use this
> +		 predicate.  */
> +	      gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +	      gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> +	    }
>  	  else
> -	    gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
> +	    {
> +	      /* Insert the predicate of the BB at the end of the BB
> +		 as this would reduce the register pressure: the only
> +		 use of this predicate will be in successor BBs.  */
> +	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
> +
> +	      if (gsi_end_p (gsi)
> +		  || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)

  || stmt_ends_bb_p (gsi_stmt (gsi))

> +		gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> +	      else
> +		gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
> +	    }

It's not clear to me why the case for 
flag_tree_loop_if_convert_memory_writes is special.

>  	  /* Once the sequence is code generated, set it to NULL.  */
>  	  set_bb_predicate_gimplified_stmts (bb, NULL);
> @@ -1122,6 +1164,56 @@ insert_gimplified_predicates (loop_p loop)
>      }
>  }
>  
> +/* Predicate each write to memory in LOOP.
> +
> +   Replace a statement "A[i] = foo" with "A[i] = cond ? foo : A[i]"
> +   with the condition COND determined from the predicate of the basic
> +   block containing the statement.  */

Instead of writing A[i] = cond ? foo : A[i] in the comment, can you
explain the CFG transformation here, especially the placement of
the load and the store?  See above - this probably would have
made things clearer to me - in the function below there are no
comments either.

> +static void
> +predicate_mem_writes (loop_p loop)
> +{
> +  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> +
> +  for (i = 1; i < orig_loop_num_nodes; i++)
> +    {
> +      gimple_stmt_iterator gsi;
> +      basic_block bb = ifc_bbs[i];
> +      tree cond = bb_predicate (bb);
> +
> +      if (is_true_predicate (cond))
> +	continue;
> +
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))

So this is another loop over all loops and statements.  Why not
do this loop once and dispatch to mem/scalar if-conversion there?

> +	if (is_gimple_assign (gsi_stmt (gsi))

Stores always are gimple_assign_single_p, so that's a better predicate...

> +	    && gimple_vdef (gsi_stmt (gsi)))
> +	  {
> +	    gimple new_stmt, lhs_stmt, rhs_stmt;
> +	    gimple stmt = gsi_stmt (gsi);
> +	    tree lhs = gimple_get_lhs (stmt);
> +	    tree rhs = gimple_op (stmt, 1);

Please use gimple_assign_lhs and gimple_assign_rhs1.

> +
> +	    gcc_assert (gimple_num_ops (stmt) == 2);

... and makes this a useless assert (it's too lowlevel anyway).

> +
> +	    rhs_stmt = ifc_temp_var (TREE_TYPE (rhs), unshare_expr (rhs));
> +	    gsi_insert_before (&gsi, rhs_stmt, GSI_SAME_STMT);
> +	    rhs = gimple_get_lhs (rhs_stmt);

gimple_assign_lhs.  The ifc_temp_var helper doesn't look very well
designed - why not pass it a gsi to insert the stmt and return
the SSA name temporary instead?

> +
> +	    lhs_stmt = ifc_temp_var (TREE_TYPE (lhs), unshare_expr (lhs));
> +	    gsi_insert_before (&gsi, lhs_stmt, GSI_SAME_STMT);
> +	    lhs = gimple_get_lhs (lhs_stmt);

You should use the same type for lhs_stmt and rhs_stmt temporary,
in fact the type of the lhs is the correct one to use
(useless_type_conversion (lhs_type, rhs_type) is known to be true).

> +	    cond = unshare_expr (cond);
> +	    rhs = build3 (COND_EXPR, TREE_TYPE (lhs), cond, rhs, lhs);
> +
> +	    new_stmt = ifc_temp_var (TREE_TYPE (lhs), rhs);
> +	    gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> +	    gimple_set_op (stmt, 1, gimple_get_lhs (new_stmt));

gimple_assign_set_rhs1.

So you create this series of stmts in place of the conditional store,
which means I don't see why materializing the condition before
the cond-expr of the dominator does not work.

> +	    update_stmt (stmt);
> +	  }
> +    }
> +}
> +
>  /* Remove all GIMPLE_CONDs and GIMPLE_LABELs of all the basic blocks
>     other than the exit and latch of the LOOP.  Also resets the
>     GIMPLE_DEBUG information.  */
> @@ -1178,7 +1270,10 @@ combine_blocks (struct loop *loop)
>  
>    remove_conditions_and_labels (loop);
>    insert_gimplified_predicates (loop);
> -  ifconvert_phi_nodes (loop);
> +  predicate_all_scalar_phis (loop);
> +
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    predicate_mem_writes (loop);

As suggested above, predicate_all_scalar_phis and predicate_mem_writes
should loop over all loops/bbs once.

>    /* Merge basic blocks: first remove all the edges in the loop,
>       except for those from the exit block.  */
> @@ -1280,6 +1375,10 @@ tree_if_conversion (struct loop *loop)
>       blocks into one huge basic block doing the if-conversion
>       on-the-fly.  */
>    combine_blocks (loop);
> +
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    mark_sym_for_renaming (gimple_vop (cfun));

That should have happened automatically (and in fact is not necessary
if you update virtual SSA form properly - but you can leave that
for a followup, just remove that stmt).

>    changed = true;
>  
>   cleanup:
> @@ -1312,6 +1411,9 @@ main_tree_if_conversion (void)
>    FOR_EACH_LOOP (li, loop, 0)
>      changed |= tree_if_conversion (loop);
>  
> +  if (changed && flag_tree_loop_if_convert_memory_writes)
> +    update_ssa (TODO_update_ssa);

Please return

TODO_update_ssa_only_virtuals

instead.

Please update the patch with the suggested changes and clarifications.

Thanks,
Richard.

>    return changed ? TODO_cleanup_cfg : 0;
>  }
>  
> @@ -1321,7 +1423,8 @@ static bool
>  gate_tree_if_conversion (void)
>  {
>    return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
> -	  || flag_tree_loop_if_convert == 1);
> +	  || flag_tree_loop_if_convert == 1
> +	  || flag_tree_loop_if_convert_memory_writes == 1);
>  }
>  
>  struct gimple_opt_pass pass_if_conversion =

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results.
  2010-07-08 21:42 ` [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
@ 2010-08-13  9:02   ` Richard Guenther
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-13  9:02 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Thu, 8 Jul 2010, Sebastian Pop wrote:

> This patch speeds up the ifcvt_memrefs_wont_trap computation by
> caching the results of the computations in the data references ->aux
> fields.
> 
> 	* tree-if-conv.c (struct ifc_dr): New.
> 	(memref_read_or_written_unconditionally): Use the cached values
> 	when possible.
> 	(write_memref_written_at_least_once): Same.
> 	(if_convertible_loop_p): Initialize and free DR->aux fields.
> ---
>  gcc/tree-if-conv.c |   53 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index f0bc026..655c168 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -441,6 +441,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>    return true;
>  }
>  
> +/* Records the status of a data reference.  This struct is attached to
> +   each DR->aux field.  */
> +
> +struct ifc_dr {
> +  /* -1 when not initialized, 0 when false, 1 when true.  */
> +  int dr_is_written_at_least_once;
> +
> +  /* -1 when not initialized, 0 when false, 1 when true.  */
> +  int dr_is_rw_unconditionally;
> +};
> +
>  /* Returns true when the memory reference A at position P in the data
>     references vector DRS and executed under the condition CA, is read
>     or written unconditionally.  */
> @@ -452,17 +463,27 @@ memref_read_or_written_unconditionally (int pos, data_reference_p a,
>  {
>    int i;
>    data_reference_p b;
> +  int x = ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally;

These casts are ugly.  Please add a #define like

#define DR_IFCVT(dr) ((struct ifc_dr *) a->aux)

and use that.

Ok with that change.

Richard.

> +
> +  if (x != -1)
> +    return x;
>  
>    for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
>      if (i != pos && same_data_refs (a, b))
>        {
>  	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
>  
> -	if (is_true_predicate (cb)
> +	if (((struct ifc_dr *) b->aux)->dr_is_rw_unconditionally == 1
> +	    || is_true_predicate (cb)
>  	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
> -	  return true;
> +	  {
> +	    ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally = 1;
> +	    ((struct ifc_dr *) b->aux)->dr_is_rw_unconditionally = 1;
> +	    return true;
> +	  }
>        }
>  
> +  ((struct ifc_dr *) a->aux)->dr_is_rw_unconditionally = 0;
>    return false;
>  }
>  
> @@ -497,6 +518,10 @@ write_memref_written_at_least_once (int pos, data_reference_p a,
>  {
>    int i;
>    data_reference_p b;
> +  int x = ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once;
> +
> +  if (x != -1)
> +    return x;
>  
>    for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
>      if (i != pos
> @@ -505,11 +530,17 @@ write_memref_written_at_least_once (int pos, data_reference_p a,
>        {
>  	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
>  
> -	if (is_true_predicate (cb)
> +	if (((struct ifc_dr *) b->aux)->dr_is_written_at_least_once == 1
> +	    || is_true_predicate (cb)
>  	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
> -	  return true;
> +	  {
> +	    ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once = 1;
> +	    ((struct ifc_dr *) b->aux)->dr_is_written_at_least_once = 1;
> +	    return true;
> +	  }
>        }
>  
> +  ((struct ifc_dr *) a->aux)->dr_is_written_at_least_once = 0;
>    return false;
>  }
>  
> @@ -972,6 +1003,7 @@ if_convertible_loop_p (struct loop *loop)
>    edge_iterator ei;
>    basic_block exit_bb = NULL;
>    bool res = false;
> +  data_reference_p dr;
>    VEC (data_reference_p, heap) *refs;
>    VEC (ddr_p, heap) *ddrs;
>  
> @@ -1048,6 +1080,14 @@ if_convertible_loop_p (struct loop *loop)
>  
>    res = false;
>  
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
> +      {
> +	dr->aux = XNEW (struct ifc_dr);
> +	((struct ifc_dr *) dr->aux)->dr_is_written_at_least_once = -1;
> +	((struct ifc_dr *) dr->aux)->dr_is_rw_unconditionally = -1;
> +      }
> +
>    for (i = 0; i < loop->num_nodes; i++)
>      {
>        basic_block bb = ifc_bbs[i];
> @@ -1069,6 +1109,11 @@ if_convertible_loop_p (struct loop *loop)
>      fprintf (dump_file, "Applying if-conversion\n");
>  
>   cleanup:
> +
> +  if (flag_tree_loop_if_convert_memory_writes)
> +    for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
> +      free (dr->aux);
> +
>    free_data_refs (refs);
>    free_dependence_relations (ddrs);
>    return res;
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-07-08 21:42 ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
@ 2010-08-13  9:41   ` Richard Guenther
  2010-08-17 17:18     ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guenther @ 2010-08-13  9:41 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Thu, 8 Jul 2010, Sebastian Pop wrote:

> This patch relaxes the checks from gimple_could_trap_p in order to
> allow the flag_loop_if_convert_memory_writes to if-convert more loops
> in which it is possible to prove that:
> 
> - the accesses to an array in a loop do not trap (more than the
>   original non-if-converted loop).  This is true when the memory
>   accesses are executed at every iteration of the if-converted loop.
> 
> - the writes to memory occur on arrays that are not const qualified.
>   This is true when there exists at least one unconditional write to
>   the array in the analyzed program.  In this patch this analysis is
>   limited to the loop to be if-converted.

Comments below.

> 	* gimple.c (gimple_op_could_trap_p): Outlined from
> 	gimple_could_trap_p_1.
> 	(gimple_could_trap_p_1): Call gimple_op_could_trap_p.
> 	* gimple.h (gimple_op_could_trap_p): Declared.
> 	* tree-data-ref.h (same_data_refs_base_objects): New.
> 	(same_data_refs): New.
> 	* tree-if-conv.c (memref_read_or_written_unconditionally): New.
> 	(memrefs_read_or_written_unconditionally): New.
> 	(write_memref_written_at_least_once): New.
> 	(write_memrefs_written_at_least_once): New.
> 	(ifcvt_memrefs_wont_trap): New.
> 	(operations_could_trap): New.
> 	(ifcvt_could_trap_p): New.
> 	(if_convertible_gimple_assign_stmt_p): Call ifcvt_could_trap_p.
> 	Gets a vector of data refs.
> 	(if_convertible_stmt_p): Same.
> 	(if_convertible_loop_p): Before freeing the data refs vector,
> 	pass it to if_convertible_stmt_p.
> 
> 	* gcc.dg/tree-ssa/ifc-5.c: New.
> 	* gcc.dg/vect/vect-cond-1.c: Update pattern.
> 	* gcc.dg/vect/vect-cond-3.c: Same.
> 	* gcc.dg/vect/vect-cond-4.c: Same.
> 	* gcc.dg/vect/vect-cond-6.c: Same.
> ---
>  gcc/gimple.c                            |   38 ++++--
>  gcc/gimple.h                            |    1 +
>  gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c   |   24 ++++
>  gcc/testsuite/gcc.dg/vect/vect-cond-1.c |    2 +-
>  gcc/testsuite/gcc.dg/vect/vect-cond-3.c |    2 +-
>  gcc/testsuite/gcc.dg/vect/vect-cond-4.c |    2 +-
>  gcc/testsuite/gcc.dg/vect/vect-cond-6.c |    1 +
>  gcc/tree-data-ref.h                     |   34 +++++
>  gcc/tree-if-conv.c                      |  220 +++++++++++++++++++++++++++----
>  9 files changed, 279 insertions(+), 45 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> 
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 707d4e4..f515acc 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2399,24 +2399,14 @@ gimple_rhs_has_side_effects (const_gimple s)
>    return false;
>  }
>  
> +/* Helper for gimple_could_trap_p_1.  Return true if the operation of
> +   S can trap.  */
>  
> -/* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
> -   Return true if S can trap.  If INCLUDE_LHS is true and S is a
> -   GIMPLE_ASSIGN, the LHS of the assignment is also checked.
> -   Otherwise, only the RHS of the assignment is checked.  */
> -
> -static bool
> -gimple_could_trap_p_1 (gimple s, bool include_lhs)
> +bool
> +gimple_op_could_trap_p (gimple s)

This is not a very good name.  What is "operation"?  It seems you
exclude loads and stores here, so gimple_nonloadstore_could_trap_p
would be better.  Or to not confuse things too much instead
of splitting the function add another flag bool include_loads and
export the function - that would be my preference here
(include_lhs is effectively include_stores, nothing else on the
lhs can trap).

With that change the gimple.[ch] changes are ok.  More comments
on the rest below.

>  {
> -  unsigned i, start;
> -  tree t, div = NULL_TREE;
>    enum tree_code op;
> -
> -  start = (is_gimple_assign (s) && !include_lhs) ? 1 : 0;
> -
> -  for (i = start; i < gimple_num_ops (s); i++)
> -    if (tree_could_trap_p (gimple_op (s, i)))
> -      return true;
> +  tree t, div = NULL_TREE;
>  
>    switch (gimple_code (s))
>      {
> @@ -2445,7 +2435,25 @@ gimple_could_trap_p_1 (gimple s, bool include_lhs)
>      }
>  
>    return false;
> +}
> +
> +/* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
> +   Return true if S can trap.  If INCLUDE_LHS is true and S is a
> +   GIMPLE_ASSIGN, the LHS of the assignment is also checked.
> +   Otherwise, only the RHS of the assignment is checked.  */
> +
> +static bool
> +gimple_could_trap_p_1 (gimple s, bool include_lhs)
> +{
> +  unsigned i, start;
> +
> +  start = (is_gimple_assign (s) && !include_lhs) ? 1 : 0;
> +
> +  for (i = start; i < gimple_num_ops (s); i++)
> +    if (tree_could_trap_p (gimple_op (s, i)))
> +      return true;
>  
> +  return gimple_op_could_trap_p (s);
>  }
>  
>  
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index ec7fc93..c446cd3 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -885,6 +885,7 @@ gimple gimple_build_cond_from_tree (tree, tree, tree);
>  void gimple_cond_set_condition_from_tree (gimple, tree);
>  bool gimple_has_side_effects (const_gimple);
>  bool gimple_rhs_has_side_effects (const_gimple);
> +bool gimple_op_could_trap_p (gimple);
>  bool gimple_could_trap_p (gimple);
>  bool gimple_assign_rhs_could_trap_p (gimple);
>  void gimple_regimplify_operands (gimple, gimple_stmt_iterator *);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> new file mode 100644
> index 0000000..a9cc816
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +void
> +dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
> +{
> +  int i, level, qmul, qadd;
> +
> +  qadd = (qscale - 1) | 1;
> +  qmul = qscale << 1;
> +
> +  for (i = 0; i <= nCoeffs; i++)
> +    {
> +      level = block[i];
> +      if (level < 0)
> +	level = level * qmul - qadd;
> +      else
> +	level = level * qmul + qadd;
> +      block[i] = level;
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> index 4ee6713..888e5cb 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-1.c
> @@ -52,7 +52,7 @@ int main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */
>

As this was testing for outer loop vectorization please duplicate the
test and disable if-conversion in one variant.

>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> index 56cfbb2..42b9f35 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-3.c
> @@ -60,7 +60,7 @@ int main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */

Likewise.

>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> index c3a1585..1d1d8fc 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-4.c
> @@ -57,7 +57,7 @@ int main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */

Likewise.

>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> index e5e9391..7820444 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-6.c
> @@ -56,5 +56,6 @@ int main ()
>  }
>  
>  /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
>  /* { dg-final { cleanup-tree-dump "vect" } } */

Likewise.

> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index eff5348..391d6fc 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -417,6 +417,40 @@ extern void create_rdg_vertices (struct graph *, VEC (gimple, heap) *);
>  extern bool dr_may_alias_p (const struct data_reference *,
>  			    const struct data_reference *);
>  
> +
> +/* Return true when the base objects of data references A and B are
> +   the same memory object.  */
> +
> +static inline bool
> +same_data_refs_base_objects (data_reference_p a, data_reference_p b)
> +{
> +  return DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS (b)
> +    && dr_may_alias_p (a, b)
> +    && operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0);

I don't see why the dr_may_alias_p check is necessary here.  In fact
it might even pessimize what you want to check.

> +}
> +
> +/* Return true when the data references A and B are accessing the same
> +   memory object with the same access functions.  */

Isn't this too special again?

> +static inline bool
> +same_data_refs (data_reference_p a, data_reference_p b)
> +{
> +  unsigned int i;
> +
> +  /* The references are exactly the same.  */
> +  if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
> +    return true;
> +
> +  if (!same_data_refs_base_objects (a, b))
> +    return false;
> +
> +  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
> +    if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
> +      return false;
> +
> +  return true;
> +}
> +
>  /* Return true when the DDR contains two data references that have the
>     same access functions.  */
>  
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index cac5a3b..f0bc026 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -441,6 +441,166 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>    return true;
>  }
>  
> +/* Returns true when the memory reference A at position P in the data
> +   references vector DRS and executed under the condition CA, is read

"and" executed?  Is A executed under condition CA or is the whole
vector DRS?  Please clarify the comment...

> +   or written unconditionally.  */
> +
> +static bool
> +memref_read_or_written_unconditionally (int pos, data_reference_p a,
> +					VEC (data_reference_p, heap) *drs,
> +					tree ca)
> +{
> +  int i;
> +  data_reference_p b;
> +
> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
> +    if (i != pos && same_data_refs (a, b))
> +      {
> +	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> +
> +	if (is_true_predicate (cb)
> +	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))

... else this doesn't make sense?  B is either executed unconditionally
(is_true_predicate (cb)) or at least A or B is executed 
uncontionally.  Why is the latter case worth specializing?  In fact
you are probably doing redundant work in fold_or_predicates all the
time in practice (cb will be the same most of the time).

> +	  return true;
> +      }
> +
> +  return false;
> +}
> +
> +/* Returns true when the memory references of STMT are read or written
> +   unconditionally.  */
> +
> +static bool
> +memrefs_read_or_written_unconditionally (gimple stmt,
> +					 VEC (data_reference_p, heap) *drs)
> +{
> +  int i;
> +  data_reference_p a;
> +  tree ca = bb_predicate (gimple_bb (stmt));
> +
> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> +    if (DR_STMT (a) == stmt

Ick.  Why not check DR_STMT (b) != stmt in 
memref_read_or_written_unconditionally and avoid all this linear
searching mess?

> +	&& !memref_read_or_written_unconditionally (i, a, drs, ca))
> +      return false;
> +
> +  return true;
> +}
> +
> +
> +/* Returns true when the memory reference A at position P in the data
> +   references vector DRS and executed under the condition CA, is
> +   unconditionally written.  */
> +
> +static bool
> +write_memref_written_at_least_once (int pos, data_reference_p a,
> +				    VEC (data_reference_p, heap) *drs,
> +				    tree ca)
> +{
> +  int i;
> +  data_reference_p b;
> +
> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
> +    if (i != pos
> +	&& !DR_IS_READ (b)
> +	&& same_data_refs_base_objects (a, b))
> +      {
> +	tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> +
> +	if (is_true_predicate (cb)
> +	    || is_true_predicate (ca = fold_or_predicates (ca, cb)))
> +	  return true;

Same comments as above.

> +      }
> +
> +  return false;
> +}
> +
> +/* Returns true when the memory references of STMT are unconditionally
> +   written.  */
> +
> +static bool
> +write_memrefs_written_at_least_once (gimple stmt,
> +				     VEC (data_reference_p, heap) *drs)
> +{
> +  int i;
> +  data_reference_p a;
> +  tree ca = bb_predicate (gimple_bb (stmt));
> +
> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> +    if (DR_STMT (a) == stmt
> +	&& !DR_IS_READ (a)
> +	&& !write_memref_written_at_least_once (i, a, drs, ca))
> +      return false;

Likewise.

> +  return true;
> +}
> +
> +/* Return true when the memory references of STMT won't trap in the
> +   if-converted code.  There are two things that we have to check for:
> +
> +   - writes to memory occur to writable memory: if-conversion of
> +   memory writes transforms the conditional memory writes into
> +   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
> +   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
> +   be executed at all in the original code, it may be a readonly
> +   memory.  To check that A is not const-qualified, we check that
> +   there exists at least an unconditional write to A in the current
> +   function.
> +
> +   - reads or writes to memory are valid memory accesses for every
> +   iteration.  To check that the memory accesses are correctly formed
> +   and that we are allowed to read and write in these locations, we
> +   check that the memory accesses to be if-converted occur at every
> +   iteration unconditionally.  */
> +
> +static bool
> +ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
> +{
> +  return write_memrefs_written_at_least_once (stmt, refs)
> +    && memrefs_read_or_written_unconditionally (stmt, refs);
> +}
> +
> +/* Same as gimple_could_trap_p_1, returns true when the statement S
> +   could trap, but do not check the memory references.  */
> +
> +static bool
> +operations_could_trap (gimple s)
> +{
> +  unsigned i;
> +
> +  for (i = 0; i < gimple_num_ops (s); i++)
> +    {
> +      tree expr = gimple_op (s, i);
> +
> +      switch (TREE_CODE (expr))
> +	{
> +	case ARRAY_REF:
> +	case INDIRECT_REF:
> +	case MISALIGNED_INDIRECT_REF:
> +	  break;
> +
> +	default:
> +	  if (tree_could_trap_p (expr))
> +	    return true;

Well - only memory references can trap here anyway, so this whole
function is just gimple_op_could_trap_p (s) (or the changed
gimple_could_trap_p_1 as I requessted).

> +	}
> +    }
> +
> +  return gimple_op_could_trap_p (s);
> +}
> +
> +/* Wrapper around gimple_could_trap_p refined for the needs of the
> +   if-conversion.  Try to prove that the memory accesses of STMT could
> +   not trap in the innermost loop containing STMT.  */
> +
> +static bool
> +ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
> +{
> +  if (gimple_has_mem_ops (stmt)

That's not the correct check.  You probably want gimple_vuse (stmt).

> +      && ifcvt_memrefs_wont_trap (stmt, refs)
> +      && !operations_could_trap (stmt))

And re-order those checks as ifcvt_memrefs_wont_trap is surely more
expensive.

> +    return false;
> +
> +  return gimple_could_trap_p (stmt);
> +}
> +
>  /* Return true when STMT is if-convertible.
>  
>     GIMPLE_ASSIGN statement is not if-convertible if,
> @@ -449,7 +609,8 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>     - LHS is not var decl.  */
>  
>  static bool
> -if_convertible_gimple_assign_stmt_p (gimple stmt)
> +if_convertible_gimple_assign_stmt_p (gimple stmt,
> +				     VEC (data_reference_p, heap) *refs)
>  {
>    tree lhs = gimple_assign_lhs (stmt);
>    basic_block bb;
> @@ -477,7 +638,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
>  
>    if (flag_tree_loop_if_convert_memory_writes)
>      {
> -      if (gimple_could_trap_p (stmt))
> +      if (ifcvt_could_trap_p (stmt, refs))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file, "tree could trap...\n");
> @@ -517,7 +678,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
>     - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
>  
>  static bool
> -if_convertible_stmt_p (gimple stmt)
> +if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
>  {
>    switch (gimple_code (stmt))
>      {
> @@ -527,7 +688,7 @@ if_convertible_stmt_p (gimple stmt)
>        return true;
>  
>      case GIMPLE_ASSIGN:
> -      return if_convertible_gimple_assign_stmt_p (stmt);
> +      return if_convertible_gimple_assign_stmt_p (stmt, refs);
>  
>      default:
>        /* Don't know what to do with 'em so don't do anything.  */
> @@ -810,6 +971,9 @@ if_convertible_loop_p (struct loop *loop)
>    edge e;
>    edge_iterator ei;
>    basic_block exit_bb = NULL;
> +  bool res = false;
> +  VEC (data_reference_p, heap) *refs;
> +  VEC (ddr_p, heap) *ddrs;
>  
>    /* Handle only innermost loop.  */
>    if (!loop || loop->inner)
> @@ -847,18 +1011,14 @@ if_convertible_loop_p (struct loop *loop)
>  
>    /* Don't if-convert the loop when the data dependences cannot be
>       computed: the loop won't be vectorized in that case.  */
> -  {
> -    VEC (data_reference_p, heap) *refs = VEC_alloc (data_reference_p, heap, 5);
> -    VEC (ddr_p, heap) *ddrs = VEC_alloc (ddr_p, heap, 25);
> -    bool res = compute_data_dependences_for_loop (loop, true, &refs, &ddrs);
> -
> -    free_data_refs (refs);
> -    free_dependence_relations (ddrs);
> +  refs = VEC_alloc (data_reference_p, heap, 5);
> +  ddrs = VEC_alloc (ddr_p, heap, 25);
> +  res = compute_data_dependences_for_loop (loop, true, &refs, &ddrs);
>  
> -    if (!res)
> -      return false;
> -  }
> +  if (!res)
> +    goto cleanup;

Instead of gotos wrap the function into another doing the allocation
and de-allocation.

Please rework the patch according to the requested changes.

Thanks,
Richard.

> +  res = false;
>    calculate_dominance_info (CDI_DOMINATORS);
>  
>    /* Allow statements that can be handled during if-conversion.  */
> @@ -867,7 +1027,7 @@ if_convertible_loop_p (struct loop *loop)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	fprintf (dump_file, "Irreducible loop\n");
> -      return false;
> +      goto cleanup;
>      }
>  
>    for (i = 0; i < loop->num_nodes; i++)
> @@ -875,14 +1035,18 @@ if_convertible_loop_p (struct loop *loop)
>        basic_block bb = ifc_bbs[i];
>  
>        if (!if_convertible_bb_p (loop, bb, exit_bb))
> -	return false;
> +	goto cleanup;
>  
>        if (bb_with_exit_edge_p (loop, bb))
>  	exit_bb = bb;
>      }
>  
> -  if (!predicate_bbs (loop))
> -    return false;
> +  res = predicate_bbs (loop);
> +
> +  if (!res)
> +    goto cleanup;
> +
> +  res = false;
>  
>    for (i = 0; i < loop->num_nodes; i++)
>      {
> @@ -891,21 +1055,23 @@ if_convertible_loop_p (struct loop *loop)
>  
>        for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next (&itr))
>  	if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr)))
> -	  return false;
> -
> -      /* For non predicated BBs, don't check their statements.  */
> -      if (!is_predicated (bb))
> -	continue;
> +	  goto cleanup;
>  
> -      for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
> -	if (!if_convertible_stmt_p (gsi_stmt (itr)))
> -	  return false;
> +      /* Check the if-convertibility of statements in predicated BBs.  */
> +      if (is_predicated (bb))
> +	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
> +	  if (!if_convertible_stmt_p (gsi_stmt (itr), refs))
> +	    goto cleanup;
>      }
>  
> +  res = true;
>    if (dump_file)
>      fprintf (dump_file, "Applying if-conversion\n");
>  
> -  return true;
> + cleanup:
> +  free_data_refs (refs);
> +  free_dependence_relations (ddrs);
> +  return res;
>  }
>  
>  /* Basic block BB has two predecessors.  Using predecessor's bb
> 

-- 
Richard Guenther <rguenther@suse.de>
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 - GF: Markus Rex

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13  8:58   ` Richard Guenther
@ 2010-08-13 20:43     ` Sebastian Pop
  2010-08-16  9:46       ` Richard Guenther
  2010-08-13 22:12     ` Sebastian Pop
  2010-08-13 22:22     ` Sebastian Pop
  2 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-13 20:43 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
>> +       if (flag_tree_loop_if_convert_memory_writes)
>> +         {
>> +           /* Insert the predicate of the BB just after the label,
>> +              as the if-conversion of memory writes will use this
>> +              predicate.  */
>> +           gimple_stmt_iterator gsi = gsi_after_labels (bb);
>> +           gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
>> +         }
>>         else
>> -         gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
>> +         {
>> +           /* Insert the predicate of the BB at the end of the BB
>> +              as this would reduce the register pressure: the only
>> +              use of this predicate will be in successor BBs.  */
>> +           gimple_stmt_iterator gsi = gsi_last_bb (bb);
>> +
>> +           if (gsi_end_p (gsi)
>> +               || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
>
>  || stmt_ends_bb_p (gsi_stmt (gsi))
>
>> +             gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
>> +           else
>> +             gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
>> +         }
>
> It's not clear to me why the case for
> flag_tree_loop_if_convert_memory_writes is special.
>
>>         /* Once the sequence is code generated, set it to NULL.  */
>>         set_bb_predicate_gimplified_stmts (bb, NULL);
>> @@ -1122,6 +1164,56 @@ insert_gimplified_predicates (loop_p loop)
>>      }
>>  }
>>
>> +/* Predicate each write to memory in LOOP.
>> +
>> +   Replace a statement "A[i] = foo" with "A[i] = cond ? foo : A[i]"
>> +   with the condition COND determined from the predicate of the basic
>> +   block containing the statement.  */
>
> Instead of writing A[i] = cond ? foo : A[i] in the comment, can you
> explain the CFG transformation here, especially the placement of
> the load and the store?  See above - this probably would have
> made things clearer to me - in the function below there are no
> comments either.
>

Let me know if the explanation below can be improved.

This function transforms control flow constructs containing memory
writes of the form

| for (i = 0; i < N; i++)
|   if (cond)
|     A[i] = expr;

into the following form that does not contain control flow:

| for (i = 0; i < N; i++)
|   A[i] = cond ? expr : A[i];

The original CFG looks like this:

| bb_0
|   i = 0
| end_bb_0
|
| bb_1
|   if (i < N) goto bb_5 else goto bb_2
| end_bb_1
|
| bb_2
|   cond = some_computation;
|   if (cond) goto bb_3 else goto bb_4
| end_bb_2
|
| bb_3
|   A[i] = expr;
|   goto bb_4
| end_bb_3
|
| bb_4
|   goto bb_1
| end_bb_4

insert_gimplified_predicates inserts the computation of the COND
expression at the beginning of the destination basic block:

| bb_0
|   i = 0
| end_bb_0
|
| bb_1
|   if (i < N) goto bb_5 else goto bb_2
| end_bb_1
|
| bb_2
|   cond = some_computation;
|   if (cond) goto bb_3 else goto bb_4
| end_bb_2
|
| bb_3
|   cond = some_computation;
|   A[i] = expr;
|   goto bb_4
| end_bb_3
|
| bb_4
|   goto bb_1
| end_bb_4

predicate_mem_writes is then predicating the memory write as follows:

| bb_0
|   i = 0
| end_bb_0
|
| bb_1
|   if (i < N) goto bb_5 else goto bb_2
| end_bb_1
|
| bb_2
|   if (cond) goto bb_3 else goto bb_4
| end_bb_2
|
| bb_3
|   cond = some_computation;
|   A[i] = cond ? expr : A[i];
|   goto bb_4
| end_bb_3
|
| bb_4
|   goto bb_1
| end_bb_4

and finally combine_blocks removes the basic block boundaries making
the loop vectorizable:

| bb_0
|   i = 0
|   if (i < N) goto bb_5 else goto bb_1
| end_bb_0
|
| bb_1
|   cond = some_computation;
|   A[i] = cond ? expr : A[i];
|   if (i < N) goto bb_5 else goto bb_4
| end_bb_1
|
| bb_4
|   goto bb_1
| end_bb_4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13  8:58   ` Richard Guenther
  2010-08-13 20:43     ` Sebastian Pop
@ 2010-08-13 22:12     ` Sebastian Pop
  2010-08-16  9:46       ` Richard Guenther
  2010-08-13 22:22     ` Sebastian Pop
  2 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-13 22:12 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

[-- Attachment #1: Type: text/plain, Size: 1873 bytes --]

On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
>> +static void
>> +predicate_mem_writes (loop_p loop)
>> +{
>> +  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> +
>> +  for (i = 1; i < orig_loop_num_nodes; i++)
>> +    {
>> +      gimple_stmt_iterator gsi;
>> +      basic_block bb = ifc_bbs[i];
>> +      tree cond = bb_predicate (bb);
>> +
>> +      if (is_true_predicate (cond))
>> +     continue;
>> +
>> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>
> So this is another loop over all loops and statements.  Why not
> do this loop once and dispatch to mem/scalar if-conversion there?
>
>> @@ -1178,7 +1270,10 @@ combine_blocks (struct loop *loop)
>>
>>    remove_conditions_and_labels (loop);
>>    insert_gimplified_predicates (loop);
>> -  ifconvert_phi_nodes (loop);
>> +  predicate_all_scalar_phis (loop);
>> +
>> +  if (flag_tree_loop_if_convert_memory_writes)
>> +    predicate_mem_writes (loop);
>
> As suggested above, predicate_all_scalar_phis and predicate_mem_writes
> should loop over all loops/bbs once.
>

The only thing that predicate_all_scalar_phis and predicate_mem_writes
have in common is that they iterate over all the basic blocks of an
innermost loop.

predicate_mem_writes iterates over all the statements of the BB.

predicate_all_scalar_phis iterates over all the phi nodes of the BB.

Should I go ahead and merge these two functions as asked?
I still find the implementation more clear in the separated form.

Please find attached a preliminary patch (passed compile and
make -k check RUNTESTFLAGS=tree-ssa.exp and vect.exp) on top of
the previous one, that shows the corrections for all your comments
(except this last point).  I will submit the combined patch +
corrections once it passes bootstrap and test.

Sebastian

[-- Attachment #2: 1876_ifc_corrections_1.diff --]
[-- Type: text/x-diff, Size: 6758 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 51319d7..017c895 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -213,11 +213,13 @@ reset_bb_predicate (basic_block bb)
   init_bb_predicate (bb);
 }
 
-/* Create a new temp variable of type TYPE.  Add GIMPLE_ASSIGN to assign EXP
-   to the new variable.  */
+/* Returns a new SSA_NAME of type TYPE that is assigned the value of
+   the expression EXPR.  Inserts the statement created for this
+   computation before GSI and leaves the iterator GSI at the same
+   statement.  */
 
-static gimple
-ifc_temp_var (tree type, tree exp)
+static tree
+ifc_temp_var (tree type, tree expr, gimple_stmt_iterator *gsi)
 {
   const char *name = "_ifc_";
   tree var, new_name;
@@ -227,8 +229,8 @@ ifc_temp_var (tree type, tree exp)
   var = create_tmp_var (type, name);
   add_referenced_var (var);
 
-  /* Build new statement to assign EXP to new variable.  */
-  stmt = gimple_build_assign (var, exp);
+  /* Build new statement to assign EXPR to new variable.  */
+  stmt = gimple_build_assign (var, expr);
 
   /* Get SSA name for the new variable and set make new statement
      its definition statement.  */
@@ -237,7 +239,8 @@ ifc_temp_var (tree type, tree exp)
   SSA_NAME_DEF_STMT (new_name) = stmt;
   update_stmt (stmt);
 
-  return stmt;
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return gimple_assign_lhs (stmt);
 }
 
 /* Return true when COND is a true predicate.  */
@@ -1205,13 +1208,7 @@ find_phi_replacement_condition (struct loop *loop,
 				    false, NULL_TREE,
 				    true, GSI_SAME_STMT);
   if (!is_gimple_reg (*cond) && !is_gimple_condexpr (*cond))
-    {
-      gimple new_stmt;
-
-      new_stmt = ifc_temp_var (TREE_TYPE (*cond), unshare_expr (*cond));
-      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-      *cond = gimple_assign_lhs (new_stmt);
-    }
+    *cond = ifc_temp_var (TREE_TYPE (*cond), unshare_expr (*cond), gsi);
 
   gcc_assert (*cond);
 
@@ -1368,7 +1365,8 @@ insert_gimplified_predicates (loop_p loop)
 	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
 
 	      if (gsi_end_p (gsi)
-		  || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
+		  || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND
+		  || stmt_ends_bb_p (gsi_stmt (gsi)))
 		gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
 	      else
 		gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
@@ -1382,9 +1380,110 @@ insert_gimplified_predicates (loop_p loop)
 
 /* Predicate each write to memory in LOOP.
 
-   Replace a statement "A[i] = foo" with "A[i] = cond ? foo : A[i]"
-   with the condition COND determined from the predicate of the basic
-   block containing the statement.  */
+   This function transforms control flow constructs containing memory
+   writes of the form:
+
+   | for (i = 0; i < N; i++)
+   |   if (cond)
+   |     A[i] = expr;
+
+   into the following form that does not contain control flow:
+
+   | for (i = 0; i < N; i++)
+   |   A[i] = cond ? expr : A[i];
+
+   The original CFG looks like this:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   cond = some_computation;
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   A[i] = expr;
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   insert_gimplified_predicates inserts the computation of the COND
+   expression at the beginning of the destination basic block:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   cond = some_computation;
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   cond = some_computation;
+   |   A[i] = expr;
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   predicate_mem_writes is then predicating the memory write as follows:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   cond = some_computation;
+   |   A[i] = cond ? expr : A[i];
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   and finally combine_blocks removes the basic block boundaries making
+   the loop vectorizable:
+
+   | bb_0
+   |   i = 0
+   |   if (i < N) goto bb_5 else goto bb_1
+   | end_bb_0
+   |
+   | bb_1
+   |   cond = some_computation;
+   |   A[i] = cond ? expr : A[i];
+   |   if (i < N) goto bb_5 else goto bb_4
+   | end_bb_1
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+*/
 
 static void
 predicate_mem_writes (loop_p loop)
@@ -1396,35 +1495,24 @@ predicate_mem_writes (loop_p loop)
       gimple_stmt_iterator gsi;
       basic_block bb = ifc_bbs[i];
       tree cond = bb_predicate (bb);
+      gimple stmt;
 
       if (is_true_predicate (cond))
 	continue;
 
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-	if (is_gimple_assign (gsi_stmt (gsi))
-	    && gimple_vdef (gsi_stmt (gsi)))
+	if ((stmt = gsi_stmt (gsi))
+	    && gimple_assign_single_p (stmt)
+	    && gimple_vdef (stmt))
 	  {
-	    gimple new_stmt, lhs_stmt, rhs_stmt;
-	    gimple stmt = gsi_stmt (gsi);
-	    tree lhs = gimple_get_lhs (stmt);
-	    tree rhs = gimple_op (stmt, 1);
-
-	    gcc_assert (gimple_num_ops (stmt) == 2);
-
-	    rhs_stmt = ifc_temp_var (TREE_TYPE (rhs), unshare_expr (rhs));
-	    gsi_insert_before (&gsi, rhs_stmt, GSI_SAME_STMT);
-	    rhs = gimple_get_lhs (rhs_stmt);
-
-	    lhs_stmt = ifc_temp_var (TREE_TYPE (lhs), unshare_expr (lhs));
-	    gsi_insert_before (&gsi, lhs_stmt, GSI_SAME_STMT);
-	    lhs = gimple_get_lhs (lhs_stmt);
-
-	    cond = unshare_expr (cond);
-	    rhs = build3 (COND_EXPR, TREE_TYPE (lhs), cond, rhs, lhs);
-
-	    new_stmt = ifc_temp_var (TREE_TYPE (lhs), rhs);
-	    gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
-	    gimple_set_op (stmt, 1, gimple_get_lhs (new_stmt));
+	    tree lhs = gimple_assign_lhs (stmt);
+	    tree rhs = gimple_assign_rhs1 (stmt);
+	    tree type = TREE_TYPE (lhs);
+
+	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
+	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
+	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
+	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	    update_stmt (stmt);
 	  }
     }
@@ -1628,7 +1716,7 @@ main_tree_if_conversion (void)
     changed |= tree_if_conversion (loop);
 
   if (changed && flag_tree_loop_if_convert_memory_writes)
-    update_ssa (TODO_update_ssa);
+    return TODO_update_ssa_only_virtuals;
 
   return changed ? TODO_cleanup_cfg : 0;
 }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13  8:58   ` Richard Guenther
  2010-08-13 20:43     ` Sebastian Pop
  2010-08-13 22:12     ` Sebastian Pop
@ 2010-08-13 22:22     ` Sebastian Pop
  2010-08-16  9:53       ` Richard Guenther
  2 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-13 22:22 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
> So you create this series of stmts in place of the conditional store,
> which means I don't see why materializing the condition before
> the cond-expr of the dominator does not work.
>

Yes, this could work.  Inserting the predicate computations anywhere
before the predicated BB is fine.  The reason why we have to insert
the predicates before or at the beginning of the BB in the case of mem
writes if-conversion is that we need the predicates to build the
conditional move statements.

For scalar if-conversion, we need the predicates as close as possible
to the phi nodes that merge the scalar values from different branches
in order to have low register pressure.

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13 22:12     ` Sebastian Pop
@ 2010-08-16  9:46       ` Richard Guenther
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-16  9:46 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2904 bytes --]

On Fri, 13 Aug 2010, Sebastian Pop wrote:

> On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
> >> +static void
> >> +predicate_mem_writes (loop_p loop)
> >> +{
> >> +  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> >> +
> >> +  for (i = 1; i < orig_loop_num_nodes; i++)
> >> +    {
> >> +      gimple_stmt_iterator gsi;
> >> +      basic_block bb = ifc_bbs[i];
> >> +      tree cond = bb_predicate (bb);
> >> +
> >> +      if (is_true_predicate (cond))
> >> +     continue;
> >> +
> >> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >
> > So this is another loop over all loops and statements.  Why not
> > do this loop once and dispatch to mem/scalar if-conversion there?
> >
> >> @@ -1178,7 +1270,10 @@ combine_blocks (struct loop *loop)
> >>
> >>    remove_conditions_and_labels (loop);
> >>    insert_gimplified_predicates (loop);
> >> -  ifconvert_phi_nodes (loop);
> >> +  predicate_all_scalar_phis (loop);
> >> +
> >> +  if (flag_tree_loop_if_convert_memory_writes)
> >> +    predicate_mem_writes (loop);
> >
> > As suggested above, predicate_all_scalar_phis and predicate_mem_writes
> > should loop over all loops/bbs once.
> >
> 
> The only thing that predicate_all_scalar_phis and predicate_mem_writes
> have in common is that they iterate over all the basic blocks of an
> innermost loop.
> 
> predicate_mem_writes iterates over all the statements of the BB.
> 
> predicate_all_scalar_phis iterates over all the phi nodes of the BB.
> 
> Should I go ahead and merge these two functions as asked?
> I still find the implementation more clear in the separated form.

Yeah, thinking again it's ok as-is.

> Please find attached a preliminary patch (passed compile and
> make -k check RUNTESTFLAGS=tree-ssa.exp and vect.exp) on top of
> the previous one, that shows the corrections for all your comments
> (except this last point).  I will submit the combined patch +
> corrections once it passes bootstrap and test.

-                 || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
+                 || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND
+                 || stmt_ends_bb_p (gsi_stmt (gsi)))

GIMPLE_CONDs already are handled in stmt_ends_bb_p, so the check
for GIMPLE_COND can be removed.

@@ -1628,7 +1716,7 @@ main_tree_if_conversion (void)
     changed |= tree_if_conversion (loop);

   if (changed && flag_tree_loop_if_convert_memory_writes)
-    update_ssa (TODO_update_ssa);
+    return TODO_update_ssa_only_virtuals;

   return changed ? TODO_cleanup_cfg : 0;

So you skip cfg_cleanup in that case.  I'd have done

  unsigned todo = 0;

...

   if (changed)
     todo |= TODO_cleanup_cfg;
   if (changed && flag_tree_loop_if_convert_memory_writes)
     todo |= TODO_update_ssa_only_virtuals;

   return todo;

I'll wait for a new combined patch and have a quick look at it.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13 20:43     ` Sebastian Pop
@ 2010-08-16  9:46       ` Richard Guenther
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-16  9:46 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4195 bytes --]

On Fri, 13 Aug 2010, Sebastian Pop wrote:

> On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
> >> +       if (flag_tree_loop_if_convert_memory_writes)
> >> +         {
> >> +           /* Insert the predicate of the BB just after the label,
> >> +              as the if-conversion of memory writes will use this
> >> +              predicate.  */
> >> +           gimple_stmt_iterator gsi = gsi_after_labels (bb);
> >> +           gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> >> +         }
> >>         else
> >> -         gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
> >> +         {
> >> +           /* Insert the predicate of the BB at the end of the BB
> >> +              as this would reduce the register pressure: the only
> >> +              use of this predicate will be in successor BBs.  */
> >> +           gimple_stmt_iterator gsi = gsi_last_bb (bb);
> >> +
> >> +           if (gsi_end_p (gsi)
> >> +               || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
> >
> >  || stmt_ends_bb_p (gsi_stmt (gsi))
> >
> >> +             gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> >> +           else
> >> +             gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
> >> +         }
> >
> > It's not clear to me why the case for
> > flag_tree_loop_if_convert_memory_writes is special.
> >
> >>         /* Once the sequence is code generated, set it to NULL.  */
> >>         set_bb_predicate_gimplified_stmts (bb, NULL);
> >> @@ -1122,6 +1164,56 @@ insert_gimplified_predicates (loop_p loop)
> >>      }
> >>  }
> >>
> >> +/* Predicate each write to memory in LOOP.
> >> +
> >> +   Replace a statement "A[i] = foo" with "A[i] = cond ? foo : A[i]"
> >> +   with the condition COND determined from the predicate of the basic
> >> +   block containing the statement.  */
> >
> > Instead of writing A[i] = cond ? foo : A[i] in the comment, can you
> > explain the CFG transformation here, especially the placement of
> > the load and the store?  See above - this probably would have
> > made things clearer to me - in the function below there are no
> > comments either.
> >
> 
> Let me know if the explanation below can be improved.

That looks good.

> This function transforms control flow constructs containing memory
> writes of the form
> 
> | for (i = 0; i < N; i++)
> |   if (cond)
> |     A[i] = expr;
> 
> into the following form that does not contain control flow:
> 
> | for (i = 0; i < N; i++)
> |   A[i] = cond ? expr : A[i];
> 
> The original CFG looks like this:
> 
> | bb_0
> |   i = 0
> | end_bb_0
> |
> | bb_1
> |   if (i < N) goto bb_5 else goto bb_2
> | end_bb_1
> |
> | bb_2
> |   cond = some_computation;
> |   if (cond) goto bb_3 else goto bb_4
> | end_bb_2
> |
> | bb_3
> |   A[i] = expr;
> |   goto bb_4
> | end_bb_3
> |
> | bb_4
> |   goto bb_1
> | end_bb_4
> 
> insert_gimplified_predicates inserts the computation of the COND
> expression at the beginning of the destination basic block:
> 
> | bb_0
> |   i = 0
> | end_bb_0
> |
> | bb_1
> |   if (i < N) goto bb_5 else goto bb_2
> | end_bb_1
> |
> | bb_2
> |   cond = some_computation;
> |   if (cond) goto bb_3 else goto bb_4
> | end_bb_2
> |
> | bb_3
> |   cond = some_computation;
> |   A[i] = expr;
> |   goto bb_4
> | end_bb_3
> |
> | bb_4
> |   goto bb_1
> | end_bb_4
> 
> predicate_mem_writes is then predicating the memory write as follows:
> 
> | bb_0
> |   i = 0
> | end_bb_0
> |
> | bb_1
> |   if (i < N) goto bb_5 else goto bb_2
> | end_bb_1
> |
> | bb_2
> |   if (cond) goto bb_3 else goto bb_4
> | end_bb_2
> |
> | bb_3
> |   cond = some_computation;
> |   A[i] = cond ? expr : A[i];
> |   goto bb_4
> | end_bb_3
> |
> | bb_4
> |   goto bb_1
> | end_bb_4
> 
> and finally combine_blocks removes the basic block boundaries making
> the loop vectorizable:
> 
> | bb_0
> |   i = 0
> |   if (i < N) goto bb_5 else goto bb_1
> | end_bb_0
> |
> | bb_1
> |   cond = some_computation;
> |   A[i] = cond ? expr : A[i];
> |   if (i < N) goto bb_5 else goto bb_4
> | end_bb_1
> |
> | bb_4
> |   goto bb_1
> | end_bb_4
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-13 22:22     ` Sebastian Pop
@ 2010-08-16  9:53       ` Richard Guenther
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-16  9:53 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Fri, 13 Aug 2010, Sebastian Pop wrote:

> On Fri, Aug 13, 2010 at 03:54, Richard Guenther <rguenther@suse.de> wrote:
> > So you create this series of stmts in place of the conditional store,
> > which means I don't see why materializing the condition before
> > the cond-expr of the dominator does not work.
> >
> 
> Yes, this could work.  Inserting the predicate computations anywhere
> before the predicated BB is fine.  The reason why we have to insert
> the predicates before or at the beginning of the BB in the case of mem
> writes if-conversion is that we need the predicates to build the
> conditional move statements.
>
> For scalar if-conversion, we need the predicates as close as possible
> to the phi nodes that merge the scalar values from different branches
> in order to have low register pressure.

Ok, I see.

Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-13  9:41   ` Richard Guenther
@ 2010-08-17 17:18     ` Sebastian Pop
  2010-08-18  9:35       ` Richard Guenther
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-17 17:18 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Fri, Aug 13, 2010 at 04:24, Richard Guenther <rguenther@suse.de> wrote:
>> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
>>  /* { dg-final { cleanup-tree-dump "vect" } } */
>>
>
> As this was testing for outer loop vectorization please duplicate the
> test and disable if-conversion in one variant.

I removed these changes as they are not needed anymore: the
if-conversion with memory accesses is not enabled in -O3.

>> +
>> +/* Return true when the base objects of data references A and B are
>> +   the same memory object.  */
>> +
>> +static inline bool
>> +same_data_refs_base_objects (data_reference_p a, data_reference_p b)
>> +{
>> +  return DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS (b)
>> +    && dr_may_alias_p (a, b)
>> +    && operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0);
>
> I don't see why the dr_may_alias_p check is necessary here.  In fact
> it might even pessimize what you want to check.

I agree, there is no reason to have the call to dr_may_alias_p: I
removed it.

>
>> +}
>> +
>> +/* Return true when the data references A and B are accessing the same
>> +   memory object with the same access functions.  */
>
> Isn't this too special again?
>
>> +static inline bool
>> +same_data_refs (data_reference_p a, data_reference_p b)

I do not understand what you are suggesting here.
Should I make the two functions same_data_refs and
same_data_refs_base_objects static and put them in tree-if-conv.c?

>> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
>> index cac5a3b..f0bc026 100644
>> --- a/gcc/tree-if-conv.c
>> +++ b/gcc/tree-if-conv.c
>> @@ -441,6 +441,166 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
>>    return true;
>>  }
>>
>> +/* Returns true when the memory reference A at position P in the data
>> +   references vector DRS and executed under the condition CA, is read
>
> "and" executed?  Is A executed under condition CA or is the whole
> vector DRS?  Please clarify the comment...
>

I added one more sentence to clarify the comment:

/* Returns true when the memory reference A at position POS in the
   data references vector DRS and executed under the condition CA, is
   read or written unconditionally.  In other words, this function
   returns true when there exist other accesses to the same data
   reference with predicates that add up (OR-up) to the true
   predicate: this ensures that the data reference A is touched (read
   or written) on every iteration of the if-converted loop.  */

>> +   or written unconditionally.  */
>> +
>> +static bool
>> +memref_read_or_written_unconditionally (int pos, data_reference_p a,
>> +                                     VEC (data_reference_p, heap) *drs,
>> +                                     tree ca)
>> +{
>> +  int i;
>> +  data_reference_p b;
>> +
>> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
>> +    if (i != pos && same_data_refs (a, b))
>> +      {
>> +     tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
>> +
>> +     if (is_true_predicate (cb)
>> +         || is_true_predicate (ca = fold_or_predicates (ca, cb)))
>> +       return true;
>
> ... else this doesn't make sense?  B is either executed unconditionally
> (is_true_predicate (cb)) or at least A or B is executed
> uncontionally.  Why is the latter case worth specializing?  In fact
> you are probably doing redundant work in fold_or_predicates all the
> time in practice (cb will be the same most of the time).
>

This condition stands for:

>> +     if (is_true_predicate (cb)

1. A is touched at every iteration when there exist a data ref B with
the same base as A and that is executed unconditionally.

>> +         || is_true_predicate (ca = fold_or_predicates (ca, cb)))

2. A is also touched at every iteration when B is predicated by a
condition that is a complement of CA.  CA is first initialized to the
predicate of A, and then CA is OR-ed to all the predicates under which
the same data reference is touched.  If eventually CA becomes the true
predicate, then each iteration of the loop contains a path touching
the data reference A.

>> +      }
>> +
>> +  return false;
>> +}
>> +
>> +/* Returns true when the memory references of STMT are read or written
>> +   unconditionally.  */
>> +
>> +static bool
>> +memrefs_read_or_written_unconditionally (gimple stmt,
>> +                                      VEC (data_reference_p, heap) *drs)
>> +{
>> +  int i;
>> +  data_reference_p a;
>> +  tree ca = bb_predicate (gimple_bb (stmt));
>> +
>> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>> +    if (DR_STMT (a) == stmt
>
> Ick.  Why not check DR_STMT (b) != stmt in
> memref_read_or_written_unconditionally and avoid all this linear
> searching mess?
>

Could you please be more specific: I do not understand what you
are asking here.

The algorithm is O (n^2) with n the number of data references in the
innermost loops.  This is because for each data reference we try to
prove that it is executed unconditionally, i.e., that there exist
other data references with the same base that are accessed on
different paths at every iteration.

>
> Please rework the patch according to the requested changes.
>

I sent the combined patch separately.

Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-17 17:18     ` Sebastian Pop
@ 2010-08-18  9:35       ` Richard Guenther
  2010-08-18 17:10         ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guenther @ 2010-08-18  9:35 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6484 bytes --]

On Tue, 17 Aug 2010, Sebastian Pop wrote:

> On Fri, Aug 13, 2010 at 04:24, Richard Guenther <rguenther@suse.de> wrote:
> >> -/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
> >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
> >>  /* { dg-final { cleanup-tree-dump "vect" } } */
> >>
> >
> > As this was testing for outer loop vectorization please duplicate the
> > test and disable if-conversion in one variant.
> 
> I removed these changes as they are not needed anymore: the
> if-conversion with memory accesses is not enabled in -O3.

Thanks.

> >> +
> >> +/* Return true when the base objects of data references A and B are
> >> +   the same memory object.  */
> >> +
> >> +static inline bool
> >> +same_data_refs_base_objects (data_reference_p a, data_reference_p b)
> >> +{
> >> +  return DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS (b)
> >> +    && dr_may_alias_p (a, b)
> >> +    && operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0);
> >
> > I don't see why the dr_may_alias_p check is necessary here.  In fact
> > it might even pessimize what you want to check.
> 
> I agree, there is no reason to have the call to dr_may_alias_p: I
> removed it.
> 
> >
> >> +}
> >> +
> >> +/* Return true when the data references A and B are accessing the same
> >> +   memory object with the same access functions.  */
> >
> > Isn't this too special again?
> >
> >> +static inline bool
> >> +same_data_refs (data_reference_p a, data_reference_p b)
> 
> I do not understand what you are suggesting here.
> Should I make the two functions same_data_refs and
> same_data_refs_base_objects static and put them in tree-if-conv.c?

I misunderstood what you are testing here - you are in fact testing
that the data-refs are exactly equal, a[i] vs. a[i+1] would not be
equal.  So just disregard my comment here.

> >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> >> index cac5a3b..f0bc026 100644
> >> --- a/gcc/tree-if-conv.c
> >> +++ b/gcc/tree-if-conv.c
> >> @@ -441,6 +441,166 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
> >>    return true;
> >>  }
> >>
> >> +/* Returns true when the memory reference A at position P in the data
> >> +   references vector DRS and executed under the condition CA, is read
> >
> > "and" executed?  Is A executed under condition CA or is the whole
> > vector DRS?  Please clarify the comment...
> >
> 
> I added one more sentence to clarify the comment:
> 
> /* Returns true when the memory reference A at position POS in the
>    data references vector DRS and executed under the condition CA, is
>    read or written unconditionally.  In other words, this function
>    returns true when there exist other accesses to the same data
>    reference with predicates that add up (OR-up) to the true
>    predicate: this ensures that the data reference A is touched (read
>    or written) on every iteration of the if-converted loop.  */

Thanks.

> >> +   or written unconditionally.  */
> >> +
> >> +static bool
> >> +memref_read_or_written_unconditionally (int pos, data_reference_p a,
> >> +                                     VEC (data_reference_p, heap) *drs,
> >> +                                     tree ca)
> >> +{
> >> +  int i;
> >> +  data_reference_p b;
> >> +
> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
> >> +    if (i != pos && same_data_refs (a, b))
> >> +      {
> >> +     tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> >> +
> >> +     if (is_true_predicate (cb)
> >> +         || is_true_predicate (ca = fold_or_predicates (ca, cb)))
> >> +       return true;
> >
> > ... else this doesn't make sense?  B is either executed unconditionally
> > (is_true_predicate (cb)) or at least A or B is executed
> > uncontionally.  Why is the latter case worth specializing?  In fact
> > you are probably doing redundant work in fold_or_predicates all the
> > time in practice (cb will be the same most of the time).
> >
> 
> This condition stands for:
> 
> >> +     if (is_true_predicate (cb)
> 
> 1. A is touched at every iteration when there exist a data ref B with
> the same base as A and that is executed unconditionally.
> 
> >> +         || is_true_predicate (ca = fold_or_predicates (ca, cb)))
> 
> 2. A is also touched at every iteration when B is predicated by a
> condition that is a complement of CA.  CA is first initialized to the
> predicate of A, and then CA is OR-ed to all the predicates under which
> the same data reference is touched.  If eventually CA becomes the true
> predicate, then each iteration of the loop contains a path touching
> the data reference A.

Ah, ok.  That makes sense.  (the fold_or_predicates still looks
very expensive - we eventually want to put a limit on the number
of DRs we perform the quadratic check on)

> >> +      }
> >> +
> >> +  return false;
> >> +}
> >> +
> >> +/* Returns true when the memory references of STMT are read or written
> >> +   unconditionally.  */
> >> +
> >> +static bool
> >> +memrefs_read_or_written_unconditionally (gimple stmt,
> >> +                                      VEC (data_reference_p, heap) *drs)
> >> +{
> >> +  int i;
> >> +  data_reference_p a;
> >> +  tree ca = bb_predicate (gimple_bb (stmt));
> >> +
> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> >> +    if (DR_STMT (a) == stmt
> >
> > Ick.  Why not check DR_STMT (b) != stmt in
> > memref_read_or_written_unconditionally and avoid all this linear
> > searching mess?
> >
> 
> Could you please be more specific: I do not understand what you
> are asking here.

> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)    
> >> +    if (DR_STMT (a) == stmt

Here you find the data-ref with DR_STMT == stmt, just to call
memref_read_or_written_unconditionally with it's index in the
DR vector.  Why not simply do

  memref_read_or_written_unconditionally (stmt, drs, ca)

and in that function instead of the i != pos check

> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
> +    if (i != pos && same_data_refs (a, b))

do DR_STMT (b) != stmt?  I guess I am asking to fold the two
functions into one to make the quadraticness obvious and avoid
passing down indices.  I'm not sure if the result is more
readable but certainly the current version looked odd to me.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-18  9:35       ` Richard Guenther
@ 2010-08-18 17:10         ` Sebastian Pop
  2010-08-18 19:05           ` Richard Guenther
  0 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-18 17:10 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Wed, Aug 18, 2010 at 04:32, Richard Guenther <rguenther@suse.de> wrote:
>> >> +      }
>> >> +
>> >> +  return false;
>> >> +}
>> >> +
>> >> +/* Returns true when the memory references of STMT are read or written
>> >> +   unconditionally.  */
>> >> +
>> >> +static bool
>> >> +memrefs_read_or_written_unconditionally (gimple stmt,
>> >> +                                      VEC (data_reference_p, heap) *drs)
>> >> +{
>> >> +  int i;
>> >> +  data_reference_p a;
>> >> +  tree ca = bb_predicate (gimple_bb (stmt));
>> >> +
>> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>> >> +    if (DR_STMT (a) == stmt
>> >
>> > Ick.  Why not check DR_STMT (b) != stmt in
>> > memref_read_or_written_unconditionally and avoid all this linear
>> > searching mess?
>> >
>>
>> Could you please be more specific: I do not understand what you
>> are asking here.
>
>> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>> >> +    if (DR_STMT (a) == stmt
>
> Here you find the data-ref with DR_STMT == stmt, just to call
> memref_read_or_written_unconditionally with it's index in the
> DR vector.  Why not simply do
>
>  memref_read_or_written_unconditionally (stmt, drs, ca)
>
> and in that function instead of the i != pos check
>
>> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
>> +    if (i != pos && same_data_refs (a, b))
>
> do DR_STMT (b) != stmt?  I guess I am asking to fold the two
> functions into one to make the quadraticness obvious and avoid
> passing down indices.  I'm not sure if the result is more
> readable but certainly the current version looked odd to me.
>

I modified both memrefs_read_or_written_unconditionally and
write_memrefs_written_at_least_once.  What do you think about this form?

/* Returns true when the memory references of STMT are read or written
   unconditionally.  In other words, this function returns true when
   for every data reference A in STMT there exist other accesses to
   the same data reference with predicates that add up (OR-up) to the
   true predicate: this ensures that the data reference A is touched
   (read or written) on every iteration of the if-converted loop.  */

static bool
memrefs_read_or_written_unconditionally (gimple stmt,
					 VEC (data_reference_p, heap) *drs)
{
  int i, j;
  data_reference_p a, b;
  tree ca = bb_predicate (gimple_bb (stmt));

  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
    if (DR_STMT (a) == stmt)
      {
	bool found = false;
	int x = DR_RW_UNCONDITIONALLY (a);

	if (x == 0)
	  return false;

	if (x == 1)
	  continue;

	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
	  if (DR_STMT (b) != stmt
	      && same_data_refs (a, b))
	    {
	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));

	      if (DR_RW_UNCONDITIONALLY (b) == 1
		  || is_true_predicate (cb)
		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
								 ca, cb)))
		{
		  DR_RW_UNCONDITIONALLY (a) = 1;
		  DR_RW_UNCONDITIONALLY (b) = 1;
		  found = true;
		  break;
		}
	    }

	if (!found)
	  {
	    DR_RW_UNCONDITIONALLY (a) = 0;
	    return false;
	  }
      }

  return true;
}

/* Returns true when the memory references of STMT are unconditionally
   written.  In other words, this function returns true when for every
   data reference A written in STMT, there exist other writes to the
   same data reference with predicates that add up (OR-up) to the true
   predicate: this ensures that the data reference A is written on
   every iteration of the if-converted loop.  */

static bool
write_memrefs_written_at_least_once (gimple stmt,
				     VEC (data_reference_p, heap) *drs)
{
  int i, j;
  data_reference_p a, b;
  tree ca = bb_predicate (gimple_bb (stmt));

  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
    if (DR_STMT (a) == stmt
	&& !DR_IS_READ (a))
      {
	bool found = false;
	int x = DR_WRITTEN_AT_LEAST_ONCE (a);

	if (x == 0)
	  return false;

	if (x == 1)
	  continue;

	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
	  if (DR_STMT (b) != stmt
	      && !DR_IS_READ (b)
	      && same_data_refs_base_objects (a, b))
	    {
	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));

	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
		  || is_true_predicate (cb)
		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
								 ca, cb)))
		{
		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
		  found = true;
		  break;
		}
	    }

	if (!found)
	  {
	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
	    return false;
	  }
      }

  return true;
}

I will prepare the combined patch if you think that this is a better
form.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-18 17:10         ` Sebastian Pop
@ 2010-08-18 19:05           ` Richard Guenther
  2010-08-18 19:29             ` [PATCH 3/3] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
                               ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Richard Guenther @ 2010-08-18 19:05 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5133 bytes --]

On Wed, 18 Aug 2010, Sebastian Pop wrote:

> On Wed, Aug 18, 2010 at 04:32, Richard Guenther <rguenther@suse.de> wrote:
> >> >> +      }
> >> >> +
> >> >> +  return false;
> >> >> +}
> >> >> +
> >> >> +/* Returns true when the memory references of STMT are read or written
> >> >> +   unconditionally.  */
> >> >> +
> >> >> +static bool
> >> >> +memrefs_read_or_written_unconditionally (gimple stmt,
> >> >> +                                      VEC (data_reference_p, heap) *drs)
> >> >> +{
> >> >> +  int i;
> >> >> +  data_reference_p a;
> >> >> +  tree ca = bb_predicate (gimple_bb (stmt));
> >> >> +
> >> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> >> >> +    if (DR_STMT (a) == stmt
> >> >
> >> > Ick.  Why not check DR_STMT (b) != stmt in
> >> > memref_read_or_written_unconditionally and avoid all this linear
> >> > searching mess?
> >> >
> >>
> >> Could you please be more specific: I do not understand what you
> >> are asking here.
> >
> >> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
> >> >> +    if (DR_STMT (a) == stmt
> >
> > Here you find the data-ref with DR_STMT == stmt, just to call
> > memref_read_or_written_unconditionally with it's index in the
> > DR vector.  Why not simply do
> >
> >  memref_read_or_written_unconditionally (stmt, drs, ca)
> >
> > and in that function instead of the i != pos check
> >
> >> +  for (i = 0; VEC_iterate (data_reference_p, drs, i, b); i++)
> >> +    if (i != pos && same_data_refs (a, b))
> >
> > do DR_STMT (b) != stmt?  I guess I am asking to fold the two
> > functions into one to make the quadraticness obvious and avoid
> > passing down indices.  I'm not sure if the result is more
> > readable but certainly the current version looked odd to me.
> >
> 
> I modified both memrefs_read_or_written_unconditionally and
> write_memrefs_written_at_least_once.  What do you think about this form?
> 
> /* Returns true when the memory references of STMT are read or written
>    unconditionally.  In other words, this function returns true when
>    for every data reference A in STMT there exist other accesses to
>    the same data reference with predicates that add up (OR-up) to the
>    true predicate: this ensures that the data reference A is touched
>    (read or written) on every iteration of the if-converted loop.  */
> 
> static bool
> memrefs_read_or_written_unconditionally (gimple stmt,
> 					 VEC (data_reference_p, heap) *drs)
> {
>   int i, j;
>   data_reference_p a, b;
>   tree ca = bb_predicate (gimple_bb (stmt));
> 
>   for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>     if (DR_STMT (a) == stmt)
>       {
> 	bool found = false;
> 	int x = DR_RW_UNCONDITIONALLY (a);
> 
> 	if (x == 0)
> 	  return false;
> 
> 	if (x == 1)
> 	  continue;
> 
> 	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
> 	  if (DR_STMT (b) != stmt
> 	      && same_data_refs (a, b))
> 	    {
> 	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> 
> 	      if (DR_RW_UNCONDITIONALLY (b) == 1
> 		  || is_true_predicate (cb)
> 		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
> 								 ca, cb)))
> 		{
> 		  DR_RW_UNCONDITIONALLY (a) = 1;
> 		  DR_RW_UNCONDITIONALLY (b) = 1;
> 		  found = true;
> 		  break;
> 		}
> 	    }
> 
> 	if (!found)
> 	  {
> 	    DR_RW_UNCONDITIONALLY (a) = 0;
> 	    return false;
> 	  }
>       }
> 
>   return true;
> }
> 
> /* Returns true when the memory references of STMT are unconditionally
>    written.  In other words, this function returns true when for every
>    data reference A written in STMT, there exist other writes to the
>    same data reference with predicates that add up (OR-up) to the true
>    predicate: this ensures that the data reference A is written on
>    every iteration of the if-converted loop.  */
> 
> static bool
> write_memrefs_written_at_least_once (gimple stmt,
> 				     VEC (data_reference_p, heap) *drs)
> {
>   int i, j;
>   data_reference_p a, b;
>   tree ca = bb_predicate (gimple_bb (stmt));
> 
>   for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
>     if (DR_STMT (a) == stmt
> 	&& !DR_IS_READ (a))
>       {
> 	bool found = false;
> 	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
> 
> 	if (x == 0)
> 	  return false;
> 
> 	if (x == 1)
> 	  continue;
> 
> 	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
> 	  if (DR_STMT (b) != stmt
> 	      && !DR_IS_READ (b)
> 	      && same_data_refs_base_objects (a, b))
> 	    {
> 	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> 
> 	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
> 		  || is_true_predicate (cb)
> 		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
> 								 ca, cb)))
> 		{
> 		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
> 		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
> 		  found = true;
> 		  break;
> 		}
> 	    }
> 
> 	if (!found)
> 	  {
> 	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
> 	    return false;
> 	  }
>       }
> 
>   return true;
> }
> 
> I will prepare the combined patch if you think that this is a better
> form.

Yes, I like that better.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/3] Do not check whether memory references accessed in every iteration trap.
  2010-08-18 19:05           ` Richard Guenther
  2010-08-18 19:29             ` [PATCH 3/3] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
@ 2010-08-18 19:29             ` Sebastian Pop
  2010-08-18 19:31             ` [PATCH 1/3] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
  2010-08-18 19:32             ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
  3 siblings, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-18 19:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

This patch relaxes the checks from gimple_could_trap_p in order to
allow the flag_loop_if_convert_memory_writes to if-convert more loops
in which it is possible to prove that:

- the accesses to an array in a loop do not trap (more than the
  original non-if-converted loop).  This is true when the memory
  accesses are executed at every iteration of the if-converted loop.

- the writes to memory occur on arrays that are not const qualified.
  This is true when there exists at least one unconditional write to
  the array in the analyzed program.  In this patch this analysis is
  limited to the loop to be if-converted.

	* gimple.c (gimple_could_trap_p_1): Not static anymore.
	Pass an extra bool parameter include_mem.
	(gimple_could_trap_p): Adjust call to gimple_could_trap_p_1.
	(gimple_assign_rhs_could_trap_p): Same.
	* gimple.h (gimple_could_trap_p_1): Declared.
	* tree-data-ref.h (same_data_refs_base_objects): New.
	(same_data_refs): New.
	* tree-if-conv.c (memrefs_read_or_written_unconditionally): New.
	(write_memrefs_written_at_least_once): New.
	(ifcvt_memrefs_wont_trap): New.
	(operations_could_trap): New.
	(ifcvt_could_trap_p): New.
	(if_convertible_gimple_assign_stmt_p): Call ifcvt_could_trap_p.
	Gets a vector of data refs.
	(if_convertible_stmt_p): Same.
	(if_convertible_loop_p_1): New.
	(if_convertible_loop_p): Call if_convertible_loop_p_1.

	* gcc.dg/tree-ssa/ifc-5.c: New.
---
 gcc/gimple.c                          |   30 ++--
 gcc/gimple.h                          |    1 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c |   24 +++
 gcc/tree-data-ref.h                   |   33 ++++
 gcc/tree-if-conv.c                    |  271 +++++++++++++++++++++++++--------
 5 files changed, 276 insertions(+), 83 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c

diff --git a/gcc/gimple.c b/gcc/gimple.c
index e3de834..3498e9c 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2399,24 +2399,25 @@ gimple_rhs_has_side_effects (const_gimple s)
   return false;
 }
 
-
 /* Helper for gimple_could_trap_p and gimple_assign_rhs_could_trap_p.
-   Return true if S can trap.  If INCLUDE_LHS is true and S is a
-   GIMPLE_ASSIGN, the LHS of the assignment is also checked.
-   Otherwise, only the RHS of the assignment is checked.  */
+   Return true if S can trap.  When INCLUDE_MEM is true, check whether
+   the memory operations could trap.  When INCLUDE_STORES is true and
+   S is a GIMPLE_ASSIGN, the LHS of the assignment is also checked.  */
 
-static bool
-gimple_could_trap_p_1 (gimple s, bool include_lhs)
+bool
+gimple_could_trap_p_1 (gimple s, bool include_mem, bool include_stores)
 {
-  unsigned i, start;
   tree t, div = NULL_TREE;
   enum tree_code op;
 
-  start = (is_gimple_assign (s) && !include_lhs) ? 1 : 0;
+  if (include_mem)
+    {
+      unsigned i, start = (is_gimple_assign (s) && !include_stores) ? 1 : 0;
 
-  for (i = start; i < gimple_num_ops (s); i++)
-    if (tree_could_trap_p (gimple_op (s, i)))
-      return true;
+      for (i = start; i < gimple_num_ops (s); i++)
+	if (tree_could_trap_p (gimple_op (s, i)))
+	  return true;
+    }
 
   switch (gimple_code (s))
     {
@@ -2445,26 +2446,23 @@ gimple_could_trap_p_1 (gimple s, bool include_lhs)
     }
 
   return false;
-
 }
 
-
 /* Return true if statement S can trap.  */
 
 bool
 gimple_could_trap_p (gimple s)
 {
-  return gimple_could_trap_p_1 (s, true);
+  return gimple_could_trap_p_1 (s, true, true);
 }
 
-
 /* Return true if RHS of a GIMPLE_ASSIGN S can trap.  */
 
 bool
 gimple_assign_rhs_could_trap_p (gimple s)
 {
   gcc_assert (is_gimple_assign (s));
-  return gimple_could_trap_p_1 (s, false);
+  return gimple_could_trap_p_1 (s, true, false);
 }
 
 
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 7909986..545b271 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -886,6 +886,7 @@ void gimple_cond_set_condition_from_tree (gimple, tree);
 bool gimple_has_side_effects (const_gimple);
 bool gimple_rhs_has_side_effects (const_gimple);
 bool gimple_could_trap_p (gimple);
+bool gimple_could_trap_p_1 (gimple, bool, bool);
 bool gimple_assign_rhs_could_trap_p (gimple);
 void gimple_regimplify_operands (gimple, gimple_stmt_iterator *);
 bool empty_body_p (gimple_seq);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
new file mode 100644
index 0000000..a9cc816
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+void
+dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
+{
+  int i, level, qmul, qadd;
+
+  qadd = (qscale - 1) | 1;
+  qmul = qscale << 1;
+
+  for (i = 0; i <= nCoeffs; i++)
+    {
+      level = block[i];
+      if (level < 0)
+	level = level * qmul - qadd;
+      else
+	level = level * qmul + qadd;
+      block[i] = level;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 9e18e26..757c3c1 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -417,6 +417,39 @@ extern void create_rdg_vertices (struct graph *, VEC (gimple, heap) *);
 extern bool dr_may_alias_p (const struct data_reference *,
 			    const struct data_reference *);
 
+
+/* Return true when the base objects of data references A and B are
+   the same memory object.  */
+
+static inline bool
+same_data_refs_base_objects (data_reference_p a, data_reference_p b)
+{
+  return DR_NUM_DIMENSIONS (a) == DR_NUM_DIMENSIONS (b)
+    && operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), 0);
+}
+
+/* Return true when the data references A and B are accessing the same
+   memory object with the same access functions.  */
+
+static inline bool
+same_data_refs (data_reference_p a, data_reference_p b)
+{
+  unsigned int i;
+
+  /* The references are exactly the same.  */
+  if (operand_equal_p (DR_REF (a), DR_REF (b), 0))
+    return true;
+
+  if (!same_data_refs_base_objects (a, b))
+    return false;
+
+  for (i = 0; i < DR_NUM_DIMENSIONS (a); i++)
+    if (!eq_evolutions_p (DR_ACCESS_FN (a, i), DR_ACCESS_FN (b, i)))
+      return false;
+
+  return true;
+}
+
 /* Return true when the DDR contains two data references that have the
    same access functions.  */
 
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index b465311..396d652 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -446,6 +446,132 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
+/* Returns true when the memory references of STMT are read or written
+   unconditionally.  In other words, this function returns true when
+   for every data reference A in STMT there exist other accesses to
+   the same data reference with predicates that add up (OR-up) to the
+   true predicate: this ensures that the data reference A is touched
+   (read or written) on every iteration of the if-converted loop.  */
+
+static bool
+memrefs_read_or_written_unconditionally (gimple stmt,
+					 VEC (data_reference_p, heap) *drs)
+{
+  int i, j;
+  data_reference_p a, b;
+  tree ca = bb_predicate (gimple_bb (stmt));
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
+    if (DR_STMT (a) == stmt)
+      {
+	bool found = false;
+
+	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
+	  if (DR_STMT (b) != stmt
+	      && same_data_refs (a, b))
+	    {
+	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
+
+	      if (is_true_predicate (cb)
+		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
+								 ca, cb)))
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	if (!found)
+	  return false;
+      }
+
+  return true;
+}
+
+/* Returns true when the memory references of STMT are unconditionally
+   written.  In other words, this function returns true when for every
+   data reference A written in STMT, there exist other writes to the
+   same data reference with predicates that add up (OR-up) to the true
+   predicate: this ensures that the data reference A is written on
+   every iteration of the if-converted loop.  */
+
+static bool
+write_memrefs_written_at_least_once (gimple stmt,
+				     VEC (data_reference_p, heap) *drs)
+{
+  int i, j;
+  data_reference_p a, b;
+  tree ca = bb_predicate (gimple_bb (stmt));
+
+  for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++)
+    if (DR_STMT (a) == stmt
+	&& !DR_IS_READ (a))
+      {
+	bool found = false;
+
+	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
+	  if (DR_STMT (b) != stmt
+	      && !DR_IS_READ (b)
+	      && same_data_refs_base_objects (a, b))
+	    {
+	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
+
+	      if (is_true_predicate (cb)
+		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
+								 ca, cb)))
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	if (!found)
+	  return false;
+      }
+
+  return true;
+}
+
+/* Return true when the memory references of STMT won't trap in the
+   if-converted code.  There are two things that we have to check for:
+
+   - writes to memory occur to writable memory: if-conversion of
+   memory writes transforms the conditional memory writes into
+   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
+   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
+   be executed at all in the original code, it may be a readonly
+   memory.  To check that A is not const-qualified, we check that
+   there exists at least an unconditional write to A in the current
+   function.
+
+   - reads or writes to memory are valid memory accesses for every
+   iteration.  To check that the memory accesses are correctly formed
+   and that we are allowed to read and write in these locations, we
+   check that the memory accesses to be if-converted occur at every
+   iteration unconditionally.  */
+
+static bool
+ifcvt_memrefs_wont_trap (gimple stmt, VEC (data_reference_p, heap) *refs)
+{
+  return write_memrefs_written_at_least_once (stmt, refs)
+    && memrefs_read_or_written_unconditionally (stmt, refs);
+}
+
+/* Wrapper around gimple_could_trap_p refined for the needs of the
+   if-conversion.  Try to prove that the memory accesses of STMT could
+   not trap in the innermost loop containing STMT.  */
+
+static bool
+ifcvt_could_trap_p (gimple stmt, VEC (data_reference_p, heap) *refs)
+{
+  if (gimple_vuse (stmt)
+      && !gimple_could_trap_p_1 (stmt, false, false)
+      && ifcvt_memrefs_wont_trap (stmt, refs))
+    return false;
+
+  return gimple_could_trap_p (stmt);
+}
+
 /* Return true when STMT is if-convertible.
 
    GIMPLE_ASSIGN statement is not if-convertible if,
@@ -454,7 +580,8 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
    - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (gimple stmt)
+if_convertible_gimple_assign_stmt_p (gimple stmt,
+				     VEC (data_reference_p, heap) *refs)
 {
   tree lhs = gimple_assign_lhs (stmt);
   basic_block bb;
@@ -482,7 +609,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
 
   if (flag_tree_loop_if_convert_memory_writes)
     {
-      if (gimple_could_trap_p (stmt))
+      if (ifcvt_could_trap_p (stmt, refs))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "tree could trap...\n");
@@ -522,7 +649,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt)
    - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt)
+if_convertible_stmt_p (gimple stmt, VEC (data_reference_p, heap) *refs)
 {
   switch (gimple_code (stmt))
     {
@@ -532,7 +659,7 @@ if_convertible_stmt_p (gimple stmt)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt);
+      return if_convertible_gimple_assign_stmt_p (stmt, refs);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -800,69 +927,24 @@ predicate_bbs (loop_p loop)
   return true;
 }
 
-/* Return true when LOOP is if-convertible.
-   LOOP is if-convertible if:
-   - it is innermost,
-   - it has two or more basic blocks,
-   - it has only one exit,
-   - loop header is not the exit edge,
-   - if its basic blocks and phi nodes are if convertible.  */
+/* Return true when LOOP is if-convertible.  This is a helper function
+   for if_convertible_loop_p.  REFS and DDRS are initialized and freed
+   in if_convertible_loop_p.  */
 
 static bool
-if_convertible_loop_p (struct loop *loop)
+if_convertible_loop_p_1 (struct loop *loop,
+			 VEC (data_reference_p, heap) **refs,
+			 VEC (ddr_p, heap) **ddrs)
 {
+  bool res;
   unsigned int i;
-  edge e;
-  edge_iterator ei;
   basic_block exit_bb = NULL;
 
-  /* Handle only innermost loop.  */
-  if (!loop || loop->inner)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "not innermost loop\n");
-      return false;
-    }
-
-  /* If only one block, no need for if-conversion.  */
-  if (loop->num_nodes <= 2)
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "less than 2 basic blocks\n");
-      return false;
-    }
-
-  /* More than one loop exit is too much to handle.  */
-  if (!single_exit (loop))
-    {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "multiple exits\n");
-      return false;
-    }
-
-  /* ??? Check target's vector conditional operation support for vectorizer.  */
-
-  /* If one of the loop header's edge is exit edge then do not apply
-     if-conversion.  */
-  FOR_EACH_EDGE (e, ei, loop->header->succs)
-    {
-      if (loop_exit_edge_p (loop, e))
-	return false;
-    }
-
   /* Don't if-convert the loop when the data dependences cannot be
      computed: the loop won't be vectorized in that case.  */
-  {
-    VEC (data_reference_p, heap) *refs = VEC_alloc (data_reference_p, heap, 5);
-    VEC (ddr_p, heap) *ddrs = VEC_alloc (ddr_p, heap, 25);
-    bool res = compute_data_dependences_for_loop (loop, true, &refs, &ddrs);
-
-    free_data_refs (refs);
-    free_dependence_relations (ddrs);
-
-    if (!res)
-      return false;
-  }
+  res = compute_data_dependences_for_loop (loop, true, refs, ddrs);
+  if (!res)
+    return false;
 
   calculate_dominance_info (CDI_DOMINATORS);
 
@@ -886,7 +968,8 @@ if_convertible_loop_p (struct loop *loop)
 	exit_bb = bb;
     }
 
-  if (!predicate_bbs (loop))
+  res = predicate_bbs (loop);
+  if (!res)
     return false;
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -898,13 +981,11 @@ if_convertible_loop_p (struct loop *loop)
 	if (!if_convertible_phi_p (loop, bb, gsi_stmt (itr)))
 	  return false;
 
-      /* For non predicated BBs, don't check their statements.  */
-      if (!is_predicated (bb))
-	continue;
-
-      for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	if (!if_convertible_stmt_p (gsi_stmt (itr)))
-	  return false;
+      /* Check the if-convertibility of statements in predicated BBs.  */
+      if (is_predicated (bb))
+	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
+	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs))
+	    return false;
     }
 
   if (dump_file)
@@ -913,6 +994,62 @@ if_convertible_loop_p (struct loop *loop)
   return true;
 }
 
+/* Return true when LOOP is if-convertible.
+   LOOP is if-convertible if:
+   - it is innermost,
+   - it has two or more basic blocks,
+   - it has only one exit,
+   - loop header is not the exit edge,
+   - if its basic blocks and phi nodes are if convertible.  */
+
+static bool
+if_convertible_loop_p (struct loop *loop)
+{
+  edge e;
+  edge_iterator ei;
+  bool res = false;
+  VEC (data_reference_p, heap) *refs;
+  VEC (ddr_p, heap) *ddrs;
+
+  /* Handle only innermost loop.  */
+  if (!loop || loop->inner)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "not innermost loop\n");
+      return false;
+    }
+
+  /* If only one block, no need for if-conversion.  */
+  if (loop->num_nodes <= 2)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "less than 2 basic blocks\n");
+      return false;
+    }
+
+  /* More than one loop exit is too much to handle.  */
+  if (!single_exit (loop))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "multiple exits\n");
+      return false;
+    }
+
+  /* If one of the loop header's edge is an exit edge then do not
+     apply if-conversion.  */
+  FOR_EACH_EDGE (e, ei, loop->header->succs)
+    if (loop_exit_edge_p (loop, e))
+      return false;
+
+  refs = VEC_alloc (data_reference_p, heap, 5);
+  ddrs = VEC_alloc (ddr_p, heap, 25);
+  res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
+
+  free_data_refs (refs);
+  free_dependence_relations (ddrs);
+  return res;
+}
+
 /* Basic block BB has two predecessors.  Using predecessor's bb
    predicate, set an appropriate condition COND for the PHI node
    replacement.  Return the true block whose phi arguments are
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/3] Speed-up ifcvt_memrefs_wont_trap caching previous results.
  2010-08-18 19:05           ` Richard Guenther
@ 2010-08-18 19:29             ` Sebastian Pop
  2010-08-18 19:29             ` [PATCH 2/3] Do not check whether memory references accessed in every iteration trap Sebastian Pop
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-18 19:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

This patch speeds up the ifcvt_memrefs_wont_trap computation by
caching the results of the computations in the data references ->aux
fields.

	* tree-if-conv.c (struct ifc_dr): New.
	(IFC_DR): New.
	(DR_WRITTEN_AT_LEAST_ONCE): New.
	(DR_RW_UNCONDITIONALLY): New.
	(memref_read_or_written_unconditionally): Use the cached values
	when possible.
	(write_memref_written_at_least_once): Same.
	(if_convertible_loop_p): Initialize and free DR->aux fields.
---
 gcc/tree-if-conv.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 396d652..cc06522 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -446,6 +446,21 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
   return true;
 }
 
+/* Records the status of a data reference.  This struct is attached to
+   each DR->aux field.  */
+
+struct ifc_dr {
+  /* -1 when not initialized, 0 when false, 1 when true.  */
+  int written_at_least_once;
+
+  /* -1 when not initialized, 0 when false, 1 when true.  */
+  int rw_unconditionally;
+};
+
+#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
+#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
+#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
+
 /* Returns true when the memory references of STMT are read or written
    unconditionally.  In other words, this function returns true when
    for every data reference A in STMT there exist other accesses to
@@ -465,6 +480,13 @@ memrefs_read_or_written_unconditionally (gimple stmt,
     if (DR_STMT (a) == stmt)
       {
 	bool found = false;
+	int x = DR_RW_UNCONDITIONALLY (a);
+
+	if (x == 0)
+	  return false;
+
+	if (x == 1)
+	  continue;
 
 	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
 	  if (DR_STMT (b) != stmt
@@ -472,17 +494,23 @@ memrefs_read_or_written_unconditionally (gimple stmt,
 	    {
 	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
 
-	      if (is_true_predicate (cb)
+	      if (DR_RW_UNCONDITIONALLY (b) == 1
+		  || is_true_predicate (cb)
 		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
 								 ca, cb)))
 		{
+		  DR_RW_UNCONDITIONALLY (a) = 1;
+		  DR_RW_UNCONDITIONALLY (b) = 1;
 		  found = true;
 		  break;
 		}
 	    }
 
 	if (!found)
-	  return false;
+	  {
+	    DR_RW_UNCONDITIONALLY (a) = 0;
+	    return false;
+	  }
       }
 
   return true;
@@ -508,6 +536,13 @@ write_memrefs_written_at_least_once (gimple stmt,
 	&& !DR_IS_READ (a))
       {
 	bool found = false;
+	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
+
+	if (x == 0)
+	  return false;
+
+	if (x == 1)
+	  continue;
 
 	for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++)
 	  if (DR_STMT (b) != stmt
@@ -516,17 +551,23 @@ write_memrefs_written_at_least_once (gimple stmt,
 	    {
 	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
 
-	      if (is_true_predicate (cb)
+	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
+		  || is_true_predicate (cb)
 		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
 								 ca, cb)))
 		{
+		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
+		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
 		  found = true;
 		  break;
 		}
 	    }
 
 	if (!found)
-	  return false;
+	  {
+	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
+	    return false;
+	  }
       }
 
   return true;
@@ -972,6 +1013,18 @@ if_convertible_loop_p_1 (struct loop *loop,
   if (!res)
     return false;
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    {
+      data_reference_p dr;
+
+      for (i = 0; VEC_iterate (data_reference_p, *refs, i, dr); i++)
+	{
+	  dr->aux = XNEW (struct ifc_dr);
+	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
+	  DR_RW_UNCONDITIONALLY (dr) = -1;
+	}
+    }
+
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
@@ -1045,6 +1098,15 @@ if_convertible_loop_p (struct loop *loop)
   ddrs = VEC_alloc (ddr_p, heap, 25);
   res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    {
+      data_reference_p dr;
+      unsigned int i;
+
+      for (i = 0; VEC_iterate (data_reference_p, refs, i, dr); i++)
+	free (dr->aux);
+    }
+
   free_data_refs (refs);
   free_dependence_relations (ddrs);
   return res;
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] Add flag -ftree-loop-if-convert-memory-writes.
  2010-08-18 19:05           ` Richard Guenther
  2010-08-18 19:29             ` [PATCH 3/3] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
  2010-08-18 19:29             ` [PATCH 2/3] Do not check whether memory references accessed in every iteration trap Sebastian Pop
@ 2010-08-18 19:31             ` Sebastian Pop
  2010-08-18 19:32             ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
  3 siblings, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-18 19:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, Sebastian Pop

This patch adds a flag that controls the replacement of the memory
writes that are in predicated basic blocks with a full write:

for (...)
  if (cond)
    A[i] = foo

is replaced with:

for (...)
  A[i] = cond ? foo : A[i]

In order to do this, we have to call gimple_could_trap_p instead of
gimple_assign_rhs_could_trap_p, as we have to also check that the LHS
of assign stmts does not trap.

	* common.opt (ftree-loop-if-convert-memory-writes): New flag.
	* doc/invoke.texi (ftree-loop-if-convert-memory-writes): Documented.
	* tree-if-conv.c (ifc_temp_var): Pass an extra parameter GSI.  Insert
	the created statement before GSI.
	(if_convertible_phi_p): Allow virtual phi nodes when
	flag_loop_if_convert_memory_writes is set.
	(if_convertible_gimple_assign_stmt_p): Allow memory reads and writes
	Do not handle types that do not match is_gimple_reg_type.
	Remove loop and bb parameters.  Call gimple_could_trap_p instead of
	when flag_loop_if_convert_memory_writes	is set, as LHS can contain
	memory refs.
	(if_convertible_stmt_p): Remove loop and bb parameters.  Update calls
	to if_convertible_gimple_assign_stmt_p.
	(if_convertible_loop_p): Update call to if_convertible_stmt_p.
	(replace_phi_with_cond_gimple_assign_stmt): Renamed
	predicate_scalar_phi.  Do not handle virtual phi nodes.
	(ifconvert_phi_nodes): Renamed predicate_all_scalar_phis.
	Call predicate_scalar_phi.
	(insert_gimplified_predicates): Insert the gimplified predicate of a BB
	just after the labels for flag_loop_if_convert_memory_writes, otherwise
	insert the predicate in the end of the BB.
	(predicate_mem_writes): New.
	(combine_blocks): Call predicate_all_scalar_phis.  When
	flag_loop_if_convert_memory_writes is set, call predicate_mem_writes.
	(tree_if_conversion): Call mark_sym_for_renaming when
	flag_loop_if_convert_memory_writes is set.
	(main_tree_if_conversion): Return TODO_update_ssa_only_virtuals when
	flag_loop_if_convert_memory_writes is set.

	* gcc.dg/tree-ssa/ifc-4.c: New.
	* gcc.dg/tree-ssa/ifc-7.c: New.
---
 gcc/common.opt                        |    4 +
 gcc/doc/invoke.texi                   |   20 ++-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c |   53 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c |   29 +++
 gcc/tree-if-conv.c                    |  302 +++++++++++++++++++++++++++------
 5 files changed, 353 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 1285ff0..0e76b72 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -657,6 +657,10 @@ ftree-loop-if-convert
 Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
 Convert conditional jumps in innermost loops to branchless equivalents
 
+ftree-loop-if-convert-memory-writes
+Common Report Var(flag_tree_loop_if_convert_memory_writes) Optimization
+Also if-convert conditional jumps containing memory writes
+
 ; -finhibit-size-directive inhibits output of .size for ELF.
 ; This is used only for compiling crtstuff.c,
 ; and it may be extended to other effects
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index edce703..6e7c22c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -383,7 +383,8 @@ Objective-C and Objective-C++ Dialects}.
 -fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
--ftree-forwprop -ftree-fre -ftree-loop-if-convert -ftree-loop-im @gol
+-ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
+-ftree-loop-if-convert-memory-writes -ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-pta -ftree-reassoc @gol
@@ -6913,6 +6914,23 @@ the innermost loops in order to improve the ability of the
 vectorization pass to handle these loops.  This is enabled by default
 if vectorization is enabled.
 
+@item -ftree-loop-if-convert-memory-writes
+Attempt to also if-convert conditional jumps containing memory writes.
+This transformation can be unsafe for multi-threaded programs as it
+transforms conditional memory writes into unconditional memory writes.
+For example,
+@smallexample
+for (i = 0; i < N; i++)
+  if (cond)
+    A[i] = expr;
+@end smallexample
+would be transformed to
+@smallexample
+for (i = 0; i < N; i++)
+  A[i] = cond ? expr : A[i];
+@end smallexample
+potentially producing data races.
+
 @item -ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
 big loop bodies and allow further loop optimizations, like
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
new file mode 100644
index 0000000..beb1a0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-4.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+struct ht
+{
+  void * (*alloc_subobject) (int);
+};
+typedef struct cpp_reader cpp_reader;
+typedef struct cpp_token cpp_token;
+typedef struct cpp_macro cpp_macro;
+enum cpp_ttype
+{
+    CPP_PASTE,
+};
+struct cpp_token {
+  __extension__ enum cpp_ttype type : 8;
+} cpp_comment_table;
+struct cpp_macro {
+  union cpp_macro_u
+  {
+    cpp_token * tokens;
+  } exp;
+  unsigned int count;
+};
+struct cpp_reader
+{
+  struct ht *hash_table;
+};
+create_iso_definition (cpp_reader *pfile, cpp_macro *macro)
+{
+  unsigned int num_extra_tokens = 0;
+  {
+    cpp_token *tokns =
+      (cpp_token *) pfile->hash_table->alloc_subobject (sizeof (cpp_token)
+							* macro->count);
+    {
+      cpp_token *normal_dest = tokns;
+      cpp_token *extra_dest = tokns + macro->count - num_extra_tokens;
+      unsigned int i;
+      for (i = 0; i < macro->count; i++)
+	{
+	  if (macro->exp.tokens[i].type == CPP_PASTE)
+	    *extra_dest++ = macro->exp.tokens[i];
+	  else
+	    *normal_dest++ = macro->exp.tokens[i];
+	}
+    }
+  }
+}
+
+/* This cannot be if-converted because the stores are to aggregate types.  */
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 0 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
new file mode 100644
index 0000000..4d26dc7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-7.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef struct eqn_d
+{
+  int *coef;
+} *eqn;
+typedef struct omega_pb_d
+{
+  eqn subs;
+} *omega_pb;
+
+omega_pb omega_solve_problem (omega_pb);
+
+omega_pb
+omega_solve_geq (omega_pb pb, int n)
+{
+  int i, e;
+  int j = 0;
+
+  for (e = n - 1; e >= 0; e--)
+    if (pb->subs[e].coef[i] != pb->subs[e].coef[j])
+      {
+	pb->subs[e].coef[i] = j;
+	pb->subs[e].coef[j] = i;
+      }
+
+  return omega_solve_problem (pb);
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 0f1caaa..b465311 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -213,11 +213,13 @@ reset_bb_predicate (basic_block bb)
   init_bb_predicate (bb);
 }
 
-/* Create a new temp variable of type TYPE.  Add GIMPLE_ASSIGN to assign EXP
-   to the new variable.  */
+/* Returns a new SSA_NAME of type TYPE that is assigned the value of
+   the expression EXPR.  Inserts the statement created for this
+   computation before GSI and leaves the iterator GSI at the same
+   statement.  */
 
-static gimple
-ifc_temp_var (tree type, tree exp)
+static tree
+ifc_temp_var (tree type, tree expr, gimple_stmt_iterator *gsi)
 {
   const char *name = "_ifc_";
   tree var, new_name;
@@ -227,8 +229,8 @@ ifc_temp_var (tree type, tree exp)
   var = create_tmp_var (type, name);
   add_referenced_var (var);
 
-  /* Build new statement to assign EXP to new variable.  */
-  stmt = gimple_build_assign (var, exp);
+  /* Build new statement to assign EXPR to new variable.  */
+  stmt = gimple_build_assign (var, expr);
 
   /* Get SSA name for the new variable and set make new statement
      its definition statement.  */
@@ -237,7 +239,8 @@ ifc_temp_var (tree type, tree exp)
   SSA_NAME_DEF_STMT (new_name) = stmt;
   update_stmt (stmt);
 
-  return stmt;
+  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  return gimple_assign_lhs (stmt);
 }
 
 /* Return true when COND is a true predicate.  */
@@ -388,9 +391,12 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    and it belongs to basic block BB.
 
    PHI is not if-convertible if:
-   - it has more than 2 arguments,
-   - virtual PHI is immediately used in another PHI node,
-   - virtual PHI on BB other than header.  */
+   - it has more than 2 arguments.
+
+   When the flag_tree_loop_if_convert_memory_writes is not set, PHI is not
+   if-convertible if:
+   - a virtual PHI is immediately used in another PHI node,
+   - there is a virtual PHI in a BB other than the loop->header.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
@@ -408,6 +414,12 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
       return false;
     }
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    return true;
+
+  /* When the flag_tree_loop_if_convert_memory_writes is not set, check
+     that there are no memory writes in the branches of the loop to be
+     if-converted.  */
   if (!is_gimple_reg (SSA_NAME_VAR (gimple_phi_result (phi))))
     {
       imm_use_iterator imm_iter;
@@ -416,9 +428,10 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
       if (bb != loop->header)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "Virtual phi not on loop header.\n");
+	    fprintf (dump_file, "Virtual phi not on loop->header.\n");
 	  return false;
 	}
+
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
 	{
 	  if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI)
@@ -438,15 +451,13 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
    GIMPLE_ASSIGN statement is not if-convertible if,
    - it is not movable,
    - it could trap,
-   - LHS is not var decl.
-
-   GIMPLE_ASSIGN is part of block BB, which is inside loop LOOP.  */
+   - LHS is not var decl.  */
 
 static bool
-if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
-    				     gimple stmt)
+if_convertible_gimple_assign_stmt_p (gimple stmt)
 {
   tree lhs = gimple_assign_lhs (stmt);
+  basic_block bb;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
@@ -454,6 +465,9 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
     }
 
+  if (!is_gimple_reg_type (TREE_TYPE (lhs)))
+    return false;
+
   /* Some of these constrains might be too conservative.  */
   if (stmt_ends_bb_p (stmt)
       || gimple_has_volatile_ops (stmt)
@@ -466,6 +480,17 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       return false;
     }
 
+  if (flag_tree_loop_if_convert_memory_writes)
+    {
+      if (gimple_could_trap_p (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "tree could trap...\n");
+	  return false;
+	}
+      return true;
+    }
+
   if (gimple_assign_rhs_could_trap_p (stmt))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
@@ -473,9 +498,11 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
       return false;
     }
 
+  bb = gimple_bb (stmt);
+
   if (TREE_CODE (lhs) != SSA_NAME
-      && bb != loop->header
-      && !bb_with_exit_edge_p (loop, bb))
+      && bb != bb->loop_father->header
+      && !bb_with_exit_edge_p (bb->loop_father, bb))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -492,12 +519,10 @@ if_convertible_gimple_assign_stmt_p (struct loop *loop, basic_block bb,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSGIN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.
-
-   STMT is inside BB, which is inside loop LOOP.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
 
 static bool
-if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
+if_convertible_stmt_p (gimple stmt)
 {
   switch (gimple_code (stmt))
     {
@@ -507,7 +532,7 @@ if_convertible_stmt_p (struct loop *loop, basic_block bb, gimple stmt)
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (loop, bb, stmt);
+      return if_convertible_gimple_assign_stmt_p (stmt);
 
     default:
       /* Don't know what to do with 'em so don't do anything.  */
@@ -878,7 +903,7 @@ if_convertible_loop_p (struct loop *loop)
 	continue;
 
       for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	if (!if_convertible_stmt_p (loop, bb, gsi_stmt (itr)))
+	if (!if_convertible_stmt_p (gsi_stmt (itr)))
 	  return false;
     }
 
@@ -898,7 +923,7 @@ if_convertible_loop_p (struct loop *loop)
 static basic_block
 find_phi_replacement_condition (struct loop *loop,
 				basic_block bb, tree *cond,
-                                gimple_stmt_iterator *gsi)
+				gimple_stmt_iterator *gsi)
 {
   edge first_edge, second_edge;
   tree tmp_cond;
@@ -970,21 +995,16 @@ find_phi_replacement_condition (struct loop *loop,
 				    false, NULL_TREE,
 				    true, GSI_SAME_STMT);
   if (!is_gimple_reg (*cond) && !is_gimple_condexpr (*cond))
-    {
-      gimple new_stmt;
-
-      new_stmt = ifc_temp_var (TREE_TYPE (*cond), unshare_expr (*cond));
-      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-      *cond = gimple_assign_lhs (new_stmt);
-    }
+    *cond = ifc_temp_var (TREE_TYPE (*cond), unshare_expr (*cond), gsi);
 
   gcc_assert (*cond);
 
   return first_edge->src;
 }
 
-/* Replace PHI node with conditional modify expr using COND.  This
-   routine does not handle PHI nodes with more than two arguments.
+/* Replace a scalar PHI node with a COND_EXPR using COND as condition.
+   This routine does not handle PHI nodes with more than two
+   arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)
@@ -996,18 +1016,22 @@ find_phi_replacement_condition (struct loop *loop,
    TRUE_BB is selected.  */
 
 static void
-replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
-    					  basic_block true_bb,
-                                   	  gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gimple phi, tree cond,
+		      basic_block true_bb,
+		      gimple_stmt_iterator *gsi)
 {
   gimple new_stmt;
   basic_block bb;
-  tree rhs;
-  tree arg;
+  tree rhs, res, arg;
 
   gcc_assert (gimple_code (phi) == GIMPLE_PHI
 	      && gimple_phi_num_args (phi) == 2);
 
+  res = gimple_phi_result (phi);
+  /* Do not handle virtual phi nodes.  */
+  if (!is_gimple_reg (SSA_NAME_VAR (res)))
+    return;
+
   bb = gimple_bb (phi);
 
   arg = degenerate_phi_result (phi);
@@ -1029,11 +1053,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
 	}
 
       /* Build new RHS using selected condition and arguments.  */
-      rhs = build3 (COND_EXPR, TREE_TYPE (PHI_RESULT (phi)),
+      rhs = build3 (COND_EXPR, TREE_TYPE (res),
 		    unshare_expr (cond), arg_0, arg_1);
     }
 
-  new_stmt = gimple_build_assign (PHI_RESULT (phi), rhs);
+  new_stmt = gimple_build_assign (res, rhs);
   SSA_NAME_DEF_STMT (gimple_phi_result (phi)) = new_stmt;
   gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
   update_stmt (new_stmt);
@@ -1045,11 +1069,11 @@ replace_phi_with_cond_gimple_assign_stmt (gimple phi, tree cond,
     }
 }
 
-/* Replaces in LOOP all the phi nodes other than those in the
+/* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
 static void
-ifconvert_phi_nodes (struct loop *loop)
+predicate_all_scalar_phis (struct loop *loop)
 {
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -1078,7 +1102,7 @@ ifconvert_phi_nodes (struct loop *loop)
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  replace_phi_with_cond_gimple_assign_stmt (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1098,7 +1122,7 @@ insert_gimplified_predicates (loop_p loop)
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
-      gimple_seq stmts = bb_predicate_gimplified_stmts (bb);
+      gimple_seq stmts;
 
       if (!is_predicated (bb))
 	{
@@ -1109,15 +1133,30 @@ insert_gimplified_predicates (loop_p loop)
 	  continue;
 	}
 
+      stmts = bb_predicate_gimplified_stmts (bb);
       if (stmts)
 	{
-	  gimple_stmt_iterator gsi = gsi_last_bb (bb);
-
-	  if (gsi_end_p (gsi)
-	      || gimple_code (gsi_stmt (gsi)) == GIMPLE_COND)
-	    gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	  if (flag_tree_loop_if_convert_memory_writes)
+	    {
+	      /* Insert the predicate of the BB just after the label,
+		 as the if-conversion of memory writes will use this
+		 predicate.  */
+	      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+	      gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	    }
 	  else
-	    gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
+	    {
+	      /* Insert the predicate of the BB at the end of the BB
+		 as this would reduce the register pressure: the only
+		 use of this predicate will be in successor BBs.  */
+	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
+
+	      if (gsi_end_p (gsi)
+		  || stmt_ends_bb_p (gsi_stmt (gsi)))
+		gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
+	      else
+		gsi_insert_seq_after (&gsi, stmts, GSI_SAME_STMT);
+	    }
 
 	  /* Once the sequence is code generated, set it to NULL.  */
 	  set_bb_predicate_gimplified_stmts (bb, NULL);
@@ -1125,6 +1164,146 @@ insert_gimplified_predicates (loop_p loop)
     }
 }
 
+/* Predicate each write to memory in LOOP.
+
+   This function transforms control flow constructs containing memory
+   writes of the form:
+
+   | for (i = 0; i < N; i++)
+   |   if (cond)
+   |     A[i] = expr;
+
+   into the following form that does not contain control flow:
+
+   | for (i = 0; i < N; i++)
+   |   A[i] = cond ? expr : A[i];
+
+   The original CFG looks like this:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   cond = some_computation;
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   A[i] = expr;
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   insert_gimplified_predicates inserts the computation of the COND
+   expression at the beginning of the destination basic block:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   cond = some_computation;
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   cond = some_computation;
+   |   A[i] = expr;
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   predicate_mem_writes is then predicating the memory write as follows:
+
+   | bb_0
+   |   i = 0
+   | end_bb_0
+   |
+   | bb_1
+   |   if (i < N) goto bb_5 else goto bb_2
+   | end_bb_1
+   |
+   | bb_2
+   |   if (cond) goto bb_3 else goto bb_4
+   | end_bb_2
+   |
+   | bb_3
+   |   cond = some_computation;
+   |   A[i] = cond ? expr : A[i];
+   |   goto bb_4
+   | end_bb_3
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+
+   and finally combine_blocks removes the basic block boundaries making
+   the loop vectorizable:
+
+   | bb_0
+   |   i = 0
+   |   if (i < N) goto bb_5 else goto bb_1
+   | end_bb_0
+   |
+   | bb_1
+   |   cond = some_computation;
+   |   A[i] = cond ? expr : A[i];
+   |   if (i < N) goto bb_5 else goto bb_4
+   | end_bb_1
+   |
+   | bb_4
+   |   goto bb_1
+   | end_bb_4
+*/
+
+static void
+predicate_mem_writes (loop_p loop)
+{
+  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+
+  for (i = 1; i < orig_loop_num_nodes; i++)
+    {
+      gimple_stmt_iterator gsi;
+      basic_block bb = ifc_bbs[i];
+      tree cond = bb_predicate (bb);
+      gimple stmt;
+
+      if (is_true_predicate (cond))
+	continue;
+
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if ((stmt = gsi_stmt (gsi))
+	    && gimple_assign_single_p (stmt)
+	    && gimple_vdef (stmt))
+	  {
+	    tree lhs = gimple_assign_lhs (stmt);
+	    tree rhs = gimple_assign_rhs1 (stmt);
+	    tree type = TREE_TYPE (lhs);
+
+	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
+	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
+	    rhs = build3 (COND_EXPR, type, unshare_expr (cond), rhs, lhs);
+	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
+	    update_stmt (stmt);
+	  }
+    }
+}
+
 /* Remove all GIMPLE_CONDs and GIMPLE_LABELs of all the basic blocks
    other than the exit and latch of the LOOP.  Also resets the
    GIMPLE_DEBUG information.  */
@@ -1181,7 +1360,10 @@ combine_blocks (struct loop *loop)
 
   remove_conditions_and_labels (loop);
   insert_gimplified_predicates (loop);
-  ifconvert_phi_nodes (loop);
+  predicate_all_scalar_phis (loop);
+
+  if (flag_tree_loop_if_convert_memory_writes)
+    predicate_mem_writes (loop);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -1283,6 +1465,10 @@ tree_if_conversion (struct loop *loop)
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
   combine_blocks (loop);
+
+  if (flag_tree_loop_if_convert_memory_writes)
+    mark_sym_for_renaming (gimple_vop (cfun));
+
   changed = true;
 
  cleanup:
@@ -1308,6 +1494,7 @@ main_tree_if_conversion (void)
   loop_iterator li;
   struct loop *loop;
   bool changed = false;
+  unsigned todo = 0;
 
   if (number_of_loops () <= 1)
     return 0;
@@ -1315,7 +1502,13 @@ main_tree_if_conversion (void)
   FOR_EACH_LOOP (li, loop, 0)
     changed |= tree_if_conversion (loop);
 
-  return changed ? TODO_cleanup_cfg : 0;
+  if (changed)
+    todo |= TODO_cleanup_cfg;
+
+  if (changed && flag_tree_loop_if_convert_memory_writes)
+    todo |= TODO_update_ssa_only_virtuals;
+
+  return todo;
 }
 
 /* Returns true when the if-conversion pass is enabled.  */
@@ -1324,7 +1517,8 @@ static bool
 gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
-	  || flag_tree_loop_if_convert == 1);
+	  || flag_tree_loop_if_convert == 1
+	  || flag_tree_loop_if_convert_memory_writes == 1);
 }
 
 struct gimple_opt_pass pass_if_conversion =
-- 
1.7.0.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-18 19:05           ` Richard Guenther
                               ` (2 preceding siblings ...)
  2010-08-18 19:31             ` [PATCH 1/3] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
@ 2010-08-18 19:32             ` Sebastian Pop
  2010-08-24 10:26               ` Richard Guenther
  3 siblings, 1 reply; 32+ messages in thread
From: Sebastian Pop @ 2010-08-18 19:32 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

>> I will prepare the combined patch if you think that this is a better
>> form.
>
> Yes, I like that better.

I sent out the new combined patches.
Ok for trunk after regstrap?

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-18 19:32             ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
@ 2010-08-24 10:26               ` Richard Guenther
  2010-08-24 15:47                 ` Sebastian Pop
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guenther @ 2010-08-24 10:26 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: gcc-patches, matz

On Wed, 18 Aug 2010, Sebastian Pop wrote:

> >> I will prepare the combined patch if you think that this is a better
> >> form.
> >
> > Yes, I like that better.
> 
> I sent out the new combined patches.
> Ok for trunk after regstrap?

Now I just noticed that -ftree-loop-if-convert-memory-writes is long
for -ftree-loop-if-convert-stores.  But I suppose it doesn't really
matter.  The patches are ok (with or without changing the option
name - whatever you prefer).

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-24 10:26               ` Richard Guenther
@ 2010-08-24 15:47                 ` Sebastian Pop
  2010-08-24 16:31                   ` Sebastian Pop
  2010-09-20  7:44                   ` Gerald Pfeifer
  0 siblings, 2 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-24 15:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

On Tue, Aug 24, 2010 at 05:00, Richard Guenther <rguenther@suse.de> wrote:
> Now I just noticed that -ftree-loop-if-convert-memory-writes is long
> for -ftree-loop-if-convert-stores.  But I suppose it doesn't really

This sounds much better, thanks for the suggestion.

> matter.  The patches are ok (with or without changing the option
> name - whatever you prefer).

I will change the flag name, and do another round of regstrap
and if everything is alright, I will commit to trunk.

Thanks,
Sebastian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-24 15:47                 ` Sebastian Pop
@ 2010-08-24 16:31                   ` Sebastian Pop
  2010-09-20  7:44                   ` Gerald Pfeifer
  1 sibling, 0 replies; 32+ messages in thread
From: Sebastian Pop @ 2010-08-24 16:31 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, matz

[-- Attachment #1: Type: text/plain, Size: 422 bytes --]

On Tue, Aug 24, 2010 at 10:11, Sebastian Pop <sebpop@gmail.com> wrote:
> On Tue, Aug 24, 2010 at 05:00, Richard Guenther <rguenther@suse.de> wrote:
>> Now I just noticed that -ftree-loop-if-convert-memory-writes is long
>> for -ftree-loop-if-convert-stores.  But I suppose it doesn't really
>
> This sounds much better, thanks for the suggestion.

Here are the changes on top of the previous patches.

Sebastian

[-- Attachment #2: 0001-Rename-flag-ftree-loop-if-convert-stores.patch --]
[-- Type: text/x-diff, Size: 5066 bytes --]

From b72f67fe79aa1eca277c150a99c7d2e8210dd601 Mon Sep 17 00:00:00 2001
From: Sebastian Pop <sebpop@gmail.com>
Date: Tue, 24 Aug 2010 11:12:29 -0500
Subject: [PATCH] Rename flag -ftree-loop-if-convert-stores.

---
 gcc/common.opt      |    4 ++--
 gcc/doc/invoke.texi |    2 +-
 gcc/tree-if-conv.c  |   22 +++++++++++-----------
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index bffc39b..675ca35 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -712,8 +712,8 @@ ftree-loop-if-convert
 Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization
 Convert conditional jumps in innermost loops to branchless equivalents
 
-ftree-loop-if-convert-memory-writes
-Common Report Var(flag_tree_loop_if_convert_memory_writes) Optimization
+ftree-loop-if-convert-stores
+Common Report Var(flag_tree_loop_if_convert_stores) Optimization
 Also if-convert conditional jumps containing memory writes
 
 ; -finhibit-size-directive inhibits output of .size for ELF.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 23559ba..7e069f0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6932,7 +6932,7 @@ the innermost loops in order to improve the ability of the
 vectorization pass to handle these loops.  This is enabled by default
 if vectorization is enabled.
 
-@item -ftree-loop-if-convert-memory-writes
+@item -ftree-loop-if-convert-stores
 Attempt to also if-convert conditional jumps containing memory writes.
 This transformation can be unsafe for multi-threaded programs as it
 transforms conditional memory writes into unconditional memory writes.
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index cc06522..86b8f26 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -393,7 +393,7 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    PHI is not if-convertible if:
    - it has more than 2 arguments.
 
-   When the flag_tree_loop_if_convert_memory_writes is not set, PHI is not
+   When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
    - there is a virtual PHI in a BB other than the loop->header.  */
@@ -414,10 +414,10 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi)
       return false;
     }
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     return true;
 
-  /* When the flag_tree_loop_if_convert_memory_writes is not set, check
+  /* When the flag_tree_loop_if_convert_stores is not set, check
      that there are no memory writes in the branches of the loop to be
      if-converted.  */
   if (!is_gimple_reg (SSA_NAME_VAR (gimple_phi_result (phi))))
@@ -648,7 +648,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
       return false;
     }
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     {
       if (ifcvt_could_trap_p (stmt, refs))
 	{
@@ -1013,7 +1013,7 @@ if_convertible_loop_p_1 (struct loop *loop,
   if (!res)
     return false;
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     {
       data_reference_p dr;
 
@@ -1098,7 +1098,7 @@ if_convertible_loop_p (struct loop *loop)
   ddrs = VEC_alloc (ddr_p, heap, 25);
   res = if_convertible_loop_p_1 (loop, &refs, &ddrs);
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     {
       data_reference_p dr;
       unsigned int i;
@@ -1335,7 +1335,7 @@ insert_gimplified_predicates (loop_p loop)
       stmts = bb_predicate_gimplified_stmts (bb);
       if (stmts)
 	{
-	  if (flag_tree_loop_if_convert_memory_writes)
+	  if (flag_tree_loop_if_convert_stores)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1561,7 +1561,7 @@ combine_blocks (struct loop *loop)
   insert_gimplified_predicates (loop);
   predicate_all_scalar_phis (loop);
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     predicate_mem_writes (loop);
 
   /* Merge basic blocks: first remove all the edges in the loop,
@@ -1665,7 +1665,7 @@ tree_if_conversion (struct loop *loop)
      on-the-fly.  */
   combine_blocks (loop);
 
-  if (flag_tree_loop_if_convert_memory_writes)
+  if (flag_tree_loop_if_convert_stores)
     mark_sym_for_renaming (gimple_vop (cfun));
 
   changed = true;
@@ -1704,7 +1704,7 @@ main_tree_if_conversion (void)
   if (changed)
     todo |= TODO_cleanup_cfg;
 
-  if (changed && flag_tree_loop_if_convert_memory_writes)
+  if (changed && flag_tree_loop_if_convert_stores)
     todo |= TODO_update_ssa_only_virtuals;
 
   return todo;
@@ -1717,7 +1717,7 @@ gate_tree_if_conversion (void)
 {
   return ((flag_tree_vectorize && flag_tree_loop_if_convert != 0)
 	  || flag_tree_loop_if_convert == 1
-	  || flag_tree_loop_if_convert_memory_writes == 1);
+	  || flag_tree_loop_if_convert_stores == 1);
 }
 
 struct gimple_opt_pass pass_if_conversion =
-- 
1.7.0.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] Do not check whether memory references accessed in every iteration trap.
  2010-08-24 15:47                 ` Sebastian Pop
  2010-08-24 16:31                   ` Sebastian Pop
@ 2010-09-20  7:44                   ` Gerald Pfeifer
  1 sibling, 0 replies; 32+ messages in thread
From: Gerald Pfeifer @ 2010-09-20  7:44 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Guenther, gcc-patches, matz

On Tue, 24 Aug 2010, Sebastian Pop wrote:
> I will change the flag name, and do another round of regstrap
> and if everything is alright, I will commit to trunk.

Mind also adding a note to gcc-4.6/changes.html ?

Thanks,
Gerald

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-09-19 22:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-08 21:41 [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop
2010-07-08 21:41 ` [PATCH 2/4] Outline fold_or_predicates from add_to_predicate_list Sebastian Pop
2010-07-09 12:13   ` Richard Guenther
2010-07-09 16:43     ` Sebastian Pop
2010-07-09 17:04       ` Sebastian Pop
2010-07-09 18:25         ` Richard Guenther
2010-07-09 18:59           ` Sebastian Pop
2010-07-08 21:42 ` [PATCH 1/4] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
2010-08-13  8:58   ` Richard Guenther
2010-08-13 20:43     ` Sebastian Pop
2010-08-16  9:46       ` Richard Guenther
2010-08-13 22:12     ` Sebastian Pop
2010-08-16  9:46       ` Richard Guenther
2010-08-13 22:22     ` Sebastian Pop
2010-08-16  9:53       ` Richard Guenther
2010-07-08 21:42 ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
2010-08-13  9:41   ` Richard Guenther
2010-08-17 17:18     ` Sebastian Pop
2010-08-18  9:35       ` Richard Guenther
2010-08-18 17:10         ` Sebastian Pop
2010-08-18 19:05           ` Richard Guenther
2010-08-18 19:29             ` [PATCH 3/3] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
2010-08-18 19:29             ` [PATCH 2/3] Do not check whether memory references accessed in every iteration trap Sebastian Pop
2010-08-18 19:31             ` [PATCH 1/3] Add flag -ftree-loop-if-convert-memory-writes Sebastian Pop
2010-08-18 19:32             ` [PATCH 3/4] Do not check whether memory references accessed in every iteration trap Sebastian Pop
2010-08-24 10:26               ` Richard Guenther
2010-08-24 15:47                 ` Sebastian Pop
2010-08-24 16:31                   ` Sebastian Pop
2010-09-20  7:44                   ` Gerald Pfeifer
2010-07-08 21:42 ` [PATCH 4/4] Speed-up ifcvt_memrefs_wont_trap caching previous results Sebastian Pop
2010-08-13  9:02   ` Richard Guenther
2010-08-12 16:19 ` [PATCH 0/4] if-conversion of loops with conditionals containing memory writes Sebastian Pop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).