public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Extended if-conversion for loops marked with pragma omp simd.
@ 2014-06-25 14:06 Yuri Rumyantsev
  2014-07-14 10:16 ` Yuri Rumyantsev
  2014-08-01  9:40 ` Richard Biener
  0 siblings, 2 replies; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-06-25 14:06 UTC (permalink / raw)
  To: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 6678 bytes --]

Hi All,

We implemented additional support for pragma omp simd in part of
extended if-conversion loops with such pragma. These extensions
include:

1. All extensions are performed only if considered loop or its outer
   loop was marked with pragma omp simd (force_vectorize); For ordinary
   loops behavior was not changed.
2. Took off cfg restriction on basic block which can have more than 2
   predecessors.
3. Put additional restriction on phi nodes which was missed in current design:
   all phi nodes must be in non-predicated basic block to conform
   semantic of COND_EXPR which is used for transformation.
4. Extend predication of phi nodes: phi may have more than 2 arguments
with some limitations:
   - for phi nodes which have more than 2 arguments, but only two
   arguments are different and one of them has the only occurence,
transformation to  single COND_EXPR can be done.
   - if phi node has more different arguments and all edge predicates
   correspondent to phi-arguments are disjoint, a chain of COND_EXPR
   will be generated for it. In current design very simple check is used:
   check starting from end that two edges correspondent to neighbor
arguments have common predecessor which is used for further check
with next edge.
 These guarantee that phi predication will produce the correct result.

Here is example of such extended predication (compile with -march=core-avx2):
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0 & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }
  <bb 4>:
  # res_15 = PHI <res_1(5), 0(3)>
  # i_16 = PHI <i_11(5), 0(3)>
  # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
  t_5 = a[i_16];
  _6 = t_5 > 0.0;
  _7 = t_5 < 9.9999998430674944e+16;
  _8 = _7 & _6;
  _ifc__28 = (unsigned int) _8;
  _10 = &c[i_16];
  _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
  _9 = MASK_LOAD (_10, 0B, _ifc__36);
  _ifc__29 = _ifc__28 != 0 ? 1 : 0;
  _ifc__30 = (int) _ifc__29;
  _ifc__31 = _9 != 0 ? _ifc__30 : 0;
  _ifc__32 = _ifc__28 != 0 ? 1 : 0;
  _ifc__33 = (int) _ifc__32;
  _ifc__34 = _9 == 0 ? _ifc__33 : 0;
  _ifc__35 = _ifc__31 != 0 ? 1 : 0;
  res_1 = res_15 + _ifc__35;
  i_11 = i_16 + 1;
  ivtmp_14 = ivtmp_17 - 1;
  if (ivtmp_14 != 0)
    goto <bb 4>;

Bootstrap and regression testing did not show any new failures.

gcc/ChageLog

2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (flag_force_vectorize): New variable.
(struct bb_predicate_s): Add negate_predicate field.
(bb_negate_predicate): New function.
(set_bb_negate_predicate): New function.
(bb_copy_predicate): New function.
(add_stmt_to_bb_predicate_gimplified_stmts): New function.
(init_bb_predicate): Add initialization of negate_predicate field.
(reset_bb_predicate): Reset negate_predicate to NULL_TREE.
(convert_name_to_cmp): New function.
(get_type_for_cond): New function.
(convert_bool_predicate): New function.
(predicate_disjunction): New function.
(predicate_conjunction): New function.
(add_to_predicate_list): Add convert_bool argument.
Add call of predicate_disjunction if convert_bool argument is true.
(add_to_dst_predicate_list): Add convert_bool argument.
Add early function exit if edge target block is always executed.
Add call of predicate_conjunction if convert_bool argument is true.
Pass convert_bool argument for add_to_predicate_list.
(equal_phi_args): New function.
(phi_has_two_different_args): New function.
(phi_args_disjoint): New function.
(if_convertible_phi_p): Accept phi nodes with more than two args
for loops marked with pragma omp simd. Add check that phi nodes are
in non-predicated basic blocks.
(ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
flag_force_vectorize was setup. Use call of all_edges_are_critical
to reject block if-conversion with imcoming critical edges only if
flag_force_vectorize was not setup.
(walk_cond_tree): New function.
(vect_bool_pattern_is_applicable): New function.
(predicate_bbs): Add convert_bool argument that is used to transform
comparison expressions of boolean type into conditional expressions
with integral operands. If bool_conv argument is false or both
outgoing edges are not critical old algorithm of predicate assignments
is used, otherwise the following code was added: check on applicable
of vect-bool-pattern recognition and trnasformation of
(bool) x != 0  --> y = (int) x; x != 0;
compute predicates for both outgoing edges one of which is critical
one using 'normal' edge, i.e. compute true and false predicates using
normal outgoing edge only; evaluated predicates are stored in
predicate and negate_predicate fields of struct bb_predicate_s and
negate_predicate of normal edge conatins predicate of critical edge,
but generated gimplified statements are stored in their destination
block fields. Additional argument 'convert_bool" is passed to
add_to_dst_predicate_list and add_to_predicate_list.
(if_convertible_loop_p_1): Call predicate_bbs with additional argument
equal to false.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(is_cond_scalar_reduction): Add 'extended' argument which signals that
both phi arguments must be evaluated through phi_has_two_different_args.
(predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_phi_disjoint_args): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add code to set-up gimple statement
iterator for predication of extended scalar phi's for insertion.
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(predicate_mem_writes): Invoke convert_name_to_cmp for extended
predication to build mask.
(combine_blocks): Pass flag_force_vectorize to predicate_bbs.
(split_crit_edge): New function.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare). Invoke
split_crit_edge for extended predication. Do loop versioning for
innermost loop marked with pragma omp simd.

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 47517 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
old mode 100644
new mode 100755
index 36a879d..cb73468
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -116,10 +116,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "expr.h"
 #include "optabs.h"
+#include "cgraph.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -127,6 +131,11 @@ typedef struct bb_predicate_s {
   /* The condition under which this basic block is executed.  */
   tree predicate;
 
+  /* The condition under which another successor of basic block
+     ending with GIMPLE_COND stmt is executed and it lies on
+     critical edge - is used only for loops marked with simd pragma.  */
+  tree negate_predicate;
+
   /* PREDICATE is gimplified, and the sequence of statements is
      recorded here, in order to avoid the duplication of computations
      that occur in previous conditions.  See PR44483.  */
@@ -149,6 +158,14 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns the gimplified negate predicate for basic block.  */
+
+static inline tree
+bb_negate_predicate (basic_block bb)
+{
+  return ((bb_predicate_p) bb->aux)->negate_predicate;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +177,22 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets the gimplified negate predicate COND for basic block.  */
+
+static inline void
+set_bb_negate_predicate (basic_block bb, tree cond)
+{
+  ((bb_predicate_p) bb->aux)->negate_predicate = cond;
+}
+
+/* Copy negate predicate after its evaluation.  */
+
+static inline void
+bb_copy_predicate (basic_block bb)
+{
+  ((bb_predicate_p) bb->aux)->negate_predicate = bb_predicate (bb);
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -188,7 +221,18 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
     (&(((bb_predicate_p) bb->aux)->predicate_gimplified_stmts), stmts);
 }
 
-/* Initializes to TRUE the predicate of basic block BB.  */
+/* Adds statement STMT to the sequence of statements
+   of the predicate for basic block BB.  */
+
+static inline void
+add_stmt_to_bb_predicate_gimplified_stmts (basic_block bb, gimple stmt)
+{
+  gimple_seq_add_stmt
+    (&(((bb_predicate_p) bb->aux)->predicate_gimplified_stmts), stmt);
+}
+
+/* Initializes to TRUE the predicate of basic block BB. Negate predicate
+   is initialized to NULL_TREE.  */
 
 static inline void
 init_bb_predicate (basic_block bb)
@@ -196,6 +240,7 @@ init_bb_predicate (basic_block bb)
   bb->aux = XNEW (struct bb_predicate_s);
   set_bb_predicate_gimplified_stmts (bb, NULL);
   set_bb_predicate (bb, boolean_true_node);
+  set_bb_negate_predicate (bb, NULL_TREE);
 }
 
 /* Release the SSA_NAMEs associated with the predicate of basic block BB,
@@ -239,6 +284,7 @@ reset_bb_predicate (basic_block bb)
     {
       release_bb_predicate (bb);
       set_bb_predicate (bb, boolean_true_node);
+      set_bb_negate_predicate (bb, NULL_TREE);
     }
 }
 
@@ -395,11 +441,195 @@ fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
   return build3 (COND_EXPR, type, cond, rhs, lhs);
 }
 
+/* Build <name> != 0 expression when COND is SSA_NAME of int type.  */
+
+static inline tree
+convert_name_to_cmp (tree cond)
+{
+  if (TREE_CODE (cond) != SSA_NAME)
+    return cond;
+  return build2 (NE_EXPR, boolean_type_node, cond,
+		 build_int_cst (TREE_TYPE (cond), 0));
+}
+
+/* Return integral type correspondent to types of condition COND.  */
+
+static inline tree
+get_type_for_cond (tree cond)
+{
+  tree opnd;
+  enum machine_mode mode;
+
+  gcc_assert (TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison);
+  opnd = TREE_OPERAND (cond, 0);
+  if (TREE_CODE (opnd) != SSA_NAME)
+    opnd = TREE_OPERAND (cond, 1);
+  if (TREE_CODE (TREE_TYPE (opnd)) == INTEGER_TYPE)
+    return TREE_TYPE (opnd);
+  mode = TYPE_MODE (TREE_TYPE (opnd));
+  return build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
+}
+
+/* Converts bool predicate COND to cond_expr:
+   cond1 = (cond)? 1: 0,  if OP is NULL_TREE, or
+   cond1 = (cond)? op : 0 otherwise.
+   Returns lhs of created assignment.  */
+
+static tree
+convert_bool_predicate (tree cond, basic_block bb, tree op)
+{
+  gimple stmt;
+  tree lhs;
+  tree itype;
+
+  if (TREE_CODE (TREE_TYPE (cond)) != BOOLEAN_TYPE)
+    /* Predicate has been promoted to int.  */
+    return cond;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      tree c1 = TREE_OPERAND (cond, 0);
+
+      if (TREE_CODE (c1) == SSA_NAME)
+	cond = build2 (EQ_EXPR, boolean_type_node, c1,
+		       build_int_cst (boolean_type_node, 0));
+      else
+	{
+	  tree type = TREE_TYPE (TREE_OPERAND (c1, 0));
+          enum tree_code code;
+	  gcc_assert (TREE_CODE_CLASS (TREE_CODE (c1)) == tcc_comparison);
+	  code = invert_tree_comparison (TREE_CODE (c1),
+					 HONOR_NANS (TYPE_MODE (type)));
+	  cond = build2 (code, boolean_type_node, TREE_OPERAND (c1, 0),
+			 TREE_OPERAND (c1, 1));
+	}
+    }
+  else
+    cond = convert_name_to_cmp (cond);
+
+  itype = get_type_for_cond (cond);
+  if (op != NULL_TREE && !types_compatible_p (itype, TREE_TYPE (op)))
+    {
+      tree  new_temp = make_temp_ssa_name (itype, NULL, "_ifc_");
+      stmt = gimple_build_assign_with_ops (NOP_EXPR, new_temp, op, NULL_TREE);
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+      op = new_temp;
+    }
+  stmt = gimple_build_assign_with_ops
+	   (COND_EXPR,
+	    (lhs = make_temp_ssa_name (itype, NULL, "_ifc_")),
+	    cond,
+	    op == NULL_TREE ? build_one_cst (itype) : op,
+	    build_zero_cst (itype));
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Convert bool predicate: new stmt is created\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+  return lhs;
+}
+
+/* Creates new BB predicate PRD = PRD1 | PRD2, where PRD1 is old BB predicate
+   converted to int type and PRD2 is NC converted to int.  */
+
+static void
+predicate_disjunction (basic_block bb, tree nc)
+{
+  tree p1, p2;
+  gimple stmt;
+  tree lhs;
+  tree itype;
+
+  gcc_assert (flag_force_vectorize);
+  p1 = convert_bool_predicate (bb_predicate (bb), bb, NULL_TREE);
+  p2 = convert_bool_predicate (nc, bb, NULL_TREE);
+  if (!types_compatible_p (TREE_TYPE (p1), TREE_TYPE (p2)))
+    {
+      if (TYPE_PRECISION (TREE_TYPE (p1)) < TYPE_PRECISION (TREE_TYPE (p2)))
+	{
+	  itype = TREE_TYPE (p1);
+	  tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+	  stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, p2, NULL_TREE);
+	  p2 = tmp;
+	}
+      else
+	{
+	  itype = TREE_TYPE (p2);
+	  tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+          stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, p1, NULL_TREE);
+	  p1 = tmp;
+	}
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+    }
+  else
+    itype = TREE_TYPE (p1);
+  lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+  stmt = gimple_build_assign_with_ops (BIT_IOR_EXPR, lhs, p1, p2);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Create BIT IOR stmt\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+  set_bb_predicate (bb, lhs);
+}
+
+/* Returns new predicate PRD = PRD1 & PRD2, which are converted to int.  */
+
+static tree
+predicate_conjunction (basic_block bb, tree prd1, tree prd2)
+{
+  tree p1, p2;
+  gimple stmt;
+  tree itype;
+
+  gcc_assert (flag_force_vectorize);
+  p1 = convert_bool_predicate (prd1, bb, NULL_TREE);
+  /* Optimize p1 & (prd2? 1 : 0) into (prd2)? p1 : 0.  */
+  p2 = convert_bool_predicate (prd2, bb, p1);
+  if (p2 == prd2)
+    {
+      /* Need to create explicit AND stmt.  */
+      itype = TREE_TYPE (p1);
+      if (!types_compatible_p (itype, TREE_TYPE (p2)))
+	{
+	  if (TYPE_PRECISION (itype) < TYPE_PRECISION (TREE_TYPE (p2)))
+	    {
+	      tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+	      stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp,
+						   p2, NULL_TREE);
+	      p2 = tmp;
+	    }
+	  else
+	    {
+	      itype = TREE_TYPE (p2);
+	      tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+              stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp,
+						   p1, NULL_TREE);
+	      p1 = tmp;
+	    }
+	  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+
+	}
+      tree lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+      stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, lhs, p1, p2);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create BIT AND stmt.\n");
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+      return lhs;
+    }
+  return p2;
+}
+
 /* Add condition NC to the predicate list of basic block BB.  LOOP is
    the loop to be if-converted.  */
 
 static inline void
-add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
+add_to_predicate_list (struct loop *loop, basic_block bb,
+		       tree nc, bool convert_bool)
 {
   tree bc, *tp;
 
@@ -424,6 +654,13 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 	  reset_bb_predicate (bb);
 	  return;
 	}
+      /* If CONVERT_BOOL is true new predicate which is disjunction of
+	 old BB predicate and NC.  */
+      if (convert_bool)
+	{
+	  predicate_disjunction (bb, nc);
+	  return;
+	}
     }
 
   /* Allow a TRUTH_NOT_EXPR around the main predicate.  */
@@ -446,19 +683,30 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 
 static void
 add_to_dst_predicate_list (struct loop *loop, edge e,
-			   tree prev_cond, tree cond)
+			   tree prev_cond, tree cond,
+			   bool convert_bool)
 {
   if (!flow_bb_inside_loop_p (loop, e->dest))
     return;
 
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    return;
   if (!is_true_predicate (prev_cond))
-    cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-			prev_cond, cond);
+    {
+      /* If CONVERT_BOOL is true new predicate is created
+	 PRD = PRD_1 & PRD_2 where rhs predicates are converted
+	 to conditional expressions.  */
+      if (convert_bool)
+	cond = predicate_conjunction (e->dest, prev_cond, cond);
+      else
+	cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+			    prev_cond, cond);
+    }
 
-  add_to_predicate_list (loop, e->dest, cond);
+  add_to_predicate_list (loop, e->dest, cond, convert_bool);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -473,6 +721,106 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
   return false;
 }
 
+/* Returns true if both arguments of phi node are equal.  */
+
+static inline bool
+equal_phi_args (tree c1, tree c2)
+{
+  if (TREE_CODE (c1) != TREE_CODE (c2))
+    return false;
+  if (TREE_CODE (c1) == SSA_NAME)
+    return c1 == c2;
+  return (operand_equal_p (c1, c2, 0) != 0);
+}
+
+/* Returns true if phi arguments are equal except for one; argument values and
+   index of exclusive argument are saved if needed.  */
+
+static bool
+phi_has_two_different_args (gimple phi, tree *arg_0, tree *arg_1,
+			    unsigned int *index)
+{
+  unsigned int i, ind0 = 0, ind1;
+  tree arg0, arg1 = NULL_TREE;
+  bool seen_same = false;
+
+  arg0 = gimple_phi_arg_def (phi, 0);
+  for (i = 1; i < gimple_phi_num_args (phi); i++)
+    {
+      tree tmp;
+      tmp = gimple_phi_arg_def (phi, i);
+      if (arg0 == NULL_TREE
+	  && !equal_phi_args (tmp, arg1))
+	{
+	  arg0 = tmp;
+	  ind0 = i;
+	}
+      else if (seen_same && equal_phi_args (tmp, arg1))
+	continue;
+      else if (!equal_phi_args (tmp, arg0))
+	{
+	  if (arg1 == NULL_TREE)
+	    {
+	      arg1 = tmp;
+	      ind1 = i;
+	    }
+	  else if (!equal_phi_args (tmp, arg1))
+	    return false;
+	  else
+	    seen_same = true;
+	}
+      else if (!seen_same)
+	{
+	  /* Swap arguments.  */
+	  seen_same = true;
+	  arg0 = arg1;
+	  arg1 = tmp;
+	  ind0 = ind1;
+	}
+      else
+	return false;
+    }
+  if (arg0 == NULL_TREE)
+    return false;
+
+  if (arg_0)
+    *arg_0 = arg0;
+  if (arg_1)
+    *arg_1 = arg1;
+  if (index)
+    *index = ind0;
+
+  return true;
+}
+
+/* Returns true when each pair of neighbor PHI arguments starting from the
+   end of list are in basic blocks which have common immediate predecessor,
+   i.e. they do not lie on any acyclic path. This common predecessor is
+   considered for comparison with PHI argument on next iteration.  */
+
+static bool
+phi_args_disjoint (gimple phi)
+{
+  int i;
+  int num_args = gimple_phi_num_args (phi);
+  basic_block common_pred = gimple_phi_arg_edge (phi, num_args - 1)->src;
+  basic_block bb;
+
+  if (EDGE_COUNT (common_pred->preds) > 1)
+    return false;
+
+  for (i = num_args - 2; i >= 0; i--)
+    {
+      bb = gimple_phi_arg_edge (phi, i)->src;
+      if (EDGE_COUNT (bb->preds) > 1)
+	return false;
+      if (EDGE_PRED (common_pred, 0)->src != EDGE_PRED (bb, 0)->src)
+	return false;
+      common_pred = EDGE_PRED (bb, 0)->src;
+    }
+  return true;
+}
+
 /* Return true when PHI is if-convertible.  PHI is part of loop LOOP
    and it belongs to basic block BB.
 
@@ -482,7 +830,18 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   Some extensions for loops marked with simd pragma were implemented:
+   - allow PHI to have more than 2 arguments if (1) all arguments are
+   equal except for one or (2) all arguments are disjoint, i.e. basic blocks
+   computing them have disjoint predicates.
+   One restriction on PHI was added - basic block containing PHI must have
+   TRUE predicate since, e.g.
+     S1: A = PHI <x1(1), x2(5)>
+   is converted into,
+     S2: A = cond ? x1 : x2;
+   which assumes that bb predicates for x1 and x2 are complementary and
+   bb correspondent to S1 must have true predicate.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -494,11 +853,45 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+
+	  if (!virtual_operand_p (gimple_phi_result (phi)))
+	    {
+	      if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "Unable to handle predicated phi.\n");
+		  return false;
+		}
+	      /* Check that phi node can be predicated.  */
+	      if (!phi_has_two_different_args (phi, NULL, NULL, NULL)
+		  && !phi_args_disjoint (phi))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    fprintf (dump_file, "Difficult to handle this phi.\n");
+		  return false;
+		}
+	    }
+        }
+      /* Additional check on loops marked with simd pragma - able to handle
+	 non-predicated phi node only.  */
+      else if (flag_force_vectorize
+	       && !virtual_operand_p (gimple_phi_result (phi))
+	       && !dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "Unable to handle predicated phi.\n");
+	  return false;
+	}
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -728,7 +1121,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -865,7 +1258,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is intrinsic call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -912,6 +1306,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 2 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -920,6 +1330,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -932,9 +1344,13 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
 
   if (exit_bb)
     {
@@ -971,18 +1387,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_edges_are_critical (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1064,6 +1479,88 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+/* Helper function of vect_bool_pattern_is_applicable. Called recursively.
+   Returns true if given pattern can be applied. Calculate and save min type
+   precision of comparison operands in PREC.  */
+
+static bool
+walk_cond_tree (tree var, int *prec)
+{
+  gimple def_stmt;
+  enum tree_code rhs_code;
+  tree rhs1;
+
+  if (TREE_CODE (var) != SSA_NAME)
+    return false;
+  def_stmt = SSA_NAME_DEF_STMT (var);
+  if (!is_gimple_assign (def_stmt))
+    return false;
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  rhs_code = gimple_assign_rhs_code (def_stmt);
+  switch (rhs_code)
+    {
+    case SSA_NAME:
+    case BIT_NOT_EXPR:
+      return walk_cond_tree (rhs1, prec);
+
+    CASE_CONVERT:
+      if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+	   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+	return false;
+      return walk_cond_tree (rhs1, prec);
+
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      if (!walk_cond_tree (rhs1, prec))
+	return false;
+      return walk_cond_tree (gimple_assign_rhs2 (def_stmt), prec);
+
+    default:
+      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	{
+	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
+	    {
+	      enum machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+	      *prec = MIN (*prec, GET_MODE_BITSIZE (mode));
+	    }
+	  else
+	    *prec = MIN (*prec, TYPE_PRECISION (TREE_TYPE (rhs1)));
+	  return true;
+	}
+      return false;
+    }
+}
+
+/* Returns true if condition in STMT is presented by
+   name != false and name has boolean type.
+   Assumes that STMT is GIMPLE_COND and its condition is presented
+   by conjunction/disjunction of comparisons - walk_cond_tree is called
+   to check it. Later gimple condition OP0 will be promoted into int
+   type with precision 'prec' for vect_bool_pattern recognition.  */
+
+static inline bool
+vect_bool_pattern_is_applicable (gimple stmt, int *prec)
+{
+  tree op0, op1;
+  enum tree_code code;
+
+  op0 = gimple_cond_lhs (stmt);
+  op1 = gimple_cond_rhs (stmt);
+  code = gimple_cond_code (stmt);
+
+  if (TREE_CODE (TREE_TYPE (op0)) != BOOLEAN_TYPE)
+    return false;
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+  if (!integer_zerop (op1))
+    return false;
+  /* Init prec to max value.  */
+  *prec = 1024;
+  return walk_cond_tree (op0, prec);
+}
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1080,10 +1577,14 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
    |   S2;
 
    S1 will be predicated with "x", and
-   S2 will be predicated with "!x".  */
+   S2 will be predicated with "!x".
+
+   CONVERT_BOOL argument was added to convert bool predicate computations
+   which is not supported by vectorizer to int type through creating of
+   conditional expressions.  */
 
 static void
-predicate_bbs (loop_p loop)
+predicate_bbs (loop_p loop, bool convert_bool)
 {
   unsigned int i;
 
@@ -1096,9 +1597,10 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
 	  reset_bb_predicate (loop->latch);
 	  continue;
@@ -1108,27 +1610,144 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+	  int prec;
 
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
-
-	  /* If C is true, then TRUE_EDGE is taken.  */
-	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
-				     unshare_expr (c));
-
-	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
-	  add_to_dst_predicate_list (loop, false_edge,
-				     unshare_expr (cond), c2);
+	  /* Compute predicates for true and false edges.  */
+	  if (!(convert_bool
+		&& vect_bool_pattern_is_applicable (stmt, &prec)))
+	    {
+	      c = fold_build2_loc (loc, code,
+				   boolean_type_node,
+				   lopnd,
+				   gimple_cond_rhs (stmt));
+	      /* Fold_build2 can produce bool conversion which is not
+                 supported by vectorizer, so re-build it without folding.  */
+	      if (convert_bool && CONVERT_EXPR_P (c)
+		  && TREE_CODE_CLASS (code) == tcc_comparison)
+		c = build2_loc (loc, code, boolean_type_node,
+				lopnd, gimple_cond_rhs (stmt));
+	      c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			       unshare_expr (c));
+	    }
+	  else
+	    {
+	      /* Convert bool predicate to int - to apply vectorization
+		 bool pattern recognition.  */
+	      tree itype = build_nonstandard_integer_type (prec, 1);
+	      tree lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+	      enum tree_code code = gimple_cond_code (stmt);
+	      enum tree_code inv_code = invert_tree_comparison (code, false);
+	      /* Create convert expression.  */
+	      gimple conv = gimple_build_assign_with_ops
+			      (CONVERT_EXPR,
+			       lhs,
+			       lopnd,
+			       NULL_TREE);
+	      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+	      update_stmt (conv);
+	      /* Insert new convert stmt before last stmt.  */
+	      gsi_insert_before (&gsi, conv, GSI_SAME_STMT);
+	      c = build2_loc (loc, code, boolean_type_node, lhs,
+			      build_zero_cst (itype));
+	      c2 = build2_loc (loc, inv_code, boolean_type_node, lhs,
+			       build_zero_cst (itype));
+	    }
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+
+	  /* If CONVERT_BOOL is true we must have exact predicate for
+	     each outgoing edge even if it is critical edge. These predicates
+	     can be used for if-conversion of phi nodes. For critical
+	     edge its predicate is saved in NEGATE_PREDICATE of other
+	     normal edge. For example, we can convert the following phi:
+	      bb_1
+		x_1 = ...;
+		_4 = x1_1 < 100;
+		_1 = x1_1 > 0;
+		_5 = _4 & _1;
+		if (_5 != 0) goto bb_3 else goto bb_2
+	      end_bb_1
+
+	      bb_2
+	      end_bb_2
+
+	      bb_3
+	        x1_2 = PHI <x1_1(1), 100(2), 100(0)>  */
+
+	  if (convert_bool && EDGE_COUNT (false_edge->dest->preds) >= 2
+	      && dominated_by_p (CDI_DOMINATORS, loop->latch,
+				 false_edge->dest))
+	    {
+	      tree prd;
+	      /* Add predicate for true edge.  */
+	      gcc_assert (EDGE_COUNT (true_edge->dest->preds) == 1);
+	      add_to_dst_predicate_list (loop, true_edge,
+					 unshare_expr (cond),
+					 unshare_expr (c),
+					 true);
+	      /* Save computed predicate and reset it to true.  */
+	      prd = bb_predicate (true_edge->dest);
+	      set_bb_predicate (true_edge->dest, boolean_true_node);
+	      /* Add predicate for false edge.  */
+	      add_to_dst_predicate_list (loop, true_edge,
+					 unshare_expr (cond),
+					 unshare_expr (c2), true);
+	      /* Copy computed predicate to negate predicate.  */
+	      bb_copy_predicate (true_edge->dest);
+	      /* Restore predicate for true edge destination.  */
+	      set_bb_predicate (true_edge->dest, prd);
+	    }
+	  else if (convert_bool && EDGE_COUNT (true_edge->dest->preds) >= 2
+		   && dominated_by_p (CDI_DOMINATORS, loop->latch,
+				      true_edge->dest))
+	    {
+	      tree prd;
+	      gimple_seq t_seq, f_seq;
+	      gcc_assert (EDGE_COUNT (false_edge->dest->preds) == 1);
+	      /* Add predicate for false edge.  */
+	      add_to_dst_predicate_list (loop, false_edge,
+					 unshare_expr (cond),
+					 unshare_expr (c2),
+					 true);
+	      /* Save predicate and reset it to true.  */
+	      prd = bb_predicate (false_edge->dest);
+	      set_bb_predicate (false_edge->dest, boolean_true_node);
+	      /* Save predicate gimplified stmt sequence and reset it.  */
+	      f_seq = bb_predicate_gimplified_stmts (false_edge->dest);
+	      set_bb_predicate_gimplified_stmts (false_edge->dest, NULL);
+	      /* Add predicate for true edge.  */
+	      add_to_dst_predicate_list (loop, false_edge,
+					 unshare_expr (cond),
+					 unshare_expr (c),
+					 true);
+	      /* Copy predicate of true-edge destination
+		 to negate predicate.  */
+	      bb_copy_predicate (false_edge->dest);
+	      /* Set up predicate for false-edge destination.  */
+	      set_bb_predicate (false_edge->dest, prd);
+	      /* Copy predicate gimplified stmt sequence to true edge
+		 destination basic block.  */
+	      t_seq = bb_predicate_gimplified_stmts (false_edge->dest);
+	      add_bb_predicate_gimplified_stmts (true_edge->dest, t_seq);
+	      /* Restore predicate gimplified stmt sequence for false
+		 edge destination.  */
+	      set_bb_predicate_gimplified_stmts (false_edge->dest, f_seq);
+	    }
+	  else
+	    {
+	      /* If C is true, then TRUE_EDGE is taken.  */
+	      add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
+					 unshare_expr (c), convert_bool);
+
+	      /* If C is false, then FALSE_EDGE is taken.  */
+	      add_to_dst_predicate_list (loop, false_edge,
+					 unshare_expr (cond),
+					 unshare_expr (c2), convert_bool);
+	    }
 
 	  cond = NULL_TREE;
 	}
@@ -1145,7 +1764,7 @@ predicate_bbs (loop_p loop)
 	  if (cond == NULL_TREE)
 	    cond = boolean_true_node;
 
-	  add_to_predicate_list (loop, bb_n, cond);
+	  add_to_predicate_list (loop, bb_n, cond, convert_bool);
 	}
     }
 
@@ -1226,7 +1845,7 @@ if_convertible_loop_p_1 (struct loop *loop,
 	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
 	  DR_RW_UNCONDITIONALLY (dr) = -1;
 	}
-      predicate_bbs (loop);
+      predicate_bbs (loop, false);
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -1337,7 +1956,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1346,44 +1967,49 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
+  if (EDGE_COUNT (bb->preds) == 2
+      && !all_edges_are_critical (bb))
     {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+
+      /* Prefer an edge with a not negated predicate.
+         ???  That's a very weak cost model.  */
+      tmp_cond = bb_predicate (first_edge->src);
+      gcc_assert (tmp_cond);
+      if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge;
 
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
+	  tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
 
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
+      /* Check if the edge we take the condition from is not critical.
+         We know that at least one non-critical edge exists.  */
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  *cond = bb_predicate (second_edge->src);
+	  gcc_assert (EDGE_COUNT (second_edge->src->succs) == 1);
+	  if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
+	    *cond = TREE_OPERAND (*cond, 0);
+	  else
+	    /* Select non loop header bb.  */
+	    first_edge = second_edge;
+	}
       else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
+	*cond = bb_predicate (first_edge->src);
 
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
+				          is_gimple_condexpr, NULL_TREE,
+				          true, GSI_SAME_STMT);
 
-  return first_edge->src;
+      return first_edge->src;
+    }
+  gcc_assert (flag_force_vectorize);
+  return NULL;
 }
 
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
@@ -1400,7 +2026,7 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 
 static bool
 is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
   tree arg_0, arg_1;
@@ -1412,8 +2038,11 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   imm_use_iterator imm_iter;
   use_operand_p use_p;
 
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  if (extended)
+    phi_has_two_different_args (phi, &arg_0, &arg_1, NULL);
+  else
+    arg_0 = PHI_ARG_DEF (phi, 0);
+    arg_1 = PHI_ARG_DEF (phi, 1);
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
@@ -1572,7 +2201,7 @@ predicate_scalar_phi (gimple phi, tree cond,
     return;
 
   bb = gimple_bb (phi);
-
+  cond = convert_name_to_cmp (cond);
   if ((arg = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
@@ -1597,7 +2226,7 @@ predicate_scalar_phi (gimple phi, tree cond,
 	  arg_0 = gimple_phi_arg_def (phi, 0);
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
@@ -1618,6 +2247,232 @@ predicate_scalar_phi (gimple phi, tree cond,
     }
 }
 
+/* Returns predicate under which edge is taken.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1)
+    /* Use predicate of src basic block if it has the only successor.  */
+    c = bb_predicate (b);
+  else
+    {
+      /* Need to take negate predicate of another outgoing edge.  */
+      edge e1 = EDGE_SUCC (b, 0);
+
+      if (e1->dest == e->dest)
+	e1 = EDGE_SUCC (b, 1);
+      c = bb_negate_predicate (e1->dest);
+      gcc_assert (c != NULL_TREE);
+    }
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "get_predicate_for_edge %d --> %d :\n",
+	       e->src->index, e->dest->index);
+      print_generic_expr (dump_file, c, TDF_SLIM);
+      fputs ("\n", dump_file);
+    }
+
+  return convert_name_to_cmp (c);
+}
+
+/* Returns insertion point for predicated phi node:
+   We distinguish 3 different cases to preserve use-def chains:
+   - bb contains only stmt's computing predicates, returns value is NULL
+     and *BEFORE is false, must insert after last stmt;
+   - bb is empty, returns NULL and *BEFORE is true, must insert before
+     first non-label stmt;
+   - bb contains both predicate computations and original stmts, must
+     insert before first original stmt.  */
+
+static gimple
+find_insertion_point (basic_block bb, bool *before)
+{
+  gimple_stmt_iterator gsi;
+  gimple stmt = NULL;
+  tree lhs;
+  bool seen_temps = false;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_code (stmt) == GIMPLE_LABEL)
+	continue;
+      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	break;
+      lhs = gimple_assign_lhs (stmt);
+      if (TREE_CODE (lhs) != SSA_NAME)
+	break;
+      if (SSA_NAME_VAR (lhs) != NULL)
+	break;
+      lhs = SSA_NAME_IDENTIFIER (lhs);
+      if (!lhs)
+	break;
+      if (strncmp (IDENTIFIER_POINTER (lhs), "_ifc_", 5) == 0)
+	{
+	  seen_temps = true;
+	  continue;
+	}
+    }
+  if (gsi_end_p (gsi))
+    {
+      if (seen_temps)
+	/* Must insert after last stmt in bb.  */
+	*before = false;
+      else
+	/* BB is empty.  */
+	*before = true;
+      return NULL;
+    }
+
+  return stmt;
+}
+
+/* This is enhancement for predication of a phi node which arguments
+   are correspondent to edges with disjoint predicates, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   all edge predicates are disjoint. For such phi node we can
+   produce a chain of cond expressions evaluating final value.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point.
+    Returns cond expression correspondent to rhs of new phi
+    replacement stmt.  */
+
+static tree
+predicate_phi_disjoint_args (gimple phi, gimple_stmt_iterator *gsi,
+			     bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  gimple stmt;
+  tree lhs;
+  tree cond;
+
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, i));
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  stmt = gimple_build_assign_with_ops (COND_EXPR, lhs, cond,
+					       last, curr);
+	}
+      else
+	stmt = gimple_build_assign_with_ops (COND_EXPR, lhs, cond, curr, last);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, 0));
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    return fold_build_cond_expr (type,
+				 unshare_expr (TREE_OPERAND (cond, 0)),
+				 last,
+				 curr);
+  return fold_build_cond_expr (type, unshare_expr (cond), curr, last);
+}
+
+/* Replace scalar phi node with more than 2 arguments to cond expression.  */
+
+static void
+predicate_extended_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+			       bool before)
+{
+  gimple new_stmt, reduc;
+  tree rhs, res, arg0, arg1, op0, op1;
+  tree cond;
+  unsigned int index0;
+  edge e;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  if (!phi_has_two_different_args (phi, &arg0, &arg1, &index0))
+      rhs = predicate_phi_disjoint_args (phi, gsi, before);
+  else
+    {
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+
+      if (!(is_cond_scalar_reduction (phi, &reduc, &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1, swap);
+    }
+  new_stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
+  update_stmt (new_stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new ext. phi replacement stmt\n");
+      print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1627,6 +2482,8 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  gimple stmt;
+  bool before = true;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1647,11 +2504,25 @@ predicate_all_scalar_phis (struct loop *loop)
 	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
       true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      if (!true_bb)
+	{
+	  /* Must use extended PHI predication; find out insertion point
+	     for unconditional PHI node evaluations.  */
+	  before = true;
+	  stmt = find_insertion_point (bb, &before);
+	  if (stmt != NULL)
+	    gsi = gsi_for_stmt (stmt);
+	  else if (!before)
+	    gsi = gsi_last_bb (bb);
+	}
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (true_bb)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_extended_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1673,7 +2544,7 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1686,7 +2557,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || flag_force_vectorize)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1863,9 +2735,12 @@ predicate_mem_writes (loop_p loop)
 	    addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
 					     true, NULL_TREE, true,
 					     GSI_SAME_STMT);
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
+	    if (flag_force_vectorize)
+	      cond = convert_name_to_cmp (cond);
+	    else
+	      cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+						 is_gimple_condexpr, NULL_TREE,
+						 true, GSI_SAME_STMT);
 	    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
 					 mask_op0, mask_op1);
 	    mask = ifc_temp_var (masktype, mask, &gsi);
@@ -1901,9 +2776,12 @@ predicate_mem_writes (loop_p loop)
 		lhs = rhs;
 		rhs = tem;
 	      }
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
+	    if (flag_force_vectorize)
+	      cond = convert_name_to_cmp (cond);
+	    else
+	      cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+						 is_gimple_condexpr, NULL_TREE,
+						 true, GSI_SAME_STMT);
 	    rhs = fold_build_cond_expr (type, unshare_expr (cond), rhs, lhs);
 	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	    update_stmt (stmt);
@@ -1965,7 +2843,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
   edge e;
   edge_iterator ei;
 
-  predicate_bbs (loop);
+  predicate_bbs (loop, flag_force_vectorize);
   remove_conditions_and_labels (loop);
   insert_gimplified_predicates (loop, any_mask_load_store);
   predicate_all_scalar_phis (loop);
@@ -2096,6 +2974,62 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+/* Split one edge in bb ending with COND_EXPR
+   if both its outgoing edges are critical.
+   It is necessary to keep at least one predicate of successor block which
+   can be used for PHI node predication.
+   Returns false if loop won't be if-converted and true otherwise.  */
+
+static bool
+split_crit_edge (struct loop *loop)
+{
+  basic_block *body;
+  basic_block bb;
+  unsigned int i;
+  unsigned int num = loop->num_nodes;
+  gimple stmt;
+  edge e;
+  edge_iterator ei;
+
+  /* Check if loop can be if-convertible.  */
+  if (!single_exit (loop))
+    return false;
+
+  /* If one of the loop header's edge is an exit edge then do not
+     apply if-conversion.  */
+  FOR_EACH_EDGE (e, ei, loop->header->succs)
+    if (loop_exit_edge_p (loop, e))
+      return false;
+
+  body = get_loop_body (loop);
+
+  for (i = 0; i < num; i++)
+    {
+      bb = body[i];
+      if (bb == loop->latch || bb == loop->header
+	  || bb_with_exit_edge_p (loop, bb))
+	continue;
+      stmt = last_stmt (bb);
+      /* Skip basic blocks not ending with conditional branch.  */
+      if (!(stmt && gimple_code (stmt) == GIMPLE_COND))
+	continue;
+      /* Consider only basic block both successors of which are
+	 on critical edge.  */
+      if (EDGE_COUNT (EDGE_SUCC (bb, 0)->dest->preds) > 1
+	  && EDGE_COUNT (EDGE_SUCC (bb, 1)->dest->preds) > 1)
+	{
+	  edge e = EDGE_SUCC (bb, 0);
+	  /* Can split edge if both src and dest are in the same loop.  */
+	  if (e->dest->loop_father != e->src->loop_father)
+	    e = EDGE_SUCC (bb, 1);
+	  gcc_assert (e->dest->loop_father == e->src->loop_father);
+	  split_edge (e);
+	}
+    }
+  free (body);
+  return true;
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2107,6 +3041,20 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  flag_force_vectorize = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!flag_force_vectorize)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	flag_force_vectorize = true;
+    }
+
+  /* Do critical edge splitting only if loop was marked with simd pragma.  */
+  if (flag_force_vectorize)
+    if (!split_crit_edge (loop))
+      goto cleanup;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2116,7 +3064,8 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store || loop->force_vectorize)
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-06-25 14:06 [PATCH] Extended if-conversion for loops marked with pragma omp simd Yuri Rumyantsev
@ 2014-07-14 10:16 ` Yuri Rumyantsev
  2014-07-14 12:16   ` Richard Biener
  2014-08-01  9:40 ` Richard Biener
  1 sibling, 1 reply; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-07-14 10:16 UTC (permalink / raw)
  To: gcc-patches, Igor Zamyatin

Ping!

2014-06-25 18:06 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Hi All,
>
> We implemented additional support for pragma omp simd in part of
> extended if-conversion loops with such pragma. These extensions
> include:
>
> 1. All extensions are performed only if considered loop or its outer
>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>    loops behavior was not changed.
> 2. Took off cfg restriction on basic block which can have more than 2
>    predecessors.
> 3. Put additional restriction on phi nodes which was missed in current design:
>    all phi nodes must be in non-predicated basic block to conform
>    semantic of COND_EXPR which is used for transformation.
> 4. Extend predication of phi nodes: phi may have more than 2 arguments
> with some limitations:
>    - for phi nodes which have more than 2 arguments, but only two
>    arguments are different and one of them has the only occurence,
> transformation to  single COND_EXPR can be done.
>    - if phi node has more different arguments and all edge predicates
>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>    will be generated for it. In current design very simple check is used:
>    check starting from end that two edges correspondent to neighbor
> arguments have common predecessor which is used for further check
> with next edge.
>  These guarantee that phi predication will produce the correct result.
>
> Here is example of such extended predication (compile with -march=core-avx2):
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0 & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>   <bb 4>:
>   # res_15 = PHI <res_1(5), 0(3)>
>   # i_16 = PHI <i_11(5), 0(3)>
>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>   t_5 = a[i_16];
>   _6 = t_5 > 0.0;
>   _7 = t_5 < 9.9999998430674944e+16;
>   _8 = _7 & _6;
>   _ifc__28 = (unsigned int) _8;
>   _10 = &c[i_16];
>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>   _ifc__30 = (int) _ifc__29;
>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>   _ifc__33 = (int) _ifc__32;
>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>   res_1 = res_15 + _ifc__35;
>   i_11 = i_16 + 1;
>   ivtmp_14 = ivtmp_17 - 1;
>   if (ivtmp_14 != 0)
>     goto <bb 4>;
>
> Bootstrap and regression testing did not show any new failures.
>
> gcc/ChageLog
>
> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (flag_force_vectorize): New variable.
> (struct bb_predicate_s): Add negate_predicate field.
> (bb_negate_predicate): New function.
> (set_bb_negate_predicate): New function.
> (bb_copy_predicate): New function.
> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
> (init_bb_predicate): Add initialization of negate_predicate field.
> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
> (convert_name_to_cmp): New function.
> (get_type_for_cond): New function.
> (convert_bool_predicate): New function.
> (predicate_disjunction): New function.
> (predicate_conjunction): New function.
> (add_to_predicate_list): Add convert_bool argument.
> Add call of predicate_disjunction if convert_bool argument is true.
> (add_to_dst_predicate_list): Add convert_bool argument.
> Add early function exit if edge target block is always executed.
> Add call of predicate_conjunction if convert_bool argument is true.
> Pass convert_bool argument for add_to_predicate_list.
> (equal_phi_args): New function.
> (phi_has_two_different_args): New function.
> (phi_args_disjoint): New function.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> for loops marked with pragma omp simd. Add check that phi nodes are
> in non-predicated basic blocks.
> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> flag_force_vectorize was setup. Use call of all_edges_are_critical
> to reject block if-conversion with imcoming critical edges only if
> flag_force_vectorize was not setup.
> (walk_cond_tree): New function.
> (vect_bool_pattern_is_applicable): New function.
> (predicate_bbs): Add convert_bool argument that is used to transform
> comparison expressions of boolean type into conditional expressions
> with integral operands. If bool_conv argument is false or both
> outgoing edges are not critical old algorithm of predicate assignments
> is used, otherwise the following code was added: check on applicable
> of vect-bool-pattern recognition and trnasformation of
> (bool) x != 0  --> y = (int) x; x != 0;
> compute predicates for both outgoing edges one of which is critical
> one using 'normal' edge, i.e. compute true and false predicates using
> normal outgoing edge only; evaluated predicates are stored in
> predicate and negate_predicate fields of struct bb_predicate_s and
> negate_predicate of normal edge conatins predicate of critical edge,
> but generated gimplified statements are stored in their destination
> block fields. Additional argument 'convert_bool" is passed to
> add_to_dst_predicate_list and add_to_predicate_list.
> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
> equal to false.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (is_cond_scalar_reduction): Add 'extended' argument which signals that
> both phi arguments must be evaluated through phi_has_two_different_args.
> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_phi_disjoint_args): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add code to set-up gimple statement
> iterator for predication of extended scalar phi's for insertion.
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
> predication to build mask.
> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
> (split_crit_edge): New function.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare). Invoke
> split_crit_edge for extended predication. Do loop versioning for
> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-07-14 10:16 ` Yuri Rumyantsev
@ 2014-07-14 12:16   ` Richard Biener
  2014-07-28 11:22     ` Yuri Rumyantsev
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2014-07-14 12:16 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Jul 14, 2014 at 12:16 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Ping!

It's in my queue (pretty large patch for a drive-by review - maybe there is
an opportunity to split the patch up?).

Won't get to it before the Cauldron though.

Richard.

> 2014-06-25 18:06 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Hi All,
>>
>> We implemented additional support for pragma omp simd in part of
>> extended if-conversion loops with such pragma. These extensions
>> include:
>>
>> 1. All extensions are performed only if considered loop or its outer
>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>    loops behavior was not changed.
>> 2. Took off cfg restriction on basic block which can have more than 2
>>    predecessors.
>> 3. Put additional restriction on phi nodes which was missed in current design:
>>    all phi nodes must be in non-predicated basic block to conform
>>    semantic of COND_EXPR which is used for transformation.
>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>> with some limitations:
>>    - for phi nodes which have more than 2 arguments, but only two
>>    arguments are different and one of them has the only occurence,
>> transformation to  single COND_EXPR can be done.
>>    - if phi node has more different arguments and all edge predicates
>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>    will be generated for it. In current design very simple check is used:
>>    check starting from end that two edges correspondent to neighbor
>> arguments have common predecessor which is used for further check
>> with next edge.
>>  These guarantee that phi predication will produce the correct result.
>>
>> Here is example of such extended predication (compile with -march=core-avx2):
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0 & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>   <bb 4>:
>>   # res_15 = PHI <res_1(5), 0(3)>
>>   # i_16 = PHI <i_11(5), 0(3)>
>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>   t_5 = a[i_16];
>>   _6 = t_5 > 0.0;
>>   _7 = t_5 < 9.9999998430674944e+16;
>>   _8 = _7 & _6;
>>   _ifc__28 = (unsigned int) _8;
>>   _10 = &c[i_16];
>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>   _ifc__30 = (int) _ifc__29;
>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>   _ifc__33 = (int) _ifc__32;
>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>   res_1 = res_15 + _ifc__35;
>>   i_11 = i_16 + 1;
>>   ivtmp_14 = ivtmp_17 - 1;
>>   if (ivtmp_14 != 0)
>>     goto <bb 4>;
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> gcc/ChageLog
>>
>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (flag_force_vectorize): New variable.
>> (struct bb_predicate_s): Add negate_predicate field.
>> (bb_negate_predicate): New function.
>> (set_bb_negate_predicate): New function.
>> (bb_copy_predicate): New function.
>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>> (init_bb_predicate): Add initialization of negate_predicate field.
>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>> (convert_name_to_cmp): New function.
>> (get_type_for_cond): New function.
>> (convert_bool_predicate): New function.
>> (predicate_disjunction): New function.
>> (predicate_conjunction): New function.
>> (add_to_predicate_list): Add convert_bool argument.
>> Add call of predicate_disjunction if convert_bool argument is true.
>> (add_to_dst_predicate_list): Add convert_bool argument.
>> Add early function exit if edge target block is always executed.
>> Add call of predicate_conjunction if convert_bool argument is true.
>> Pass convert_bool argument for add_to_predicate_list.
>> (equal_phi_args): New function.
>> (phi_has_two_different_args): New function.
>> (phi_args_disjoint): New function.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> for loops marked with pragma omp simd. Add check that phi nodes are
>> in non-predicated basic blocks.
>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>> to reject block if-conversion with imcoming critical edges only if
>> flag_force_vectorize was not setup.
>> (walk_cond_tree): New function.
>> (vect_bool_pattern_is_applicable): New function.
>> (predicate_bbs): Add convert_bool argument that is used to transform
>> comparison expressions of boolean type into conditional expressions
>> with integral operands. If bool_conv argument is false or both
>> outgoing edges are not critical old algorithm of predicate assignments
>> is used, otherwise the following code was added: check on applicable
>> of vect-bool-pattern recognition and trnasformation of
>> (bool) x != 0  --> y = (int) x; x != 0;
>> compute predicates for both outgoing edges one of which is critical
>> one using 'normal' edge, i.e. compute true and false predicates using
>> normal outgoing edge only; evaluated predicates are stored in
>> predicate and negate_predicate fields of struct bb_predicate_s and
>> negate_predicate of normal edge conatins predicate of critical edge,
>> but generated gimplified statements are stored in their destination
>> block fields. Additional argument 'convert_bool" is passed to
>> add_to_dst_predicate_list and add_to_predicate_list.
>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>> equal to false.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>> both phi arguments must be evaluated through phi_has_two_different_args.
>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_phi_disjoint_args): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>> iterator for predication of extended scalar phi's for insertion.
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>> predication to build mask.
>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>> (split_crit_edge): New function.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare). Invoke
>> split_crit_edge for extended predication. Do loop versioning for
>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-07-14 12:16   ` Richard Biener
@ 2014-07-28 11:22     ` Yuri Rumyantsev
  0 siblings, 0 replies; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-07-28 11:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Ping!

2014-07-14 16:16 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jul 14, 2014 at 12:16 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Ping!
>
> It's in my queue (pretty large patch for a drive-by review - maybe there is
> an opportunity to split the patch up?).
>
> Won't get to it before the Cauldron though.
>
> Richard.
>
>> 2014-06-25 18:06 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>> Hi All,
>>>
>>> We implemented additional support for pragma omp simd in part of
>>> extended if-conversion loops with such pragma. These extensions
>>> include:
>>>
>>> 1. All extensions are performed only if considered loop or its outer
>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>    loops behavior was not changed.
>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>    predecessors.
>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>    all phi nodes must be in non-predicated basic block to conform
>>>    semantic of COND_EXPR which is used for transformation.
>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>> with some limitations:
>>>    - for phi nodes which have more than 2 arguments, but only two
>>>    arguments are different and one of them has the only occurence,
>>> transformation to  single COND_EXPR can be done.
>>>    - if phi node has more different arguments and all edge predicates
>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>    will be generated for it. In current design very simple check is used:
>>>    check starting from end that two edges correspondent to neighbor
>>> arguments have common predecessor which is used for further check
>>> with next edge.
>>>  These guarantee that phi predication will produce the correct result.
>>>
>>> Here is example of such extended predication (compile with -march=core-avx2):
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0 & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>   <bb 4>:
>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>   t_5 = a[i_16];
>>>   _6 = t_5 > 0.0;
>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>   _8 = _7 & _6;
>>>   _ifc__28 = (unsigned int) _8;
>>>   _10 = &c[i_16];
>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__30 = (int) _ifc__29;
>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__33 = (int) _ifc__32;
>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>   res_1 = res_15 + _ifc__35;
>>>   i_11 = i_16 + 1;
>>>   ivtmp_14 = ivtmp_17 - 1;
>>>   if (ivtmp_14 != 0)
>>>     goto <bb 4>;
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> gcc/ChageLog
>>>
>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>> (struct bb_predicate_s): Add negate_predicate field.
>>> (bb_negate_predicate): New function.
>>> (set_bb_negate_predicate): New function.
>>> (bb_copy_predicate): New function.
>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>> (convert_name_to_cmp): New function.
>>> (get_type_for_cond): New function.
>>> (convert_bool_predicate): New function.
>>> (predicate_disjunction): New function.
>>> (predicate_conjunction): New function.
>>> (add_to_predicate_list): Add convert_bool argument.
>>> Add call of predicate_disjunction if convert_bool argument is true.
>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>> Add early function exit if edge target block is always executed.
>>> Add call of predicate_conjunction if convert_bool argument is true.
>>> Pass convert_bool argument for add_to_predicate_list.
>>> (equal_phi_args): New function.
>>> (phi_has_two_different_args): New function.
>>> (phi_args_disjoint): New function.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>> in non-predicated basic blocks.
>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>> to reject block if-conversion with imcoming critical edges only if
>>> flag_force_vectorize was not setup.
>>> (walk_cond_tree): New function.
>>> (vect_bool_pattern_is_applicable): New function.
>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>> comparison expressions of boolean type into conditional expressions
>>> with integral operands. If bool_conv argument is false or both
>>> outgoing edges are not critical old algorithm of predicate assignments
>>> is used, otherwise the following code was added: check on applicable
>>> of vect-bool-pattern recognition and trnasformation of
>>> (bool) x != 0  --> y = (int) x; x != 0;
>>> compute predicates for both outgoing edges one of which is critical
>>> one using 'normal' edge, i.e. compute true and false predicates using
>>> normal outgoing edge only; evaluated predicates are stored in
>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>> negate_predicate of normal edge conatins predicate of critical edge,
>>> but generated gimplified statements are stored in their destination
>>> block fields. Additional argument 'convert_bool" is passed to
>>> add_to_dst_predicate_list and add_to_predicate_list.
>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>> equal to false.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_phi_disjoint_args): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>> iterator for predication of extended scalar phi's for insertion.
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>> predication to build mask.
>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>> (split_crit_edge): New function.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare). Invoke
>>> split_crit_edge for extended predication. Do loop versioning for
>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-06-25 14:06 [PATCH] Extended if-conversion for loops marked with pragma omp simd Yuri Rumyantsev
  2014-07-14 10:16 ` Yuri Rumyantsev
@ 2014-08-01  9:40 ` Richard Biener
  2014-08-15 12:02   ` Yuri Rumyantsev
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Biener @ 2014-08-01  9:40 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> We implemented additional support for pragma omp simd in part of
> extended if-conversion loops with such pragma. These extensions
> include:
>
> 1. All extensions are performed only if considered loop or its outer
>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>    loops behavior was not changed.
> 2. Took off cfg restriction on basic block which can have more than 2
>    predecessors.
> 3. Put additional restriction on phi nodes which was missed in current design:
>    all phi nodes must be in non-predicated basic block to conform
>    semantic of COND_EXPR which is used for transformation.

How is that so?  If the PHI is predicated then its result will be used
in a PHI node again and thus we'd create a sequence of COND_EXPRs.

No?

> 4. Extend predication of phi nodes: phi may have more than 2 arguments
> with some limitations:
>    - for phi nodes which have more than 2 arguments, but only two
>    arguments are different and one of them has the only occurence,
> transformation to  single COND_EXPR can be done.
>    - if phi node has more different arguments and all edge predicates
>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>    will be generated for it. In current design very simple check is used:
>    check starting from end that two edges correspondent to neighbor
> arguments have common predecessor which is used for further check
> with next edge.
>  These guarantee that phi predication will produce the correct result.

Btw, you can think of these extensions as unfactoring a PHI node by
inserting forwarder blocks.  Thus

   x = PHI <1(2), 1(3), 2(4)>

becomes

  bb 5: <forwarder-from(2)-and(3)>

  x = PHI <1(5), 2(4)>

and

  x = PHI <1(2), 2(3), 3(4)>

becomes

  bb 5:
  x' = PHI <1(2), 2(3)>

  b = PHI<x'(5), 3(4)>

which means that 3) has to work.  Note that we want this kind of
PHI transforms for out-of-SSA as well to reduce the number of
copies we need to insert on edges.

Thus it would be nice if you implemented 4) in terms of a pre-pass
over the force_vect loops PHI nodes, applying that CFG transform.
And make 3) work properly if it doesn't already.

It looks like you introduce a "negate predicate" to work around the
critical edge limitation?  Please instead change if-conversion to
work with edge predicates (as opposed to BB predicates).

Thanks,
Richard.

>
> Here is example of such extended predication (compile with -march=core-avx2):
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0 & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>   <bb 4>:
>   # res_15 = PHI <res_1(5), 0(3)>
>   # i_16 = PHI <i_11(5), 0(3)>
>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>   t_5 = a[i_16];
>   _6 = t_5 > 0.0;
>   _7 = t_5 < 9.9999998430674944e+16;
>   _8 = _7 & _6;
>   _ifc__28 = (unsigned int) _8;
>   _10 = &c[i_16];
>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>   _ifc__30 = (int) _ifc__29;
>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>   _ifc__33 = (int) _ifc__32;
>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>   res_1 = res_15 + _ifc__35;
>   i_11 = i_16 + 1;
>   ivtmp_14 = ivtmp_17 - 1;
>   if (ivtmp_14 != 0)
>     goto <bb 4>;
>
> Bootstrap and regression testing did not show any new failures.
>
> gcc/ChageLog
>
> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (flag_force_vectorize): New variable.
> (struct bb_predicate_s): Add negate_predicate field.
> (bb_negate_predicate): New function.
> (set_bb_negate_predicate): New function.
> (bb_copy_predicate): New function.
> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
> (init_bb_predicate): Add initialization of negate_predicate field.
> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
> (convert_name_to_cmp): New function.
> (get_type_for_cond): New function.
> (convert_bool_predicate): New function.
> (predicate_disjunction): New function.
> (predicate_conjunction): New function.
> (add_to_predicate_list): Add convert_bool argument.
> Add call of predicate_disjunction if convert_bool argument is true.
> (add_to_dst_predicate_list): Add convert_bool argument.
> Add early function exit if edge target block is always executed.
> Add call of predicate_conjunction if convert_bool argument is true.
> Pass convert_bool argument for add_to_predicate_list.
> (equal_phi_args): New function.
> (phi_has_two_different_args): New function.
> (phi_args_disjoint): New function.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> for loops marked with pragma omp simd. Add check that phi nodes are
> in non-predicated basic blocks.
> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> flag_force_vectorize was setup. Use call of all_edges_are_critical
> to reject block if-conversion with imcoming critical edges only if
> flag_force_vectorize was not setup.
> (walk_cond_tree): New function.
> (vect_bool_pattern_is_applicable): New function.
> (predicate_bbs): Add convert_bool argument that is used to transform
> comparison expressions of boolean type into conditional expressions
> with integral operands. If bool_conv argument is false or both
> outgoing edges are not critical old algorithm of predicate assignments
> is used, otherwise the following code was added: check on applicable
> of vect-bool-pattern recognition and trnasformation of
> (bool) x != 0  --> y = (int) x; x != 0;
> compute predicates for both outgoing edges one of which is critical
> one using 'normal' edge, i.e. compute true and false predicates using
> normal outgoing edge only; evaluated predicates are stored in
> predicate and negate_predicate fields of struct bb_predicate_s and
> negate_predicate of normal edge conatins predicate of critical edge,
> but generated gimplified statements are stored in their destination
> block fields. Additional argument 'convert_bool" is passed to
> add_to_dst_predicate_list and add_to_predicate_list.
> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
> equal to false.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (is_cond_scalar_reduction): Add 'extended' argument which signals that
> both phi arguments must be evaluated through phi_has_two_different_args.
> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_phi_disjoint_args): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add code to set-up gimple statement
> iterator for predication of extended scalar phi's for insertion.
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
> predication to build mask.
> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
> (split_crit_edge): New function.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare). Invoke
> split_crit_edge for extended predication. Do loop versioning for
> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-08-01  9:40 ` Richard Biener
@ 2014-08-15 12:02   ` Yuri Rumyantsev
  2014-09-08 11:03     ` Yuri Rumyantsev
  2014-09-08 13:10     ` Richard Biener
  0 siblings, 2 replies; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-08-15 12:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 13418 bytes --]

Richard!
Here is updated patch with the following changes:

1. Any restrictions on phi-function were eliminated for extended conversion.
2.  Put predicate for critical edges to 'aux' field of edge, i.e.
negate_predicate was deleted.
3. Deleted splitting of critical edges, i.e. both outgoing edges can
be critical.
4. Use notion of cd-equivalence to set-up predicate for join basic
blocks to simplify it.
5. I decided to not design pre-pass since it will lead generating
chain of cond expressions for phi-node if conversion, whereas for phi
of kind
  x = PHI <1(2), 1(3), 2(4)>
only one cond expression is required and this is considered as simple
optimization for arbitrary phi-function. More precise,
if phi-function have only two different arguments and one of them has
single occurrence, if- conversion is performed as if phi have only 2
arguments.
For arbitrary phi function a chain of cond expressions is produced.

Updated patch is attached.

Any comments will be appreciated.

2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(add_stmt_to_bb_predicate_gimplified_stmts): New function.
(init_bb_predicate): Add initialization of negate_predicate field.
(reset_bb_predicate): Reset negate_predicate to NULL_TREE.
(convert_name_to_cmp): New function.
(get_type_for_cond): New function.
(convert_bool_predicate): New function.
(predicate_disjunction): New function.
(predicate_conjunction): New function.
(add_to_predicate_list): Add convert_bool argument.
Use predicate of cd-equivalent block if convert_bool is true and
such bb exists; save it in static variable for further possible use.
Add call of predicate_disjunction if convert_bool argument is true.
(add_to_dst_predicate_list): Add convert_bool argument.
Add early function exit if edge target block is always executed.
Add call of predicate_conjunction if convert_bool argument is true.
Pass convert_bool argument for add_to_predicate_list.
Set-up predicate for crritical edge if convert_bool is true.
(equal_phi_args): New function.
(phi_has_two_different_args): New function.
(if_convertible_phi_p): Accept phi nodes with more than two args
if flag_force_vectorize wa set-up.
(ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
(if_convertible_stmt_p): Allow calls of function clones if
flag_force_vectorize was set-up.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
flag_force_vectorize was set-up. Use call of all_edges_are_critical
to reject block if-conversion with imcoming critical edges only if
flag_force_vectorize was not set-up.
(walk_cond_tree): New function.
(vect_bool_pattern_is_applicable): New function.
(predicate_bbs): Add convert_bool argument which is used to transform
comparison expressions of boolean type into conditional expressions
with integral operands. If convert_bool argument was set-up and
vect bool pattern can be appied perform the following transformation:
(bool) x != 0  --> y = (int) x; x != 0;
Add check that if fold_build2 produces bool conversion if convert_bool
was set-up, recompute predicate using build2_loc. Additional argument
'convert_bool" is passed to add_to_dst_predicate_list and
add_to_predicate_list.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
flag_force_vectorize was set-up to calculate cd equivalent bb's.
Call predicate_bbs with additional argument equal to false.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(is_cond_scalar_reduction): Add 'extended' argument which signals that
phi arguments must be evaluated through phi_has_two_different_args.
(predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_phi): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add code to set-up gimple statement
iterator for predication of extended scalar phi's for insertion.
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(predicate_mem_writes): Invoke convert_name_to_cmp for extended
predication to build mask.
(combine_blocks): Pass flag_force_vectorize to predicate_bbs.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd.

2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> We implemented additional support for pragma omp simd in part of
>> extended if-conversion loops with such pragma. These extensions
>> include:
>>
>> 1. All extensions are performed only if considered loop or its outer
>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>    loops behavior was not changed.
>> 2. Took off cfg restriction on basic block which can have more than 2
>>    predecessors.
>> 3. Put additional restriction on phi nodes which was missed in current design:
>>    all phi nodes must be in non-predicated basic block to conform
>>    semantic of COND_EXPR which is used for transformation.
>
> How is that so?  If the PHI is predicated then its result will be used
> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>
> No?
>
>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>> with some limitations:
>>    - for phi nodes which have more than 2 arguments, but only two
>>    arguments are different and one of them has the only occurence,
>> transformation to  single COND_EXPR can be done.
>>    - if phi node has more different arguments and all edge predicates
>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>    will be generated for it. In current design very simple check is used:
>>    check starting from end that two edges correspondent to neighbor
>> arguments have common predecessor which is used for further check
>> with next edge.
>>  These guarantee that phi predication will produce the correct result.
>
> Btw, you can think of these extensions as unfactoring a PHI node by
> inserting forwarder blocks.  Thus
>
>    x = PHI <1(2), 1(3), 2(4)>
>
> becomes
>
>   bb 5: <forwarder-from(2)-and(3)>
>
>   x = PHI <1(5), 2(4)>
>
> and
>
>   x = PHI <1(2), 2(3), 3(4)>
>
> becomes
>
>   bb 5:
>   x' = PHI <1(2), 2(3)>
>
>   b = PHI<x'(5), 3(4)>
>
> which means that 3) has to work.  Note that we want this kind of
> PHI transforms for out-of-SSA as well to reduce the number of
> copies we need to insert on edges.
>
> Thus it would be nice if you implemented 4) in terms of a pre-pass
> over the force_vect loops PHI nodes, applying that CFG transform.
> And make 3) work properly if it doesn't already.
>
> It looks like you introduce a "negate predicate" to work around the
> critical edge limitation?  Please instead change if-conversion to
> work with edge predicates (as opposed to BB predicates).
>
> Thanks,
> Richard.
>
>>
>> Here is example of such extended predication (compile with -march=core-avx2):
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0 & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>   <bb 4>:
>>   # res_15 = PHI <res_1(5), 0(3)>
>>   # i_16 = PHI <i_11(5), 0(3)>
>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>   t_5 = a[i_16];
>>   _6 = t_5 > 0.0;
>>   _7 = t_5 < 9.9999998430674944e+16;
>>   _8 = _7 & _6;
>>   _ifc__28 = (unsigned int) _8;
>>   _10 = &c[i_16];
>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>   _ifc__30 = (int) _ifc__29;
>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>   _ifc__33 = (int) _ifc__32;
>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>   res_1 = res_15 + _ifc__35;
>>   i_11 = i_16 + 1;
>>   ivtmp_14 = ivtmp_17 - 1;
>>   if (ivtmp_14 != 0)
>>     goto <bb 4>;
>>
>> Bootstrap and regression testing did not show any new failures.
>>
>> gcc/ChageLog
>>
>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (flag_force_vectorize): New variable.
>> (struct bb_predicate_s): Add negate_predicate field.
>> (bb_negate_predicate): New function.
>> (set_bb_negate_predicate): New function.
>> (bb_copy_predicate): New function.
>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>> (init_bb_predicate): Add initialization of negate_predicate field.
>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>> (convert_name_to_cmp): New function.
>> (get_type_for_cond): New function.
>> (convert_bool_predicate): New function.
>> (predicate_disjunction): New function.
>> (predicate_conjunction): New function.
>> (add_to_predicate_list): Add convert_bool argument.
>> Add call of predicate_disjunction if convert_bool argument is true.
>> (add_to_dst_predicate_list): Add convert_bool argument.
>> Add early function exit if edge target block is always executed.
>> Add call of predicate_conjunction if convert_bool argument is true.
>> Pass convert_bool argument for add_to_predicate_list.
>> (equal_phi_args): New function.
>> (phi_has_two_different_args): New function.
>> (phi_args_disjoint): New function.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> for loops marked with pragma omp simd. Add check that phi nodes are
>> in non-predicated basic blocks.
>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>> to reject block if-conversion with imcoming critical edges only if
>> flag_force_vectorize was not setup.
>> (walk_cond_tree): New function.
>> (vect_bool_pattern_is_applicable): New function.
>> (predicate_bbs): Add convert_bool argument that is used to transform
>> comparison expressions of boolean type into conditional expressions
>> with integral operands. If bool_conv argument is false or both
>> outgoing edges are not critical old algorithm of predicate assignments
>> is used, otherwise the following code was added: check on applicable
>> of vect-bool-pattern recognition and trnasformation of
>> (bool) x != 0  --> y = (int) x; x != 0;
>> compute predicates for both outgoing edges one of which is critical
>> one using 'normal' edge, i.e. compute true and false predicates using
>> normal outgoing edge only; evaluated predicates are stored in
>> predicate and negate_predicate fields of struct bb_predicate_s and
>> negate_predicate of normal edge conatins predicate of critical edge,
>> but generated gimplified statements are stored in their destination
>> block fields. Additional argument 'convert_bool" is passed to
>> add_to_dst_predicate_list and add_to_predicate_list.
>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>> equal to false.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>> both phi arguments must be evaluated through phi_has_two_different_args.
>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_phi_disjoint_args): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>> iterator for predication of extended scalar phi's for insertion.
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>> predication to build mask.
>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>> (split_crit_edge): New function.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare). Invoke
>> split_crit_edge for extended predication. Do loop versioning for
>> innermost loop marked with pragma omp simd.

[-- Attachment #2: patch.1 --]
[-- Type: application/octet-stream, Size: 40960 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 1f8ef03..b5ab12d
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -116,10 +116,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "expr.h"
 #include "optabs.h"
+#include "cgraph.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +153,16 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +174,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -188,6 +212,16 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
     (&(((bb_predicate_p) bb->aux)->predicate_gimplified_stmts), stmts);
 }
 
+/* Adds statement STMT to the sequence of statements
+   of the predicate for basic block BB.  */
+
+static inline void
+add_stmt_to_bb_predicate_gimplified_stmts (basic_block bb, gimple stmt)
+{
+  gimple_seq_add_stmt
+    (&(((bb_predicate_p) bb->aux)->predicate_gimplified_stmts), stmt);
+}
+
 /* Initializes to TRUE the predicate of basic block BB.  */
 
 static inline void
@@ -395,26 +429,231 @@ fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
   return build3 (COND_EXPR, type, cond, rhs, lhs);
 }
 
+/* Build <name> != 0 expression when COND is SSA_NAME of int type.  */
+
+static inline tree
+convert_name_to_cmp (tree cond)
+{
+  if (TREE_CODE (cond) != SSA_NAME)
+    return cond;
+  return build2 (NE_EXPR, boolean_type_node, cond,
+		 build_int_cst (TREE_TYPE (cond), 0));
+}
+
+/* Return integral type correspondent to types of condition COND.  */
+
+static inline tree
+get_type_for_cond (tree cond)
+{
+  tree opnd;
+  enum machine_mode mode;
+
+  gcc_assert (TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison);
+  opnd = TREE_OPERAND (cond, 0);
+  if (TREE_CODE (opnd) != SSA_NAME)
+    opnd = TREE_OPERAND (cond, 1);
+  if (TREE_CODE (TREE_TYPE (opnd)) == INTEGER_TYPE)
+    return TREE_TYPE (opnd);
+  mode = TYPE_MODE (TREE_TYPE (opnd));
+  return build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
+}
+
+/* Converts bool predicate COND to cond_expr:
+   cond1 = (cond)? 1: 0,  if OP is NULL_TREE, or
+   cond1 = (cond)? op : 0 otherwise.
+   Returns lhs of created assignment.  */
+
+static tree
+convert_bool_predicate (tree cond, basic_block bb, tree op)
+{
+  gimple stmt;
+  tree lhs;
+  tree itype;
+
+  if (TREE_CODE (TREE_TYPE (cond)) != BOOLEAN_TYPE)
+    /* Predicate has been promoted to int.  */
+    return cond;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      tree c1 = TREE_OPERAND (cond, 0);
+
+      if (TREE_CODE (c1) == SSA_NAME)
+	cond = build2 (EQ_EXPR, boolean_type_node, c1,
+		       build_int_cst (boolean_type_node, 0));
+      else
+	{
+	  tree type = TREE_TYPE (TREE_OPERAND (c1, 0));
+          enum tree_code code;
+	  gcc_assert (TREE_CODE_CLASS (TREE_CODE (c1)) == tcc_comparison);
+	  code = invert_tree_comparison (TREE_CODE (c1),
+					 HONOR_NANS (TYPE_MODE (type)));
+	  cond = build2 (code, boolean_type_node, TREE_OPERAND (c1, 0),
+			 TREE_OPERAND (c1, 1));
+	}
+    }
+  else
+    cond = convert_name_to_cmp (cond);
+
+  itype = get_type_for_cond (cond);
+  if (op != NULL_TREE && !types_compatible_p (itype, TREE_TYPE (op)))
+    {
+      tree  new_temp = make_temp_ssa_name (itype, NULL, "_ifc_");
+      stmt = gimple_build_assign_with_ops (NOP_EXPR, new_temp, op, NULL_TREE);
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+      op = new_temp;
+    }
+  stmt = gimple_build_assign_with_ops
+	   (COND_EXPR,
+	    (lhs = make_temp_ssa_name (itype, NULL, "_ifc_")),
+	    cond,
+	    op == NULL_TREE ? build_one_cst (itype) : op,
+	    build_zero_cst (itype));
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Convert bool predicate: new stmt is created\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+  return lhs;
+}
+
+/* Creates new BB predicate PRD = PRD1 | PRD2, where PRD1 is old BB predicate
+   converted to int type and PRD2 is NC converted to int.  */
+
+static void
+predicate_disjunction (basic_block bb, tree nc)
+{
+  tree p1, p2;
+  gimple stmt;
+  tree lhs;
+  tree itype;
+
+  gcc_assert (flag_force_vectorize);
+  p1 = convert_bool_predicate (bb_predicate (bb), bb, NULL_TREE);
+  p2 = convert_bool_predicate (nc, bb, NULL_TREE);
+  if (!types_compatible_p (TREE_TYPE (p1), TREE_TYPE (p2)))
+    {
+      if (TYPE_PRECISION (TREE_TYPE (p1)) < TYPE_PRECISION (TREE_TYPE (p2)))
+	{
+	  itype = TREE_TYPE (p1);
+	  tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+	  stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, p2, NULL_TREE);
+	  p2 = tmp;
+	}
+      else
+	{
+	  itype = TREE_TYPE (p2);
+	  tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+          stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, p1, NULL_TREE);
+	  p1 = tmp;
+	}
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+    }
+  else
+    itype = TREE_TYPE (p1);
+  lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+  stmt = gimple_build_assign_with_ops (BIT_IOR_EXPR, lhs, p1, p2);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Create BIT IOR stmt\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+  set_bb_predicate (bb, lhs);
+}
+
+/* Returns new predicate PRD = PRD1 & PRD2, which are converted to int.  */
+
+static tree
+predicate_conjunction (basic_block bb, tree prd1, tree prd2)
+{
+  tree p1, p2;
+  gimple stmt;
+  tree itype;
+
+  gcc_assert (flag_force_vectorize);
+  p1 = convert_bool_predicate (prd1, bb, NULL_TREE);
+  /* Optimize p1 & (prd2? 1 : 0) into (prd2)? p1 : 0.  */
+  p2 = convert_bool_predicate (prd2, bb, p1);
+  if (p2 == prd2)
+    {
+      /* Need to create explicit AND stmt.  */
+      itype = TREE_TYPE (p1);
+      if (!types_compatible_p (itype, TREE_TYPE (p2)))
+	{
+	  if (TYPE_PRECISION (itype) < TYPE_PRECISION (TREE_TYPE (p2)))
+	    {
+	      tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+	      stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp,
+						   p2, NULL_TREE);
+	      p2 = tmp;
+	    }
+	  else
+	    {
+	      itype = TREE_TYPE (p2);
+	      tree tmp = make_temp_ssa_name (itype, NULL, "_ifc_");
+              stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp,
+						   p1, NULL_TREE);
+	      p1 = tmp;
+	    }
+	  add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+
+	}
+      tree lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+      stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, lhs, p1, p2);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create BIT AND stmt.\n");
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      add_stmt_to_bb_predicate_gimplified_stmts (bb, stmt);
+      return lhs;
+    }
+  return p2;
+}
+
 /* Add condition NC to the predicate list of basic block BB.  LOOP is
    the loop to be if-converted.  */
 
-static inline void
-add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
+static void
+add_to_predicate_list (struct loop *loop, basic_block bb,
+		       tree nc, bool convert_bool)
 {
   tree bc, *tp;
+  basic_block dom_bb;
+  static basic_block join_bb = NULL;
 
   if (is_true_predicate (nc))
     return;
 
-  if (!is_predicated (bb))
+  /* If dominance tells us this basic block is always executed,
+     don't record any predicates for it.  */
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+    return;
+
+  if (convert_bool)
     {
-      /* If dominance tells us this basic block is always executed, don't
-	 record any predicates for it.  */
-      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+      /* If predicate has been already set up for it using immediate
+	 dominator simply escape.  */
+      if (join_bb == bb)
 	return;
-
-      bc = nc;
+      dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      /* We use notion of cd equivalence.  */
+      if (dom_bb != loop->header
+	  && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
+	{
+	  gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
+	  bc = bb_predicate (dom_bb);
+	  gcc_assert (!is_true_predicate (bc));
+	  set_bb_predicate (bb, bc);
+ 
+	  /* Save bb in join_bb to not handle it once more.  */
+	  join_bb = bb;
+	  return;
+	}
     }
+  if (!is_predicated (bb))
+    bc = nc;
   else
     {
       bc = bb_predicate (bb);
@@ -424,6 +663,13 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 	  reset_bb_predicate (bb);
 	  return;
 	}
+      /* If CONVERT_BOOL is true create new predicate which is disjunction of
+	 old BB predicate and NC.  */
+      if (convert_bool)
+	{
+	  predicate_disjunction (bb, nc);
+	  return;
+	}
     }
 
   /* Allow a TRUTH_NOT_EXPR around the main predicate.  */
@@ -446,19 +692,31 @@ add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
 
 static void
 add_to_dst_predicate_list (struct loop *loop, edge e,
-			   tree prev_cond, tree cond)
+			   tree prev_cond, tree cond,
+			   bool convert_bool)
 {
   if (!flow_bb_inside_loop_p (loop, e->dest))
     return;
-
   if (!is_true_predicate (prev_cond))
-    cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
-			prev_cond, cond);
+    {
+      /* If CONVERT_BOOL is true new predicate is created
+	 PRD = PRD_1 & PRD_2 where rhs predicates are converted
+	 to conditional expressions.  */
+      if (convert_bool)
+	cond = predicate_conjunction (e->dest, prev_cond, cond);
+      else
+	cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+			    prev_cond, cond);
+    }
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond, convert_bool);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  /* If edge E is critical save predicate on it.  */
+  if (convert_bool && EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -473,6 +731,78 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
   return false;
 }
 
+/* Returns true if both arguments of phi node are equal.  */
+
+static inline bool
+equal_phi_args (tree c1, tree c2)
+{
+  if (TREE_CODE (c1) != TREE_CODE (c2))
+    return false;
+  if (TREE_CODE (c1) == SSA_NAME)
+    return c1 == c2;
+  return (operand_equal_p (c1, c2, 0) != 0);
+}
+
+/* Returns true if phi arguments are equal except for one; argument values and
+   index of exclusive argument are saved if needed.  */
+
+static bool
+phi_has_two_different_args (gimple phi, tree *arg_0, tree *arg_1,
+			    unsigned int *index)
+{
+  unsigned int i, ind0 = 0, ind1;
+  tree arg0, arg1 = NULL_TREE;
+  bool seen_same = false;
+
+  arg0 = gimple_phi_arg_def (phi, 0);
+  for (i = 1; i < gimple_phi_num_args (phi); i++)
+    {
+      tree tmp;
+      tmp = gimple_phi_arg_def (phi, i);
+      if (arg0 == NULL_TREE
+	  && !equal_phi_args (tmp, arg1))
+	{
+	  arg0 = tmp;
+	  ind0 = i;
+	}
+      else if (seen_same && equal_phi_args (tmp, arg1))
+	continue;
+      else if (!equal_phi_args (tmp, arg0))
+	{
+	  if (arg1 == NULL_TREE)
+	    {
+	      arg1 = tmp;
+	      ind1 = i;
+	    }
+	  else if (!equal_phi_args (tmp, arg1))
+	    return false;
+	  else
+	    seen_same = true;
+	}
+      else if (!seen_same)
+	{
+	  /* Swap arguments.  */
+	  seen_same = true;
+	  arg0 = arg1;
+	  arg1 = tmp;
+	  ind0 = ind1;
+	}
+      else
+	return false;
+    }
+  if (arg0 == NULL_TREE)
+    return false;
+
+  if (arg_0)
+    *arg_0 = arg0;
+  if (arg_1)
+    *arg_1 = arg1;
+  if (index)
+    *index = ind0;
+
+  return true;
+}
+
 /* Return true when PHI is if-convertible.  PHI is part of loop LOOP
    and it belongs to basic block BB.
 
@@ -482,7 +812,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */   
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -494,11 +826,18 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -728,7 +1067,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -865,7 +1204,9 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is intrinsic call or call of function marked with
+     pragma omp declare simd.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -894,6 +1235,13 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
 		   so restrict if-conversion to those.  */
 		&& DECL_BUILT_IN (fndecl))
 	      return true;
+	    if (flag_force_vectorize)
+	      {
+		struct cgraph_node *node = cgraph_node::get (fndecl);
+		if (node != NULL && node->simd_clones != NULL)
+		/* Accept #pragma omp declare simd functions.  */
+		  return true;
+	      }
 	  }
 	return false;
       }
@@ -912,6 +1260,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 2 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -920,6 +1284,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -930,11 +1296,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   edge_iterator ei;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "----------[%d]-------------\n", bb->index);
+    fprintf (dump_file, "----------[%d]-----[%d]--------\n", bb->index, EDGE_COUNT (bb->preds));
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
 
   if (exit_bb)
     {
@@ -971,18 +1341,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_edges_are_critical (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1064,6 +1433,88 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+/* Helper function of vect_bool_pattern_is_applicable. Called recursively.
+   Returns true if given pattern can be applied. Calculate and save min type
+   precision of comparison operands in PREC.  */
+
+static bool
+walk_cond_tree (tree var, int *prec)
+{
+  gimple def_stmt;
+  enum tree_code rhs_code;
+  tree rhs1;
+
+  if (TREE_CODE (var) != SSA_NAME)
+    return false;
+  def_stmt = SSA_NAME_DEF_STMT (var);
+  if (!is_gimple_assign (def_stmt))
+    return false;
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  rhs_code = gimple_assign_rhs_code (def_stmt);
+  switch (rhs_code)
+    {
+    case SSA_NAME:
+    case BIT_NOT_EXPR:
+      return walk_cond_tree (rhs1, prec);
+
+    CASE_CONVERT:
+      if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+	   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+	return false;
+      return walk_cond_tree (rhs1, prec);
+
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      if (!walk_cond_tree (rhs1, prec))
+	return false;
+      return walk_cond_tree (gimple_assign_rhs2 (def_stmt), prec);
+
+    default:
+      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	{
+	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
+	    {
+	      enum machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+	      *prec = MIN (*prec, GET_MODE_BITSIZE (mode));
+	    }
+	  else
+	    *prec = MIN (*prec, TYPE_PRECISION (TREE_TYPE (rhs1)));
+	  return true;
+	}
+      return false;
+    }
+}
+
+/* Returns true if condition in STMT is presented by
+   name != false and name has boolean type.
+   Assumes that STMT is GIMPLE_COND and its condition is presented
+   by conjunction/disjunction of comparisons - walk_cond_tree is called
+   to check it. Later gimple condition OP0 will be promoted into int
+   type with precision 'prec' for vect_bool_pattern recognition.  */
+
+static inline bool
+vect_bool_pattern_is_applicable (gimple stmt, int *prec)
+{
+  tree op0, op1;
+  enum tree_code code;
+
+  op0 = gimple_cond_lhs (stmt);
+  op1 = gimple_cond_rhs (stmt);
+  code = gimple_cond_code (stmt);
+
+  if (TREE_CODE (TREE_TYPE (op0)) != BOOLEAN_TYPE)
+    return false;
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+  if (!integer_zerop (op1))
+    return false;
+  /* Init prec to max value.  */
+  *prec = 1024;
+  return walk_cond_tree (op0, prec);
+}
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1080,10 +1531,14 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
    |   S2;
 
    S1 will be predicated with "x", and
-   S2 will be predicated with "!x".  */
+   S2 will be predicated with "!x".
+
+   CONVERT_BOOL argument was added to convert bool predicate computations
+   which is not supported by vectorizer to int type through creating of
+   conditional expressions.  */
 
 static void
-predicate_bbs (loop_p loop)
+predicate_bbs (loop_p loop, bool convert_bool)
 {
   unsigned int i;
 
@@ -1096,9 +1551,10 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
 	  reset_bb_predicate (loop->latch);
 	  continue;
@@ -1108,27 +1564,63 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
-
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+	  int prec;
 
+	  /* Compute predicates for true and false edges.  */
+	  if (!(convert_bool
+		&& vect_bool_pattern_is_applicable (stmt, &prec)))
+	    {
+	      c = fold_build2_loc (loc, code,
+				   boolean_type_node,
+				   lopnd,
+				   gimple_cond_rhs (stmt));
+	      /* Fold_build2 can produce bool conversion which is not
+                 supported by vectorizer, so re-build it without folding.  */
+	      if (convert_bool && CONVERT_EXPR_P (c)
+		  && TREE_CODE_CLASS (code) == tcc_comparison)
+		c = build2_loc (loc, code, boolean_type_node,
+				lopnd, gimple_cond_rhs (stmt));
+	      c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			       unshare_expr (c));
+	    }
+	  else
+	    {
+	      /* Convert bool predicate to int - to apply vectorization
+		 bool pattern recognition.  */
+	      tree itype = build_nonstandard_integer_type (prec, 1);
+	      tree lhs = make_temp_ssa_name (itype, NULL, "_ifc_");
+	      enum tree_code code = gimple_cond_code (stmt);
+	      enum tree_code inv_code = invert_tree_comparison (code, false);
+	      /* Create convert expression.  */
+	      gimple conv = gimple_build_assign_with_ops
+			      (CONVERT_EXPR,
+			       lhs,
+			       lopnd,
+			       NULL_TREE);
+	      gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+	      update_stmt (conv);
+	      /* Insert new convert stmt before last stmt.  */
+	      gsi_insert_before (&gsi, conv, GSI_SAME_STMT);
+	      c = build2_loc (loc, code, boolean_type_node, lhs,
+			      build_zero_cst (itype));
+	      c2 = build2_loc (loc, inv_code, boolean_type_node, lhs,
+			       build_zero_cst (itype));
+	    }
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+	  if (convert_bool)
+	    true_edge->aux = false_edge->aux = NULL;
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
-				     unshare_expr (c));
+				     unshare_expr (c), convert_bool);
 
 	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
-	  add_to_dst_predicate_list (loop, false_edge,
-				     unshare_expr (cond), c2);
+	  add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
+				     unshare_expr (c2), convert_bool);
 
 	  cond = NULL_TREE;
 	}
@@ -1145,7 +1637,7 @@ predicate_bbs (loop_p loop)
 	  if (cond == NULL_TREE)
 	    cond = boolean_true_node;
 
-	  add_to_predicate_list (loop, bb_n, cond);
+	  add_to_predicate_list (loop, bb_n, cond, convert_bool);
 	}
     }
 
@@ -1176,6 +1668,8 @@ if_convertible_loop_p_1 (struct loop *loop,
     return false;
 
   calculate_dominance_info (CDI_DOMINATORS);
+  if (flag_force_vectorize)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
   ifc_bbs = get_loop_body_in_if_conv_order (loop);
@@ -1226,7 +1720,7 @@ if_convertible_loop_p_1 (struct loop *loop,
 	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
 	  DR_RW_UNCONDITIONALLY (dr) = -1;
 	}
-      predicate_bbs (loop);
+      predicate_bbs (loop, false);
     }
 
   for (i = 0; i < loop->num_nodes; i++)
@@ -1337,7 +1831,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1346,44 +1842,49 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
+  if (EDGE_COUNT (bb->preds) == 2
+      && !all_edges_are_critical (bb))
     {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+
+      /* Prefer an edge with a not negated predicate.
+         ???  That's a very weak cost model.  */
+      tmp_cond = bb_predicate (first_edge->src);
+      gcc_assert (tmp_cond);
+      if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge;
 
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
+	  tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
 
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
+      /* Check if the edge we take the condition from is not critical.
+         We know that at least one non-critical edge exists.  */
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  *cond = bb_predicate (second_edge->src);
+	  gcc_assert (EDGE_COUNT (second_edge->src->succs) == 1);
+	  if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
+	    *cond = TREE_OPERAND (*cond, 0);
+	  else
+	    /* Select non loop header bb.  */
+	    first_edge = second_edge;
+	}
       else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
+	*cond = bb_predicate (first_edge->src);
 
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
+				          is_gimple_condexpr, NULL_TREE,
+				          true, GSI_SAME_STMT);
 
-  return first_edge->src;
+      return first_edge->src;
+    }
+  gcc_assert (flag_force_vectorize);
+  return NULL;
 }
 
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
@@ -1400,7 +1901,7 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 
 static bool
 is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
   tree arg_0, arg_1;
@@ -1413,8 +1914,11 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   imm_use_iterator imm_iter;
   use_operand_p use_p;
 
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  if (extended)
+    phi_has_two_different_args (phi, &arg_0, &arg_1, NULL);
+  else
+    arg_0 = PHI_ARG_DEF (phi, 0);
+    arg_1 = PHI_ARG_DEF (phi, 1);
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
@@ -1578,7 +2082,7 @@ predicate_scalar_phi (gimple phi, tree cond,
     return;
 
   bb = gimple_bb (phi);
-
+  cond = convert_name_to_cmp (cond);
   if ((arg = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
@@ -1603,7 +2107,7 @@ predicate_scalar_phi (gimple phi, tree cond,
 	  arg_0 = gimple_phi_arg_def (phi, 0);
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
@@ -1624,6 +2128,219 @@ predicate_scalar_phi (gimple phi, tree cond,
     }
 }
 
+/* Returns predicate under which edge is taken.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1)
+    /* Use predicate of src basic block if it has the only successor.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+
+  return convert_name_to_cmp (c);
+}
+
+/* Returns insertion point for predicated phi node:
+   We distinguish 3 different cases to preserve use-def chains:
+   - bb contains only stmt's computing predicates, returns value is NULL
+     and *BEFORE is false, must insert after last stmt;
+   - bb is empty, returns NULL and *BEFORE is true, must insert before
+     first non-label stmt;
+   - bb contains both predicate computations and original stmts, must
+     insert before first original stmt.  */
+
+static gimple
+find_insertion_point (basic_block bb, bool *before)
+{
+  gimple_stmt_iterator gsi;
+  gimple stmt = NULL;
+  tree lhs;
+  bool seen_temps = false;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_code (stmt) == GIMPLE_LABEL)
+	continue;
+      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	break;
+      lhs = gimple_assign_lhs (stmt);
+      if (TREE_CODE (lhs) != SSA_NAME)
+	break;
+      if (SSA_NAME_VAR (lhs) != NULL)
+	break;
+      lhs = SSA_NAME_IDENTIFIER (lhs);
+      if (!lhs)
+	break;
+      if (strncmp (IDENTIFIER_POINTER (lhs), "_ifc_", 5) == 0)
+	{
+	  seen_temps = true;
+	  continue;
+	}
+    }
+  if (gsi_end_p (gsi))
+    {
+      if (seen_temps)
+	/* Must insert after last stmt in bb.  */
+	*before = false;
+      else
+	/* BB is empty.  */
+	*before = true;
+      return NULL;
+    }
+
+  return stmt;
+}
+
+/* This is enhancement for predication of a phi node with arbitrary
+   number of arguments, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   a chain of recurrent cond expressions will be produced.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point.
+    Returns cond expression correspondent to rhs of new phi
+    replacement stmt.  */
+
+static tree
+predicate_arbitrary_phi (gimple phi, gimple_stmt_iterator *gsi,
+			     bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  gimple stmt;
+  tree lhs;
+  tree cond;
+
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, i));
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  stmt = gimple_build_assign_with_ops (COND_EXPR, lhs,
+					       unshare_expr (cond),
+					       last, curr);
+	}
+      else
+	stmt = gimple_build_assign_with_ops (COND_EXPR, lhs,
+					     unshare_expr (cond), curr, last);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, 0));
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    return fold_build_cond_expr (type,
+				 unshare_expr (TREE_OPERAND (cond, 0)),
+				 last,
+				 curr);
+  return fold_build_cond_expr (type, unshare_expr (cond), curr, last);
+}
+
+/* Replace scalar phi node with more than 2 arguments to cond expression.  */
+
+static void
+predicate_extended_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+			       bool before)
+{
+  gimple new_stmt, reduc;
+  tree rhs, res, arg0, arg1, op0, op1;
+  tree cond;
+  unsigned int index0;
+  edge e;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  if (!phi_has_two_different_args (phi, &arg0, &arg1, &index0))
+      rhs = predicate_arbitrary_phi (phi, gsi, before);
+  else
+    {
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+
+      if (!(is_cond_scalar_reduction (phi, &reduc, &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1, swap);
+    }
+  new_stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
+  update_stmt (new_stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new ext. phi replacement stmt\n");
+      print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1633,6 +2350,8 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  gimple stmt;
+  bool before = true;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1653,11 +2372,25 @@ predicate_all_scalar_phis (struct loop *loop)
 	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
       true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      if (!true_bb)
+	{
+	  /* Must use extended PHI predication; find out insertion point
+	     for unconditional PHI node evaluations.  */
+	  before = true;
+	  stmt = find_insertion_point (bb, &before);
+	  if (stmt != NULL)
+	    gsi = gsi_for_stmt (stmt);
+	  else if (!before)
+	    gsi = gsi_last_bb (bb);
+	}
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (true_bb)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_extended_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1679,7 +2412,7 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1692,7 +2425,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || flag_force_vectorize)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1869,9 +2603,12 @@ predicate_mem_writes (loop_p loop)
 	    addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
 					     true, NULL_TREE, true,
 					     GSI_SAME_STMT);
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
+	    if (flag_force_vectorize)
+	      cond = convert_name_to_cmp (cond);
+	    else
+	      cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+						 is_gimple_condexpr, NULL_TREE,
+						 true, GSI_SAME_STMT);
 	    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
 					 mask_op0, mask_op1);
 	    mask = ifc_temp_var (masktype, mask, &gsi);
@@ -1907,9 +2644,12 @@ predicate_mem_writes (loop_p loop)
 		lhs = rhs;
 		rhs = tem;
 	      }
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
+	    if (flag_force_vectorize)
+	      cond = convert_name_to_cmp (cond);
+	    else
+	      cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+						 is_gimple_condexpr, NULL_TREE,
+						 true, GSI_SAME_STMT);
 	    rhs = fold_build_cond_expr (type, unshare_expr (cond), rhs, lhs);
 	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
 	    update_stmt (stmt);
@@ -1971,7 +2711,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
   edge e;
   edge_iterator ei;
 
-  predicate_bbs (loop);
+  predicate_bbs (loop, flag_force_vectorize);
   remove_conditions_and_labels (loop);
   insert_gimplified_predicates (loop, any_mask_load_store);
   predicate_all_scalar_phis (loop);
@@ -2102,6 +2842,7 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2113,6 +2854,15 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  flag_force_vectorize = true /*loop->force_vectorize */;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!flag_force_vectorize)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	flag_force_vectorize = true;
+    }
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2122,7 +2872,8 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store || loop->force_vectorize)
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2143,7 +2894,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-08-15 12:02   ` Yuri Rumyantsev
@ 2014-09-08 11:03     ` Yuri Rumyantsev
  2014-09-08 13:10     ` Richard Biener
  1 sibling, 0 replies; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-09-08 11:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Did you have a chance to look at this?

Thanks.

2014-08-15 16:02 GMT+04:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard!
> Here is updated patch with the following changes:
>
> 1. Any restrictions on phi-function were eliminated for extended conversion.
> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
> negate_predicate was deleted.
> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
> be critical.
> 4. Use notion of cd-equivalence to set-up predicate for join basic
> blocks to simplify it.
> 5. I decided to not design pre-pass since it will lead generating
> chain of cond expressions for phi-node if conversion, whereas for phi
> of kind
>   x = PHI <1(2), 1(3), 2(4)>
> only one cond expression is required and this is considered as simple
> optimization for arbitrary phi-function. More precise,
> if phi-function have only two different arguments and one of them has
> single occurrence, if- conversion is performed as if phi have only 2
> arguments.
> For arbitrary phi function a chain of cond expressions is produced.
>
> Updated patch is attached.
>
> Any comments will be appreciated.
>
> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
> (init_bb_predicate): Add initialization of negate_predicate field.
> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
> (convert_name_to_cmp): New function.
> (get_type_for_cond): New function.
> (convert_bool_predicate): New function.
> (predicate_disjunction): New function.
> (predicate_conjunction): New function.
> (add_to_predicate_list): Add convert_bool argument.
> Use predicate of cd-equivalent block if convert_bool is true and
> such bb exists; save it in static variable for further possible use.
> Add call of predicate_disjunction if convert_bool argument is true.
> (add_to_dst_predicate_list): Add convert_bool argument.
> Add early function exit if edge target block is always executed.
> Add call of predicate_conjunction if convert_bool argument is true.
> Pass convert_bool argument for add_to_predicate_list.
> Set-up predicate for crritical edge if convert_bool is true.
> (equal_phi_args): New function.
> (phi_has_two_different_args): New function.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if flag_force_vectorize wa set-up.
> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
> (if_convertible_stmt_p): Allow calls of function clones if
> flag_force_vectorize was set-up.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> flag_force_vectorize was set-up. Use call of all_edges_are_critical
> to reject block if-conversion with imcoming critical edges only if
> flag_force_vectorize was not set-up.
> (walk_cond_tree): New function.
> (vect_bool_pattern_is_applicable): New function.
> (predicate_bbs): Add convert_bool argument which is used to transform
> comparison expressions of boolean type into conditional expressions
> with integral operands. If convert_bool argument was set-up and
> vect bool pattern can be appied perform the following transformation:
> (bool) x != 0  --> y = (int) x; x != 0;
> Add check that if fold_build2 produces bool conversion if convert_bool
> was set-up, recompute predicate using build2_loc. Additional argument
> 'convert_bool" is passed to add_to_dst_predicate_list and
> add_to_predicate_list.
> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
> flag_force_vectorize was set-up to calculate cd equivalent bb's.
> Call predicate_bbs with additional argument equal to false.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (is_cond_scalar_reduction): Add 'extended' argument which signals that
> phi arguments must be evaluated through phi_has_two_different_args.
> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_arbitrary_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add code to set-up gimple statement
> iterator for predication of extended scalar phi's for insertion.
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
> predication to build mask.
> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare).Do loop versioning
> for innermost loop marked with pragma omp simd.
>
> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> We implemented additional support for pragma omp simd in part of
>>> extended if-conversion loops with such pragma. These extensions
>>> include:
>>>
>>> 1. All extensions are performed only if considered loop or its outer
>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>    loops behavior was not changed.
>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>    predecessors.
>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>    all phi nodes must be in non-predicated basic block to conform
>>>    semantic of COND_EXPR which is used for transformation.
>>
>> How is that so?  If the PHI is predicated then its result will be used
>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>
>> No?
>>
>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>> with some limitations:
>>>    - for phi nodes which have more than 2 arguments, but only two
>>>    arguments are different and one of them has the only occurence,
>>> transformation to  single COND_EXPR can be done.
>>>    - if phi node has more different arguments and all edge predicates
>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>    will be generated for it. In current design very simple check is used:
>>>    check starting from end that two edges correspondent to neighbor
>>> arguments have common predecessor which is used for further check
>>> with next edge.
>>>  These guarantee that phi predication will produce the correct result.
>>
>> Btw, you can think of these extensions as unfactoring a PHI node by
>> inserting forwarder blocks.  Thus
>>
>>    x = PHI <1(2), 1(3), 2(4)>
>>
>> becomes
>>
>>   bb 5: <forwarder-from(2)-and(3)>
>>
>>   x = PHI <1(5), 2(4)>
>>
>> and
>>
>>   x = PHI <1(2), 2(3), 3(4)>
>>
>> becomes
>>
>>   bb 5:
>>   x' = PHI <1(2), 2(3)>
>>
>>   b = PHI<x'(5), 3(4)>
>>
>> which means that 3) has to work.  Note that we want this kind of
>> PHI transforms for out-of-SSA as well to reduce the number of
>> copies we need to insert on edges.
>>
>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>> over the force_vect loops PHI nodes, applying that CFG transform.
>> And make 3) work properly if it doesn't already.
>>
>> It looks like you introduce a "negate predicate" to work around the
>> critical edge limitation?  Please instead change if-conversion to
>> work with edge predicates (as opposed to BB predicates).
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Here is example of such extended predication (compile with -march=core-avx2):
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0 & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>   <bb 4>:
>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>   t_5 = a[i_16];
>>>   _6 = t_5 > 0.0;
>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>   _8 = _7 & _6;
>>>   _ifc__28 = (unsigned int) _8;
>>>   _10 = &c[i_16];
>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__30 = (int) _ifc__29;
>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__33 = (int) _ifc__32;
>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>   res_1 = res_15 + _ifc__35;
>>>   i_11 = i_16 + 1;
>>>   ivtmp_14 = ivtmp_17 - 1;
>>>   if (ivtmp_14 != 0)
>>>     goto <bb 4>;
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> gcc/ChageLog
>>>
>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>> (struct bb_predicate_s): Add negate_predicate field.
>>> (bb_negate_predicate): New function.
>>> (set_bb_negate_predicate): New function.
>>> (bb_copy_predicate): New function.
>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>> (convert_name_to_cmp): New function.
>>> (get_type_for_cond): New function.
>>> (convert_bool_predicate): New function.
>>> (predicate_disjunction): New function.
>>> (predicate_conjunction): New function.
>>> (add_to_predicate_list): Add convert_bool argument.
>>> Add call of predicate_disjunction if convert_bool argument is true.
>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>> Add early function exit if edge target block is always executed.
>>> Add call of predicate_conjunction if convert_bool argument is true.
>>> Pass convert_bool argument for add_to_predicate_list.
>>> (equal_phi_args): New function.
>>> (phi_has_two_different_args): New function.
>>> (phi_args_disjoint): New function.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>> in non-predicated basic blocks.
>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>> to reject block if-conversion with imcoming critical edges only if
>>> flag_force_vectorize was not setup.
>>> (walk_cond_tree): New function.
>>> (vect_bool_pattern_is_applicable): New function.
>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>> comparison expressions of boolean type into conditional expressions
>>> with integral operands. If bool_conv argument is false or both
>>> outgoing edges are not critical old algorithm of predicate assignments
>>> is used, otherwise the following code was added: check on applicable
>>> of vect-bool-pattern recognition and trnasformation of
>>> (bool) x != 0  --> y = (int) x; x != 0;
>>> compute predicates for both outgoing edges one of which is critical
>>> one using 'normal' edge, i.e. compute true and false predicates using
>>> normal outgoing edge only; evaluated predicates are stored in
>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>> negate_predicate of normal edge conatins predicate of critical edge,
>>> but generated gimplified statements are stored in their destination
>>> block fields. Additional argument 'convert_bool" is passed to
>>> add_to_dst_predicate_list and add_to_predicate_list.
>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>> equal to false.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_phi_disjoint_args): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>> iterator for predication of extended scalar phi's for insertion.
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>> predication to build mask.
>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>> (split_crit_edge): New function.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare). Invoke
>>> split_crit_edge for extended predication. Do loop versioning for
>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-08-15 12:02   ` Yuri Rumyantsev
  2014-09-08 11:03     ` Yuri Rumyantsev
@ 2014-09-08 13:10     ` Richard Biener
  2014-09-22  8:28       ` Yuri Rumyantsev
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Biener @ 2014-09-08 13:10 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard!
> Here is updated patch with the following changes:
>
> 1. Any restrictions on phi-function were eliminated for extended conversion.
> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
> negate_predicate was deleted.
> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
> be critical.
> 4. Use notion of cd-equivalence to set-up predicate for join basic
> blocks to simplify it.
> 5. I decided to not design pre-pass since it will lead generating
> chain of cond expressions for phi-node if conversion, whereas for phi
> of kind
>   x = PHI <1(2), 1(3), 2(4)>
> only one cond expression is required and this is considered as simple
> optimization for arbitrary phi-function. More precise,
> if phi-function have only two different arguments and one of them has
> single occurrence, if- conversion is performed as if phi have only 2
> arguments.
> For arbitrary phi function a chain of cond expressions is produced.
>
> Updated patch is attached.
>
> Any comments will be appreciated.

The patch is still very big and does multiple things at once which makes
it hard to review.

In addition to that it changes function singatures without updating
the function comments.  For example what is the convert_bool
argument doing to add_to_dst_predicate_list?  Why do we need
all this added logic.

You duplicate operand_equal_for_phi_arg_p.

I think the code handling PHIs with more than two operands but
only two unequal operands is useful generally, so that's an obvious
candidate for splitting out into a separate patch.

+   CONVERT_BOOL argument was added to convert bool predicate computations
+   which is not supported by vectorizer to int type through creating of
+   conditional expressions.  */

Example?  The vectorizer has patterns for bool predicate computations.
This seems to be another feature that needs splitting out.

The way you get around the critical edge parts looks awkward to me.
Please either do _all_ predicates as edge predicates or simply
split critical edges (of the respective loop body).

I still think that an utility doing same PHI arg merging by introducing
forwarder blocks would be nicer to have.

I'd restructure the main tree_if_conversion function to apply these
CFG pre-transforms when we are going to version the loop
for if conversion (eventually transitioning to always doing that).

So - please split up the patch.  It's way too big.

Thanks,
Richard.

> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
> (init_bb_predicate): Add initialization of negate_predicate field.
> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
> (convert_name_to_cmp): New function.
> (get_type_for_cond): New function.
> (convert_bool_predicate): New function.
> (predicate_disjunction): New function.
> (predicate_conjunction): New function.
> (add_to_predicate_list): Add convert_bool argument.
> Use predicate of cd-equivalent block if convert_bool is true and
> such bb exists; save it in static variable for further possible use.
> Add call of predicate_disjunction if convert_bool argument is true.
> (add_to_dst_predicate_list): Add convert_bool argument.
> Add early function exit if edge target block is always executed.
> Add call of predicate_conjunction if convert_bool argument is true.
> Pass convert_bool argument for add_to_predicate_list.
> Set-up predicate for crritical edge if convert_bool is true.
> (equal_phi_args): New function.
> (phi_has_two_different_args): New function.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if flag_force_vectorize wa set-up.
> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
> (if_convertible_stmt_p): Allow calls of function clones if
> flag_force_vectorize was set-up.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Allow bb has more than two predecessors if
> flag_force_vectorize was set-up. Use call of all_edges_are_critical
> to reject block if-conversion with imcoming critical edges only if
> flag_force_vectorize was not set-up.
> (walk_cond_tree): New function.
> (vect_bool_pattern_is_applicable): New function.
> (predicate_bbs): Add convert_bool argument which is used to transform
> comparison expressions of boolean type into conditional expressions
> with integral operands. If convert_bool argument was set-up and
> vect bool pattern can be appied perform the following transformation:
> (bool) x != 0  --> y = (int) x; x != 0;
> Add check that if fold_build2 produces bool conversion if convert_bool
> was set-up, recompute predicate using build2_loc. Additional argument
> 'convert_bool" is passed to add_to_dst_predicate_list and
> add_to_predicate_list.
> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
> flag_force_vectorize was set-up to calculate cd equivalent bb's.
> Call predicate_bbs with additional argument equal to false.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (is_cond_scalar_reduction): Add 'extended' argument which signals that
> phi arguments must be evaluated through phi_has_two_different_args.
> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (find_insertion_point): New function.
> (predicate_arbitrary_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add code to set-up gimple statement
> iterator for predication of extended scalar phi's for insertion.
> (insert_gimplified_predicates): Add test for non-predicated basic
> blocks that there are no gimplified statements to insert. Insert
> predicates at the block begining for extended if-conversion.
> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
> predication to build mask.
> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
> (tree_if_conversion): Initialize flag_force_vectorize from current
> loop or outer loop (to support pragma omp declare).Do loop versioning
> for innermost loop marked with pragma omp simd.
>
> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> We implemented additional support for pragma omp simd in part of
>>> extended if-conversion loops with such pragma. These extensions
>>> include:
>>>
>>> 1. All extensions are performed only if considered loop or its outer
>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>    loops behavior was not changed.
>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>    predecessors.
>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>    all phi nodes must be in non-predicated basic block to conform
>>>    semantic of COND_EXPR which is used for transformation.
>>
>> How is that so?  If the PHI is predicated then its result will be used
>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>
>> No?
>>
>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>> with some limitations:
>>>    - for phi nodes which have more than 2 arguments, but only two
>>>    arguments are different and one of them has the only occurence,
>>> transformation to  single COND_EXPR can be done.
>>>    - if phi node has more different arguments and all edge predicates
>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>    will be generated for it. In current design very simple check is used:
>>>    check starting from end that two edges correspondent to neighbor
>>> arguments have common predecessor which is used for further check
>>> with next edge.
>>>  These guarantee that phi predication will produce the correct result.
>>
>> Btw, you can think of these extensions as unfactoring a PHI node by
>> inserting forwarder blocks.  Thus
>>
>>    x = PHI <1(2), 1(3), 2(4)>
>>
>> becomes
>>
>>   bb 5: <forwarder-from(2)-and(3)>
>>
>>   x = PHI <1(5), 2(4)>
>>
>> and
>>
>>   x = PHI <1(2), 2(3), 3(4)>
>>
>> becomes
>>
>>   bb 5:
>>   x' = PHI <1(2), 2(3)>
>>
>>   b = PHI<x'(5), 3(4)>
>>
>> which means that 3) has to work.  Note that we want this kind of
>> PHI transforms for out-of-SSA as well to reduce the number of
>> copies we need to insert on edges.
>>
>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>> over the force_vect loops PHI nodes, applying that CFG transform.
>> And make 3) work properly if it doesn't already.
>>
>> It looks like you introduce a "negate predicate" to work around the
>> critical edge limitation?  Please instead change if-conversion to
>> work with edge predicates (as opposed to BB predicates).
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Here is example of such extended predication (compile with -march=core-avx2):
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0 & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>   <bb 4>:
>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>   t_5 = a[i_16];
>>>   _6 = t_5 > 0.0;
>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>   _8 = _7 & _6;
>>>   _ifc__28 = (unsigned int) _8;
>>>   _10 = &c[i_16];
>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__30 = (int) _ifc__29;
>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>   _ifc__33 = (int) _ifc__32;
>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>   res_1 = res_15 + _ifc__35;
>>>   i_11 = i_16 + 1;
>>>   ivtmp_14 = ivtmp_17 - 1;
>>>   if (ivtmp_14 != 0)
>>>     goto <bb 4>;
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>>
>>> gcc/ChageLog
>>>
>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>> (struct bb_predicate_s): Add negate_predicate field.
>>> (bb_negate_predicate): New function.
>>> (set_bb_negate_predicate): New function.
>>> (bb_copy_predicate): New function.
>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>> (convert_name_to_cmp): New function.
>>> (get_type_for_cond): New function.
>>> (convert_bool_predicate): New function.
>>> (predicate_disjunction): New function.
>>> (predicate_conjunction): New function.
>>> (add_to_predicate_list): Add convert_bool argument.
>>> Add call of predicate_disjunction if convert_bool argument is true.
>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>> Add early function exit if edge target block is always executed.
>>> Add call of predicate_conjunction if convert_bool argument is true.
>>> Pass convert_bool argument for add_to_predicate_list.
>>> (equal_phi_args): New function.
>>> (phi_has_two_different_args): New function.
>>> (phi_args_disjoint): New function.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>> in non-predicated basic blocks.
>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>> to reject block if-conversion with imcoming critical edges only if
>>> flag_force_vectorize was not setup.
>>> (walk_cond_tree): New function.
>>> (vect_bool_pattern_is_applicable): New function.
>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>> comparison expressions of boolean type into conditional expressions
>>> with integral operands. If bool_conv argument is false or both
>>> outgoing edges are not critical old algorithm of predicate assignments
>>> is used, otherwise the following code was added: check on applicable
>>> of vect-bool-pattern recognition and trnasformation of
>>> (bool) x != 0  --> y = (int) x; x != 0;
>>> compute predicates for both outgoing edges one of which is critical
>>> one using 'normal' edge, i.e. compute true and false predicates using
>>> normal outgoing edge only; evaluated predicates are stored in
>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>> negate_predicate of normal edge conatins predicate of critical edge,
>>> but generated gimplified statements are stored in their destination
>>> block fields. Additional argument 'convert_bool" is passed to
>>> add_to_dst_predicate_list and add_to_predicate_list.
>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>> equal to false.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (find_insertion_point): New function.
>>> (predicate_phi_disjoint_args): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>> iterator for predication of extended scalar phi's for insertion.
>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>> blocks that there are no gimplified statements to insert. Insert
>>> predicates at the block begining for extended if-conversion.
>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>> predication to build mask.
>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>> (split_crit_edge): New function.
>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>> loop or outer loop (to support pragma omp declare). Invoke
>>> split_crit_edge for extended predication. Do loop versioning for
>>> innermost loop marked with pragma omp simd.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Extended if-conversion for loops marked with pragma omp simd.
  2014-09-08 13:10     ` Richard Biener
@ 2014-09-22  8:28       ` Yuri Rumyantsev
  0 siblings, 0 replies; 9+ messages in thread
From: Yuri Rumyantsev @ 2014-09-22  8:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 19381 bytes --]

Richard,

here is reduced patch (part.1) which was reduced almost twice.
Let's me also answer on your comments.

1. I really use edge field 'aux' to keep predicate for critical edges.
My previous code was not correct and now it looks like:

  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
    /* Edge E is not critical,  use predicate of edge source bb. */
    c = bb_predicate (b);
  else
    /* Edge E is critical and its aux field contains predicate.  */
    c = edge_predicate (e);

2. I completely delete all code related to creation of conditional
expressions and completely rely on bool pattern recognition in
vectorizer. But we need to delete all dead predicate computations
which are not used since they prevent vectorization. I will add this
local-dce function in next patch.
3. I also did not include in this patch recognition of general
phi-nodes with two arguments only for which conversion of conditional
scalar reduction can be applied also.
Note that all these changes are applied for loop marked with pragma
omp simd only.

2014-09-22  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (cgraph.h): Add include file to detect function clone.
(flag_force_vectorize): New variable.
(edge_predicate): New function.
(set_edge_predicate): New function.
(convert_name_to_cmp): New function.
(add_to_predicate_list): Check unconditionally that bb is always
executed to early exit. Use predicate of cd-equivalent block
for join blocks if it exists.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if
destination block of edge is not always executed. Set-up predicate
for critical edge.
(if_convertible_phi_p): Accept phi nodes with more than two args
if FLAG_FORCE_VECTORIZE was set-up.
(ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
(if_convertible_stmt_p): Fix up pre-function comments.
(all_edges_are_critical): New function.
(if_convertible_bb_p): Allow bb has more than two predecessors if
FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
to reject block if-conversion with incoming critical edges only if
FLAG_FORCE_VECTORIZE was not set-up.
(predicate_bbs): Skip loop exit block also. Add check that if
fold_build2 produces bool conversion, recompute predicate using
build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
(if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
(find_phi_replacement_condition): Extend function interface:
it returns NULL if given phi node must be handled by means of
extended phi node predication. If number of predecessors of phi-block
is equal 2 and atleast one incoming edge is not critical original
algorithm is used.
(get_predicate_for_edge): New function.
(find_insertion_point): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_all_scalar_phis): Introduce new variable BEFORE.
Invoke find_insertion_point to initialize gsi and
predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
that extended predication must be applied).
(insert_gimplified_predicates): Add test for non-predicated basic
blocks that there are no gimplified statements to insert. Insert
predicates at the block begining for extended if-conversion.
(tree_if_conversion): Initialize flag_force_vectorize from current
loop or outer loop (to support pragma omp declare).Do loop versioning
for innermost loop marked with pragma omp simd and
FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
for blocks with two successors.




2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard!
>> Here is updated patch with the following changes:
>>
>> 1. Any restrictions on phi-function were eliminated for extended conversion.
>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>> negate_predicate was deleted.
>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>> be critical.
>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>> blocks to simplify it.
>> 5. I decided to not design pre-pass since it will lead generating
>> chain of cond expressions for phi-node if conversion, whereas for phi
>> of kind
>>   x = PHI <1(2), 1(3), 2(4)>
>> only one cond expression is required and this is considered as simple
>> optimization for arbitrary phi-function. More precise,
>> if phi-function have only two different arguments and one of them has
>> single occurrence, if- conversion is performed as if phi have only 2
>> arguments.
>> For arbitrary phi function a chain of cond expressions is produced.
>>
>> Updated patch is attached.
>>
>> Any comments will be appreciated.
>
> The patch is still very big and does multiple things at once which makes
> it hard to review.
>
> In addition to that it changes function singatures without updating
> the function comments.  For example what is the convert_bool
> argument doing to add_to_dst_predicate_list?  Why do we need
> all this added logic.
>
> You duplicate operand_equal_for_phi_arg_p.
>
> I think the code handling PHIs with more than two operands but
> only two unequal operands is useful generally, so that's an obvious
> candidate for splitting out into a separate patch.
>
> +   CONVERT_BOOL argument was added to convert bool predicate computations
> +   which is not supported by vectorizer to int type through creating of
> +   conditional expressions.  */
>
> Example?  The vectorizer has patterns for bool predicate computations.
> This seems to be another feature that needs splitting out.
>
> The way you get around the critical edge parts looks awkward to me.
> Please either do _all_ predicates as edge predicates or simply
> split critical edges (of the respective loop body).
>
> I still think that an utility doing same PHI arg merging by introducing
> forwarder blocks would be nicer to have.
>
> I'd restructure the main tree_if_conversion function to apply these
> CFG pre-transforms when we are going to version the loop
> for if conversion (eventually transitioning to always doing that).
>
> So - please split up the patch.  It's way too big.
>
> Thanks,
> Richard.
>
>> 2014-08-15  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>> (flag_force_vectorize): New variable.
>> (edge_predicate): New function.
>> (set_edge_predicate): New function.
>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>> (init_bb_predicate): Add initialization of negate_predicate field.
>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>> (convert_name_to_cmp): New function.
>> (get_type_for_cond): New function.
>> (convert_bool_predicate): New function.
>> (predicate_disjunction): New function.
>> (predicate_conjunction): New function.
>> (add_to_predicate_list): Add convert_bool argument.
>> Use predicate of cd-equivalent block if convert_bool is true and
>> such bb exists; save it in static variable for further possible use.
>> Add call of predicate_disjunction if convert_bool argument is true.
>> (add_to_dst_predicate_list): Add convert_bool argument.
>> Add early function exit if edge target block is always executed.
>> Add call of predicate_conjunction if convert_bool argument is true.
>> Pass convert_bool argument for add_to_predicate_list.
>> Set-up predicate for crritical edge if convert_bool is true.
>> (equal_phi_args): New function.
>> (phi_has_two_different_args): New function.
>> (if_convertible_phi_p): Accept phi nodes with more than two args
>> if flag_force_vectorize wa set-up.
>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>> (if_convertible_stmt_p): Allow calls of function clones if
>> flag_force_vectorize was set-up.
>> (all_edges_are_critical): New function.
>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>> to reject block if-conversion with imcoming critical edges only if
>> flag_force_vectorize was not set-up.
>> (walk_cond_tree): New function.
>> (vect_bool_pattern_is_applicable): New function.
>> (predicate_bbs): Add convert_bool argument which is used to transform
>> comparison expressions of boolean type into conditional expressions
>> with integral operands. If convert_bool argument was set-up and
>> vect bool pattern can be appied perform the following transformation:
>> (bool) x != 0  --> y = (int) x; x != 0;
>> Add check that if fold_build2 produces bool conversion if convert_bool
>> was set-up, recompute predicate using build2_loc. Additional argument
>> 'convert_bool" is passed to add_to_dst_predicate_list and
>> add_to_predicate_list.
>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>> Call predicate_bbs with additional argument equal to false.
>> (find_phi_replacement_condition): Extend function interface:
>> it returns NULL if given phi node must be handled by means of
>> extended phi node predication. If number of predecessors of phi-block
>> is equal 2 and atleast one incoming edge is not critical original
>> algorithm is used.
>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>> phi arguments must be evaluated through phi_has_two_different_args.
>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (find_insertion_point): New function.
>> (predicate_arbitrary_phi): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>> iterator for predication of extended scalar phi's for insertion.
>> (insert_gimplified_predicates): Add test for non-predicated basic
>> blocks that there are no gimplified statements to insert. Insert
>> predicates at the block begining for extended if-conversion.
>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>> predication to build mask.
>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>> (tree_if_conversion): Initialize flag_force_vectorize from current
>> loop or outer loop (to support pragma omp declare).Do loop versioning
>> for innermost loop marked with pragma omp simd.
>>
>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> We implemented additional support for pragma omp simd in part of
>>>> extended if-conversion loops with such pragma. These extensions
>>>> include:
>>>>
>>>> 1. All extensions are performed only if considered loop or its outer
>>>>    loop was marked with pragma omp simd (force_vectorize); For ordinary
>>>>    loops behavior was not changed.
>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>    predecessors.
>>>> 3. Put additional restriction on phi nodes which was missed in current design:
>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>    semantic of COND_EXPR which is used for transformation.
>>>
>>> How is that so?  If the PHI is predicated then its result will be used
>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>
>>> No?
>>>
>>>> 4. Extend predication of phi nodes: phi may have more than 2 arguments
>>>> with some limitations:
>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>    arguments are different and one of them has the only occurence,
>>>> transformation to  single COND_EXPR can be done.
>>>>    - if phi node has more different arguments and all edge predicates
>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>    will be generated for it. In current design very simple check is used:
>>>>    check starting from end that two edges correspondent to neighbor
>>>> arguments have common predecessor which is used for further check
>>>> with next edge.
>>>>  These guarantee that phi predication will produce the correct result.
>>>
>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>> inserting forwarder blocks.  Thus
>>>
>>>    x = PHI <1(2), 1(3), 2(4)>
>>>
>>> becomes
>>>
>>>   bb 5: <forwarder-from(2)-and(3)>
>>>
>>>   x = PHI <1(5), 2(4)>
>>>
>>> and
>>>
>>>   x = PHI <1(2), 2(3), 3(4)>
>>>
>>> becomes
>>>
>>>   bb 5:
>>>   x' = PHI <1(2), 2(3)>
>>>
>>>   b = PHI<x'(5), 3(4)>
>>>
>>> which means that 3) has to work.  Note that we want this kind of
>>> PHI transforms for out-of-SSA as well to reduce the number of
>>> copies we need to insert on edges.
>>>
>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>> And make 3) work properly if it doesn't already.
>>>
>>> It looks like you introduce a "negate predicate" to work around the
>>> critical edge limitation?  Please instead change if-conversion to
>>> work with edge predicates (as opposed to BB predicates).
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Here is example of such extended predication (compile with -march=core-avx2):
>>>> #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0 & t < 1.0e+17f)
>>>>       if (c[i] != 0)
>>>> res += 1;
>>>>   }
>>>>   <bb 4>:
>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>   t_5 = a[i_16];
>>>>   _6 = t_5 > 0.0;
>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>   _8 = _7 & _6;
>>>>   _ifc__28 = (unsigned int) _8;
>>>>   _10 = &c[i_16];
>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>   _ifc__30 = (int) _ifc__29;
>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>   _ifc__33 = (int) _ifc__32;
>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>   res_1 = res_15 + _ifc__35;
>>>>   i_11 = i_16 + 1;
>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>   if (ivtmp_14 != 0)
>>>>     goto <bb 4>;
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>>
>>>> gcc/ChageLog
>>>>
>>>> 2014-06-25  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>> (bb_negate_predicate): New function.
>>>> (set_bb_negate_predicate): New function.
>>>> (bb_copy_predicate): New function.
>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>> (convert_name_to_cmp): New function.
>>>> (get_type_for_cond): New function.
>>>> (convert_bool_predicate): New function.
>>>> (predicate_disjunction): New function.
>>>> (predicate_conjunction): New function.
>>>> (add_to_predicate_list): Add convert_bool argument.
>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>> Add early function exit if edge target block is always executed.
>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>> Pass convert_bool argument for add_to_predicate_list.
>>>> (equal_phi_args): New function.
>>>> (phi_has_two_different_args): New function.
>>>> (phi_args_disjoint): New function.
>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>> in non-predicated basic blocks.
>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>> (all_edges_are_critical): New function.
>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>> to reject block if-conversion with imcoming critical edges only if
>>>> flag_force_vectorize was not setup.
>>>> (walk_cond_tree): New function.
>>>> (vect_bool_pattern_is_applicable): New function.
>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>> comparison expressions of boolean type into conditional expressions
>>>> with integral operands. If bool_conv argument is false or both
>>>> outgoing edges are not critical old algorithm of predicate assignments
>>>> is used, otherwise the following code was added: check on applicable
>>>> of vect-bool-pattern recognition and trnasformation of
>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>> compute predicates for both outgoing edges one of which is critical
>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>> normal outgoing edge only; evaluated predicates are stored in
>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>> but generated gimplified statements are stored in their destination
>>>> block fields. Additional argument 'convert_bool" is passed to
>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional argument
>>>> equal to false.
>>>> (find_phi_replacement_condition): Extend function interface:
>>>> it returns NULL if given phi node must be handled by means of
>>>> extended phi node predication. If number of predecessors of phi-block
>>>> is equal 2 and atleast one incoming edge is not critical original
>>>> algorithm is used.
>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>> both phi arguments must be evaluated through phi_has_two_different_args.
>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_phi_disjoint_args): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>> iterator for predication of extended scalar phi's for insertion.
>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>> blocks that there are no gimplified statements to insert. Insert
>>>> predicates at the block begining for extended if-conversion.
>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>> predication to build mask.
>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>> (split_crit_edge): New function.
>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>> split_crit_edge for extended predication. Do loop versioning for
>>>> innermost loop marked with pragma omp simd.

[-- Attachment #2: patch.part1 --]
[-- Type: application/octet-stream, Size: 22411 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 1f8ef03..aab56ee
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -120,6 +120,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Copy of 'force_vectorize' field of loop.  */
+static bool flag_force_vectorize;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -149,6 +152,16 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -160,6 +173,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -395,26 +418,64 @@ fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
   return build3 (COND_EXPR, type, cond, rhs, lhs);
 }
 
+/* Build <name> != 0 expression when COND is SSA_NAME of int type.  */
+
+static inline tree
+convert_name_to_cmp (tree cond)
+{
+  if (TREE_CODE (cond) != SSA_NAME)
+    return cond;
+  return build2 (NE_EXPR, boolean_type_node, cond,
+		 build_int_cst (TREE_TYPE (cond), 0));
+}
+
 /* Add condition NC to the predicate list of basic block BB.  LOOP is
-   the loop to be if-converted.  */
+   the loop to be if-converted. Use predicate of cd-equivalent block
+   if it exists for join bb.  */
 
 static inline void
-add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
+add_to_predicate_list (struct loop *loop, basic_block bb,
+		       tree nc)
 {
   tree bc, *tp;
+  basic_block dom_bb;
+  static basic_block join_bb = NULL;
 
   if (is_true_predicate (nc))
     return;
 
-  if (!is_predicated (bb))
+  /* If dominance tells us this basic block is always executed,
+     don't record any predicates for it.  */
+  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+    return;
+
+  /* If predicate has been already set up for given bb using cd-equivalent
+     block predicate, simply escape. Post-dominator tree was built under
+     flag_force_vectorize only.  */
+  if (flag_force_vectorize)
     {
-      /* If dominance tells us this basic block is always executed, don't
-	 record any predicates for it.  */
-      if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+      if (join_bb == bb)
 	return;
+      dom_bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+      /* We use notion of cd equivalence to get simplier predicate for
+	 join block, e.g. if join block has 2 predecessors with predicates
+	 p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
+	 p1 & p2 | p1 & !p2.  */
+      if (dom_bb != loop->header
+	  && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
+	{
+	  gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
+	  bc = bb_predicate (dom_bb);
+	  gcc_assert (!is_true_predicate (bc));
+	  set_bb_predicate (bb, bc);
 
-      bc = nc;
+	  /* Save bb in join_bb to not handle it once more.  */
+	  join_bb = bb;
+	  return;
+	}
     }
+  if (!is_predicated (bb))
+    bc = nc;
   else
     {
       bc = bb_predicate (bb);
@@ -455,10 +516,15 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -482,7 +548,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the flag_force_vectorize is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
@@ -494,11 +562,18 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gimple phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!flag_force_vectorize)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -728,7 +803,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -865,7 +940,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -912,6 +988,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 2 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_edges_are_critical (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -920,6 +1012,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is not applicable for loops marked with simd pragma.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -932,9 +1026,13 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
 
   if (exit_bb)
     {
@@ -971,18 +1069,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction is not valid for loops marked with
+     simd pragma.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!flag_force_vectorize && all_edges_are_critical (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1064,6 +1161,7 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
   return blocks;
 }
 
+
 /* Returns true when the analysis of the predicates for all the basic
    blocks in LOOP succeeded.
 
@@ -1096,9 +1194,10 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
 	  reset_bb_predicate (loop->latch);
 	  continue;
@@ -1108,27 +1207,41 @@ predicate_bbs (loop_p loop)
       stmt = last_stmt (bb);
       if (stmt && gimple_code (stmt) == GIMPLE_COND)
 	{
-	  tree c2;
+	  tree c, c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
-				    boolean_type_node,
-				    gimple_cond_lhs (stmt),
-				    gimple_cond_rhs (stmt));
-
-	  /* Add new condition into destination's predicate list.  */
-	  extract_true_false_edges_from_block (gimple_bb (stmt),
-					       &true_edge, &false_edge);
+	  tree lopnd = gimple_cond_lhs (stmt);
+	  enum tree_code code = gimple_cond_code (stmt);
+
+	  /* Compute predicates for true and false edges.  */
+	  c = fold_build2_loc (loc, code,
+			       boolean_type_node,
+			       lopnd,
+			       gimple_cond_rhs (stmt));
+	  /* Fold_build2 can produce bool conversion which is not
+             supported by vectorizer, so re-build it without folding.
+	     For example, such conversion is generated for sequence:
+		_Bool _7, _8, _9;
+		_7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
+		if (_9 != 0)  --> (bool)_9.  */
+
+	  if (CONVERT_EXPR_P (c)
+	      && TREE_CODE_CLASS (code) == tcc_comparison)
+	    c = build2_loc (loc, code, boolean_type_node,
+			    lopnd, gimple_cond_rhs (stmt));
+	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
+			   unshare_expr (c));
 
+	  extract_true_false_edges_from_block (bb, &true_edge, &false_edge);
+	  if (flag_force_vectorize)
+	    true_edge->aux = false_edge->aux = NULL;
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
 
 	  /* If C is false, then FALSE_EDGE is taken.  */
-	  c2 = build1_loc (loc, TRUTH_NOT_EXPR, boolean_type_node,
-			   unshare_expr (c));
-	  add_to_dst_predicate_list (loop, false_edge,
-				     unshare_expr (cond), c2);
+	  add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
+				     unshare_expr (c2));
 
 	  cond = NULL_TREE;
 	}
@@ -1176,6 +1289,8 @@ if_convertible_loop_p_1 (struct loop *loop,
     return false;
 
   calculate_dominance_info (CDI_DOMINATORS);
+  if (flag_force_vectorize)
+    calculate_dominance_info (CDI_POST_DOMINATORS);
 
   /* Allow statements that can be handled during if-conversion.  */
   ifc_bbs = get_loop_body_in_if_conv_order (loop);
@@ -1337,7 +1452,9 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
    replacement.  Return the true block whose phi arguments are
    selected when cond is true.  LOOP is the loop containing the
    if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
+   if-conversion.
+   Returns NULL if given phi node must be handled by means of extended
+   phi node predication.  */
 
 static basic_block
 find_phi_replacement_condition (basic_block bb, tree *cond,
@@ -1346,7 +1463,13 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   edge first_edge, second_edge;
   tree tmp_cond;
 
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
+  if (EDGE_COUNT (bb->preds) != 2
+      || all_edges_are_critical (bb))
+    {
+      gcc_assert (flag_force_vectorize);
+      return NULL;
+    }
+
   first_edge = EDGE_PRED (bb, 0);
   second_edge = EDGE_PRED (bb, 1);
 
@@ -1624,6 +1747,237 @@ predicate_scalar_phi (gimple phi, tree cond,
     }
 }
 
+/* Returns predicate of edge associated with argument of phi node.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
+    /* Use predicate of src basic block if it has the only successor.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+  return convert_name_to_cmp (c);
+}
+
+/* This is enhancement for predication of a phi node with arbitrary
+   number of arguments, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   a chain of recurrent cond expressions will be produced.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point.  */
+
+static void
+predicate_arbitrary_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+				bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  gimple stmt;
+  tree lhs;
+  tree rhs;
+  tree res;
+  tree cond;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, i));
+      swap = false;
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  swap = true;
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+
+      stmt = gimple_build_assign_with_ops (COND_EXPR, lhs,
+					   unshare_expr (cond),
+					   swap? last : curr,
+					   swap? curr : last);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, 0));
+  swap = false;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      cond = TREE_OPERAND (cond, 0);
+      swap = true;
+    }
+  if (before)
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, true,
+				       GSI_SAME_STMT);
+  else
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, false,
+				       GSI_CONTINUE_LINKING);
+  rhs = fold_build_cond_expr (type,
+			      unshare_expr (cond),
+			      swap? last : curr,
+			      swap? curr : last);
+  stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+  update_stmt (stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new phi replacement stmt\n");
+      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+    }
+}
+
+/* Returns gimple statement iterator to insert code for predicated phi.  */
+
+static gimple_stmt_iterator
+find_insertion_point (basic_block bb, bool* before)
+{
+  edge e;
+  edge_iterator ei;
+  tree cond;
+  gimple last = NULL;
+  gimple curr;
+  int num_opnd;
+  tree opnd1, opnd2;
+
+  /* Found last statement in bb after which code for predicated phi can be
+     inserted using edge predicates.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == SSA_NAME)
+	{
+	  opnd1 = cond;
+	  opnd2 = NULL_TREE;
+	}
+      else if (TREE_CONSTANT (cond))
+	continue;
+      else if ((num_opnd = TREE_OPERAND_LENGTH (cond)) == 2)
+	{
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = TREE_OPERAND (cond, 1);
+	}
+      else
+	{
+	  gcc_assert (num_opnd == 1);
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = NULL_TREE;
+	}
+      /* Process each operand of cond to determine the latest defenition.  */
+      while (true)
+	{
+	  if (TREE_CODE (opnd1) == SSA_NAME)
+	    {
+	      curr = SSA_NAME_DEF_STMT (opnd1);
+	      /* Skip defenition in other bb's.  */
+	      if (gimple_bb (curr) == bb)
+		{
+		  if (last == NULL)
+		    last = curr;
+		  else
+		    {
+		      /* Determine what stmt is latest in bb.  */
+		      gimple_stmt_iterator gsi;
+		      gimple stmt;
+		      for (gsi = gsi_last_bb (bb);
+			   !gsi_end_p (gsi);
+			    gsi_prev (&gsi))
+			if ((stmt = gsi_stmt (gsi)) == last)
+			  break;
+			else if (stmt == curr)
+			  {
+			    last = curr;
+			    break;
+			  }
+		    }
+		}
+	    }
+	    if (opnd2 != NULL_TREE)
+	      {
+		opnd1 = opnd2;
+		opnd2 = NULL_TREE;
+	      }
+	    else
+	      break;
+	}
+    }
+
+  if (last == NULL)
+    {
+      *before = true;
+      return gsi_after_labels (bb);
+    }
+  *before = false;
+  return gsi_for_stmt (last);
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1633,6 +1987,7 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  bool before = false;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1653,11 +2008,17 @@ predicate_all_scalar_phis (struct loop *loop)
 	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
       true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      if (!true_bb)
+	/* Will use extended predication, find out insertion point.  */
+	gsi = find_insertion_point (bb, &before);
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (true_bb)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_arbitrary_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1673,13 +2034,12 @@ static void
 insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 {
   unsigned int i;
-
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1692,7 +2052,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || flag_force_vectorize)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -1849,7 +2210,7 @@ predicate_mem_writes (loop_p loop)
 	  swap = true;
 	  cond = TREE_OPERAND (cond, 0);
 	}
-
+      cond = convert_name_to_cmp (cond);
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
 	  continue;
@@ -2102,6 +2463,7 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2113,6 +2475,15 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  flag_force_vectorize = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!flag_force_vectorize)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	flag_force_vectorize = true;
+    }
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2122,7 +2493,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2143,7 +2516,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-09-22  8:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 14:06 [PATCH] Extended if-conversion for loops marked with pragma omp simd Yuri Rumyantsev
2014-07-14 10:16 ` Yuri Rumyantsev
2014-07-14 12:16   ` Richard Biener
2014-07-28 11:22     ` Yuri Rumyantsev
2014-08-01  9:40 ` Richard Biener
2014-08-15 12:02   ` Yuri Rumyantsev
2014-09-08 11:03     ` Yuri Rumyantsev
2014-09-08 13:10     ` Richard Biener
2014-09-22  8:28       ` Yuri Rumyantsev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).