public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/3] Extended if-conversion
@ 2014-11-12 13:36 Yuri Rumyantsev
  2014-11-28 12:46 ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-11-12 13:36 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 3115 bytes --]

Hi All,

Here is the second patch related to extended predication.
Few comments which explain a main goal of design.

1. I don't want to insert any critical edge splitting since it may
lead to less efficient binaries.
2. One special case of extended PHI node predication was introduced
when #arguments is more than 2 but only two arguments are different
and one argument has the only occurrence. For such PHI conditional
scalar reduction is applied.
This is correspondent to the following statement:
    if (q1 && q2 && q3) var++
 New function phi_has_two_different_args was introduced to detect such phi.
3. Original algorithm for PHI predication used assumption that at
least one incoming edge for blocks containing PHI is not critical - it
guarantees that all computations related to predicate of normal edge
are already inserted above this block and
code related to PHI predication can be inserted at the beginning of
block. But this is not true for critical edges for which predicate
computations are  in the block where code for phi predication must be
inserted. So new function find_insertion_point is introduced which is
simply found out the last statement in block defining predicates
correspondent to all incoming edges and insert phi predication code
after it (with some minor exceptions).

ChangeLog:

2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
FLAG_FORCE_VECTORIZE instead of loop flag.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
FLAG_FORCE_VECTORIZE is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(phi_has_two_different_args): New function.
(is_cond_scalar_reduction): Add argument EXTENDED to choose access
to phi arguments. Invoke phi_has_two_different_args to get phi
arguments if EXTENDED is true. Change check that block
containing reduction statement candidate is predecessor
of phi-block since phi may have more than two arguments.
(convert_scalar_cond_reduction): Add argument BEFORE to insert
statement before/after gsi point.
(predicate_scalar_phi): Add argument false (which means non-extended
predication) to call of is_cond_scalar_reduction. Add argument
true (which correspondent to argument BEFORE) to call of
convert_scalar_cond_reduction.
(get_predicate_for_edge): New function.
(predicate_arbitrary_scalar_phi): New function.
(predicate_extended_scalar_phi): New function.
(find_insertion_point): New function.
(predicate_all_scalar_phis): Add two boolean variables EXTENDED and
BEFORE. Initialize EXTENDED to true if BB containing phi has more
than 2 predecessors or both incoming edges are critical. Invoke
find_phi_replacement_condition and predicate_scalar_phi or
find_insertion_point and predicate_extended_scalar_phi depending on
EXTENDED value.
(insert_gimplified_predicates): Add check that non-predicated block
may have statements to insert. Insert predicate of BB just after label
if FLAG_FORCE_VECTORIZE is true.
(tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: if-conv.patch2 --]
[-- Type: application/octet-stream, Size: 17365 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index a6cbffc..b107116 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -792,7 +792,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
   basic_block bb = gimple_bb (stmt);
   bool is_load;
 
-  if (!(flag_tree_loop_vectorize || bb->loop_father->force_vectorize)
+  if (!(flag_tree_loop_vectorize || flag_force_vectorize)
       || bb->loop_father->dont_vectorize
       || !gimple_assign_single_p (stmt)
       || gimple_has_volatile_ops (stmt))
@@ -1016,10 +1016,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!flag_force_vectorize)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1053,23 +1058,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 	return false;
       }
 
-  /* At least one incoming edge has to be non-critical as otherwise edge
-     predicates are not equal to basic-block predicates of the edge
-     source. This restriction will be removed after adding support for
-     extended predication.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      if (!flag_force_vectorize && all_preds_critical_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
-		      bb->index);
-
-	  return false;
-	}
-    }
-
   return true;
 }
 
@@ -1473,6 +1461,66 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   return first_edge->src;
 }
 
+/* Returns true if phi arguments are equal except for one; argument values and
+   index of exclusive argument are saved if needed.  */
+
+static bool
+phi_has_two_different_args (gimple phi, tree *arg_0, tree *arg_1,
+			    unsigned int *index)
+{
+  unsigned int i, ind0 = 0, ind1;
+  tree arg0, arg1 = NULL_TREE;
+  bool seen_same = false;
+
+  arg0 = gimple_phi_arg_def (phi, 0);
+  for (i = 1; i < gimple_phi_num_args (phi); i++)
+    {
+      tree tmp;
+      tmp = gimple_phi_arg_def (phi, i);
+      if (arg0 == NULL_TREE
+	  && operand_equal_p (tmp, arg1, 0) == 0)
+	{
+	  arg0 = tmp;
+	  ind0 = i;
+	}
+      else if (seen_same && operand_equal_p (tmp, arg1, 0) != 0)
+	continue;
+      else if (operand_equal_p (tmp, arg0, 0) == 0)
+	{
+	  if (arg1 == NULL_TREE)
+	    {
+	      arg1 = tmp;
+	      ind1 = i;
+	    }
+	  else if (operand_equal_p (tmp, arg1, 0) == 0)
+	    return false;
+	  else
+	    seen_same = true;
+	}
+      else if (!seen_same)
+	{
+	  /* Swap arguments.  */
+	  seen_same = true;
+	  arg0 = arg1;
+	  arg1 = tmp;
+	  ind0 = ind1;
+	}
+      else
+	return false;
+    }
+  if (arg0 == NULL_TREE)
+    return false;
+
+  if (arg_0)
+    *arg_0 = arg0;
+  if (arg_1)
+    *arg_1 = arg1;
+  if (index)
+    *index = ind0;
+
+  return true;
+}
+
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1483,11 +1531,12 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is used to get phi arguments through different methods.  */
 
 static bool
 is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
   tree arg_0, arg_1;
@@ -1499,9 +1548,18 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
+  edge e;
+  edge_iterator ei;
+  bool result = false;
 
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  if (!extended)
+    {
+      arg_0 = PHI_ARG_DEF (phi, 0);
+      arg_1 = PHI_ARG_DEF (phi, 1);
+    }
+  else
+    /* Phi may have more than 2 arguments, but only two are different.  */
+    phi_has_two_different_args (phi, &arg_0, &arg_1, NULL);
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
@@ -1536,8 +1594,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1593,11 +1656,14 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     res_2 = res_13 + _ifc__1;
   Argument SWAP tells that arguments of conditional expression should be
   swapped.
+  Argument BEFORE is used to insert new statement before/after.
+
   Returns rhs of resulting PHI assignment.  */
 
 static tree
 convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
-			       tree cond, tree op0, tree op1, bool swap)
+			       tree cond, tree op0, tree op1, bool swap,
+			       bool before)
 {
   gimple_stmt_iterator stmt_it;
   gimple new_assign;
@@ -1622,7 +1688,10 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
 
   /* Create assignment stmt and insert it at GSI.  */
   new_assign = gimple_build_assign (tmp, c);
-  gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
+  if (before)
+    gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_assign, GSI_NEW_STMT);
   /* Build rhs for unconditional increment/decrement.  */
   rhs = fold_build2 (gimple_assign_rhs_code (reduc),
 		     TREE_TYPE (rhs1), op0, tmp);
@@ -1690,10 +1759,11 @@ predicate_scalar_phi (gimple phi, tree cond,
 	  arg_0 = gimple_phi_arg_def (phi, 0);
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
-					     true_bb != gimple_bb (reduc));
+					     true_bb != gimple_bb (reduc),
+					     true);
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
@@ -1711,6 +1781,291 @@ predicate_scalar_phi (gimple phi, tree cond,
     }
 }
 
+/* Returns predicate of edge associated with argument of phi node.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
+    /* Edge E is not critical, use predicate of edge source bb.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+  return c;
+}
+
+/* This is enhancement for predication of a phi node with arbitrary
+   number of arguments, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   a chain of recurrent cond expressions will be produced.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point
+    using value of argumnet BEFORE.
+    Returns cond expression correspondent to rhs of new phi
+    replacement stmt.  */
+
+static tree
+predicate_arbitrary_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+				bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  tree c;
+  gimple stmt;
+  tree lhs;
+  tree cond;
+  bool swap = false;
+
+  gcc_assert (flag_force_vectorize);
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, i));
+      swap = false;
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  swap = true;
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+
+      c = fold_build_cond_expr (type, unshare_expr (cond),
+				 swap? last : curr,
+				 swap? curr : last);
+      stmt = gimple_build_assign (lhs, c);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (phi, 0));
+  swap = false;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      cond = TREE_OPERAND (cond, 0);
+      swap = true;
+    }
+  if (before)
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, true,
+				       GSI_SAME_STMT);
+  else
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, false,
+				       GSI_CONTINUE_LINKING);
+
+  return fold_build_cond_expr (type,
+			       unshare_expr (cond),
+			       swap? last : curr,
+			       swap? curr : last);
+}
+
+/* Replace scalar phi node with more than 2 arguments. Distinguish
+   one important particular case if phi has only two different
+   arguments and one of them has the only occurance.  */
+
+static void
+predicate_extended_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+			       bool before)
+{
+  gimple new_stmt, reduc;
+  tree rhs, res, arg0, arg1, op0, op1;
+  tree cond;
+  unsigned int index0;
+  edge e;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  if (!phi_has_two_different_args (phi, &arg0, &arg1, &index0))
+    rhs = predicate_arbitrary_scalar_phi (phi, gsi, before);
+  else
+    {
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+      if (!(is_cond_scalar_reduction (phi, &reduc, &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap, before);
+    }
+  new_stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
+  update_stmt (new_stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new extended phi replacement stmt\n");
+      print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+}
+
+/* Returns gimple statement iterator to insert code for predicated phi.  */
+
+static gimple_stmt_iterator
+find_insertion_point (basic_block bb, bool* before)
+{
+  edge e;
+  edge_iterator ei;
+  tree cond;
+  gimple last = NULL;
+  gimple curr;
+  int num_opnd;
+  tree opnd1, opnd2;
+
+  /* Found last statement in bb after which code for predicated phi can be
+     inserted using edge predicates.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == SSA_NAME)
+	{
+	  opnd1 = cond;
+	  opnd2 = NULL_TREE;
+	}
+      else if (TREE_CONSTANT (cond))
+	continue;
+      else if ((num_opnd = TREE_OPERAND_LENGTH (cond)) == 2)
+	{
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = TREE_OPERAND (cond, 1);
+	}
+      else
+	{
+	  gcc_assert (num_opnd == 1);
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = NULL_TREE;
+	}
+      /* Process each operand of cond to determine the latest defenition.  */
+      while (true)
+	{
+	  if (TREE_CODE (opnd1) == SSA_NAME)
+	    {
+	      curr = SSA_NAME_DEF_STMT (opnd1);
+	      /* Skip defenition in other bb's.  */
+	      if (gimple_bb (curr) == bb)
+		{
+		  if (last == NULL)
+		    last = curr;
+		  else
+		    {
+		      /* Determine what stmt is latest in bb.  */
+		      gimple_stmt_iterator gsi;
+		      gimple stmt;
+		      for (gsi = gsi_last_bb (bb);
+			   !gsi_end_p (gsi);
+			    gsi_prev (&gsi))
+			if ((stmt = gsi_stmt (gsi)) == last)
+			  break;
+			else if (stmt == curr)
+			  {
+			    last = curr;
+			    break;
+			  }
+		    }
+		}
+	    }
+	    if (opnd2 != NULL_TREE)
+	      {
+		opnd1 = opnd2;
+		opnd2 = NULL_TREE;
+	      }
+	    else
+	      break;
+	}
+    }
+
+  if (last == NULL)
+    {
+      *before = true;
+      return gsi_after_labels (bb);
+    }
+  *before = false;
+  return gsi_for_stmt (last);
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1720,6 +2075,8 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  bool extended;
+  bool before = false;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1736,15 +2093,28 @@ predicate_all_scalar_phis (struct loop *loop)
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      /* If BB has more than 2 predecessors or all incoming edges to bb
+	 are critical, must handle PHI through extended predication.  */
+      extended = EDGE_COUNT (bb->preds) != 2 || all_preds_critical_p (bb);
+
+      if (!extended)
+	{
+	  /* BB has two predecessors.  Using predecessor's aux field, set
+	     appropriate condition for the PHI node replacement.  */
+	  gsi = gsi_after_labels (bb);
+	  true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+	}
+      else
+	gsi = find_insertion_point (bb, &before);
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = gsi_stmt (phi_gsi);
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (!extended)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_extended_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1766,7 +2136,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      /* Non-predicated join blocks can have the statements to insert.  */
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1779,7 +2150,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || flag_force_vectorize)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -2200,8 +2572,14 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Temporary set up this flag to false.  */
-  flag_force_vectorize = false;
+  flag_force_vectorize = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!flag_force_vectorize)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	flag_force_vectorize = true;
+    }
 
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-11-12 13:36 [PATCH 2/3] Extended if-conversion Yuri Rumyantsev
@ 2014-11-28 12:46 ` Richard Biener
  2014-12-01 15:53   ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-11-28 12:46 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> Here is the second patch related to extended predication.
> Few comments which explain a main goal of design.
>
> 1. I don't want to insert any critical edge splitting since it may
> lead to less efficient binaries.
> 2. One special case of extended PHI node predication was introduced
> when #arguments is more than 2 but only two arguments are different
> and one argument has the only occurrence. For such PHI conditional
> scalar reduction is applied.
> This is correspondent to the following statement:
>     if (q1 && q2 && q3) var++
>  New function phi_has_two_different_args was introduced to detect such phi.
> 3. Original algorithm for PHI predication used assumption that at
> least one incoming edge for blocks containing PHI is not critical - it
> guarantees that all computations related to predicate of normal edge
> are already inserted above this block and
> code related to PHI predication can be inserted at the beginning of
> block. But this is not true for critical edges for which predicate
> computations are  in the block where code for phi predication must be
> inserted. So new function find_insertion_point is introduced which is
> simply found out the last statement in block defining predicates
> correspondent to all incoming edges and insert phi predication code
> after it (with some minor exceptions).

Unfortunately the patch doesn't apply for me - I get

patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
predicate_all_scalar_phis (struct loop *loop)

a few remarks nevertheless.  I don't see how we need both
predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
Couldn't we simply sort an array of (edge, value) pairs after value
and handle equal values specially in predicate_extended_scalar_phi?
That would even make PHI <a, a, b, c, c> more optimal.

I don't understand the need for find_insertion_point.  All SSA names
required for the predicates are defined upward - and the complex CFG
is squashed to a single basic-block, thus the defs will dominate the
inserted code if you insert after labels just like for the other case.
Or what am I missing?  ("flattening" of the basic-blocks of course needs
to happen in dominator order - but I guess that happens already?)

I'd like the extended PHI handling to be enablable by a flag even
for !force-vectorization - I've seen cases with 3 PHI args multiple
times that would have been nice to vectorize.  I suggest to
add -ftree-loop-if-convert-aggressive for this.  We can do this as
followup, but please rename the local flag_force_vectorize flag
to something less looking like a flag, like simply 'aggressive'.

Otherwise patch 2 looks ok to me.

Richard.


> ChangeLog:
>
> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
> FLAG_FORCE_VECTORIZE instead of loop flag.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> FLAG_FORCE_VECTORIZE is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (phi_has_two_different_args): New function.
> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
> to phi arguments. Invoke phi_has_two_different_args to get phi
> arguments if EXTENDED is true. Change check that block
> containing reduction statement candidate is predecessor
> of phi-block since phi may have more than two arguments.
> (convert_scalar_cond_reduction): Add argument BEFORE to insert
> statement before/after gsi point.
> (predicate_scalar_phi): Add argument false (which means non-extended
> predication) to call of is_cond_scalar_reduction. Add argument
> true (which correspondent to argument BEFORE) to call of
> convert_scalar_cond_reduction.
> (get_predicate_for_edge): New function.
> (predicate_arbitrary_scalar_phi): New function.
> (predicate_extended_scalar_phi): New function.
> (find_insertion_point): New function.
> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
> BEFORE. Initialize EXTENDED to true if BB containing phi has more
> than 2 predecessors or both incoming edges are critical. Invoke
> find_phi_replacement_condition and predicate_scalar_phi or
> find_insertion_point and predicate_extended_scalar_phi depending on
> EXTENDED value.
> (insert_gimplified_predicates): Add check that non-predicated block
> may have statements to insert. Insert predicate of BB just after label
> if FLAG_FORCE_VECTORIZE is true.
> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-11-28 12:46 ` Richard Biener
@ 2014-12-01 15:53   ` Yuri Rumyantsev
  2014-12-02 13:28     ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-01 15:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 6882 bytes --]

Hi Richard,

I resend you patch1 and patch2 with minor changes:
1. I renamed flag_force_vectorize to aggressive_if_conv.
2. Use static cast for the first argument of gimple_phi_arg_edge.
I also very sorry that I sent you bad patch.

Now let me answer on your questions related to second patch.
1. Why we need both predicate_extended_scalar_phi and
predicate_arbitrary_scalar_phi?

Let's consider the following simple test-case:

  #pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)  /* c is integer array. */
res += 1;
  }

we can see the following phi node correspondent to res:

# res_1 = PHI <res_15(3), res_15(4), res_10(5)>

It is clear that we can optimize it to phi node with 2 arguments only
and only one check can be used for phi predication (for reduction in
our case), namely predicate of bb_5. In general case we can't do it
even if we sort all phi argument values since we still have to produce
a chain of cond expressions to perform phi predication (see comments
for predicate_arbitrary_scalar_phi).
2. Why we need to introduce find_insertion_point?
 Let's consider another test-case extracted from 175.vpr ( t5.c is
attached) and we can see that bb_7 and bb_9 containig phi nodes has
only critical incoming edges and both contain code computing edge
predicates, e.g.

<bb 7>:
# xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
_46 = xmax_17 == xmax_37;
_47 = xmax_17 == xmax_27;
_48 = _46 & _47;
_53 = xmax_17 == xmax_37;
_54 = ~_53;
_55 = xmax_17 == xmax_27;
_56 = _54 & _55;
_57 = _48 | _56;
xmax_edge_19 = xmax_edge_39 + 1;
goto <bb 11>;

It is evident that we can not put phi predication at the block
beginning but need to put it after predicate computations.
Note also that if there are no critical edges for phi arguments
insertion point will be "after labels" Note also that phi result can
have use in this block too, so we can't put predication code to the
block end.

Let me know if you still have any questions.

Best regards.
Yuri.




2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> Here is the second patch related to extended predication.
>> Few comments which explain a main goal of design.
>>
>> 1. I don't want to insert any critical edge splitting since it may
>> lead to less efficient binaries.
>> 2. One special case of extended PHI node predication was introduced
>> when #arguments is more than 2 but only two arguments are different
>> and one argument has the only occurrence. For such PHI conditional
>> scalar reduction is applied.
>> This is correspondent to the following statement:
>>     if (q1 && q2 && q3) var++
>>  New function phi_has_two_different_args was introduced to detect such phi.
>> 3. Original algorithm for PHI predication used assumption that at
>> least one incoming edge for blocks containing PHI is not critical - it
>> guarantees that all computations related to predicate of normal edge
>> are already inserted above this block and
>> code related to PHI predication can be inserted at the beginning of
>> block. But this is not true for critical edges for which predicate
>> computations are  in the block where code for phi predication must be
>> inserted. So new function find_insertion_point is introduced which is
>> simply found out the last statement in block defining predicates
>> correspondent to all incoming edges and insert phi predication code
>> after it (with some minor exceptions).
>
> Unfortunately the patch doesn't apply for me - I get
>
> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
> predicate_all_scalar_phis (struct loop *loop)
>
> a few remarks nevertheless.  I don't see how we need both
> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
> Couldn't we simply sort an array of (edge, value) pairs after value
> and handle equal values specially in predicate_extended_scalar_phi?
> That would even make PHI <a, a, b, c, c> more optimal.
>
> I don't understand the need for find_insertion_point.  All SSA names
> required for the predicates are defined upward - and the complex CFG
> is squashed to a single basic-block, thus the defs will dominate the
> inserted code if you insert after labels just like for the other case.
> Or what am I missing?  ("flattening" of the basic-blocks of course needs
> to happen in dominator order - but I guess that happens already?)
>
> I'd like the extended PHI handling to be enablable by a flag even
> for !force-vectorization - I've seen cases with 3 PHI args multiple
> times that would have been nice to vectorize.  I suggest to
> add -ftree-loop-if-convert-aggressive for this.  We can do this as
> followup, but please rename the local flag_force_vectorize flag
> to something less looking like a flag, like simply 'aggressive'.
>
> Otherwise patch 2 looks ok to me.
>
> Richard.
>
>
>> ChangeLog:
>>
>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>> FLAG_FORCE_VECTORIZE instead of loop flag.
>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>> FLAG_FORCE_VECTORIZE is true.
>> (if_convertible_bb_p): Delete check that bb has at least one
>> non-critical incoming edge.
>> (phi_has_two_different_args): New function.
>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>> to phi arguments. Invoke phi_has_two_different_args to get phi
>> arguments if EXTENDED is true. Change check that block
>> containing reduction statement candidate is predecessor
>> of phi-block since phi may have more than two arguments.
>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>> statement before/after gsi point.
>> (predicate_scalar_phi): Add argument false (which means non-extended
>> predication) to call of is_cond_scalar_reduction. Add argument
>> true (which correspondent to argument BEFORE) to call of
>> convert_scalar_cond_reduction.
>> (get_predicate_for_edge): New function.
>> (predicate_arbitrary_scalar_phi): New function.
>> (predicate_extended_scalar_phi): New function.
>> (find_insertion_point): New function.
>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>> than 2 predecessors or both incoming edges are critical. Invoke
>> find_phi_replacement_condition and predicate_scalar_phi or
>> find_insertion_point and predicate_extended_scalar_phi depending on
>> EXTENDED value.
>> (insert_gimplified_predicates): Add check that non-predicated block
>> may have statements to insert. Insert predicate of BB just after label
>> if FLAG_FORCE_VECTORIZE is true.
>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: t5.c --]
[-- Type: text/x-csrc, Size: 713 bytes --]

#define N 512
#define max(x,y) (x) >= (y)? (x) : (y)
#define min(x,y) (x) <= (y)? (x) : (y)
int c_X[N];
int x_max;
int x_min;
extern int nx;

void foo (int n)
{
  int i, x;
  int xmin, xmax;
  int xmin_edge, xmax_edge;

  x = c_X[0];
  xmin = xmax = x;
  xmin_edge = xmax_edge  = 1;
#pragma omp simd safelen(8)
  for (i = 1; i<n; i++) {
    x = c_X[i];
    x = max(min(x,nx),1);
    if (x == xmin) {  
       xmin_edge++;
    }
    if (x == xmax) {
       xmax_edge++;
    }
    else if (x < xmin) {
       xmin = x;
       xmin_edge = 1;
    }
    else if (x > xmax) {
       xmax = x;
       xmax_edge = 1;
    }

  }
    x_max = xmax_edge;
    x_min = xmin_edge;
}

   

[-- Attachment #3: if-conv.patch1.1 --]
[-- Type: application/octet-stream, Size: 7951 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f7befac..6c9ad32 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -131,6 +131,9 @@ along with GCC; see the file COPYING3.  If not see
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Apply more aggressive (extended) if-conversion if true.  */
+static bool aggressive_if_conv;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -160,6 +163,17 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
+/* Returns predicate for critical edge E.  */
+
+static inline tree
+edge_predicate (edge e)
+{
+  gcc_assert (EDGE_COUNT (e->src->succs) >= 2);
+  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
+  gcc_assert (e->aux != NULL);
+  return (tree) e->aux;
+}
+
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -171,6 +185,16 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
+/* Sets predicate COND for critical edge E.
+   Assumes that #(E->src->succs) >=2 & #(E->dest->preds) >= 2.  */
+
+static inline void
+set_edge_predicate (edge e, tree cond)
+{
+  gcc_assert (cond != NULL_TREE);
+  e->aux = cond;
+}
+
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -485,10 +509,16 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
+
+  /* If edge E is critical save predicate on it.
+     Assume that #(e->src->succs) >= 2.  */
+  if (EDGE_COUNT (e->dest->preds) >= 2)
+    set_edge_predicate (e, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -512,7 +542,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the aggressive_if_conv is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
@@ -524,11 +556,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!aggressive_if_conv)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -895,7 +933,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -942,6 +981,22 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -950,6 +1005,9 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction will be deleted after adding support for extended
+   predication.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -1001,18 +1059,17 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
+     source. This restriction will be removed after adding support for
+     extended predication.  */
   if (EDGE_COUNT (bb->preds) > 1
       && bb != loop->header)
     {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
+      if (!aggressive_if_conv && all_preds_critical_p (bb))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
+	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
+		      bb->index);
+
 	  return false;
 	}
     }
@@ -1126,11 +1183,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1141,7 +1199,7 @@ predicate_bbs (loop_p loop)
 	  tree c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+	  tree c = build2_loc (loc, gimple_cond_code (stmt),
 				    boolean_type_node,
 				    gimple_cond_lhs (stmt),
 				    gimple_cond_rhs (stmt));
@@ -1150,6 +1208,8 @@ predicate_bbs (loop_p loop)
 	  extract_true_false_edges_from_block (gimple_bb (stmt),
 					       &true_edge, &false_edge);
 
+          true_edge->aux = false_edge->aux = NULL;
+
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
@@ -2145,6 +2205,9 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Temporary set up this flag to false.  */
+  aggressive_if_conv = false;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2154,7 +2217,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || (loop->force_vectorize && flag_tree_loop_if_convert != 1))
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2175,7 +2240,15 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	  if (EDGE_COUNT (bb->succs) == 2)
+	    {
+	      EDGE_SUCC (bb, 0)->aux = NULL;
+	      EDGE_SUCC (bb, 1)->aux = NULL;
+	    }
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

[-- Attachment #4: if-conv.patch2.1 --]
[-- Type: application/octet-stream, Size: 17100 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 6c9ad32..bde2119 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1020,10 +1020,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1057,23 +1062,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 	return false;
       }
 
-  /* At least one incoming edge has to be non-critical as otherwise edge
-     predicates are not equal to basic-block predicates of the edge
-     source. This restriction will be removed after adding support for
-     extended predication.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      if (!aggressive_if_conv && all_preds_critical_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
-		      bb->index);
-
-	  return false;
-	}
-    }
-
   return true;
 }
 
@@ -1477,6 +1465,66 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
   return first_edge->src;
 }
 
+/* Returns true if phi arguments are equal except for one; argument values and
+   index of exclusive argument are saved if needed.  */
+
+static bool
+phi_has_two_different_args (gimple phi, tree *arg_0, tree *arg_1,
+			    unsigned int *index)
+{
+  unsigned int i, ind0 = 0, ind1;
+  tree arg0, arg1 = NULL_TREE;
+  bool seen_same = false;
+
+  arg0 = gimple_phi_arg_def (phi, 0);
+  for (i = 1; i < gimple_phi_num_args (phi); i++)
+    {
+      tree tmp;
+      tmp = gimple_phi_arg_def (phi, i);
+      if (arg0 == NULL_TREE
+	  && operand_equal_p (tmp, arg1, 0) == 0)
+	{
+	  arg0 = tmp;
+	  ind0 = i;
+	}
+      else if (seen_same && operand_equal_p (tmp, arg1, 0) != 0)
+	continue;
+      else if (operand_equal_p (tmp, arg0, 0) == 0)
+	{
+	  if (arg1 == NULL_TREE)
+	    {
+	      arg1 = tmp;
+	      ind1 = i;
+	    }
+	  else if (operand_equal_p (tmp, arg1, 0) == 0)
+	    return false;
+	  else
+	    seen_same = true;
+	}
+      else if (!seen_same)
+	{
+	  /* Swap arguments.  */
+	  seen_same = true;
+	  arg0 = arg1;
+	  arg1 = tmp;
+	  ind0 = ind1;
+	}
+      else
+	return false;
+    }
+  if (arg0 == NULL_TREE)
+    return false;
+
+  if (arg_0)
+    *arg_0 = arg0;
+  if (arg_1)
+    *arg_1 = arg1;
+  if (index)
+    *index = ind0;
+
+  return true;
+}
+
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1487,11 +1535,12 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is used to get phi arguments through different methods.  */
 
 static bool
 is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
   tree arg_0, arg_1;
@@ -1503,9 +1552,19 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
+  edge e;
+  edge_iterator ei;
+  bool result = false;
+
+  if (!extended)
+    {
+      arg_0 = PHI_ARG_DEF (phi, 0);
+      arg_1 = PHI_ARG_DEF (phi, 1);
+    }
+  else
+    /* Phi may have more than 2 arguments, but only two are different.  */
+    phi_has_two_different_args (phi, &arg_0, &arg_1, NULL);
 
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
@@ -1540,8 +1599,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1597,11 +1661,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     res_2 = res_13 + _ifc__1;
   Argument SWAP tells that arguments of conditional expression should be
   swapped.
+  Argument BEFORE is used to insert new statement before/after.
   Returns rhs of resulting PHI assignment.  */
 
 static tree
 convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
-			       tree cond, tree op0, tree op1, bool swap)
+			       tree cond, tree op0, tree op1, bool swap,
+			       bool before)
 {
   gimple_stmt_iterator stmt_it;
   gimple new_assign;
@@ -1626,7 +1692,10 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
 
   /* Create assignment stmt and insert it at GSI.  */
   new_assign = gimple_build_assign (tmp, c);
-  gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
+  if (before)
+    gsi_insert_before (gsi, new_assign, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_assign, GSI_NEW_STMT);
   /* Build rhs for unconditional increment/decrement.  */
   rhs = fold_build2 (gimple_assign_rhs_code (reduc),
 		     TREE_TYPE (rhs1), op0, tmp);
@@ -1694,10 +1763,11 @@ predicate_scalar_phi (gphi *phi, tree cond,
 	  arg_0 = gimple_phi_arg_def (phi, 0);
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
-					     true_bb != gimple_bb (reduc));
+					     true_bb != gimple_bb (reduc),
+					     true);
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
@@ -1715,6 +1785,288 @@ predicate_scalar_phi (gphi *phi, tree cond,
     }
 }
 
+/* Returns predicate of edge associated with argument of phi node.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
+    /* Edge E is not critical, use predicate of edge source bb.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+  return c;
+}
+
+/* This is enhancement for predication of a phi node with arbitrary
+   number of arguments, i.e. for
+	x = phi (x_1, x_2, ..., x_k)
+   a chain of recurrent cond expressions will be produced.
+   For example,
+	bb_0
+	if (_5 != 0) goto bb_1 else goto bb_2
+	end_bb_0
+
+	bb_1
+	res_2 = some computations;
+	goto bb_5
+	end_bb_1
+
+	bb_2
+	if (_9 != 0) goto bb_3 else goto bb_4
+	end_bb_2
+
+	bb_3
+	res_3 = ...;
+	goto bb_5
+	end_bb_3
+
+	bb4
+	res_4 = ...;
+	end_bb_4
+
+	bb_5
+	# res_1 = PHI <res_2(1), res_3(3), res_4(4)>
+
+    will be if-converted into chain of unconditional assignments:
+	_ifc__42 = <PRD_3> ? res_3 : res_4;
+	res_1 = _5 != 0 ? res_2 : _ifc__42;
+
+    where <PRD_3> is predicate of <bb_3>.
+
+    All created intermediate statements are inserted at GSI point
+    using value of argumnet BEFORE.
+    Returns cond expression correspondent to rhs of new phi
+    replacement stmt.  */
+
+static tree
+predicate_arbitrary_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+				bool before)
+{
+  int i;
+  int num = (int) gimple_phi_num_args (phi);
+  tree last = gimple_phi_arg_def (phi, num - 1);
+  tree type = TREE_TYPE (gimple_phi_result (phi));
+  tree curr;
+  tree c;
+  gimple stmt;
+  tree lhs;
+  tree cond;
+  bool swap = false;
+
+  gcc_assert (aggressive_if_conv);
+  for (i = num - 2; i > 0; i--)
+    {
+      curr = gimple_phi_arg_def (phi, i);
+      lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+      cond = get_predicate_for_edge (gimple_phi_arg_edge (as_a <gphi *> (phi),
+				     i));
+      swap = false;
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  cond = TREE_OPERAND (cond, 0);
+	  swap = true;
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+
+      c = fold_build_cond_expr (type, unshare_expr (cond),
+				 swap? last : curr,
+				 swap? curr : last);
+      stmt = gimple_build_assign (lhs, c);
+
+      if (before)
+	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+      else
+	gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+      update_stmt (stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Create new assign stmt for phi arg#%d\n", i);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      last = lhs;
+    }
+  curr = gimple_phi_arg_def (phi, 0);
+  cond = get_predicate_for_edge (gimple_phi_arg_edge (as_a <gphi *> (phi), 0));
+  swap = false;
+  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+    {
+      cond = TREE_OPERAND (cond, 0);
+      swap = true;
+    }
+  if (before)
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, true,
+				       GSI_SAME_STMT);
+  else
+    cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+				       is_gimple_condexpr, NULL_TREE, false,
+				       GSI_CONTINUE_LINKING);
+
+  return fold_build_cond_expr (type,
+			       unshare_expr (cond),
+			       swap? last : curr,
+			       swap? curr : last);
+}
+
+/* Replace scalar phi node with more than 2 arguments. Distinguish
+   one important particular case if phi has only two different
+   arguments and one of them has the only occurance.  */
+
+static void
+predicate_extended_scalar_phi (gimple phi, gimple_stmt_iterator *gsi,
+			       bool before)
+{
+  gimple new_stmt, reduc;
+  tree rhs, res, arg0, arg1, op0, op1;
+  tree cond;
+  unsigned int index0;
+  edge e;
+  bool swap = false;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  if (!phi_has_two_different_args (phi, &arg0, &arg1, &index0))
+    rhs = predicate_arbitrary_scalar_phi (phi, gsi, before);
+  else
+    {
+      e = gimple_phi_arg_edge (as_a <gphi *> (phi), index0);
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      if (before)
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   true, GSI_SAME_STMT);
+      else
+	cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					   is_gimple_condexpr, NULL_TREE,
+					   false, GSI_CONTINUE_LINKING);
+      if (!(is_cond_scalar_reduction (phi, &reduc, &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap, before);
+    }
+  new_stmt = gimple_build_assign (res, rhs);
+  if (before)
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  else
+    gsi_insert_after (gsi, new_stmt, GSI_NEW_STMT);
+  update_stmt (new_stmt);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new extended phi replacement stmt\n");
+      print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+}
+
+/* Returns gimple statement iterator to insert code for predicated phi.  */
+
+static gimple_stmt_iterator
+find_insertion_point (basic_block bb, bool* before)
+{
+  edge e;
+  edge_iterator ei;
+  tree cond;
+  gimple last = NULL;
+  gimple curr;
+  int num_opnd;
+  tree opnd1, opnd2;
+
+  /* Found last statement in bb after which code for predicated phi can be
+     inserted using edge predicates.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == SSA_NAME)
+	{
+	  opnd1 = cond;
+	  opnd2 = NULL_TREE;
+	}
+      else if (TREE_CONSTANT (cond))
+	continue;
+      else if ((num_opnd = TREE_OPERAND_LENGTH (cond)) == 2)
+	{
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = TREE_OPERAND (cond, 1);
+	}
+      else
+	{
+	  gcc_assert (num_opnd == 1);
+	  opnd1 = TREE_OPERAND (cond, 0);
+	  opnd2 = NULL_TREE;
+	}
+      /* Process each operand of cond to determine the latest defenition.  */
+      while (true)
+	{
+	  if (TREE_CODE (opnd1) == SSA_NAME)
+	    {
+	      curr = SSA_NAME_DEF_STMT (opnd1);
+	      /* Skip defenition in other bb's.  */
+	      if (gimple_bb (curr) == bb)
+		{
+		  if (last == NULL)
+		    last = curr;
+		  else
+		    {
+		      /* Determine what stmt is latest in bb.  */
+		      gimple_stmt_iterator gsi;
+		      gimple stmt;
+		      for (gsi = gsi_last_bb (bb);
+			   !gsi_end_p (gsi);
+			    gsi_prev (&gsi))
+			if ((stmt = gsi_stmt (gsi)) == last)
+			  break;
+			else if (stmt == curr)
+			  {
+			    last = curr;
+			    break;
+			  }
+		    }
+		}
+	    }
+	    if (opnd2 != NULL_TREE)
+	      {
+		opnd1 = opnd2;
+		opnd2 = NULL_TREE;
+	      }
+	    else
+	      break;
+	}
+    }
+
+  if (last == NULL)
+    {
+      *before = true;
+      return gsi_after_labels (bb);
+    }
+  *before = false;
+  return gsi_for_stmt (last);
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1724,6 +2076,8 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  bool extended;
+  bool before = false;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1741,15 +2095,26 @@ predicate_all_scalar_phis (struct loop *loop)
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
-      gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+      /* If BB has more than 2 predecessors or all incoming edges to bb
+	 are critical, must handle PHI through extended predication.  */
+      extended = EDGE_COUNT (bb->preds) != 2 || all_preds_critical_p (bb);
+      if (!extended)
+	{
+	  /* BB has two predecessors.  Using predecessor's aux field, set
+	     appropriate condition for the PHI node replacement.  */
+	  gsi = gsi_after_labels (bb);
+	  true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+	}
+      else
+	gsi = find_insertion_point (bb, &before);
 
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (!extended)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_extended_scalar_phi (phi, &gsi, before);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1771,7 +2136,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      /* Non-predicated join blocks can have the statements to insert.  */
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1784,7 +2150,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || aggressive_if_conv)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -2205,8 +2572,15 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Temporary set up this flag to false.  */
-  aggressive_if_conv = false;
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
 
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-01 15:53   ` Yuri Rumyantsev
@ 2014-12-02 13:28     ` Richard Biener
  2014-12-02 15:29       ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-02 13:28 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> I resend you patch1 and patch2 with minor changes:
> 1. I renamed flag_force_vectorize to aggressive_if_conv.
> 2. Use static cast for the first argument of gimple_phi_arg_edge.
> I also very sorry that I sent you bad patch.
>
> Now let me answer on your questions related to second patch.
> 1. Why we need both predicate_extended_scalar_phi and
> predicate_arbitrary_scalar_phi?
>
> Let's consider the following simple test-case:
>
>   #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)  /* c is integer array. */
> res += 1;
>   }
>
> we can see the following phi node correspondent to res:
>
> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>
> It is clear that we can optimize it to phi node with 2 arguments only
> and only one check can be used for phi predication (for reduction in
> our case), namely predicate of bb_5. In general case we can't do it
> even if we sort all phi argument values since we still have to produce
> a chain of cond expressions to perform phi predication (see comments
> for predicate_arbitrary_scalar_phi).

How so?  We can always use !(condition) for the "last" value, thus
treat it as an 'else' case.  That even works for

# res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>

where the condition for edges 5 and 7 can be computed as
! (condition for 3 || condition for 4).

Of course it is worthwhile to also sort single-occurances first
so your case gets just the condiiton for edge 5 and its inversion
used for edges 3 and 4 combined.

> 2. Why we need to introduce find_insertion_point?
>  Let's consider another test-case extracted from 175.vpr ( t5.c is
> attached) and we can see that bb_7 and bb_9 containig phi nodes has
> only critical incoming edges and both contain code computing edge
> predicates, e.g.
>
> <bb 7>:
> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
> _46 = xmax_17 == xmax_37;
> _47 = xmax_17 == xmax_27;
> _48 = _46 & _47;
> _53 = xmax_17 == xmax_37;
> _54 = ~_53;
> _55 = xmax_17 == xmax_27;
> _56 = _54 & _55;
> _57 = _48 | _56;
> xmax_edge_19 = xmax_edge_39 + 1;
> goto <bb 11>;
>
> It is evident that we can not put phi predication at the block
> beginning but need to put it after predicate computations.
> Note also that if there are no critical edges for phi arguments
> insertion point will be "after labels" Note also that phi result can
> have use in this block too, so we can't put predication code to the
> block end.

So the issue is that predicate insertion for edge predicates does
not happen on the edge but somewhere else (generally impossible
for critical edges unless you split them).

I think I've told you before that I prefer simple solutions to such issues,
like splitting the edge!  Certainly not involving a function walking
GENERIC expressions.

Thanks,
Richard.

> Let me know if you still have any questions.
>
> Best regards.
> Yuri.
>
>
>
>
> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is the second patch related to extended predication.
>>> Few comments which explain a main goal of design.
>>>
>>> 1. I don't want to insert any critical edge splitting since it may
>>> lead to less efficient binaries.
>>> 2. One special case of extended PHI node predication was introduced
>>> when #arguments is more than 2 but only two arguments are different
>>> and one argument has the only occurrence. For such PHI conditional
>>> scalar reduction is applied.
>>> This is correspondent to the following statement:
>>>     if (q1 && q2 && q3) var++
>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>> 3. Original algorithm for PHI predication used assumption that at
>>> least one incoming edge for blocks containing PHI is not critical - it
>>> guarantees that all computations related to predicate of normal edge
>>> are already inserted above this block and
>>> code related to PHI predication can be inserted at the beginning of
>>> block. But this is not true for critical edges for which predicate
>>> computations are  in the block where code for phi predication must be
>>> inserted. So new function find_insertion_point is introduced which is
>>> simply found out the last statement in block defining predicates
>>> correspondent to all incoming edges and insert phi predication code
>>> after it (with some minor exceptions).
>>
>> Unfortunately the patch doesn't apply for me - I get
>>
>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>> predicate_all_scalar_phis (struct loop *loop)
>>
>> a few remarks nevertheless.  I don't see how we need both
>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>> Couldn't we simply sort an array of (edge, value) pairs after value
>> and handle equal values specially in predicate_extended_scalar_phi?
>> That would even make PHI <a, a, b, c, c> more optimal.
>>
>> I don't understand the need for find_insertion_point.  All SSA names
>> required for the predicates are defined upward - and the complex CFG
>> is squashed to a single basic-block, thus the defs will dominate the
>> inserted code if you insert after labels just like for the other case.
>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>> to happen in dominator order - but I guess that happens already?)
>>
>> I'd like the extended PHI handling to be enablable by a flag even
>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>> times that would have been nice to vectorize.  I suggest to
>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>> followup, but please rename the local flag_force_vectorize flag
>> to something less looking like a flag, like simply 'aggressive'.
>>
>> Otherwise patch 2 looks ok to me.
>>
>> Richard.
>>
>>
>>> ChangeLog:
>>>
>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>> FLAG_FORCE_VECTORIZE is true.
>>> (if_convertible_bb_p): Delete check that bb has at least one
>>> non-critical incoming edge.
>>> (phi_has_two_different_args): New function.
>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>> arguments if EXTENDED is true. Change check that block
>>> containing reduction statement candidate is predecessor
>>> of phi-block since phi may have more than two arguments.
>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>> statement before/after gsi point.
>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>> predication) to call of is_cond_scalar_reduction. Add argument
>>> true (which correspondent to argument BEFORE) to call of
>>> convert_scalar_cond_reduction.
>>> (get_predicate_for_edge): New function.
>>> (predicate_arbitrary_scalar_phi): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (find_insertion_point): New function.
>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>> than 2 predecessors or both incoming edges are critical. Invoke
>>> find_phi_replacement_condition and predicate_scalar_phi or
>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>> EXTENDED value.
>>> (insert_gimplified_predicates): Add check that non-predicated block
>>> may have statements to insert. Insert predicate of BB just after label
>>> if FLAG_FORCE_VECTORIZE is true.
>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-02 13:28     ` Richard Biener
@ 2014-12-02 15:29       ` Yuri Rumyantsev
  2014-12-04 12:41         ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-02 15:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Thanks Richard for your quick reply!

1. I agree that we can combine predicate_extended_ and
predicate_arbitrary_ to one function as you proposed.
2. What is your opinion about using more simple decision about
insertion point - if bb has use of phi result insert phi predication
before it and at the bb end otherwise. I assume that critical edge
splitting is not a good decision.

Best regards.
Yuri.

2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> I resend you patch1 and patch2 with minor changes:
>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>> I also very sorry that I sent you bad patch.
>>
>> Now let me answer on your questions related to second patch.
>> 1. Why we need both predicate_extended_scalar_phi and
>> predicate_arbitrary_scalar_phi?
>>
>> Let's consider the following simple test-case:
>>
>>   #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)  /* c is integer array. */
>> res += 1;
>>   }
>>
>> we can see the following phi node correspondent to res:
>>
>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>
>> It is clear that we can optimize it to phi node with 2 arguments only
>> and only one check can be used for phi predication (for reduction in
>> our case), namely predicate of bb_5. In general case we can't do it
>> even if we sort all phi argument values since we still have to produce
>> a chain of cond expressions to perform phi predication (see comments
>> for predicate_arbitrary_scalar_phi).
>
> How so?  We can always use !(condition) for the "last" value, thus
> treat it as an 'else' case.  That even works for
>
> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>
> where the condition for edges 5 and 7 can be computed as
> ! (condition for 3 || condition for 4).
>
> Of course it is worthwhile to also sort single-occurances first
> so your case gets just the condiiton for edge 5 and its inversion
> used for edges 3 and 4 combined.
>
>> 2. Why we need to introduce find_insertion_point?
>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>> only critical incoming edges and both contain code computing edge
>> predicates, e.g.
>>
>> <bb 7>:
>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>> _46 = xmax_17 == xmax_37;
>> _47 = xmax_17 == xmax_27;
>> _48 = _46 & _47;
>> _53 = xmax_17 == xmax_37;
>> _54 = ~_53;
>> _55 = xmax_17 == xmax_27;
>> _56 = _54 & _55;
>> _57 = _48 | _56;
>> xmax_edge_19 = xmax_edge_39 + 1;
>> goto <bb 11>;
>>
>> It is evident that we can not put phi predication at the block
>> beginning but need to put it after predicate computations.
>> Note also that if there are no critical edges for phi arguments
>> insertion point will be "after labels" Note also that phi result can
>> have use in this block too, so we can't put predication code to the
>> block end.
>
> So the issue is that predicate insertion for edge predicates does
> not happen on the edge but somewhere else (generally impossible
> for critical edges unless you split them).
>
> I think I've told you before that I prefer simple solutions to such issues,
> like splitting the edge!  Certainly not involving a function walking
> GENERIC expressions.
>
> Thanks,
> Richard.
>
>> Let me know if you still have any questions.
>>
>> Best regards.
>> Yuri.
>>
>>
>>
>>
>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is the second patch related to extended predication.
>>>> Few comments which explain a main goal of design.
>>>>
>>>> 1. I don't want to insert any critical edge splitting since it may
>>>> lead to less efficient binaries.
>>>> 2. One special case of extended PHI node predication was introduced
>>>> when #arguments is more than 2 but only two arguments are different
>>>> and one argument has the only occurrence. For such PHI conditional
>>>> scalar reduction is applied.
>>>> This is correspondent to the following statement:
>>>>     if (q1 && q2 && q3) var++
>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>> 3. Original algorithm for PHI predication used assumption that at
>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>> guarantees that all computations related to predicate of normal edge
>>>> are already inserted above this block and
>>>> code related to PHI predication can be inserted at the beginning of
>>>> block. But this is not true for critical edges for which predicate
>>>> computations are  in the block where code for phi predication must be
>>>> inserted. So new function find_insertion_point is introduced which is
>>>> simply found out the last statement in block defining predicates
>>>> correspondent to all incoming edges and insert phi predication code
>>>> after it (with some minor exceptions).
>>>
>>> Unfortunately the patch doesn't apply for me - I get
>>>
>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>> predicate_all_scalar_phis (struct loop *loop)
>>>
>>> a few remarks nevertheless.  I don't see how we need both
>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>> and handle equal values specially in predicate_extended_scalar_phi?
>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>
>>> I don't understand the need for find_insertion_point.  All SSA names
>>> required for the predicates are defined upward - and the complex CFG
>>> is squashed to a single basic-block, thus the defs will dominate the
>>> inserted code if you insert after labels just like for the other case.
>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>> to happen in dominator order - but I guess that happens already?)
>>>
>>> I'd like the extended PHI handling to be enablable by a flag even
>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>> times that would have been nice to vectorize.  I suggest to
>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>> followup, but please rename the local flag_force_vectorize flag
>>> to something less looking like a flag, like simply 'aggressive'.
>>>
>>> Otherwise patch 2 looks ok to me.
>>>
>>> Richard.
>>>
>>>
>>>> ChangeLog:
>>>>
>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>> FLAG_FORCE_VECTORIZE is true.
>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>> non-critical incoming edge.
>>>> (phi_has_two_different_args): New function.
>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>> arguments if EXTENDED is true. Change check that block
>>>> containing reduction statement candidate is predecessor
>>>> of phi-block since phi may have more than two arguments.
>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>> statement before/after gsi point.
>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>> true (which correspondent to argument BEFORE) to call of
>>>> convert_scalar_cond_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (predicate_arbitrary_scalar_phi): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (find_insertion_point): New function.
>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>> EXTENDED value.
>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>> may have statements to insert. Insert predicate of BB just after label
>>>> if FLAG_FORCE_VECTORIZE is true.
>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-02 15:29       ` Yuri Rumyantsev
@ 2014-12-04 12:41         ` Richard Biener
  2014-12-04 13:15           ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-04 12:41 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Thanks Richard for your quick reply!
>
> 1. I agree that we can combine predicate_extended_ and
> predicate_arbitrary_ to one function as you proposed.
> 2. What is your opinion about using more simple decision about
> insertion point - if bb has use of phi result insert phi predication
> before it and at the bb end otherwise. I assume that critical edge
> splitting is not a good decision.

Why not always insert before the use?  Which would be after labels,
what we do for two-arg PHIs.  That is, how can it be that you predicate
a PHI in BB1 and then for an edge predicate on one of its incoming
edges you get SSA uses with defs that are in BB1 itself?  That
can only happen for backedges but those you can't remove in any case.

Richard.

>
> Best regards.
> Yuri.
>
> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> I resend you patch1 and patch2 with minor changes:
>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>> I also very sorry that I sent you bad patch.
>>>
>>> Now let me answer on your questions related to second patch.
>>> 1. Why we need both predicate_extended_scalar_phi and
>>> predicate_arbitrary_scalar_phi?
>>>
>>> Let's consider the following simple test-case:
>>>
>>>   #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)  /* c is integer array. */
>>> res += 1;
>>>   }
>>>
>>> we can see the following phi node correspondent to res:
>>>
>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>
>>> It is clear that we can optimize it to phi node with 2 arguments only
>>> and only one check can be used for phi predication (for reduction in
>>> our case), namely predicate of bb_5. In general case we can't do it
>>> even if we sort all phi argument values since we still have to produce
>>> a chain of cond expressions to perform phi predication (see comments
>>> for predicate_arbitrary_scalar_phi).
>>
>> How so?  We can always use !(condition) for the "last" value, thus
>> treat it as an 'else' case.  That even works for
>>
>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>
>> where the condition for edges 5 and 7 can be computed as
>> ! (condition for 3 || condition for 4).
>>
>> Of course it is worthwhile to also sort single-occurances first
>> so your case gets just the condiiton for edge 5 and its inversion
>> used for edges 3 and 4 combined.
>>
>>> 2. Why we need to introduce find_insertion_point?
>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>> only critical incoming edges and both contain code computing edge
>>> predicates, e.g.
>>>
>>> <bb 7>:
>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>> _46 = xmax_17 == xmax_37;
>>> _47 = xmax_17 == xmax_27;
>>> _48 = _46 & _47;
>>> _53 = xmax_17 == xmax_37;
>>> _54 = ~_53;
>>> _55 = xmax_17 == xmax_27;
>>> _56 = _54 & _55;
>>> _57 = _48 | _56;
>>> xmax_edge_19 = xmax_edge_39 + 1;
>>> goto <bb 11>;
>>>
>>> It is evident that we can not put phi predication at the block
>>> beginning but need to put it after predicate computations.
>>> Note also that if there are no critical edges for phi arguments
>>> insertion point will be "after labels" Note also that phi result can
>>> have use in this block too, so we can't put predication code to the
>>> block end.
>>
>> So the issue is that predicate insertion for edge predicates does
>> not happen on the edge but somewhere else (generally impossible
>> for critical edges unless you split them).
>>
>> I think I've told you before that I prefer simple solutions to such issues,
>> like splitting the edge!  Certainly not involving a function walking
>> GENERIC expressions.
>>
>> Thanks,
>> Richard.
>>
>>> Let me know if you still have any questions.
>>>
>>> Best regards.
>>> Yuri.
>>>
>>>
>>>
>>>
>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> Here is the second patch related to extended predication.
>>>>> Few comments which explain a main goal of design.
>>>>>
>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>> lead to less efficient binaries.
>>>>> 2. One special case of extended PHI node predication was introduced
>>>>> when #arguments is more than 2 but only two arguments are different
>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>> scalar reduction is applied.
>>>>> This is correspondent to the following statement:
>>>>>     if (q1 && q2 && q3) var++
>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>> guarantees that all computations related to predicate of normal edge
>>>>> are already inserted above this block and
>>>>> code related to PHI predication can be inserted at the beginning of
>>>>> block. But this is not true for critical edges for which predicate
>>>>> computations are  in the block where code for phi predication must be
>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>> simply found out the last statement in block defining predicates
>>>>> correspondent to all incoming edges and insert phi predication code
>>>>> after it (with some minor exceptions).
>>>>
>>>> Unfortunately the patch doesn't apply for me - I get
>>>>
>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>
>>>> a few remarks nevertheless.  I don't see how we need both
>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>
>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>> required for the predicates are defined upward - and the complex CFG
>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>> inserted code if you insert after labels just like for the other case.
>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>> to happen in dominator order - but I guess that happens already?)
>>>>
>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>> times that would have been nice to vectorize.  I suggest to
>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>> followup, but please rename the local flag_force_vectorize flag
>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>
>>>> Otherwise patch 2 looks ok to me.
>>>>
>>>> Richard.
>>>>
>>>>
>>>>> ChangeLog:
>>>>>
>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>> non-critical incoming edge.
>>>>> (phi_has_two_different_args): New function.
>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>> arguments if EXTENDED is true. Change check that block
>>>>> containing reduction statement candidate is predecessor
>>>>> of phi-block since phi may have more than two arguments.
>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>> statement before/after gsi point.
>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>> true (which correspondent to argument BEFORE) to call of
>>>>> convert_scalar_cond_reduction.
>>>>> (get_predicate_for_edge): New function.
>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>> (predicate_extended_scalar_phi): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>> EXTENDED value.
>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-04 12:41         ` Richard Biener
@ 2014-12-04 13:15           ` Yuri Rumyantsev
  2014-12-04 13:37             ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-04 13:15 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

I did simple change by saving gsi iterator for each bb that has
critical edges by adding additional field to bb_predicate_s:

typedef struct bb_predicate_s {

  /* The condition under which this basic block is executed.  */
  tree predicate;

  /* PREDICATE is gimplified, and the sequence of statements is
     recorded here, in order to avoid the duplication of computations
     that occur in previous conditions.  See PR44483.  */
  gimple_seq predicate_gimplified_stmts;

  /* Insertion point for blocks having incoming critical edges.  */
  gimple_stmt_iterator gsi;
} *bb_predicate_p;

and this iterator is saved in  insert_gimplified_predicates before
insertion code for predicate computation. I checked that this fix
works.

Now I am implementing merging of predicate_extended.. and
predicate_arbitrary.. functions as you proposed.

Best regards.
Yuri.

2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard for your quick reply!
>>
>> 1. I agree that we can combine predicate_extended_ and
>> predicate_arbitrary_ to one function as you proposed.
>> 2. What is your opinion about using more simple decision about
>> insertion point - if bb has use of phi result insert phi predication
>> before it and at the bb end otherwise. I assume that critical edge
>> splitting is not a good decision.
>
> Why not always insert before the use?  Which would be after labels,
> what we do for two-arg PHIs.  That is, how can it be that you predicate
> a PHI in BB1 and then for an edge predicate on one of its incoming
> edges you get SSA uses with defs that are in BB1 itself?  That
> can only happen for backedges but those you can't remove in any case.
>
> Richard.
>
>>
>> Best regards.
>> Yuri.
>>
>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> I resend you patch1 and patch2 with minor changes:
>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>> I also very sorry that I sent you bad patch.
>>>>
>>>> Now let me answer on your questions related to second patch.
>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>> predicate_arbitrary_scalar_phi?
>>>>
>>>> Let's consider the following simple test-case:
>>>>
>>>>   #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>       if (c[i] != 0)  /* c is integer array. */
>>>> res += 1;
>>>>   }
>>>>
>>>> we can see the following phi node correspondent to res:
>>>>
>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>
>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>> and only one check can be used for phi predication (for reduction in
>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>> even if we sort all phi argument values since we still have to produce
>>>> a chain of cond expressions to perform phi predication (see comments
>>>> for predicate_arbitrary_scalar_phi).
>>>
>>> How so?  We can always use !(condition) for the "last" value, thus
>>> treat it as an 'else' case.  That even works for
>>>
>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>
>>> where the condition for edges 5 and 7 can be computed as
>>> ! (condition for 3 || condition for 4).
>>>
>>> Of course it is worthwhile to also sort single-occurances first
>>> so your case gets just the condiiton for edge 5 and its inversion
>>> used for edges 3 and 4 combined.
>>>
>>>> 2. Why we need to introduce find_insertion_point?
>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>> only critical incoming edges and both contain code computing edge
>>>> predicates, e.g.
>>>>
>>>> <bb 7>:
>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>> _46 = xmax_17 == xmax_37;
>>>> _47 = xmax_17 == xmax_27;
>>>> _48 = _46 & _47;
>>>> _53 = xmax_17 == xmax_37;
>>>> _54 = ~_53;
>>>> _55 = xmax_17 == xmax_27;
>>>> _56 = _54 & _55;
>>>> _57 = _48 | _56;
>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>> goto <bb 11>;
>>>>
>>>> It is evident that we can not put phi predication at the block
>>>> beginning but need to put it after predicate computations.
>>>> Note also that if there are no critical edges for phi arguments
>>>> insertion point will be "after labels" Note also that phi result can
>>>> have use in this block too, so we can't put predication code to the
>>>> block end.
>>>
>>> So the issue is that predicate insertion for edge predicates does
>>> not happen on the edge but somewhere else (generally impossible
>>> for critical edges unless you split them).
>>>
>>> I think I've told you before that I prefer simple solutions to such issues,
>>> like splitting the edge!  Certainly not involving a function walking
>>> GENERIC expressions.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Let me know if you still have any questions.
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Here is the second patch related to extended predication.
>>>>>> Few comments which explain a main goal of design.
>>>>>>
>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>> lead to less efficient binaries.
>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>> scalar reduction is applied.
>>>>>> This is correspondent to the following statement:
>>>>>>     if (q1 && q2 && q3) var++
>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>> are already inserted above this block and
>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>> block. But this is not true for critical edges for which predicate
>>>>>> computations are  in the block where code for phi predication must be
>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>> simply found out the last statement in block defining predicates
>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>> after it (with some minor exceptions).
>>>>>
>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>
>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>
>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>
>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>> required for the predicates are defined upward - and the complex CFG
>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>> inserted code if you insert after labels just like for the other case.
>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>
>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>> times that would have been nice to vectorize.  I suggest to
>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>
>>>>> Otherwise patch 2 looks ok to me.
>>>>>
>>>>> Richard.
>>>>>
>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>> non-critical incoming edge.
>>>>>> (phi_has_two_different_args): New function.
>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>> containing reduction statement candidate is predecessor
>>>>>> of phi-block since phi may have more than two arguments.
>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>> statement before/after gsi point.
>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>> convert_scalar_cond_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>> EXTENDED value.
>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-04 13:15           ` Yuri Rumyantsev
@ 2014-12-04 13:37             ` Richard Biener
  2014-12-09 13:11               ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-04 13:37 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did simple change by saving gsi iterator for each bb that has
> critical edges by adding additional field to bb_predicate_s:
>
> typedef struct bb_predicate_s {
>
>   /* The condition under which this basic block is executed.  */
>   tree predicate;
>
>   /* PREDICATE is gimplified, and the sequence of statements is
>      recorded here, in order to avoid the duplication of computations
>      that occur in previous conditions.  See PR44483.  */
>   gimple_seq predicate_gimplified_stmts;
>
>   /* Insertion point for blocks having incoming critical edges.  */
>   gimple_stmt_iterator gsi;
> } *bb_predicate_p;
>
> and this iterator is saved in  insert_gimplified_predicates before
> insertion code for predicate computation. I checked that this fix
> works.

Huh?  I still wonder what the issue is with inserting everything
after the PHI we predicate.

Well, your updated patch will come with testcases for the testsuite
that will hopefully fail if doing that.

Richard.

>
> Now I am implementing merging of predicate_extended.. and
> predicate_arbitrary.. functions as you proposed.
>
> Best regards.
> Yuri.
>
> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks Richard for your quick reply!
>>>
>>> 1. I agree that we can combine predicate_extended_ and
>>> predicate_arbitrary_ to one function as you proposed.
>>> 2. What is your opinion about using more simple decision about
>>> insertion point - if bb has use of phi result insert phi predication
>>> before it and at the bb end otherwise. I assume that critical edge
>>> splitting is not a good decision.
>>
>> Why not always insert before the use?  Which would be after labels,
>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>> a PHI in BB1 and then for an edge predicate on one of its incoming
>> edges you get SSA uses with defs that are in BB1 itself?  That
>> can only happen for backedges but those you can't remove in any case.
>>
>> Richard.
>>
>>>
>>> Best regards.
>>> Yuri.
>>>
>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> I resend you patch1 and patch2 with minor changes:
>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>> I also very sorry that I sent you bad patch.
>>>>>
>>>>> Now let me answer on your questions related to second patch.
>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>> predicate_arbitrary_scalar_phi?
>>>>>
>>>>> Let's consider the following simple test-case:
>>>>>
>>>>>   #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>> res += 1;
>>>>>   }
>>>>>
>>>>> we can see the following phi node correspondent to res:
>>>>>
>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>
>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>> and only one check can be used for phi predication (for reduction in
>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>> even if we sort all phi argument values since we still have to produce
>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>> for predicate_arbitrary_scalar_phi).
>>>>
>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>> treat it as an 'else' case.  That even works for
>>>>
>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>
>>>> where the condition for edges 5 and 7 can be computed as
>>>> ! (condition for 3 || condition for 4).
>>>>
>>>> Of course it is worthwhile to also sort single-occurances first
>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>> used for edges 3 and 4 combined.
>>>>
>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>> only critical incoming edges and both contain code computing edge
>>>>> predicates, e.g.
>>>>>
>>>>> <bb 7>:
>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>> _46 = xmax_17 == xmax_37;
>>>>> _47 = xmax_17 == xmax_27;
>>>>> _48 = _46 & _47;
>>>>> _53 = xmax_17 == xmax_37;
>>>>> _54 = ~_53;
>>>>> _55 = xmax_17 == xmax_27;
>>>>> _56 = _54 & _55;
>>>>> _57 = _48 | _56;
>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>> goto <bb 11>;
>>>>>
>>>>> It is evident that we can not put phi predication at the block
>>>>> beginning but need to put it after predicate computations.
>>>>> Note also that if there are no critical edges for phi arguments
>>>>> insertion point will be "after labels" Note also that phi result can
>>>>> have use in this block too, so we can't put predication code to the
>>>>> block end.
>>>>
>>>> So the issue is that predicate insertion for edge predicates does
>>>> not happen on the edge but somewhere else (generally impossible
>>>> for critical edges unless you split them).
>>>>
>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>> like splitting the edge!  Certainly not involving a function walking
>>>> GENERIC expressions.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Let me know if you still have any questions.
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Here is the second patch related to extended predication.
>>>>>>> Few comments which explain a main goal of design.
>>>>>>>
>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>> lead to less efficient binaries.
>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>> scalar reduction is applied.
>>>>>>> This is correspondent to the following statement:
>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>> are already inserted above this block and
>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>> simply found out the last statement in block defining predicates
>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>> after it (with some minor exceptions).
>>>>>>
>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>
>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>
>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>
>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>
>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>
>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>
>>>>>>> ChangeLog:
>>>>>>>
>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>> non-critical incoming edge.
>>>>>>> (phi_has_two_different_args): New function.
>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>> containing reduction statement candidate is predecessor
>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>> statement before/after gsi point.
>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>> convert_scalar_cond_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (find_insertion_point): New function.
>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>> EXTENDED value.
>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-04 13:37             ` Richard Biener
@ 2014-12-09 13:11               ` Yuri Rumyantsev
  2014-12-09 15:21                 ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-09 13:11 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 15260 bytes --]

Richard,

Here is updated patch2 with the following changes:
1. Delete functions  phi_has_two_different_args and find_insertion_point.
2. Use only one function for extended predication -
predicate_extended_scalar_phi.
3. Save gsi before insertion of predicate computations for basic
blocks if it has 2 predecessors and
both incoming edges are critical or it gas more than 2 predecessors
and at least one incoming edge
is critical. This saved iterator can be used by extended phi predication.

Here is motivated test-case which explains this point.
Test-case is attached (t5.c) and it must be compiled with -O2
-ftree-loop-vectorize -fopenmp options.
The problem phi is in bb-7:

  bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
  {
    <bb 5>:
    xmax_edge_18 = xmax_edge_36 + 1;
    if (xmax_17 == xmax_27)
      goto <bb 7>;
    else
      goto <bb 9>;

  }
  bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
  {
    <bb 6>:
    if (xmax_17 == xmax_27)
      goto <bb 7>;
    else
      goto <bb 8>;

  }
  bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
  {
    <bb 7>:
    # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
    xmax_edge_19 = xmax_edge_39 + 1;
    goto <bb 11>;

  }

Note that both incoming edges to bb_7 are critical. If we comment out
restoring gsi in predicate_all_scalar_phi:
#if 0
 if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
     || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
   gsi = bb_insert_point (bb);
 else
#endif
   gsi = gsi_after_labels (bb);

we will get ICE:
t5.c: In function 'foo':
t5.c:9:6: error: definition in block 4 follows the use
 void foo (int n)
      ^
for SSA_NAME: _1 in statement:
_52 = _1 & _3;
t5.c:9:6: internal compiler error: verify_ssa failed

smce predicate computations were inserted in bb_7.

ChangeLog is

2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c : Include hash-map.h.
(struct bb_predicate_s): Add new field to save copy of gimple
statement iterator.
(bb_insert_point): New function.
(set_bb_insert_point): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Allow bb has more than 2 predecessors if
AGGRESSIVE_IF_CONV is true.
(if_convertible_bb_p): Delete check that bb has at least one
non-critical incoming edge.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(predicate_scalar_phi): Add new arguments for call of
is_cond_scalar_reduction.
(get_predicate_for_edge): New function.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_extended_scalar_phi): New function.
(predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
to true if BB containing phi has more than 2 predecessors or both
incoming edges are critical. Invoke find_phi_replacement_condition and
predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
has 2 predecessors and both incoming edges are critical or it has more
than 2 predecessors and atleast one incoming edge is critical.
Use standard gsi_after_labels otherwise.
Invoke predicate_extended_scalar_phi if EXTENDED is true.
(insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
to save gsi before insertion of predicate computations. SEt-up it to
true for BB with 2 predecessors and critical incoming edges either
        number of predecessors is geater 2 and at least one incoming edge is
critical.
Add check that non-predicated block may have statements to insert.
Insert predicate computation of BB just after label if
EXTENDED_PREDICATION is true.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field.




2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I did simple change by saving gsi iterator for each bb that has
>> critical edges by adding additional field to bb_predicate_s:
>>
>> typedef struct bb_predicate_s {
>>
>>   /* The condition under which this basic block is executed.  */
>>   tree predicate;
>>
>>   /* PREDICATE is gimplified, and the sequence of statements is
>>      recorded here, in order to avoid the duplication of computations
>>      that occur in previous conditions.  See PR44483.  */
>>   gimple_seq predicate_gimplified_stmts;
>>
>>   /* Insertion point for blocks having incoming critical edges.  */
>>   gimple_stmt_iterator gsi;
>> } *bb_predicate_p;
>>
>> and this iterator is saved in  insert_gimplified_predicates before
>> insertion code for predicate computation. I checked that this fix
>> works.
>
> Huh?  I still wonder what the issue is with inserting everything
> after the PHI we predicate.
>
> Well, your updated patch will come with testcases for the testsuite
> that will hopefully fail if doing that.
>
> Richard.
>
>>
>> Now I am implementing merging of predicate_extended.. and
>> predicate_arbitrary.. functions as you proposed.
>>
>> Best regards.
>> Yuri.
>>
>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Thanks Richard for your quick reply!
>>>>
>>>> 1. I agree that we can combine predicate_extended_ and
>>>> predicate_arbitrary_ to one function as you proposed.
>>>> 2. What is your opinion about using more simple decision about
>>>> insertion point - if bb has use of phi result insert phi predication
>>>> before it and at the bb end otherwise. I assume that critical edge
>>>> splitting is not a good decision.
>>>
>>> Why not always insert before the use?  Which would be after labels,
>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>> can only happen for backedges but those you can't remove in any case.
>>>
>>> Richard.
>>>
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>> I also very sorry that I sent you bad patch.
>>>>>>
>>>>>> Now let me answer on your questions related to second patch.
>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>
>>>>>> Let's consider the following simple test-case:
>>>>>>
>>>>>>   #pragma omp simd safelen(8)
>>>>>>   for (i=0; i<512; i++)
>>>>>>   {
>>>>>>     float t = a[i];
>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>> res += 1;
>>>>>>   }
>>>>>>
>>>>>> we can see the following phi node correspondent to res:
>>>>>>
>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>
>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>
>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>> treat it as an 'else' case.  That even works for
>>>>>
>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>
>>>>> where the condition for edges 5 and 7 can be computed as
>>>>> ! (condition for 3 || condition for 4).
>>>>>
>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>> used for edges 3 and 4 combined.
>>>>>
>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>> only critical incoming edges and both contain code computing edge
>>>>>> predicates, e.g.
>>>>>>
>>>>>> <bb 7>:
>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>> _46 = xmax_17 == xmax_37;
>>>>>> _47 = xmax_17 == xmax_27;
>>>>>> _48 = _46 & _47;
>>>>>> _53 = xmax_17 == xmax_37;
>>>>>> _54 = ~_53;
>>>>>> _55 = xmax_17 == xmax_27;
>>>>>> _56 = _54 & _55;
>>>>>> _57 = _48 | _56;
>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>> goto <bb 11>;
>>>>>>
>>>>>> It is evident that we can not put phi predication at the block
>>>>>> beginning but need to put it after predicate computations.
>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>> have use in this block too, so we can't put predication code to the
>>>>>> block end.
>>>>>
>>>>> So the issue is that predicate insertion for edge predicates does
>>>>> not happen on the edge but somewhere else (generally impossible
>>>>> for critical edges unless you split them).
>>>>>
>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>> GENERIC expressions.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> Let me know if you still have any questions.
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>
>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>> lead to less efficient binaries.
>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>> scalar reduction is applied.
>>>>>>>> This is correspondent to the following statement:
>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>> are already inserted above this block and
>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>> after it (with some minor exceptions).
>>>>>>>
>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>
>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>
>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>
>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>
>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>
>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>
>>>>>>>> ChangeLog:
>>>>>>>>
>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>> non-critical incoming edge.
>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>> statement before/after gsi point.
>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>> EXTENDED value.
>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: t5.c --]
[-- Type: text/x-csrc, Size: 713 bytes --]

#define N 512
#define max(x,y) (x) >= (y)? (x) : (y)
#define min(x,y) (x) <= (y)? (x) : (y)
int c_X[N];
int x_max;
int x_min;
extern int nx;

void foo (int n)
{
  int i, x;
  int xmin, xmax;
  int xmin_edge, xmax_edge;

  x = c_X[0];
  xmin = xmax = x;
  xmin_edge = xmax_edge  = 1;
#pragma omp simd safelen(8)
  for (i = 1; i<n; i++) {
    x = c_X[i];
    x = max(min(x,nx),1);
    if (x == xmin) {  
       xmin_edge++;
    }
    if (x == xmax) {
       xmax_edge++;
    }
    else if (x < xmin) {
       xmin = x;
       xmin_edge = 1;
    }
    else if (x > xmax) {
       xmax = x;
       xmax_edge = 1;
    }

  }
    x_max = xmax_edge;
    x_min = xmin_edge;
}

   

[-- Attachment #3: if-conv.patch2.2.1 --]
[-- Type: application/octet-stream, Size: 14984 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 6c9ad32..59ea142 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -127,6 +127,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "insn-codes.h"
 #include "optabs.h"
+#include "hash-map.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -145,6 +146,9 @@ typedef struct bb_predicate_s {
      recorded here, in order to avoid the duplication of computations
      that occur in previous conditions.  See PR44483.  */
   gimple_seq predicate_gimplified_stmts;
+
+  /* Insertion point for blocks having incoming critical edges.  */
+  gimple_stmt_iterator gsi;
 } *bb_predicate_p;
 
 /* Returns true when the basic block BB has a predicate.  */
@@ -223,6 +227,18 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
     (&(((bb_predicate_p) bb->aux)->predicate_gimplified_stmts), stmts);
 }
 
+static inline gimple_stmt_iterator
+bb_insert_point (basic_block bb)
+{
+  return ((bb_predicate_p) bb->aux)->gsi;
+}
+
+static inline void
+set_bb_insert_point (basic_block bb, gimple_stmt_iterator gsi)
+{
+  ((bb_predicate_p) bb->aux)->gsi = gsi;
+}
+
 /* Initializes to TRUE the predicate of basic block BB.  */
 
 static inline void
@@ -997,6 +1013,19 @@ all_preds_critical_p (basic_block bb)
   return true;
 }
 
+/* Returns true if at least one successor in on critical edge.  */
+static inline bool
+has_pred_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) > 1)
+      return true;
+  return false;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -1020,10 +1049,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1057,23 +1091,6 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 	return false;
       }
 
-  /* At least one incoming edge has to be non-critical as otherwise edge
-     predicates are not equal to basic-block predicates of the edge
-     source. This restriction will be removed after adding support for
-     extended predication.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      if (!aggressive_if_conv && all_preds_critical_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
-		      bb->index);
-
-	  return false;
-	}
-    }
-
   return true;
 }
 
@@ -1487,14 +1504,15 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   ARG_0 and ARG_1 are correspondent PHI arguments.
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is true if PHI has > 2 arguments.  */
 
 static bool
-is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+is_cond_scalar_reduction (gimple phi, gimple *reduc, tree arg_0, tree arg_1,
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
-  tree arg_0, arg_1;
   gimple stmt;
   gimple header_phi = NULL;
   enum tree_code reduction_op;
@@ -1503,13 +1521,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  edge e;
+  edge_iterator ei;
+  bool result = false;
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
-  if (gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
+  if (!extended && gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
     {
       lhs = arg_1;
       header_phi = SSA_NAME_DEF_STMT (arg_0);
@@ -1540,8 +1558,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1694,7 +1717,8 @@ predicate_scalar_phi (gphi *phi, tree cond,
 	  arg_0 = gimple_phi_arg_def (phi, 0);
 	  arg_1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, arg_0, arg_1,
+				    &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
@@ -1715,6 +1739,211 @@ predicate_scalar_phi (gphi *phi, tree cond,
     }
 }
 
+/* Returns predicate of edge associated with argument of phi node.  */
+
+static tree
+get_predicate_for_edge (edge e)
+{
+  tree c;
+  basic_block b = e->src;
+
+  if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
+    /* Edge E is not critical, use predicate of edge source bb.  */
+    c = bb_predicate (b);
+  else
+    /* Edge E is critical and its aux field contains predicate.  */
+    c = edge_predicate (e);
+  return c;
+}
+
+/* Helpers for PHI arguments hashtable map.  */
+
+struct phi_args_hash_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (tree);
+  static inline bool equal_keys (tree, tree);
+};
+
+inline hashval_t
+phi_args_hash_traits::hash (tree value)
+{
+  return iterative_hash_expr (value, 0);
+}
+
+inline bool
+phi_args_hash_traits::equal_keys (tree value1, tree value2)
+{
+  return operand_equal_p (value1, value2, 0);
+}
+
+  /* Produce condition for all occurrences of ARG in PHI node.  */
+
+static tree
+gen_phi_arg_condition (gimple phi, vec<int> *occur,
+		       gimple_stmt_iterator *gsi)
+{
+  int len;
+  int i;
+  tree cond = NULL_TREE;
+  tree c;
+  edge e;
+
+  len = occur->length ();
+  gcc_assert (len > 0);
+  for (i = 0; i < len; i++)
+    {
+      e = gimple_phi_arg_edge (as_a <gphi *> (phi), (*occur)[i]);
+      c = get_predicate_for_edge (e);
+      if (is_true_predicate (c))
+	continue;
+      c = force_gimple_operand_gsi_1 (gsi, unshare_expr (c),
+				      is_gimple_condexpr, NULL_TREE,
+				      true, GSI_SAME_STMT);
+      if (cond != NULL_TREE)
+	{
+	  /* Must build OR expression.  */
+	  cond = fold_or_predicates (EXPR_LOCATION (c), c, cond);
+	  cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					     is_gimple_condexpr, NULL_TREE,
+					     true, GSI_SAME_STMT);
+	}
+      else
+	cond = c;
+    }
+  gcc_assert (cond != NULL_TREE);
+  return cond;
+}
+
+/* Perform predication of PHI node if number arguments is greater then 2 or
+   both phi arguments are on critical edges.  */
+
+static void
+predicate_extended_scalar_phi (gimple phi, gimple_stmt_iterator *gsi)
+{
+  gimple new_stmt = NULL, reduc;
+  tree rhs, res, arg0, arg1, op0, op1;
+  tree cond;
+  unsigned int index0;
+  unsigned int max_ind, max, args_len;
+  edge e;
+  unsigned int i;
+
+  res = gimple_phi_result (phi);
+  if (virtual_operand_p (res))
+    return;
+
+  /* Create hashmap for PHI node which contain vector of argument indexes
+     having the same value.  */
+  bool swap = false;
+  hash_map<tree, auto_vec<int>, phi_args_hash_traits> phi_arg_map;
+  unsigned int num_args = gimple_phi_num_args (phi);
+  /* Vector of different PHI argument values.  */
+  auto_vec<tree> args (num_args);
+
+  /* Compute phi_arg_map.  */
+  for (i = 0; i < num_args; i++)
+    {
+      tree arg;
+
+      arg = gimple_phi_arg_def (phi, i);
+      if (!phi_arg_map.get (arg))
+	args.quick_push (arg);
+      phi_arg_map.get_or_insert (arg).safe_push (i);
+    }
+
+  /* Determine element with max number of occurrences.  */
+  max_ind = 0;
+  max = phi_arg_map.get (args[0])->length ();
+  max_ind = 0;
+  args_len = args.length ();
+  for (i = 1; i < args_len; i++)
+    {
+      unsigned int len;
+      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+	{
+	  max_ind = i;
+	  max = len;
+	}
+    }
+
+  /* Put element with max number of occurences to the end of ARGS.  */
+  if (max_ind + 1 != args_len)
+    {
+      tree tmp = args[args_len - 1];
+      args[args_len - 1] = args[max_ind];
+      args[max_ind] = tmp;
+    }
+
+  /* Handle one special case when number of arguments with different values
+     is equal 2 and one argument has the only occurrence. Such PHI can be
+     handled as if would have only 2 arguments.  */
+  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+    {
+      vec<int> *indexes;
+      indexes = phi_arg_map.get (args[0]);
+      index0 = (*indexes)[0];
+      arg0 = args[0];
+      arg1 = args[1];
+      e = gimple_phi_arg_edge (as_a <gphi *> (phi), index0);
+      cond = get_predicate_for_edge (e);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      if (!(is_cond_scalar_reduction (phi, &reduc, arg0 , arg1,
+				      &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+    }
+  else
+    {
+      /* Common case.  */
+      vec<int> *indexes;
+      tree type = TREE_TYPE (gimple_phi_result (phi));
+      tree lhs;
+      fprintf (stderr, "Predicate extended phi:\n");
+      print_gimple_stmt (stderr, phi, 0, 1);
+      arg0 = args[0];
+      for (i = 0; i < args_len - 1; i++)
+	{
+	  arg1 = args[i + 1];
+	  indexes = phi_arg_map.get (args[i]);
+	  if (i != args_len - 2)
+	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+	  else
+	    lhs = res;
+	  cond = gen_phi_arg_condition (phi, indexes, gsi);
+	  rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				      arg0, arg1);
+	  new_stmt = gimple_build_assign (lhs, rhs);
+	  fprintf(stderr, "Create new stmt with args_len=%d:\n", args_len);
+	  print_gimple_stmt (stderr, new_stmt, 0, 1);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (new_stmt);
+	  arg0 = lhs;
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "new extended phi replacement stmt\n");
+      print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+    }
+}
+
 /* Replaces in LOOP all the scalar phi nodes other than those in the
    LOOP->header block with conditional modify expressions.  */
 
@@ -1724,6 +1953,7 @@ predicate_all_scalar_phis (struct loop *loop)
   basic_block bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
   unsigned int i;
+  bool extended;
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1741,15 +1971,31 @@ predicate_all_scalar_phis (struct loop *loop)
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
-      gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
-
+      /* If BB has more than 2 predecessors or all incoming edges to bb
+	 are critical, must handle PHI through extended predication.  */
+      extended = EDGE_COUNT (bb->preds) != 2 || all_preds_critical_p (bb);
+      if (!extended)
+	{
+	  /* BB has two predecessors.  Using predecessor's aux field, set
+	     appropriate condition for the PHI node replacement.  */
+	  gsi = gsi_after_labels (bb);
+	  true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
+	}
+      else
+	{
+	  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
+	      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
+	    gsi = bb_insert_point (bb);
+	  else
+	    gsi = gsi_after_labels (bb);
+	}
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  if (!extended)
+	    predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  else
+	    predicate_extended_scalar_phi (phi, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1765,13 +2011,24 @@ static void
 insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 {
   unsigned int i;
+  bool extended_predication;
 
   for (i = 0; i < loop->num_nodes; i++)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
 
-      if (!is_predicated (bb))
+      extended_predication = false;
+      if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
+	  || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
+	{
+	  /* Save insertion point for extended PHI predication.  */
+	  set_bb_insert_point (bb, gsi_after_labels (bb));
+	  extended_predication = true;
+	}
+
+      /* Non-predicated join blocks can have the statements to insert.  */
+      if (!is_predicated (bb) && bb_predicate_gimplified_stmts (bb) == NULL)
 	{
 	  /* Do not insert statements for a basic block that is not
 	     predicated.  Also make sure that the predicate of the
@@ -1784,7 +2041,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
       if (stmts)
 	{
 	  if (flag_tree_loop_if_convert_stores
-	      || any_mask_load_store)
+	      || any_mask_load_store
+	      || extended_predication)
 	    {
 	      /* Insert the predicate of the BB just after the label,
 		 as the if-conversion of memory writes will use this
@@ -2205,8 +2463,15 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Temporary set up this flag to false.  */
-  aggressive_if_conv = false;
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
 
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-09 13:11               ` Yuri Rumyantsev
@ 2014-12-09 15:21                 ` Richard Biener
  2014-12-10 10:54                   ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-09 15:21 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Here is updated patch2 with the following changes:
> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
> 2. Use only one function for extended predication -
> predicate_extended_scalar_phi.
> 3. Save gsi before insertion of predicate computations for basic
> blocks if it has 2 predecessors and
> both incoming edges are critical or it gas more than 2 predecessors
> and at least one incoming edge
> is critical. This saved iterator can be used by extended phi predication.
>
> Here is motivated test-case which explains this point.
> Test-case is attached (t5.c) and it must be compiled with -O2
> -ftree-loop-vectorize -fopenmp options.
> The problem phi is in bb-7:
>
>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>   {
>     <bb 5>:
>     xmax_edge_18 = xmax_edge_36 + 1;
>     if (xmax_17 == xmax_27)
>       goto <bb 7>;
>     else
>       goto <bb 9>;
>
>   }
>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>   {
>     <bb 6>:
>     if (xmax_17 == xmax_27)
>       goto <bb 7>;
>     else
>       goto <bb 8>;
>
>   }
>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>   {
>     <bb 7>:
>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>     xmax_edge_19 = xmax_edge_39 + 1;
>     goto <bb 11>;
>
>   }
>
> Note that both incoming edges to bb_7 are critical. If we comment out
> restoring gsi in predicate_all_scalar_phi:
> #if 0
>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>    gsi = bb_insert_point (bb);
>  else
> #endif
>    gsi = gsi_after_labels (bb);
>
> we will get ICE:
> t5.c: In function 'foo':
> t5.c:9:6: error: definition in block 4 follows the use
>  void foo (int n)
>       ^
> for SSA_NAME: _1 in statement:
> _52 = _1 & _3;
> t5.c:9:6: internal compiler error: verify_ssa failed
>
> smce predicate computations were inserted in bb_7.

The issue is obviously that the predicates have already been emitted
in the target BB - that's of course the wrong place.  This is done
by insert_gimplified_predicates.

This just shows how edge predicate handling is broken - we don't
seem to have a sequence of gimplified stmts for edge predicates
but push those to e->dest which makes this really messy.

Rather than having a separate phase where we insert all
gimplified bb predicates we should do that on-demand when
predicating a PHI.

Your patch writes to stderr - that's bad - use dump_file and guard
the printfs properly.

You also still have two functions for PHI predication.  And the
new extended variant doesn't commonize the 2-args and general
paths.

I'm not at all happy with this code.  It may be existing if-conv codes
fault but making it even worse is not an option.

Again - what's wrong with simply splitting critical edges if
aggressive_if_conv?  I think that would very much simplify
things here.  Or alternatively use gsi_insert_on_edge and
commit edge insertions before merging the blocks.

Thanks,
Richard.

> ChangeLog is
>
> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c : Include hash-map.h.
> (struct bb_predicate_s): Add new field to save copy of gimple
> statement iterator.
> (bb_insert_point): New function.
> (set_bb_insert_point): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> AGGRESSIVE_IF_CONV is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (predicate_scalar_phi): Add new arguments for call of
> is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
> to true if BB containing phi has more than 2 predecessors or both
> incoming edges are critical. Invoke find_phi_replacement_condition and
> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
> has 2 predecessors and both incoming edges are critical or it has more
> than 2 predecessors and atleast one incoming edge is critical.
> Use standard gsi_after_labels otherwise.
> Invoke predicate_extended_scalar_phi if EXTENDED is true.
> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
> to save gsi before insertion of predicate computations. SEt-up it to
> true for BB with 2 predecessors and critical incoming edges either
>         number of predecessors is geater 2 and at least one incoming edge is
> critical.
> Add check that non-predicated block may have statements to insert.
> Insert predicate computation of BB just after label if
> EXTENDED_PREDICATION is true.
> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
> is copy of inner or outer loop force_vectorize field.
>
>
>
>
> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I did simple change by saving gsi iterator for each bb that has
>>> critical edges by adding additional field to bb_predicate_s:
>>>
>>> typedef struct bb_predicate_s {
>>>
>>>   /* The condition under which this basic block is executed.  */
>>>   tree predicate;
>>>
>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>      recorded here, in order to avoid the duplication of computations
>>>      that occur in previous conditions.  See PR44483.  */
>>>   gimple_seq predicate_gimplified_stmts;
>>>
>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>   gimple_stmt_iterator gsi;
>>> } *bb_predicate_p;
>>>
>>> and this iterator is saved in  insert_gimplified_predicates before
>>> insertion code for predicate computation. I checked that this fix
>>> works.
>>
>> Huh?  I still wonder what the issue is with inserting everything
>> after the PHI we predicate.
>>
>> Well, your updated patch will come with testcases for the testsuite
>> that will hopefully fail if doing that.
>>
>> Richard.
>>
>>>
>>> Now I am implementing merging of predicate_extended.. and
>>> predicate_arbitrary.. functions as you proposed.
>>>
>>> Best regards.
>>> Yuri.
>>>
>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Thanks Richard for your quick reply!
>>>>>
>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>> 2. What is your opinion about using more simple decision about
>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>> splitting is not a good decision.
>>>>
>>>> Why not always insert before the use?  Which would be after labels,
>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>> can only happen for backedges but those you can't remove in any case.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>
>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>
>>>>>>> Let's consider the following simple test-case:
>>>>>>>
>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>   for (i=0; i<512; i++)
>>>>>>>   {
>>>>>>>     float t = a[i];
>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>> res += 1;
>>>>>>>   }
>>>>>>>
>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>
>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>
>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>
>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>> treat it as an 'else' case.  That even works for
>>>>>>
>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>
>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>> ! (condition for 3 || condition for 4).
>>>>>>
>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>> used for edges 3 and 4 combined.
>>>>>>
>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>> predicates, e.g.
>>>>>>>
>>>>>>> <bb 7>:
>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>> _48 = _46 & _47;
>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>> _54 = ~_53;
>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>> _56 = _54 & _55;
>>>>>>> _57 = _48 | _56;
>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>> goto <bb 11>;
>>>>>>>
>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>> beginning but need to put it after predicate computations.
>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>> block end.
>>>>>>
>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>> for critical edges unless you split them).
>>>>>>
>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>> GENERIC expressions.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Let me know if you still have any questions.
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>
>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>> lead to less efficient binaries.
>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>> scalar reduction is applied.
>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>> are already inserted above this block and
>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>
>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>
>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>
>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>
>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>
>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>
>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>> non-critical incoming edge.
>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>> statement before/after gsi point.
>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>> EXTENDED value.
>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-09 15:21                 ` Richard Biener
@ 2014-12-10 10:54                   ` Yuri Rumyantsev
  2014-12-10 14:31                     ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-10 10:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Sorry that I forgot to delete debug dump from my fix.
I have few questions about your comments.

1. You wrote :
> You also still have two functions for PHI predication.  And the
> new extended variant doesn't commonize the 2-args and general
> path
 Did you mean that I must combine predicate_scalar_phi and
predicate_extended scalar phi to one function?
Please note that if additional flag was not set up (i.e.
aggressive_if_conv is false) extended predication is required more
compile time since it builds hash_map.

2. About critical edge splitting.

Did you mean that we should perform it (1) under aggressive_if_conv
option only; (2) should we split all critical edges.
Note that this leads to recomputing of topological order.

It is worth noting that in current implementation bb's with 2
predecessors and both are on critical edges are accepted without
additional option.

Thanks ahead.
Yuri.
2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Here is updated patch2 with the following changes:
>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>> 2. Use only one function for extended predication -
>> predicate_extended_scalar_phi.
>> 3. Save gsi before insertion of predicate computations for basic
>> blocks if it has 2 predecessors and
>> both incoming edges are critical or it gas more than 2 predecessors
>> and at least one incoming edge
>> is critical. This saved iterator can be used by extended phi predication.
>>
>> Here is motivated test-case which explains this point.
>> Test-case is attached (t5.c) and it must be compiled with -O2
>> -ftree-loop-vectorize -fopenmp options.
>> The problem phi is in bb-7:
>>
>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>   {
>>     <bb 5>:
>>     xmax_edge_18 = xmax_edge_36 + 1;
>>     if (xmax_17 == xmax_27)
>>       goto <bb 7>;
>>     else
>>       goto <bb 9>;
>>
>>   }
>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>   {
>>     <bb 6>:
>>     if (xmax_17 == xmax_27)
>>       goto <bb 7>;
>>     else
>>       goto <bb 8>;
>>
>>   }
>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>   {
>>     <bb 7>:
>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>     xmax_edge_19 = xmax_edge_39 + 1;
>>     goto <bb 11>;
>>
>>   }
>>
>> Note that both incoming edges to bb_7 are critical. If we comment out
>> restoring gsi in predicate_all_scalar_phi:
>> #if 0
>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>    gsi = bb_insert_point (bb);
>>  else
>> #endif
>>    gsi = gsi_after_labels (bb);
>>
>> we will get ICE:
>> t5.c: In function 'foo':
>> t5.c:9:6: error: definition in block 4 follows the use
>>  void foo (int n)
>>       ^
>> for SSA_NAME: _1 in statement:
>> _52 = _1 & _3;
>> t5.c:9:6: internal compiler error: verify_ssa failed
>>
>> smce predicate computations were inserted in bb_7.
>
> The issue is obviously that the predicates have already been emitted
> in the target BB - that's of course the wrong place.  This is done
> by insert_gimplified_predicates.
>
> This just shows how edge predicate handling is broken - we don't
> seem to have a sequence of gimplified stmts for edge predicates
> but push those to e->dest which makes this really messy.
>
> Rather than having a separate phase where we insert all
> gimplified bb predicates we should do that on-demand when
> predicating a PHI.
>
> Your patch writes to stderr - that's bad - use dump_file and guard
> the printfs properly.
>
> You also still have two functions for PHI predication.  And the
> new extended variant doesn't commonize the 2-args and general
> paths.
>
> I'm not at all happy with this code.  It may be existing if-conv codes
> fault but making it even worse is not an option.
>
> Again - what's wrong with simply splitting critical edges if
> aggressive_if_conv?  I think that would very much simplify
> things here.  Or alternatively use gsi_insert_on_edge and
> commit edge insertions before merging the blocks.
>
> Thanks,
> Richard.
>
>> ChangeLog is
>>
>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-if-conv.c : Include hash-map.h.
>> (struct bb_predicate_s): Add new field to save copy of gimple
>> statement iterator.
>> (bb_insert_point): New function.
>> (set_bb_insert_point): New function.
>> (has_pred_critical_p): New function.
>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>> AGGRESSIVE_IF_CONV is true.
>> (if_convertible_bb_p): Delete check that bb has at least one
>> non-critical incoming edge.
>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>> Allow interchange PHI arguments if EXTENDED is false.
>> Change check that block containing reduction statement candidate
>> is predecessor of phi-block since phi may have more than two arguments.
>> (predicate_scalar_phi): Add new arguments for call of
>> is_cond_scalar_reduction.
>> (get_predicate_for_edge): New function.
>> (struct phi_args_hash_traits): New type.
>> (phi_args_hash_traits::hash): New function.
>> (phi_args_hash_traits::equal_keys): New function.
>> (gen_phi_arg_condition): New function.
>> (predicate_extended_scalar_phi): New function.
>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>> to true if BB containing phi has more than 2 predecessors or both
>> incoming edges are critical. Invoke find_phi_replacement_condition and
>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>> has 2 predecessors and both incoming edges are critical or it has more
>> than 2 predecessors and atleast one incoming edge is critical.
>> Use standard gsi_after_labels otherwise.
>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>> to save gsi before insertion of predicate computations. SEt-up it to
>> true for BB with 2 predecessors and critical incoming edges either
>>         number of predecessors is geater 2 and at least one incoming edge is
>> critical.
>> Add check that non-predicated block may have statements to insert.
>> Insert predicate computation of BB just after label if
>> EXTENDED_PREDICATION is true.
>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>> is copy of inner or outer loop force_vectorize field.
>>
>>
>>
>>
>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I did simple change by saving gsi iterator for each bb that has
>>>> critical edges by adding additional field to bb_predicate_s:
>>>>
>>>> typedef struct bb_predicate_s {
>>>>
>>>>   /* The condition under which this basic block is executed.  */
>>>>   tree predicate;
>>>>
>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>      recorded here, in order to avoid the duplication of computations
>>>>      that occur in previous conditions.  See PR44483.  */
>>>>   gimple_seq predicate_gimplified_stmts;
>>>>
>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>   gimple_stmt_iterator gsi;
>>>> } *bb_predicate_p;
>>>>
>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>> insertion code for predicate computation. I checked that this fix
>>>> works.
>>>
>>> Huh?  I still wonder what the issue is with inserting everything
>>> after the PHI we predicate.
>>>
>>> Well, your updated patch will come with testcases for the testsuite
>>> that will hopefully fail if doing that.
>>>
>>> Richard.
>>>
>>>>
>>>> Now I am implementing merging of predicate_extended.. and
>>>> predicate_arbitrary.. functions as you proposed.
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Thanks Richard for your quick reply!
>>>>>>
>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>> 2. What is your opinion about using more simple decision about
>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>> splitting is not a good decision.
>>>>>
>>>>> Why not always insert before the use?  Which would be after labels,
>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Hi Richard,
>>>>>>>>
>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>
>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>
>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>
>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>   {
>>>>>>>>     float t = a[i];
>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>> res += 1;
>>>>>>>>   }
>>>>>>>>
>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>
>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>
>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>
>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>
>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>
>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>
>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>> used for edges 3 and 4 combined.
>>>>>>>
>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>> predicates, e.g.
>>>>>>>>
>>>>>>>> <bb 7>:
>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>> _48 = _46 & _47;
>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>> _54 = ~_53;
>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>> _56 = _54 & _55;
>>>>>>>> _57 = _48 | _56;
>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>> goto <bb 11>;
>>>>>>>>
>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>> block end.
>>>>>>>
>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>> for critical edges unless you split them).
>>>>>>>
>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>> GENERIC expressions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Let me know if you still have any questions.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>
>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>> are already inserted above this block and
>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>
>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>
>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>
>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>
>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>
>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>
>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> ChangeLog:
>>>>>>>>>>
>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>> EXTENDED value.
>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-10 10:54                   ` Yuri Rumyantsev
@ 2014-12-10 14:31                     ` Richard Biener
  2014-12-10 15:22                       ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-10 14:31 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Sorry that I forgot to delete debug dump from my fix.
> I have few questions about your comments.
>
> 1. You wrote :
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> path
>  Did you mean that I must combine predicate_scalar_phi and
> predicate_extended scalar phi to one function?
> Please note that if additional flag was not set up (i.e.
> aggressive_if_conv is false) extended predication is required more
> compile time since it builds hash_map.

It's compile-time complexity is reasonable enough even for
non-aggressive if-conversion.

> 2. About critical edge splitting.
>
> Did you mean that we should perform it (1) under aggressive_if_conv
> option only; (2) should we split all critical edges.
> Note that this leads to recomputing of topological order.

Well, I don't mind splitting all critical edges unconditionally, thus
do something like

Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c  (revision 218515)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
   if (number_of_loops (fun) <= 1)
     return 0;

+  bool critical_edges_split_p = false;
   FOR_EACH_LOOP (loop, 0)
     if (flag_tree_loop_if_convert == 1
        || flag_tree_loop_if_convert_stores == 1
        || ((flag_tree_loop_vectorize || loop->force_vectorize)
            && !loop->dont_vectorize))
-      todo |= tree_if_conversion (loop);
+      {
+       if (!critical_edges_split_p)
+         {
+           split_critical_edges ();
+           critical_edges_split_p = true;
+           todo |= TODO_cleanup_cfg;
+         }
+       todo |= tree_if_conversion (loop);
+      }

 #ifdef ENABLE_CHECKING
   {

> It is worth noting that in current implementation bb's with 2
> predecessors and both are on critical edges are accepted without
> additional option.

Yes, I know.

tree-if-conv.c is a mess right now and if we can avoid adding more
to it and even fix the critical edge missed optimization with splitting
critical edges then I am all for that solution.

Richard.

> Thanks ahead.
> Yuri.
> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is updated patch2 with the following changes:
>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>> 2. Use only one function for extended predication -
>>> predicate_extended_scalar_phi.
>>> 3. Save gsi before insertion of predicate computations for basic
>>> blocks if it has 2 predecessors and
>>> both incoming edges are critical or it gas more than 2 predecessors
>>> and at least one incoming edge
>>> is critical. This saved iterator can be used by extended phi predication.
>>>
>>> Here is motivated test-case which explains this point.
>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>> -ftree-loop-vectorize -fopenmp options.
>>> The problem phi is in bb-7:
>>>
>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>   {
>>>     <bb 5>:
>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 9>;
>>>
>>>   }
>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>   {
>>>     <bb 6>:
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 8>;
>>>
>>>   }
>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>   {
>>>     <bb 7>:
>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>     goto <bb 11>;
>>>
>>>   }
>>>
>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>> restoring gsi in predicate_all_scalar_phi:
>>> #if 0
>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>    gsi = bb_insert_point (bb);
>>>  else
>>> #endif
>>>    gsi = gsi_after_labels (bb);
>>>
>>> we will get ICE:
>>> t5.c: In function 'foo':
>>> t5.c:9:6: error: definition in block 4 follows the use
>>>  void foo (int n)
>>>       ^
>>> for SSA_NAME: _1 in statement:
>>> _52 = _1 & _3;
>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>
>>> smce predicate computations were inserted in bb_7.
>>
>> The issue is obviously that the predicates have already been emitted
>> in the target BB - that's of course the wrong place.  This is done
>> by insert_gimplified_predicates.
>>
>> This just shows how edge predicate handling is broken - we don't
>> seem to have a sequence of gimplified stmts for edge predicates
>> but push those to e->dest which makes this really messy.
>>
>> Rather than having a separate phase where we insert all
>> gimplified bb predicates we should do that on-demand when
>> predicating a PHI.
>>
>> Your patch writes to stderr - that's bad - use dump_file and guard
>> the printfs properly.
>>
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> paths.
>>
>> I'm not at all happy with this code.  It may be existing if-conv codes
>> fault but making it even worse is not an option.
>>
>> Again - what's wrong with simply splitting critical edges if
>> aggressive_if_conv?  I think that would very much simplify
>> things here.  Or alternatively use gsi_insert_on_edge and
>> commit edge insertions before merging the blocks.
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog is
>>>
>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-if-conv.c : Include hash-map.h.
>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>> statement iterator.
>>> (bb_insert_point): New function.
>>> (set_bb_insert_point): New function.
>>> (has_pred_critical_p): New function.
>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>> AGGRESSIVE_IF_CONV is true.
>>> (if_convertible_bb_p): Delete check that bb has at least one
>>> non-critical incoming edge.
>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>> Allow interchange PHI arguments if EXTENDED is false.
>>> Change check that block containing reduction statement candidate
>>> is predecessor of phi-block since phi may have more than two arguments.
>>> (predicate_scalar_phi): Add new arguments for call of
>>> is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (struct phi_args_hash_traits): New type.
>>> (phi_args_hash_traits::hash): New function.
>>> (phi_args_hash_traits::equal_keys): New function.
>>> (gen_phi_arg_condition): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>> to true if BB containing phi has more than 2 predecessors or both
>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>> has 2 predecessors and both incoming edges are critical or it has more
>>> than 2 predecessors and atleast one incoming edge is critical.
>>> Use standard gsi_after_labels otherwise.
>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>> to save gsi before insertion of predicate computations. SEt-up it to
>>> true for BB with 2 predecessors and critical incoming edges either
>>>         number of predecessors is geater 2 and at least one incoming edge is
>>> critical.
>>> Add check that non-predicated block may have statements to insert.
>>> Insert predicate computation of BB just after label if
>>> EXTENDED_PREDICATION is true.
>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>> is copy of inner or outer loop force_vectorize field.
>>>
>>>
>>>
>>>
>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>
>>>>> typedef struct bb_predicate_s {
>>>>>
>>>>>   /* The condition under which this basic block is executed.  */
>>>>>   tree predicate;
>>>>>
>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>
>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>   gimple_stmt_iterator gsi;
>>>>> } *bb_predicate_p;
>>>>>
>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>> insertion code for predicate computation. I checked that this fix
>>>>> works.
>>>>
>>>> Huh?  I still wonder what the issue is with inserting everything
>>>> after the PHI we predicate.
>>>>
>>>> Well, your updated patch will come with testcases for the testsuite
>>>> that will hopefully fail if doing that.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Now I am implementing merging of predicate_extended.. and
>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Thanks Richard for your quick reply!
>>>>>>>
>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>> splitting is not a good decision.
>>>>>>
>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>
>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>
>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>
>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>   {
>>>>>>>>>     float t = a[i];
>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>> res += 1;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>
>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>
>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>
>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>
>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>
>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>
>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>
>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>> predicates, e.g.
>>>>>>>>>
>>>>>>>>> <bb 7>:
>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>> _48 = _46 & _47;
>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>> _54 = ~_53;
>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>> _56 = _54 & _55;
>>>>>>>>> _57 = _48 | _56;
>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>> goto <bb 11>;
>>>>>>>>>
>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>> block end.
>>>>>>>>
>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>> for critical edges unless you split them).
>>>>>>>>
>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>> GENERIC expressions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>
>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>
>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>
>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>
>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>
>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>
>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>
>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-10 14:31                     ` Richard Biener
@ 2014-12-10 15:22                       ` Yuri Rumyantsev
  2014-12-11  8:59                         ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-10 15:22 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

Richard,

Thanks for your reply!

I didn't understand your point:

Well, I don't mind splitting all critical edges unconditionally

but you do it unconditionally in proposed patch. Also I assume that
call of split_critical_edges() can break ssa. For example, we can
split headers of loops, loop exit blocks etc. I prefer to do something
more loop-specialized, e.g. call edge_split() for critical edges
outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
destination bb belongs to loop).


2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Sorry that I forgot to delete debug dump from my fix.
>> I have few questions about your comments.
>>
>> 1. You wrote :
>>> You also still have two functions for PHI predication.  And the
>>> new extended variant doesn't commonize the 2-args and general
>>> path
>>  Did you mean that I must combine predicate_scalar_phi and
>> predicate_extended scalar phi to one function?
>> Please note that if additional flag was not set up (i.e.
>> aggressive_if_conv is false) extended predication is required more
>> compile time since it builds hash_map.
>
> It's compile-time complexity is reasonable enough even for
> non-aggressive if-conversion.
>
>> 2. About critical edge splitting.
>>
>> Did you mean that we should perform it (1) under aggressive_if_conv
>> option only; (2) should we split all critical edges.
>> Note that this leads to recomputing of topological order.
>
> Well, I don't mind splitting all critical edges unconditionally, thus
> do something like
>
> Index: gcc/tree-if-conv.c
> ===================================================================
> --- gcc/tree-if-conv.c  (revision 218515)
> +++ gcc/tree-if-conv.c  (working copy)
> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>    if (number_of_loops (fun) <= 1)
>      return 0;
>
> +  bool critical_edges_split_p = false;
>    FOR_EACH_LOOP (loop, 0)
>      if (flag_tree_loop_if_convert == 1
>         || flag_tree_loop_if_convert_stores == 1
>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>             && !loop->dont_vectorize))
> -      todo |= tree_if_conversion (loop);
> +      {
> +       if (!critical_edges_split_p)
> +         {
> +           split_critical_edges ();
> +           critical_edges_split_p = true;
> +           todo |= TODO_cleanup_cfg;
> +         }
> +       todo |= tree_if_conversion (loop);
> +      }
>
>  #ifdef ENABLE_CHECKING
>    {
>
>> It is worth noting that in current implementation bb's with 2
>> predecessors and both are on critical edges are accepted without
>> additional option.
>
> Yes, I know.
>
> tree-if-conv.c is a mess right now and if we can avoid adding more
> to it and even fix the critical edge missed optimization with splitting
> critical edges then I am all for that solution.
>
> Richard.
>
>> Thanks ahead.
>> Yuri.
>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Here is updated patch2 with the following changes:
>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>> 2. Use only one function for extended predication -
>>>> predicate_extended_scalar_phi.
>>>> 3. Save gsi before insertion of predicate computations for basic
>>>> blocks if it has 2 predecessors and
>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>> and at least one incoming edge
>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>
>>>> Here is motivated test-case which explains this point.
>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>> -ftree-loop-vectorize -fopenmp options.
>>>> The problem phi is in bb-7:
>>>>
>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>   {
>>>>     <bb 5>:
>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>     if (xmax_17 == xmax_27)
>>>>       goto <bb 7>;
>>>>     else
>>>>       goto <bb 9>;
>>>>
>>>>   }
>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>   {
>>>>     <bb 6>:
>>>>     if (xmax_17 == xmax_27)
>>>>       goto <bb 7>;
>>>>     else
>>>>       goto <bb 8>;
>>>>
>>>>   }
>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>   {
>>>>     <bb 7>:
>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>     goto <bb 11>;
>>>>
>>>>   }
>>>>
>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>> restoring gsi in predicate_all_scalar_phi:
>>>> #if 0
>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>    gsi = bb_insert_point (bb);
>>>>  else
>>>> #endif
>>>>    gsi = gsi_after_labels (bb);
>>>>
>>>> we will get ICE:
>>>> t5.c: In function 'foo':
>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>  void foo (int n)
>>>>       ^
>>>> for SSA_NAME: _1 in statement:
>>>> _52 = _1 & _3;
>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>
>>>> smce predicate computations were inserted in bb_7.
>>>
>>> The issue is obviously that the predicates have already been emitted
>>> in the target BB - that's of course the wrong place.  This is done
>>> by insert_gimplified_predicates.
>>>
>>> This just shows how edge predicate handling is broken - we don't
>>> seem to have a sequence of gimplified stmts for edge predicates
>>> but push those to e->dest which makes this really messy.
>>>
>>> Rather than having a separate phase where we insert all
>>> gimplified bb predicates we should do that on-demand when
>>> predicating a PHI.
>>>
>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>> the printfs properly.
>>>
>>> You also still have two functions for PHI predication.  And the
>>> new extended variant doesn't commonize the 2-args and general
>>> paths.
>>>
>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>> fault but making it even worse is not an option.
>>>
>>> Again - what's wrong with simply splitting critical edges if
>>> aggressive_if_conv?  I think that would very much simplify
>>> things here.  Or alternatively use gsi_insert_on_edge and
>>> commit edge insertions before merging the blocks.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog is
>>>>
>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-if-conv.c : Include hash-map.h.
>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>> statement iterator.
>>>> (bb_insert_point): New function.
>>>> (set_bb_insert_point): New function.
>>>> (has_pred_critical_p): New function.
>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>> AGGRESSIVE_IF_CONV is true.
>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>> non-critical incoming edge.
>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>> Change check that block containing reduction statement candidate
>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>> (predicate_scalar_phi): Add new arguments for call of
>>>> is_cond_scalar_reduction.
>>>> (get_predicate_for_edge): New function.
>>>> (struct phi_args_hash_traits): New type.
>>>> (phi_args_hash_traits::hash): New function.
>>>> (phi_args_hash_traits::equal_keys): New function.
>>>> (gen_phi_arg_condition): New function.
>>>> (predicate_extended_scalar_phi): New function.
>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>> to true if BB containing phi has more than 2 predecessors or both
>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>> Use standard gsi_after_labels otherwise.
>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>> critical.
>>>> Add check that non-predicated block may have statements to insert.
>>>> Insert predicate computation of BB just after label if
>>>> EXTENDED_PREDICATION is true.
>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>> is copy of inner or outer loop force_vectorize field.
>>>>
>>>>
>>>>
>>>>
>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>
>>>>>> typedef struct bb_predicate_s {
>>>>>>
>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>   tree predicate;
>>>>>>
>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>
>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>   gimple_stmt_iterator gsi;
>>>>>> } *bb_predicate_p;
>>>>>>
>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>> works.
>>>>>
>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>> after the PHI we predicate.
>>>>>
>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>> that will hopefully fail if doing that.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>
>>>>>> Best regards.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>
>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>> splitting is not a good decision.
>>>>>>>
>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Hi Richard,
>>>>>>>>>>
>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>
>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>
>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>
>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>   {
>>>>>>>>>>     float t = a[i];
>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>> res += 1;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>
>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>
>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>
>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>
>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>
>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>
>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>
>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>> predicates, e.g.
>>>>>>>>>>
>>>>>>>>>> <bb 7>:
>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>> _54 = ~_53;
>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>
>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>> block end.
>>>>>>>>>
>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>
>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>> GENERIC expressions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>
>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>
>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>
>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>
>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>
>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-10 15:22                       ` Yuri Rumyantsev
@ 2014-12-11  8:59                         ` Richard Biener
  2014-12-16 15:16                           ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-11  8:59 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> Thanks for your reply!
>
> I didn't understand your point:
>
> Well, I don't mind splitting all critical edges unconditionally
>
> but you do it unconditionally in proposed patch.

I don't mind means I am fine with it.

> Also I assume that
> call of split_critical_edges() can break ssa. For example, we can
> split headers of loops, loop exit blocks etc.

How does that "break SSA"?  You mean loop-closed SSA?  I'd
be surprised if so but that may be possible.

> I prefer to do something
> more loop-specialized, e.g. call edge_split() for critical edges
> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
> destination bb belongs to loop).

That works for me as well but it is more complicated to implement.
Ideally you'd only split one edge if you find a block with only critical
predecessors (where we'd currently give up).  But note that this
requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
will change loop->num_nodes so we have to be more careful in
constructing the loop calling if_convertible_bb_p.

Richard.

>
> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Sorry that I forgot to delete debug dump from my fix.
>>> I have few questions about your comments.
>>>
>>> 1. You wrote :
>>>> You also still have two functions for PHI predication.  And the
>>>> new extended variant doesn't commonize the 2-args and general
>>>> path
>>>  Did you mean that I must combine predicate_scalar_phi and
>>> predicate_extended scalar phi to one function?
>>> Please note that if additional flag was not set up (i.e.
>>> aggressive_if_conv is false) extended predication is required more
>>> compile time since it builds hash_map.
>>
>> It's compile-time complexity is reasonable enough even for
>> non-aggressive if-conversion.
>>
>>> 2. About critical edge splitting.
>>>
>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>> option only; (2) should we split all critical edges.
>>> Note that this leads to recomputing of topological order.
>>
>> Well, I don't mind splitting all critical edges unconditionally, thus
>> do something like
>>
>> Index: gcc/tree-if-conv.c
>> ===================================================================
>> --- gcc/tree-if-conv.c  (revision 218515)
>> +++ gcc/tree-if-conv.c  (working copy)
>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>    if (number_of_loops (fun) <= 1)
>>      return 0;
>>
>> +  bool critical_edges_split_p = false;
>>    FOR_EACH_LOOP (loop, 0)
>>      if (flag_tree_loop_if_convert == 1
>>         || flag_tree_loop_if_convert_stores == 1
>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>             && !loop->dont_vectorize))
>> -      todo |= tree_if_conversion (loop);
>> +      {
>> +       if (!critical_edges_split_p)
>> +         {
>> +           split_critical_edges ();
>> +           critical_edges_split_p = true;
>> +           todo |= TODO_cleanup_cfg;
>> +         }
>> +       todo |= tree_if_conversion (loop);
>> +      }
>>
>>  #ifdef ENABLE_CHECKING
>>    {
>>
>>> It is worth noting that in current implementation bb's with 2
>>> predecessors and both are on critical edges are accepted without
>>> additional option.
>>
>> Yes, I know.
>>
>> tree-if-conv.c is a mess right now and if we can avoid adding more
>> to it and even fix the critical edge missed optimization with splitting
>> critical edges then I am all for that solution.
>>
>> Richard.
>>
>>> Thanks ahead.
>>> Yuri.
>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Here is updated patch2 with the following changes:
>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>> 2. Use only one function for extended predication -
>>>>> predicate_extended_scalar_phi.
>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>> blocks if it has 2 predecessors and
>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>> and at least one incoming edge
>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>
>>>>> Here is motivated test-case which explains this point.
>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>> The problem phi is in bb-7:
>>>>>
>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>   {
>>>>>     <bb 5>:
>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>     if (xmax_17 == xmax_27)
>>>>>       goto <bb 7>;
>>>>>     else
>>>>>       goto <bb 9>;
>>>>>
>>>>>   }
>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>   {
>>>>>     <bb 6>:
>>>>>     if (xmax_17 == xmax_27)
>>>>>       goto <bb 7>;
>>>>>     else
>>>>>       goto <bb 8>;
>>>>>
>>>>>   }
>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>   {
>>>>>     <bb 7>:
>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>     goto <bb 11>;
>>>>>
>>>>>   }
>>>>>
>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>> #if 0
>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>    gsi = bb_insert_point (bb);
>>>>>  else
>>>>> #endif
>>>>>    gsi = gsi_after_labels (bb);
>>>>>
>>>>> we will get ICE:
>>>>> t5.c: In function 'foo':
>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>  void foo (int n)
>>>>>       ^
>>>>> for SSA_NAME: _1 in statement:
>>>>> _52 = _1 & _3;
>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>
>>>>> smce predicate computations were inserted in bb_7.
>>>>
>>>> The issue is obviously that the predicates have already been emitted
>>>> in the target BB - that's of course the wrong place.  This is done
>>>> by insert_gimplified_predicates.
>>>>
>>>> This just shows how edge predicate handling is broken - we don't
>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>> but push those to e->dest which makes this really messy.
>>>>
>>>> Rather than having a separate phase where we insert all
>>>> gimplified bb predicates we should do that on-demand when
>>>> predicating a PHI.
>>>>
>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>> the printfs properly.
>>>>
>>>> You also still have two functions for PHI predication.  And the
>>>> new extended variant doesn't commonize the 2-args and general
>>>> paths.
>>>>
>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>> fault but making it even worse is not an option.
>>>>
>>>> Again - what's wrong with simply splitting critical edges if
>>>> aggressive_if_conv?  I think that would very much simplify
>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>> commit edge insertions before merging the blocks.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog is
>>>>>
>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>> statement iterator.
>>>>> (bb_insert_point): New function.
>>>>> (set_bb_insert_point): New function.
>>>>> (has_pred_critical_p): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>> AGGRESSIVE_IF_CONV is true.
>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>> non-critical incoming edge.
>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>> Change check that block containing reduction statement candidate
>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>> is_cond_scalar_reduction.
>>>>> (get_predicate_for_edge): New function.
>>>>> (struct phi_args_hash_traits): New type.
>>>>> (phi_args_hash_traits::hash): New function.
>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>> (gen_phi_arg_condition): New function.
>>>>> (predicate_extended_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>> Use standard gsi_after_labels otherwise.
>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>> critical.
>>>>> Add check that non-predicated block may have statements to insert.
>>>>> Insert predicate computation of BB just after label if
>>>>> EXTENDED_PREDICATION is true.
>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>
>>>>>>> typedef struct bb_predicate_s {
>>>>>>>
>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>   tree predicate;
>>>>>>>
>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>
>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>> } *bb_predicate_p;
>>>>>>>
>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>> works.
>>>>>>
>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>> after the PHI we predicate.
>>>>>>
>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>> that will hopefully fail if doing that.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>
>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>> splitting is not a good decision.
>>>>>>>>
>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>
>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>
>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>
>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>
>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>   {
>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>> res += 1;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>
>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>
>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>
>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>
>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>
>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>
>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>
>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>
>>>>>>>>>>> <bb 7>:
>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>
>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>> block end.
>>>>>>>>>>
>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>
>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>
>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>
>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>
>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-11  8:59                         ` Richard Biener
@ 2014-12-16 15:16                           ` Yuri Rumyantsev
  2014-12-17 15:45                             ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-16 15:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 23341 bytes --]

Hi Richard,

Here is updated patch which includes
(1) split critical edges for aggressive if conversion.
(2) delete all stuff related to support of critical edge predication.
(3) only one function - predicate_scalar_phi performs predication.
(4) function find_phi_replacement_condition was deleted since it was
included in predicate_scalar_phi for phi with two arguments.

I checked that patch works in stress testing mode, i.e. with
aggressive if conversion by default.

What is your opinion?

Thanks.
Yuri.

2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> Thanks for your reply!
>>
>> I didn't understand your point:
>>
>> Well, I don't mind splitting all critical edges unconditionally
>>
>> but you do it unconditionally in proposed patch.
>
> I don't mind means I am fine with it.
>
>> Also I assume that
>> call of split_critical_edges() can break ssa. For example, we can
>> split headers of loops, loop exit blocks etc.
>
> How does that "break SSA"?  You mean loop-closed SSA?  I'd
> be surprised if so but that may be possible.
>
>> I prefer to do something
>> more loop-specialized, e.g. call edge_split() for critical edges
>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>> destination bb belongs to loop).
>
> That works for me as well but it is more complicated to implement.
> Ideally you'd only split one edge if you find a block with only critical
> predecessors (where we'd currently give up).  But note that this
> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
> will change loop->num_nodes so we have to be more careful in
> constructing the loop calling if_convertible_bb_p.
>
> Richard.
>
>>
>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Sorry that I forgot to delete debug dump from my fix.
>>>> I have few questions about your comments.
>>>>
>>>> 1. You wrote :
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> path
>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>> predicate_extended scalar phi to one function?
>>>> Please note that if additional flag was not set up (i.e.
>>>> aggressive_if_conv is false) extended predication is required more
>>>> compile time since it builds hash_map.
>>>
>>> It's compile-time complexity is reasonable enough even for
>>> non-aggressive if-conversion.
>>>
>>>> 2. About critical edge splitting.
>>>>
>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>> option only; (2) should we split all critical edges.
>>>> Note that this leads to recomputing of topological order.
>>>
>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>> do something like
>>>
>>> Index: gcc/tree-if-conv.c
>>> ===================================================================
>>> --- gcc/tree-if-conv.c  (revision 218515)
>>> +++ gcc/tree-if-conv.c  (working copy)
>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>    if (number_of_loops (fun) <= 1)
>>>      return 0;
>>>
>>> +  bool critical_edges_split_p = false;
>>>    FOR_EACH_LOOP (loop, 0)
>>>      if (flag_tree_loop_if_convert == 1
>>>         || flag_tree_loop_if_convert_stores == 1
>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>             && !loop->dont_vectorize))
>>> -      todo |= tree_if_conversion (loop);
>>> +      {
>>> +       if (!critical_edges_split_p)
>>> +         {
>>> +           split_critical_edges ();
>>> +           critical_edges_split_p = true;
>>> +           todo |= TODO_cleanup_cfg;
>>> +         }
>>> +       todo |= tree_if_conversion (loop);
>>> +      }
>>>
>>>  #ifdef ENABLE_CHECKING
>>>    {
>>>
>>>> It is worth noting that in current implementation bb's with 2
>>>> predecessors and both are on critical edges are accepted without
>>>> additional option.
>>>
>>> Yes, I know.
>>>
>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>> to it and even fix the critical edge missed optimization with splitting
>>> critical edges then I am all for that solution.
>>>
>>> Richard.
>>>
>>>> Thanks ahead.
>>>> Yuri.
>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Here is updated patch2 with the following changes:
>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>> 2. Use only one function for extended predication -
>>>>>> predicate_extended_scalar_phi.
>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>> blocks if it has 2 predecessors and
>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>> and at least one incoming edge
>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>
>>>>>> Here is motivated test-case which explains this point.
>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>> The problem phi is in bb-7:
>>>>>>
>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>   {
>>>>>>     <bb 5>:
>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 9>;
>>>>>>
>>>>>>   }
>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>   {
>>>>>>     <bb 6>:
>>>>>>     if (xmax_17 == xmax_27)
>>>>>>       goto <bb 7>;
>>>>>>     else
>>>>>>       goto <bb 8>;
>>>>>>
>>>>>>   }
>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>   {
>>>>>>     <bb 7>:
>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>     goto <bb 11>;
>>>>>>
>>>>>>   }
>>>>>>
>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>> #if 0
>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>    gsi = bb_insert_point (bb);
>>>>>>  else
>>>>>> #endif
>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>
>>>>>> we will get ICE:
>>>>>> t5.c: In function 'foo':
>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>  void foo (int n)
>>>>>>       ^
>>>>>> for SSA_NAME: _1 in statement:
>>>>>> _52 = _1 & _3;
>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>
>>>>>> smce predicate computations were inserted in bb_7.
>>>>>
>>>>> The issue is obviously that the predicates have already been emitted
>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>> by insert_gimplified_predicates.
>>>>>
>>>>> This just shows how edge predicate handling is broken - we don't
>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>> but push those to e->dest which makes this really messy.
>>>>>
>>>>> Rather than having a separate phase where we insert all
>>>>> gimplified bb predicates we should do that on-demand when
>>>>> predicating a PHI.
>>>>>
>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>> the printfs properly.
>>>>>
>>>>> You also still have two functions for PHI predication.  And the
>>>>> new extended variant doesn't commonize the 2-args and general
>>>>> paths.
>>>>>
>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>> fault but making it even worse is not an option.
>>>>>
>>>>> Again - what's wrong with simply splitting critical edges if
>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>> commit edge insertions before merging the blocks.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog is
>>>>>>
>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>> statement iterator.
>>>>>> (bb_insert_point): New function.
>>>>>> (set_bb_insert_point): New function.
>>>>>> (has_pred_critical_p): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>> non-critical incoming edge.
>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>> Change check that block containing reduction statement candidate
>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>> is_cond_scalar_reduction.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (struct phi_args_hash_traits): New type.
>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>> (gen_phi_arg_condition): New function.
>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>> Use standard gsi_after_labels otherwise.
>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>> critical.
>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>> Insert predicate computation of BB just after label if
>>>>>> EXTENDED_PREDICATION is true.
>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>
>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>
>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>   tree predicate;
>>>>>>>>
>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>
>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>> } *bb_predicate_p;
>>>>>>>>
>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>> works.
>>>>>>>
>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>> after the PHI we predicate.
>>>>>>>
>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>> that will hopefully fail if doing that.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>> Yuri.
>>>>>>>>
>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>
>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>
>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>
>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>
>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>
>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>   {
>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>
>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>
>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>
>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>
>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>
>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>
>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>
>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>
>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>> block end.
>>>>>>>>>>>
>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>
>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>
>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>
>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: if-conv.patch2.3 --]
[-- Type: application/octet-stream, Size: 21396 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 6c9ad32..b6e585b 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -127,6 +127,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "insn-codes.h"
 #include "optabs.h"
+#include "hash-map.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
@@ -163,17 +164,6 @@ bb_predicate (basic_block bb)
   return ((bb_predicate_p) bb->aux)->predicate;
 }
 
-/* Returns predicate for critical edge E.  */
-
-static inline tree
-edge_predicate (edge e)
-{
-  gcc_assert (EDGE_COUNT (e->src->succs) >= 2);
-  gcc_assert (EDGE_COUNT (e->dest->preds) >= 2);
-  gcc_assert (e->aux != NULL);
-  return (tree) e->aux;
-}
-
 /* Sets the gimplified predicate COND for basic block BB.  */
 
 static inline void
@@ -185,16 +175,6 @@ set_bb_predicate (basic_block bb, tree cond)
   ((bb_predicate_p) bb->aux)->predicate = cond;
 }
 
-/* Sets predicate COND for critical edge E.
-   Assumes that #(E->src->succs) >=2 & #(E->dest->preds) >= 2.  */
-
-static inline void
-set_edge_predicate (edge e, tree cond)
-{
-  gcc_assert (cond != NULL_TREE);
-  e->aux = cond;
-}
-
 /* Returns the sequence of statements of the gimplification of the
    predicate for basic block BB.  */
 
@@ -511,11 +491,6 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
 
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
     add_to_predicate_list (loop, e->dest, cond);
-
-  /* If edge E is critical save predicate on it.
-     Assume that #(e->src->succs) >= 2.  */
-  if (EDGE_COUNT (e->dest->preds) >= 2)
-    set_edge_predicate (e, cond);
 }
 
 /* Returns true if one of the successor edges of BB exits LOOP.  */
@@ -997,6 +972,19 @@ all_preds_critical_p (basic_block bb)
   return true;
 }
 
+/* Returns true if at least one successor in on critical edge.  */
+static inline bool
+has_pred_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) > 1)
+      return true;
+  return false;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -1020,10 +1008,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1059,19 +1052,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source. This restriction will be removed after adding support for
-     extended predication.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
+     source. This check is skipped if aggressive_if_conv is true.  */
+  if (!aggressive_if_conv
+      && EDGE_COUNT (bb->preds) > 1
+      && bb != loop->header
+      && all_preds_critical_p (bb))
     {
-      if (!aggressive_if_conv && all_preds_critical_p (bb))
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors in bb#%d\n",
-		      bb->index);
-
-	  return false;
-	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "only critical predecessors\n");
+	return false;
     }
 
   return true;
@@ -1208,8 +1197,6 @@ predicate_bbs (loop_p loop)
 	  extract_true_false_edges_from_block (gimple_bb (stmt),
 					       &true_edge, &false_edge);
 
-          true_edge->aux = false_edge->aux = NULL;
-
 	  /* If C is true, then TRUE_EDGE is taken.  */
 	  add_to_dst_predicate_list (loop, true_edge, unshare_expr (cond),
 				     unshare_expr (c));
@@ -1351,7 +1338,7 @@ if_convertible_loop_p_1 (struct loop *loop,
     }
 
   if (dump_file)
-    fprintf (dump_file, "Applying if-conversion\n");
+    fprintf (dump_file, "Applying if-conversion for loop->header#%d in %s\n", loop->header->index, current_function_name());
 
   return true;
 }
@@ -1423,60 +1410,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
   return res;
 }
 
-/* Basic block BB has two predecessors.  Using predecessor's bb
-   predicate, set an appropriate condition COND for the PHI node
-   replacement.  Return the true block whose phi arguments are
-   selected when cond is true.  LOOP is the loop containing the
-   if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
-
-static basic_block
-find_phi_replacement_condition (basic_block bb, tree *cond,
-				gimple_stmt_iterator *gsi)
-{
-  edge first_edge, second_edge;
-  tree tmp_cond;
-
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
-    {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
-
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
-
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
-      else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
-
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
-
-  return first_edge->src;
-}
-
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1487,14 +1420,15 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   ARG_0 and ARG_1 are correspondent PHI arguments.
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is true if PHI has > 2 arguments.  */
 
 static bool
-is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+is_cond_scalar_reduction (gimple phi, gimple *reduc, tree arg_0, tree arg_1,
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
-  tree arg_0, arg_1;
   gimple stmt;
   gimple header_phi = NULL;
   enum tree_code reduction_op;
@@ -1503,13 +1437,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  edge e;
+  edge_iterator ei;
+  bool result = false;
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
-  if (gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
+  if (!extended && gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
     {
       lhs = arg_1;
       header_phi = SSA_NAME_DEF_STMT (arg_0);
@@ -1540,8 +1474,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1638,9 +1577,66 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Helpers for PHI arguments hashtable map.  */
+
+struct phi_args_hash_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (tree);
+  static inline bool equal_keys (tree, tree);
+};
+
+inline hashval_t
+phi_args_hash_traits::hash (tree value)
+{
+  return iterative_hash_expr (value, 0);
+}
+
+inline bool
+phi_args_hash_traits::equal_keys (tree value1, tree value2)
+{
+  return operand_equal_p (value1, value2, 0);
+}
+
+  /* Produce condition for all occurrences of ARG in PHI node.  */
+
+static tree
+gen_phi_arg_condition (gphi *phi, vec<int> *occur,
+		       gimple_stmt_iterator *gsi)
+{
+  int len;
+  int i;
+  tree cond = NULL_TREE;
+  tree c;
+  edge e;
+
+  len = occur->length ();
+  gcc_assert (len > 0);
+  for (i = 0; i < len; i++)
+    {
+      e = gimple_phi_arg_edge (phi, (*occur)[i]);
+      c = bb_predicate (e->src);
+      if (is_true_predicate (c))
+	continue;
+      c = force_gimple_operand_gsi_1 (gsi, unshare_expr (c),
+				      is_gimple_condexpr, NULL_TREE,
+				      true, GSI_SAME_STMT);
+      if (cond != NULL_TREE)
+	{
+	  /* Must build OR expression.  */
+	  cond = fold_or_predicates (EXPR_LOCATION (c), c, cond);
+	  cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					     is_gimple_condexpr, NULL_TREE,
+					     true, GSI_SAME_STMT);
+	}
+      else
+	cond = c;
+    }
+  gcc_assert (cond != NULL_TREE);
+  return cond;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
-   This routine does not handle PHI nodes with more than two
-   arguments.
+   This routine can handle PHI nodes with more than two arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)>
@@ -1648,69 +1644,208 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
      S2: A = cond ? x1 : x2;
 
    The generated code is inserted at GSI that points to the top of
-   basic block's statement list.  When COND is true, phi arg from
-   TRUE_BB is selected.  */
+   basic block's statement list.
+   If PHI node has more than two arguments a chain of conditional
+   expression is produced.  */
+
 
 static void
-predicate_scalar_phi (gphi *phi, tree cond,
-		      basic_block true_bb,
-		      gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 {
-  gimple new_stmt;
+  gimple new_stmt = NULL, reduc;
+  tree rhs, res, arg0, arg1, op0, op1, scev;
+  tree cond;
+  unsigned int index0;
+  unsigned int max_ind, max, args_len;
+  edge e;
   basic_block bb;
-  tree rhs, res, arg, scev;
-
-  gcc_assert (gimple_code (phi) == GIMPLE_PHI
-	      && gimple_phi_num_args (phi) == 2);
+  unsigned int i;
 
   res = gimple_phi_result (phi);
-  /* Do not handle virtual phi nodes.  */
   if (virtual_operand_p (res))
     return;
 
-  bb = gimple_bb (phi);
-
-  if ((arg = degenerate_phi_result (phi))
+  if ((rhs = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
 	  && !chrec_contains_undetermined (scev)
 	  && scev != res
-	  && (arg = gimple_phi_arg_def (phi, 0))))
-    rhs = arg;
-  else
-    {
-      tree arg_0, arg_1;
-      tree op0, op1;
-      gimple reduc;
+	  && (rhs = gimple_phi_arg_def (phi, 0)))) {
+    if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Degenerate phi!\n");
+	print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+      }
+    new_stmt = gimple_build_assign (res, rhs);
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+    update_stmt (new_stmt);
+    return;
+  }
 
-      /* Use condition that is not TRUTH_NOT_EXPR in conditional modify expr.  */
+  bb = gimple_bb (phi);
+  if (EDGE_COUNT (bb->preds) == 2)
+    {
+      /* Predicate ordinary PHI node with 2 arguments.  */
+      edge first_edge, second_edge;
+      basic_block true_bb;
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+      cond = bb_predicate (first_edge->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  cond = bb_predicate (second_edge->src);
+	  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	    cond = TREE_OPERAND (cond, 0);
+	  else
+	    first_edge = second_edge;
+	}
+      else
+	cond = bb_predicate (first_edge->src);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      true_bb = first_edge->src;
       if (EDGE_PRED (bb, 1)->src == true_bb)
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 1);
-	  arg_1 = gimple_phi_arg_def (phi, 0);
+	  arg0 = gimple_phi_arg_def (phi, 1);
+	  arg1 = gimple_phi_arg_def (phi, 0);
 	}
       else
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 0);
-	  arg_1 = gimple_phi_arg_def (phi, 1);
+	  arg0 = gimple_phi_arg_def (phi, 0);
+	  arg1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, arg0, arg1,
+				    &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
-				    arg_0, arg_1);
+				    arg0, arg1);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "new phi replacement stmt\n");
+	  print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+	}
+      return;
+    }
+
+  /* Create hashmap for PHI node which contain vector of argument indexes
+     having the same value.  */
+  bool swap = false;
+  hash_map<tree, auto_vec<int>, phi_args_hash_traits> phi_arg_map;
+  unsigned int num_args = gimple_phi_num_args (phi);
+  /* Vector of different PHI argument values.  */
+  auto_vec<tree> args (num_args);
+
+  /* Compute phi_arg_map.  */
+  for (i = 0; i < num_args; i++)
+    {
+      tree arg;
+
+      arg = gimple_phi_arg_def (phi, i);
+      if (!phi_arg_map.get (arg))
+	args.quick_push (arg);
+      phi_arg_map.get_or_insert (arg).safe_push (i);
+    }
+
+  /* Determine element with max number of occurrences.  */
+  max_ind = 0;
+  max = phi_arg_map.get (args[0])->length ();
+  args_len = args.length ();
+  for (i = 1; i < args_len; i++)
+    {
+      unsigned int len;
+      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+	{
+	  max_ind = i;
+	  max = len;
+	}
+    }
+
+  /* Put element with max number of occurences to the end of ARGS.  */
+  if (max_ind != 0 && max_ind +1 != args_len)
+    {
+      tree tmp = args[args_len - 1];
+      args[args_len - 1] = args[max_ind];
+      args[max_ind] = tmp;
     }
 
-  new_stmt = gimple_build_assign (res, rhs);
-  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-  update_stmt (new_stmt);
+  /* Handle one special case when number of arguments with different values
+     is equal 2 and one argument has the only occurrence. Such PHI can be
+     handled as if would have only 2 arguments.  */
+  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+    {
+      vec<int> *indexes;
+      indexes = phi_arg_map.get (args[0]);
+      index0 = (*indexes)[0];
+      arg0 = args[0];
+      arg1 = args[1];
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = bb_predicate (e->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      if (!(is_cond_scalar_reduction (phi, &reduc, arg0 , arg1,
+				      &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+    }
+  else
+    {
+      /* Common case.  */
+      vec<int> *indexes;
+      tree type = TREE_TYPE (gimple_phi_result (phi));
+      tree lhs;
+      arg1 = args[1];
+      for (i = 0; i < args_len; i++)
+	{
+	  arg0 = args[i];
+	  indexes = phi_arg_map.get (args[i]);
+	  if (i != args_len - 1)
+	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+	  else
+	    lhs = res;
+	  cond = gen_phi_arg_condition (phi, indexes, gsi);
+	  rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				      arg0, arg1);
+	  new_stmt = gimple_build_assign (lhs, rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (new_stmt);
+	  arg1 = lhs;
+	}
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "new phi replacement stmt\n");
+      fprintf (dump_file, "new extended phi replacement stmt\n");
       print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
     }
 }
@@ -1728,28 +1863,25 @@ predicate_all_scalar_phis (struct loop *loop)
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
       gphi *phi;
-      tree cond = NULL_TREE;
       gimple_stmt_iterator gsi;
       gphi_iterator phi_gsi;
-      basic_block true_bb = NULL;
       bb = ifc_bbs[i];
 
       if (bb == loop->header)
 	continue;
 
+      if (EDGE_COUNT (bb->preds) == 1)
+	continue;
+
       phi_gsi = gsi_start_phis (bb);
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
-
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1770,7 +1902,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
-
+      if (!is_predicated (bb))
+	gcc_assert (bb_predicate_gimplified_stmts (bb) == NULL);
       if (!is_predicated (bb))
 	{
 	  /* Do not insert statements for a basic block that is not
@@ -2194,6 +2327,54 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+/* Performs splitting of critical edges if aggressive_if_conv is true.
+   Returns false if loop won't be if converted and true otherwise.  */
+
+static bool
+ifcvt_split_critical_edges (struct loop *loop)
+{
+  basic_block *body;
+  basic_block bb;
+  unsigned int num = loop->num_nodes;
+  unsigned int i;
+  gimple stmt;
+  edge e;
+
+  if (num <= 2)
+    return false;
+  if (loop->inner)
+    return false;
+  if (!single_exit (loop))
+    return false;
+
+  body = get_loop_body (loop);
+  for (i = 0; i < num; i++)
+    {
+      bb = body[i];
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
+	continue;
+      stmt = last_stmt (bb);
+      /* Skip basic blocks not ending with conditional branch.  */
+      if (!(stmt && gimple_code (stmt) == GIMPLE_COND))
+	continue;
+      if (EDGE_COUNT (EDGE_SUCC (bb, 0)->dest->preds) > 1)
+	{
+	  e = EDGE_SUCC (bb, 0);
+	  if (e->dest->loop_father == e->src->loop_father)
+	    split_edge (e);
+	}
+      if (EDGE_COUNT (EDGE_SUCC (bb, 1)->dest->preds) > 1)
+	{
+	  e = EDGE_SUCC (bb, 1);
+	  if (e->dest->loop_father == e->src->loop_father)
+	    split_edge (e);
+	}
+    }
+  free (body);
+  return true;
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2205,8 +2386,19 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Temporary set up this flag to false.  */
-  aggressive_if_conv = false;
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
+
+  if (aggressive_if_conv)
+    if (!ifcvt_split_critical_edges (loop))
+      goto cleanup;
 
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
@@ -2243,11 +2435,6 @@ tree_if_conversion (struct loop *loop)
 	{
 	  basic_block bb = ifc_bbs[i];
 	  free_bb_predicate (bb);
-	  if (EDGE_COUNT (bb->succs) == 2)
-	    {
-	      EDGE_SUCC (bb, 0)->aux = NULL;
-	      EDGE_SUCC (bb, 1)->aux = NULL;
-	    }
 	}
 
       free (ifc_bbs);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-16 15:16                           ` Yuri Rumyantsev
@ 2014-12-17 15:45                             ` Richard Biener
  2014-12-18 13:48                               ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-17 15:45 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> Here is updated patch which includes
> (1) split critical edges for aggressive if conversion.
> (2) delete all stuff related to support of critical edge predication.
> (3) only one function - predicate_scalar_phi performs predication.
> (4) function find_phi_replacement_condition was deleted since it was
> included in predicate_scalar_phi for phi with two arguments.
>
> I checked that patch works in stress testing mode, i.e. with
> aggressive if conversion by default.
>
> What is your opinion?

Looks ok overall, but please simply do

  FOR_EACH_EDGE (e, ei, bb->succs)
    if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
      split_edge (e);

for all blocks apart from the latch.

Can you please send a combined patch up to this one?  Looking at
the incremental diff is somewhat hard.  Thus a patch including all
patches from patch1 to this one.

Thanks,
Richard.

>
> Thanks.
> Yuri.
>
> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> Thanks for your reply!
>>>
>>> I didn't understand your point:
>>>
>>> Well, I don't mind splitting all critical edges unconditionally
>>>
>>> but you do it unconditionally in proposed patch.
>>
>> I don't mind means I am fine with it.
>>
>>> Also I assume that
>>> call of split_critical_edges() can break ssa. For example, we can
>>> split headers of loops, loop exit blocks etc.
>>
>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>> be surprised if so but that may be possible.
>>
>>> I prefer to do something
>>> more loop-specialized, e.g. call edge_split() for critical edges
>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>> destination bb belongs to loop).
>>
>> That works for me as well but it is more complicated to implement.
>> Ideally you'd only split one edge if you find a block with only critical
>> predecessors (where we'd currently give up).  But note that this
>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>> will change loop->num_nodes so we have to be more careful in
>> constructing the loop calling if_convertible_bb_p.
>>
>> Richard.
>>
>>>
>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>> I have few questions about your comments.
>>>>>
>>>>> 1. You wrote :
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> path
>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>> predicate_extended scalar phi to one function?
>>>>> Please note that if additional flag was not set up (i.e.
>>>>> aggressive_if_conv is false) extended predication is required more
>>>>> compile time since it builds hash_map.
>>>>
>>>> It's compile-time complexity is reasonable enough even for
>>>> non-aggressive if-conversion.
>>>>
>>>>> 2. About critical edge splitting.
>>>>>
>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>> option only; (2) should we split all critical edges.
>>>>> Note that this leads to recomputing of topological order.
>>>>
>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>> do something like
>>>>
>>>> Index: gcc/tree-if-conv.c
>>>> ===================================================================
>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>    if (number_of_loops (fun) <= 1)
>>>>      return 0;
>>>>
>>>> +  bool critical_edges_split_p = false;
>>>>    FOR_EACH_LOOP (loop, 0)
>>>>      if (flag_tree_loop_if_convert == 1
>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>             && !loop->dont_vectorize))
>>>> -      todo |= tree_if_conversion (loop);
>>>> +      {
>>>> +       if (!critical_edges_split_p)
>>>> +         {
>>>> +           split_critical_edges ();
>>>> +           critical_edges_split_p = true;
>>>> +           todo |= TODO_cleanup_cfg;
>>>> +         }
>>>> +       todo |= tree_if_conversion (loop);
>>>> +      }
>>>>
>>>>  #ifdef ENABLE_CHECKING
>>>>    {
>>>>
>>>>> It is worth noting that in current implementation bb's with 2
>>>>> predecessors and both are on critical edges are accepted without
>>>>> additional option.
>>>>
>>>> Yes, I know.
>>>>
>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>> to it and even fix the critical edge missed optimization with splitting
>>>> critical edges then I am all for that solution.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks ahead.
>>>>> Yuri.
>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Here is updated patch2 with the following changes:
>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>> 2. Use only one function for extended predication -
>>>>>>> predicate_extended_scalar_phi.
>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>> blocks if it has 2 predecessors and
>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>> and at least one incoming edge
>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>
>>>>>>> Here is motivated test-case which explains this point.
>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>> The problem phi is in bb-7:
>>>>>>>
>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>   {
>>>>>>>     <bb 5>:
>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 9>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>   {
>>>>>>>     <bb 6>:
>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>       goto <bb 7>;
>>>>>>>     else
>>>>>>>       goto <bb 8>;
>>>>>>>
>>>>>>>   }
>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>   {
>>>>>>>     <bb 7>:
>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>     goto <bb 11>;
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>> #if 0
>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>  else
>>>>>>> #endif
>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>
>>>>>>> we will get ICE:
>>>>>>> t5.c: In function 'foo':
>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>  void foo (int n)
>>>>>>>       ^
>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>> _52 = _1 & _3;
>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>
>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>
>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>> by insert_gimplified_predicates.
>>>>>>
>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>> but push those to e->dest which makes this really messy.
>>>>>>
>>>>>> Rather than having a separate phase where we insert all
>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>> predicating a PHI.
>>>>>>
>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>> the printfs properly.
>>>>>>
>>>>>> You also still have two functions for PHI predication.  And the
>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>> paths.
>>>>>>
>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>> fault but making it even worse is not an option.
>>>>>>
>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>> commit edge insertions before merging the blocks.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> ChangeLog is
>>>>>>>
>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>
>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>> statement iterator.
>>>>>>> (bb_insert_point): New function.
>>>>>>> (set_bb_insert_point): New function.
>>>>>>> (has_pred_critical_p): New function.
>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>> non-critical incoming edge.
>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>> Change check that block containing reduction statement candidate
>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>> is_cond_scalar_reduction.
>>>>>>> (get_predicate_for_edge): New function.
>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>> critical.
>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>> Insert predicate computation of BB just after label if
>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>
>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>
>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>   tree predicate;
>>>>>>>>>
>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>
>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>
>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>> works.
>>>>>>>>
>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>> after the PHI we predicate.
>>>>>>>>
>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>
>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>
>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>
>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>
>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>
>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>
>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>
>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>
>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>> block end.
>>>>>>>>>>>>
>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>
>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-17 15:45                             ` Richard Biener
@ 2014-12-18 13:48                               ` Yuri Rumyantsev
  2014-12-19 11:46                                 ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-18 13:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 27795 bytes --]

Richard,

I am sending you full patch (~1000 lines) but if you need only patch.1
and patch.2 will let me know and i'll send you reduced patch.

Below are few comments regarding your remarks for patch.3.

1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
when dead code elimination is required to vectorize loop, i.e. dead
statement is marked as relevant.
2. You wrote:
> The "retry" code also looks odd - why do you walk the BB multiple
> times instead of just doing sth like
>
>  while (!has_single_use (lhs))
>    {
>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>      ifcvt_walk_pattern_tree (copy);
>    }
>
> thus returning the copy you create and re-process it (the copy should
> now have a single-use).

The problem is that not only top SSA_NAME (lhs) may have multiple uses
but some intermediate variables too. For example, for the following
test-case

float a[1000];
int c[1000];

int foo()
{
  int i, res = 0;
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }
  return res;
}

After combine_blocks we have the following bb:

<bb 3>:
# res_15 = PHI <res_1(7), 0(15)>
# i_16 = PHI <i_11(7), 0(15)>
# ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
t_5 = a[i_16];
_6 = t_5 > 0.0;
_7 = t_5 < 9.9999998430674944e+16;
_8 = _6 & _7;
_10 = &c[i_16];
_ifc__32 = _8 ? 4294967295 : 0;
_9 = MASK_LOAD (_10, 0B, _ifc__32);
_28 = _8;
_29 = _9 != 0;
_30 = _28 & _29;
_ifc__31 = _30 ? 1 : 0;
res_1 = res_15 + _ifc__31;
i_11 = i_16 + 1;
ivtmp_13 = ivtmp_14 - 1;
if (ivtmp_13 != 0)
  goto <bb 7>;
else
  goto <bb 8>;

and we can see that _8 has multiple uses. Also note that after splitting of
_8 = _6 & _7
we also get multiple uses for definition of  _6 and _7. So I used this
iterative algorithm as the simplest one.

I think it would be nice to re-use some utility from tree-vect-patterns.c
for stmt_is_root_of_bool_pattern.

I assume that function stmt_is_root_of_bool_pattern can be simplified
to check on COND_EXPR only since PHI predication and memory access
predication produced only such statements,i.e. it can look like

static bool
stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
{
  enum tree_code code;
  tree lhs, rhs;

  code = gimple_assign_rhs_code (stmt);
  if (code == COND_EXPR)
    {
      rhs = gimple_assign_rhs1 (stmt);
      if (TREE_CODE (rhs) != SSA_NAME)
return false;
      *var = rhs;
      return true;
    }
  return false;
}

I also did few minor changes in patch.2.

3. You can also notice that I inserted code in tree_if_conversion to
do loop version if explicit option "-ftree-loop-if-convert" was not
passed to compiler, i.e. we perform if-conversion for loop
vectorization only and if it does not take place, we should delete
if-converted version of loop.
What is your opinion?

Thanks.
Yuri.

2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi Richard,
>>
>> Here is updated patch which includes
>> (1) split critical edges for aggressive if conversion.
>> (2) delete all stuff related to support of critical edge predication.
>> (3) only one function - predicate_scalar_phi performs predication.
>> (4) function find_phi_replacement_condition was deleted since it was
>> included in predicate_scalar_phi for phi with two arguments.
>>
>> I checked that patch works in stress testing mode, i.e. with
>> aggressive if conversion by default.
>>
>> What is your opinion?
>
> Looks ok overall, but please simply do
>
>   FOR_EACH_EDGE (e, ei, bb->succs)
>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>       split_edge (e);
>
> for all blocks apart from the latch.
>
> Can you please send a combined patch up to this one?  Looking at
> the incremental diff is somewhat hard.  Thus a patch including all
> patches from patch1 to this one.
>
> Thanks,
> Richard.
>
>>
>> Thanks.
>> Yuri.
>>
>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> Thanks for your reply!
>>>>
>>>> I didn't understand your point:
>>>>
>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>
>>>> but you do it unconditionally in proposed patch.
>>>
>>> I don't mind means I am fine with it.
>>>
>>>> Also I assume that
>>>> call of split_critical_edges() can break ssa. For example, we can
>>>> split headers of loops, loop exit blocks etc.
>>>
>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>> be surprised if so but that may be possible.
>>>
>>>> I prefer to do something
>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>> destination bb belongs to loop).
>>>
>>> That works for me as well but it is more complicated to implement.
>>> Ideally you'd only split one edge if you find a block with only critical
>>> predecessors (where we'd currently give up).  But note that this
>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>> will change loop->num_nodes so we have to be more careful in
>>> constructing the loop calling if_convertible_bb_p.
>>>
>>> Richard.
>>>
>>>>
>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>> I have few questions about your comments.
>>>>>>
>>>>>> 1. You wrote :
>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>> path
>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>> predicate_extended scalar phi to one function?
>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>> compile time since it builds hash_map.
>>>>>
>>>>> It's compile-time complexity is reasonable enough even for
>>>>> non-aggressive if-conversion.
>>>>>
>>>>>> 2. About critical edge splitting.
>>>>>>
>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>> option only; (2) should we split all critical edges.
>>>>>> Note that this leads to recomputing of topological order.
>>>>>
>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>> do something like
>>>>>
>>>>> Index: gcc/tree-if-conv.c
>>>>> ===================================================================
>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>    if (number_of_loops (fun) <= 1)
>>>>>      return 0;
>>>>>
>>>>> +  bool critical_edges_split_p = false;
>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>             && !loop->dont_vectorize))
>>>>> -      todo |= tree_if_conversion (loop);
>>>>> +      {
>>>>> +       if (!critical_edges_split_p)
>>>>> +         {
>>>>> +           split_critical_edges ();
>>>>> +           critical_edges_split_p = true;
>>>>> +           todo |= TODO_cleanup_cfg;
>>>>> +         }
>>>>> +       todo |= tree_if_conversion (loop);
>>>>> +      }
>>>>>
>>>>>  #ifdef ENABLE_CHECKING
>>>>>    {
>>>>>
>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>> predecessors and both are on critical edges are accepted without
>>>>>> additional option.
>>>>>
>>>>> Yes, I know.
>>>>>
>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>> critical edges then I am all for that solution.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Thanks ahead.
>>>>>> Yuri.
>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>> and at least one incoming edge
>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>
>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>> The problem phi is in bb-7:
>>>>>>>>
>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>   {
>>>>>>>>     <bb 5>:
>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>       goto <bb 7>;
>>>>>>>>     else
>>>>>>>>       goto <bb 9>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>   {
>>>>>>>>     <bb 6>:
>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>       goto <bb 7>;
>>>>>>>>     else
>>>>>>>>       goto <bb 8>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>   {
>>>>>>>>     <bb 7>:
>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>     goto <bb 11>;
>>>>>>>>
>>>>>>>>   }
>>>>>>>>
>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>> #if 0
>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>  else
>>>>>>>> #endif
>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>
>>>>>>>> we will get ICE:
>>>>>>>> t5.c: In function 'foo':
>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>  void foo (int n)
>>>>>>>>       ^
>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>> _52 = _1 & _3;
>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>
>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>
>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>> by insert_gimplified_predicates.
>>>>>>>
>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>
>>>>>>> Rather than having a separate phase where we insert all
>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>> predicating a PHI.
>>>>>>>
>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>> the printfs properly.
>>>>>>>
>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>> paths.
>>>>>>>
>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>> fault but making it even worse is not an option.
>>>>>>>
>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> ChangeLog is
>>>>>>>>
>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>> statement iterator.
>>>>>>>> (bb_insert_point): New function.
>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>> non-critical incoming edge.
>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>> is_cond_scalar_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>> critical.
>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>
>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>
>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>   tree predicate;
>>>>>>>>>>
>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>
>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>
>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>> works.
>>>>>>>>>
>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>> after the PHI we predicate.
>>>>>>>>>
>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>> Yuri.
>>>>>>>>>>
>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>
>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>
>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>
>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>
>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: patch.20141218 --]
[-- Type: application/octet-stream, Size: 31513 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f7befac..14b1cb9 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -127,10 +127,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "insn-codes.h"
 #include "optabs.h"
+#include "hash-map.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Apply more aggressive (extended) if-conversion if true.  */
+static bool aggressive_if_conv;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -373,6 +377,22 @@ static tree
 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
   tree rhs1, lhs1, cond_expr;
+
+  /* If COND is comparison r != 0 and r has boolean type, convert COND
+     to SSA_NAME to accept by vect bool pattern.  */
+  if (TREE_CODE (cond) == NE_EXPR)
+    {
+      tree op0 = TREE_OPERAND (cond, 0);
+      tree op1 = TREE_OPERAND (cond, 1);
+      if (TREE_CODE (op0) == SSA_NAME
+	  && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+	  && (integer_zerop (op1)))
+	cond = op0;
+      else if (TREE_CODE (op1) == SSA_NAME
+	       && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
+	       && (integer_zerop (op0)))
+	cond = op1;
+    }
   cond_expr = fold_ternary (COND_EXPR, type, cond,
 			    rhs, lhs);
 
@@ -485,10 +505,11 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -512,7 +533,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the aggressive_if_conv is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
@@ -524,11 +547,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!aggressive_if_conv)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -895,7 +924,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -942,6 +972,35 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
+/* Returns true if at least one successor in on critical edge.  */
+static inline bool
+has_pred_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) > 1)
+      return true;
+  return false;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -950,6 +1009,9 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction will be deleted after adding support for extended
+   predication.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -962,10 +1024,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1001,20 +1068,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
-	  return false;
-	}
+     source. This check is skipped if aggressive_if_conv is true.  */
+  if (!aggressive_if_conv
+      && EDGE_COUNT (bb->preds) > 1
+      && bb != loop->header
+      && all_preds_critical_p (bb))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "only critical predecessors\n");
+	return false;
     }
 
   return true;
@@ -1126,11 +1188,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1141,7 +1204,7 @@ predicate_bbs (loop_p loop)
 	  tree c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+	  tree c = build2_loc (loc, gimple_cond_code (stmt),
 				    boolean_type_node,
 				    gimple_cond_lhs (stmt),
 				    gimple_cond_rhs (stmt));
@@ -1291,7 +1354,7 @@ if_convertible_loop_p_1 (struct loop *loop,
     }
 
   if (dump_file)
-    fprintf (dump_file, "Applying if-conversion\n");
+    fprintf (dump_file, "Applying if-conversion for loop->header#%d in %s\n", loop->header->index, current_function_name());
 
   return true;
 }
@@ -1363,60 +1426,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
   return res;
 }
 
-/* Basic block BB has two predecessors.  Using predecessor's bb
-   predicate, set an appropriate condition COND for the PHI node
-   replacement.  Return the true block whose phi arguments are
-   selected when cond is true.  LOOP is the loop containing the
-   if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
-
-static basic_block
-find_phi_replacement_condition (basic_block bb, tree *cond,
-				gimple_stmt_iterator *gsi)
-{
-  edge first_edge, second_edge;
-  tree tmp_cond;
-
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
-    {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
-
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
-
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
-      else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
-
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
-
-  return first_edge->src;
-}
-
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1427,14 +1436,15 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   ARG_0 and ARG_1 are correspondent PHI arguments.
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is true if PHI has > 2 arguments.  */
 
 static bool
-is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+is_cond_scalar_reduction (gimple phi, gimple *reduc, tree arg_0, tree arg_1,
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
-  tree arg_0, arg_1;
   gimple stmt;
   gimple header_phi = NULL;
   enum tree_code reduction_op;
@@ -1443,13 +1453,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  edge e;
+  edge_iterator ei;
+  bool result = false;
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
-  if (gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
+  if (!extended && gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
     {
       lhs = arg_1;
       header_phi = SSA_NAME_DEF_STMT (arg_0);
@@ -1480,8 +1490,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1578,9 +1593,66 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Helpers for PHI arguments hashtable map.  */
+
+struct phi_args_hash_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (tree);
+  static inline bool equal_keys (tree, tree);
+};
+
+inline hashval_t
+phi_args_hash_traits::hash (tree value)
+{
+  return iterative_hash_expr (value, 0);
+}
+
+inline bool
+phi_args_hash_traits::equal_keys (tree value1, tree value2)
+{
+  return operand_equal_p (value1, value2, 0);
+}
+
+  /* Produce condition for all occurrences of ARG in PHI node.  */
+
+static tree
+gen_phi_arg_condition (gphi *phi, vec<int> *occur,
+		       gimple_stmt_iterator *gsi)
+{
+  int len;
+  int i;
+  tree cond = NULL_TREE;
+  tree c;
+  edge e;
+
+  len = occur->length ();
+  gcc_assert (len > 0);
+  for (i = 0; i < len; i++)
+    {
+      e = gimple_phi_arg_edge (phi, (*occur)[i]);
+      c = bb_predicate (e->src);
+      if (is_true_predicate (c))
+	continue;
+      c = force_gimple_operand_gsi_1 (gsi, unshare_expr (c),
+				      is_gimple_condexpr, NULL_TREE,
+				      true, GSI_SAME_STMT);
+      if (cond != NULL_TREE)
+	{
+	  /* Must build OR expression.  */
+	  cond = fold_or_predicates (EXPR_LOCATION (c), c, cond);
+	  cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					     is_gimple_condexpr, NULL_TREE,
+					     true, GSI_SAME_STMT);
+	}
+      else
+	cond = c;
+    }
+  gcc_assert (cond != NULL_TREE);
+  return cond;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
-   This routine does not handle PHI nodes with more than two
-   arguments.
+   This routine can handle PHI nodes with more than two arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)>
@@ -1588,69 +1660,209 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
      S2: A = cond ? x1 : x2;
 
    The generated code is inserted at GSI that points to the top of
-   basic block's statement list.  When COND is true, phi arg from
-   TRUE_BB is selected.  */
+   basic block's statement list.
+   If PHI node has more than two arguments a chain of conditional
+   expression is produced.  */
+
 
 static void
-predicate_scalar_phi (gphi *phi, tree cond,
-		      basic_block true_bb,
-		      gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 {
-  gimple new_stmt;
+  gimple new_stmt = NULL, reduc;
+  tree rhs, res, arg0, arg1, op0, op1, scev;
+  tree cond;
+  unsigned int index0;
+  unsigned int max, args_len;
+  edge e;
   basic_block bb;
-  tree rhs, res, arg, scev;
-
-  gcc_assert (gimple_code (phi) == GIMPLE_PHI
-	      && gimple_phi_num_args (phi) == 2);
+  unsigned int i;
 
   res = gimple_phi_result (phi);
-  /* Do not handle virtual phi nodes.  */
   if (virtual_operand_p (res))
     return;
 
-  bb = gimple_bb (phi);
-
-  if ((arg = degenerate_phi_result (phi))
+  if ((rhs = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
 	  && !chrec_contains_undetermined (scev)
 	  && scev != res
-	  && (arg = gimple_phi_arg_def (phi, 0))))
-    rhs = arg;
-  else
-    {
-      tree arg_0, arg_1;
-      tree op0, op1;
-      gimple reduc;
+	  && (rhs = gimple_phi_arg_def (phi, 0)))) {
+    if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Degenerate phi!\n");
+	print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+      }
+    new_stmt = gimple_build_assign (res, rhs);
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+    update_stmt (new_stmt);
+    return;
+  }
 
-      /* Use condition that is not TRUTH_NOT_EXPR in conditional modify expr.  */
+  bb = gimple_bb (phi);
+  if (EDGE_COUNT (bb->preds) == 2)
+    {
+      /* Predicate ordinary PHI node with 2 arguments.  */
+      edge first_edge, second_edge;
+      basic_block true_bb;
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+      cond = bb_predicate (first_edge->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  cond = bb_predicate (second_edge->src);
+	  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	    cond = TREE_OPERAND (cond, 0);
+	  else
+	    first_edge = second_edge;
+	}
+      else
+	cond = bb_predicate (first_edge->src);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      true_bb = first_edge->src;
       if (EDGE_PRED (bb, 1)->src == true_bb)
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 1);
-	  arg_1 = gimple_phi_arg_def (phi, 0);
+	  arg0 = gimple_phi_arg_def (phi, 1);
+	  arg1 = gimple_phi_arg_def (phi, 0);
 	}
       else
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 0);
-	  arg_1 = gimple_phi_arg_def (phi, 1);
+	  arg0 = gimple_phi_arg_def (phi, 0);
+	  arg1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, arg0, arg1,
+				    &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
-				    arg_0, arg_1);
+				    arg0, arg1);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "new phi replacement stmt\n");
+	  print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+	}
+      return;
+    }
+
+  /* Create hashmap for PHI node which contain vector of argument indexes
+     having the same value.  */
+  bool swap = false;
+  hash_map<tree, auto_vec<int>, phi_args_hash_traits> phi_arg_map;
+  unsigned int num_args = gimple_phi_num_args (phi);
+  int max_ind = -1;
+  /* Vector of different PHI argument values.  */
+  auto_vec<tree> args (num_args);
+
+  /* Compute phi_arg_map.  */
+  for (i = 0; i < num_args; i++)
+    {
+      tree arg;
+
+      arg = gimple_phi_arg_def (phi, i);
+      if (!phi_arg_map.get (arg))
+	args.quick_push (arg);
+      phi_arg_map.get_or_insert (arg).safe_push (i);
+    }
+
+  /* Determine element with max number of occurrences.  */
+  max_ind = -1;
+  max = 1;
+  args_len = args.length ();
+  for (i = 0; i < args_len; i++)
+    {
+      unsigned int len;
+      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+	{
+	  max_ind = (int) i;
+	  max = len;
+	}
+    }
+
+  /* Put element with max number of occurences to the end of ARGS.  */
+  if (max_ind != -1 && max_ind +1 != (int) args_len)
+    {
+      tree tmp = args[args_len - 1];
+      args[args_len - 1] = args[max_ind];
+      args[max_ind] = tmp;
     }
 
-  new_stmt = gimple_build_assign (res, rhs);
-  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-  update_stmt (new_stmt);
+  /* Handle one special case when number of arguments with different values
+     is equal 2 and one argument has the only occurrence. Such PHI can be
+     handled as if would have only 2 arguments.  */
+  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+    {
+      vec<int> *indexes;
+      indexes = phi_arg_map.get (args[0]);
+      index0 = (*indexes)[0];
+      arg0 = args[0];
+      arg1 = args[1];
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = bb_predicate (e->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      if (!(is_cond_scalar_reduction (phi, &reduc, arg0 , arg1,
+				      &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+    }
+  else
+    {
+      /* Common case.  */
+      vec<int> *indexes;
+      tree type = TREE_TYPE (gimple_phi_result (phi));
+      tree lhs;
+      arg1 = args[1];
+      for (i = 0; i < args_len; i++)
+	{
+	  arg0 = args[i];
+	  indexes = phi_arg_map.get (args[i]);
+	  if (i != args_len - 1)
+	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+	  else
+	    lhs = res;
+	  cond = gen_phi_arg_condition (phi, indexes, gsi);
+	  rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				      arg0, arg1);
+	  new_stmt = gimple_build_assign (lhs, rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (new_stmt);
+	  arg1 = lhs;
+	}
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "new phi replacement stmt\n");
+      fprintf (dump_file, "new extended phi replacement stmt\n");
       print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
     }
 }
@@ -1668,28 +1880,25 @@ predicate_all_scalar_phis (struct loop *loop)
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
       gphi *phi;
-      tree cond = NULL_TREE;
       gimple_stmt_iterator gsi;
       gphi_iterator phi_gsi;
-      basic_block true_bb = NULL;
       bb = ifc_bbs[i];
 
       if (bb == loop->header)
 	continue;
 
+      if (EDGE_COUNT (bb->preds) == 1)
+	continue;
+
       phi_gsi = gsi_start_phis (bb);
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
-
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1710,7 +1919,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
-
+      if (!is_predicated (bb))
+	gcc_assert (bb_predicate_gimplified_stmts (bb) == NULL);
       if (!is_predicated (bb))
 	{
 	  /* Do not insert statements for a basic block that is not
@@ -1862,7 +2072,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 static void
 predicate_mem_writes (loop_p loop)
 {
-  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
+  tree mask_vec[10];
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1882,6 +2093,9 @@ predicate_mem_writes (loop_p loop)
 	  cond = TREE_OPERAND (cond, 0);
 	}
 
+      for (j=0; j<10; j++)
+	mask_vec[j] = NULL_TREE;
+
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
 	  continue;
@@ -1892,21 +2106,26 @@ predicate_mem_writes (loop_p loop)
 	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
 	    gimple new_stmt;
 	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
-
-	    masktype = build_nonstandard_integer_type (bitsize, 1);
-	    mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-	    mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
 	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
 	    mark_addressable (ref);
 	    addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
 					     true, NULL_TREE, true,
 					     GSI_SAME_STMT);
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
-	    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					 mask_op0, mask_op1);
-	    mask = ifc_temp_var (masktype, mask, &gsi);
+	    gcc_assert (exact_log2 (bitsize) != -1);
+	    if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
+	      {
+		masktype = build_nonstandard_integer_type (bitsize, 1);
+		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
+		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
+		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+					           is_gimple_condexpr,
+						   NULL_TREE,
+					           true, GSI_SAME_STMT);
+		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
+					     mask_op0, mask_op1);
+		mask = ifc_temp_var (masktype, mask, &gsi);
+		mask_vec[exact_log2 (bitsize)] = mask;
+	      }
 	    ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
 	    /* Copy points-to info if possible.  */
 	    if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
@@ -2134,6 +2353,197 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+/* Performs splitting of critical edges if aggressive_if_conv is true.
+   Returns false if loop won't be if converted and true otherwise.  */
+
+static bool
+ifcvt_split_critical_edges (struct loop *loop)
+{
+  basic_block *body;
+  basic_block bb;
+  unsigned int num = loop->num_nodes;
+  unsigned int i;
+  gimple stmt;
+  edge e;
+  edge_iterator ei;
+
+  if (num <= 2)
+    return false;
+  if (loop->inner)
+    return false;
+  if (!single_exit (loop))
+    return false;
+
+  body = get_loop_body (loop);
+  for (i = 0; i < num; i++)
+    {
+      bb = body[i];
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
+	continue;
+      stmt = last_stmt (bb);
+      /* Skip basic blocks not ending with conditional branch.  */
+      if (!(stmt && gimple_code (stmt) == GIMPLE_COND))
+	continue;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
+	  split_edge (e);
+    }
+  free (body);
+  return true;
+}
+
+/* Assumes that lhs of DEF_STMT have multiple uses.
+   Delete one use by (1) creation of copy DEF_STMT with
+   unique lhs; (2) change original use of lhs in one
+   use statement with newly created lhs.  */
+
+static void
+ifcvt_split_def_stmt (gimple def_stmt)
+{
+  tree var;
+  tree lhs;
+  gimple copy_stmt;
+  gimple_stmt_iterator gsi;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  var = gimple_assign_lhs (def_stmt);
+  copy_stmt = gimple_copy (def_stmt);
+  lhs = make_temp_ssa_name (TREE_TYPE (var), NULL, "_ifc_");
+  gimple_assign_set_lhs (copy_stmt, lhs);
+  SSA_NAME_DEF_STMT (lhs) = copy_stmt;
+  /* Insert copy of DEF_STMT.  */
+  gsi = gsi_for_stmt (def_stmt);
+  gsi_insert_after (&gsi, copy_stmt, GSI_SAME_STMT);
+  /* Change one use of var to lhs.  */
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, var)
+    {
+      SET_USE (use_p, lhs);
+      break;
+    }
+}
+
+/* Traverse bool pattern recursively starting from var.
+   Returns true if tree can be considered as bool pattern.
+   Retry is true if additional traversal is required.  */
+
+static bool
+ifcvt_walk_pattern_tree (tree var, bool *retry)
+{
+  tree rhs1;
+  enum tree_code code;
+  gimple def_stmt;
+  tree lhs;
+
+  def_stmt = SSA_NAME_DEF_STMT (var);
+  if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
+    return false;
+  lhs = gimple_assign_lhs (def_stmt);
+  if (!has_single_use (lhs))
+    {
+      *retry = true;
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Multiple uses in stmt: ");
+	  print_generic_expr (dump_file, lhs, 0);
+	  fprintf (dump_file, "\n");
+	  print_gimple_stmt (dump_file, def_stmt, 0, TDF_SLIM);
+	}
+      ifcvt_split_def_stmt (def_stmt);
+      return true;
+    }
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  code = gimple_assign_rhs_code (def_stmt);
+  switch (code)
+    {
+    case SSA_NAME:
+      return ifcvt_walk_pattern_tree (rhs1, retry);
+    CASE_CONVERT:
+      if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+	   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+	return false;
+      return ifcvt_walk_pattern_tree (rhs1, retry);
+    case BIT_NOT_EXPR:
+      return ifcvt_walk_pattern_tree (rhs1, retry);
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      if (!ifcvt_walk_pattern_tree (rhs1, retry))
+	return false;
+      return ifcvt_walk_pattern_tree (gimple_assign_rhs2 (def_stmt), retry);
+    default:
+      if (TREE_CODE_CLASS (code) == tcc_comparison)
+	return true;
+    }
+  return false;
+}
+
+/* Returns true if STMT can be a root of bool pattern apllied
+   by vectorizer. VAR contains SSA_NAME which starts pattern.  */
+
+static bool
+stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
+{
+  enum tree_code code;
+  tree lhs, rhs;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (CONVERT_EXPR_CODE_P (code))
+    {
+      lhs = gimple_assign_lhs (stmt);
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+	return false;
+      if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  else if (code == COND_EXPR)
+    {
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (rhs) != SSA_NAME)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  return false;
+}
+
+/*  Traverse all statements in BB which correspondent to loop header to
+    find out all statements which can start bool pattern applied by
+    vectorizer and convert multiple uses in it to conform pattern
+    restrictions. Such case can occur if the same predicate is used both
+    for phi node conversion and load/store mask.  */
+
+static void
+ifcvt_repair_bool_pattern (basic_block bb)
+{
+  tree rhs;
+  bool retry;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
+	continue;
+      if (!stmt_is_root_of_bool_pattern (stmt, &rhs))
+	continue;
+      while (true)
+	{
+	  retry = false;
+	  if (!ifcvt_walk_pattern_tree (rhs, &retry))
+	    return;
+	  if (!retry)
+	    break;
+	}
+    }
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2145,6 +2555,20 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
+
+  if (aggressive_if_conv)
+    if (!ifcvt_split_critical_edges (loop))
+      goto cleanup;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2154,7 +2578,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || flag_tree_loop_if_convert != 1)
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2162,6 +2588,11 @@ tree_if_conversion (struct loop *loop)
      on-the-fly.  */
   combine_blocks (loop, any_mask_load_store);
 
+  /* Repair tree correspondent to bool pattern to delete multiple uses of
+     preidcates.  */
+  if (aggressive_if_conv)
+    ifcvt_repair_bool_pattern (loop->header);
+
   todo |= TODO_cleanup_cfg;
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     {
@@ -2175,7 +2606,10 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-18 13:48                               ` Yuri Rumyantsev
@ 2014-12-19 11:46                                 ` Richard Biener
  2014-12-22 14:49                                   ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2014-12-19 11:46 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I am sending you full patch (~1000 lines) but if you need only patch.1
> and patch.2 will let me know and i'll send you reduced patch.
>
> Below are few comments regarding your remarks for patch.3.
>
> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
> when dead code elimination is required to vectorize loop, i.e. dead
> statement is marked as relevant.
> 2. You wrote:
>> The "retry" code also looks odd - why do you walk the BB multiple
>> times instead of just doing sth like
>>
>>  while (!has_single_use (lhs))
>>    {
>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>      ifcvt_walk_pattern_tree (copy);
>>    }
>>
>> thus returning the copy you create and re-process it (the copy should
>> now have a single-use).
>
> The problem is that not only top SSA_NAME (lhs) may have multiple uses
> but some intermediate variables too. For example, for the following
> test-case
>
> float a[1000];
> int c[1000];
>
> int foo()
> {
>   int i, res = 0;
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>   return res;
> }
>
> After combine_blocks we have the following bb:
>
> <bb 3>:
> # res_15 = PHI <res_1(7), 0(15)>
> # i_16 = PHI <i_11(7), 0(15)>
> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
> t_5 = a[i_16];
> _6 = t_5 > 0.0;
> _7 = t_5 < 9.9999998430674944e+16;
> _8 = _6 & _7;
> _10 = &c[i_16];
> _ifc__32 = _8 ? 4294967295 : 0;
> _9 = MASK_LOAD (_10, 0B, _ifc__32);
> _28 = _8;
> _29 = _9 != 0;
> _30 = _28 & _29;
> _ifc__31 = _30 ? 1 : 0;
> res_1 = res_15 + _ifc__31;
> i_11 = i_16 + 1;
> ivtmp_13 = ivtmp_14 - 1;
> if (ivtmp_13 != 0)
>   goto <bb 7>;
> else
>   goto <bb 8>;
>
> and we can see that _8 has multiple uses. Also note that after splitting of
> _8 = _6 & _7
> we also get multiple uses for definition of  _6 and _7. So I used this
> iterative algorithm as the simplest one.

But it walks the entire pattern again and again while you only need to
ensure you walk the pattern tree of the now single-use DEF again
(in fact, rather than replacing a random USE in ifcvt_split_def_stmt
you should pass down the user_operand_p that you need to make
single-use).

> I think it would be nice to re-use some utility from tree-vect-patterns.c
> for stmt_is_root_of_bool_pattern.
>
> I assume that function stmt_is_root_of_bool_pattern can be simplified
> to check on COND_EXPR only since PHI predication and memory access
> predication produced only such statements,i.e. it can look like
>
> static bool
> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
> {
>   enum tree_code code;
>   tree lhs, rhs;
>
>   code = gimple_assign_rhs_code (stmt);
>   if (code == COND_EXPR)
>     {
>       rhs = gimple_assign_rhs1 (stmt);
>       if (TREE_CODE (rhs) != SSA_NAME)
> return false;
>       *var = rhs;
>       return true;
>     }
>   return false;
> }
>
> I also did few minor changes in patch.2.
>
> 3. You can also notice that I inserted code in tree_if_conversion to
> do loop version if explicit option "-ftree-loop-if-convert" was not
> passed to compiler, i.e. we perform if-conversion for loop
> vectorization only and if it does not take place, we should delete
> if-converted version of loop.
> What is your opinion?

Overall part 1 and part 2 look good to me, predicate_scalar_phi
looks in need of some refactoring to avoid duplicate code.  We can
do that a followup.

Part 3 still needs the iteration to be resolved and make the use we
actually care about single-use, not a random one so we can avoid
iterating completely.

Richard.

> Thanks.
> Yuri.
>
> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> Here is updated patch which includes
>>> (1) split critical edges for aggressive if conversion.
>>> (2) delete all stuff related to support of critical edge predication.
>>> (3) only one function - predicate_scalar_phi performs predication.
>>> (4) function find_phi_replacement_condition was deleted since it was
>>> included in predicate_scalar_phi for phi with two arguments.
>>>
>>> I checked that patch works in stress testing mode, i.e. with
>>> aggressive if conversion by default.
>>>
>>> What is your opinion?
>>
>> Looks ok overall, but please simply do
>>
>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>       split_edge (e);
>>
>> for all blocks apart from the latch.
>>
>> Can you please send a combined patch up to this one?  Looking at
>> the incremental diff is somewhat hard.  Thus a patch including all
>> patches from patch1 to this one.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> Thanks for your reply!
>>>>>
>>>>> I didn't understand your point:
>>>>>
>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>
>>>>> but you do it unconditionally in proposed patch.
>>>>
>>>> I don't mind means I am fine with it.
>>>>
>>>>> Also I assume that
>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>> split headers of loops, loop exit blocks etc.
>>>>
>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>> be surprised if so but that may be possible.
>>>>
>>>>> I prefer to do something
>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>> destination bb belongs to loop).
>>>>
>>>> That works for me as well but it is more complicated to implement.
>>>> Ideally you'd only split one edge if you find a block with only critical
>>>> predecessors (where we'd currently give up).  But note that this
>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>> will change loop->num_nodes so we have to be more careful in
>>>> constructing the loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>> I have few questions about your comments.
>>>>>>>
>>>>>>> 1. You wrote :
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> path
>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>> predicate_extended scalar phi to one function?
>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>> compile time since it builds hash_map.
>>>>>>
>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>> non-aggressive if-conversion.
>>>>>>
>>>>>>> 2. About critical edge splitting.
>>>>>>>
>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>> option only; (2) should we split all critical edges.
>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>> do something like
>>>>>>
>>>>>> Index: gcc/tree-if-conv.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>      return 0;
>>>>>>
>>>>>> +  bool critical_edges_split_p = false;
>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>             && !loop->dont_vectorize))
>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>> +      {
>>>>>> +       if (!critical_edges_split_p)
>>>>>> +         {
>>>>>> +           split_critical_edges ();
>>>>>> +           critical_edges_split_p = true;
>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>> +         }
>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>> +      }
>>>>>>
>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>    {
>>>>>>
>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>> additional option.
>>>>>>
>>>>>> Yes, I know.
>>>>>>
>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>> critical edges then I am all for that solution.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks ahead.
>>>>>>> Yuri.
>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>> and at least one incoming edge
>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>
>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>
>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>   {
>>>>>>>>>     <bb 5>:
>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 9>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>   {
>>>>>>>>>     <bb 6>:
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 8>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>   {
>>>>>>>>>     <bb 7>:
>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>     goto <bb 11>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>> #if 0
>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>  else
>>>>>>>>> #endif
>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>
>>>>>>>>> we will get ICE:
>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>  void foo (int n)
>>>>>>>>>       ^
>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>> _52 = _1 & _3;
>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>
>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>
>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>
>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>
>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>> predicating a PHI.
>>>>>>>>
>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>> the printfs properly.
>>>>>>>>
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> paths.
>>>>>>>>
>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>
>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> ChangeLog is
>>>>>>>>>
>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>> statement iterator.
>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>> non-critical incoming edge.
>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>> critical.
>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>
>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>
>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>
>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>
>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>
>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>> works.
>>>>>>>>>>
>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>
>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>
>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-19 11:46                                 ` Richard Biener
@ 2014-12-22 14:49                                   ` Yuri Rumyantsev
  2015-01-09 12:31                                     ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2014-12-22 14:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 31381 bytes --]

Richard,

I changed algorithm for bool pattern repair.
It turned out that ifcvt_local_dce phaase is required since for
test-case I sent you in previous mail vectorization is not performed
without dead code elimination:

For the loop
#pragma omp simd safelen(8)
  for (i=0; i<512; i++)
  {
    float t = a[i];
    if (t > 0.0f & t < 1.0e+17f)
      if (c[i] != 0)
res += 1;
  }

I've got the following message from vectorizer:

t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;

t3.c:10:11: note: bit-precision arithmetic not supported.
t3.c:10:11: note: not vectorized: relevant stmt not supported:
_ifc__39 = t_5 > 0.0;

It is caused by the following dead predicate computations after
critical edge splitting:

(after combine blocks):

<bb 3>:
# res_15 = PHI <res_1(7), 0(19)>
# i_16 = PHI <i_11(7), 0(19)>
# ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
t_5 = a[i_16];
_6 = t_5 > 0.0;
_7 = t_5 < 9.9999998430674944e+16;
_8 = _6 & _7;
_10 = &c[i_16];
_ifc__36 = _8 ? 4294967295 : 0;
_9 = MASK_LOAD (_10, 0B, _ifc__36);
_28 = _8;
_29 = _9 != 0;
_30 = _28 & _29;
// Statements below are dead!!
_31 = _8;
_32 = _9 != 0;
_33 = ~_32;
_34 = _31 & _33;
// End of dead statements.
_ifc__35 = _30 ? 1 : 0;
res_1 = res_15 + _ifc__35;
i_11 = i_16 + 1;
ivtmp_13 = ivtmp_14 - 1;
if (ivtmp_13 != 0)
  goto <bb 7>;
else
  goto <bb 8>;

But if we delete these statements loop will be vectorized.

New patch is attached.

Thanks.
Yuri.

2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I am sending you full patch (~1000 lines) but if you need only patch.1
>> and patch.2 will let me know and i'll send you reduced patch.
>>
>> Below are few comments regarding your remarks for patch.3.
>>
>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>> when dead code elimination is required to vectorize loop, i.e. dead
>> statement is marked as relevant.
>> 2. You wrote:
>>> The "retry" code also looks odd - why do you walk the BB multiple
>>> times instead of just doing sth like
>>>
>>>  while (!has_single_use (lhs))
>>>    {
>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>      ifcvt_walk_pattern_tree (copy);
>>>    }
>>>
>>> thus returning the copy you create and re-process it (the copy should
>>> now have a single-use).
>>
>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>> but some intermediate variables too. For example, for the following
>> test-case
>>
>> float a[1000];
>> int c[1000];
>>
>> int foo()
>> {
>>   int i, res = 0;
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>   return res;
>> }
>>
>> After combine_blocks we have the following bb:
>>
>> <bb 3>:
>> # res_15 = PHI <res_1(7), 0(15)>
>> # i_16 = PHI <i_11(7), 0(15)>
>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>> t_5 = a[i_16];
>> _6 = t_5 > 0.0;
>> _7 = t_5 < 9.9999998430674944e+16;
>> _8 = _6 & _7;
>> _10 = &c[i_16];
>> _ifc__32 = _8 ? 4294967295 : 0;
>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>> _28 = _8;
>> _29 = _9 != 0;
>> _30 = _28 & _29;
>> _ifc__31 = _30 ? 1 : 0;
>> res_1 = res_15 + _ifc__31;
>> i_11 = i_16 + 1;
>> ivtmp_13 = ivtmp_14 - 1;
>> if (ivtmp_13 != 0)
>>   goto <bb 7>;
>> else
>>   goto <bb 8>;
>>
>> and we can see that _8 has multiple uses. Also note that after splitting of
>> _8 = _6 & _7
>> we also get multiple uses for definition of  _6 and _7. So I used this
>> iterative algorithm as the simplest one.
>
> But it walks the entire pattern again and again while you only need to
> ensure you walk the pattern tree of the now single-use DEF again
> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
> you should pass down the user_operand_p that you need to make
> single-use).
>
>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>> for stmt_is_root_of_bool_pattern.
>>
>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>> to check on COND_EXPR only since PHI predication and memory access
>> predication produced only such statements,i.e. it can look like
>>
>> static bool
>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>> {
>>   enum tree_code code;
>>   tree lhs, rhs;
>>
>>   code = gimple_assign_rhs_code (stmt);
>>   if (code == COND_EXPR)
>>     {
>>       rhs = gimple_assign_rhs1 (stmt);
>>       if (TREE_CODE (rhs) != SSA_NAME)
>> return false;
>>       *var = rhs;
>>       return true;
>>     }
>>   return false;
>> }
>>
>> I also did few minor changes in patch.2.
>>
>> 3. You can also notice that I inserted code in tree_if_conversion to
>> do loop version if explicit option "-ftree-loop-if-convert" was not
>> passed to compiler, i.e. we perform if-conversion for loop
>> vectorization only and if it does not take place, we should delete
>> if-converted version of loop.
>> What is your opinion?
>
> Overall part 1 and part 2 look good to me, predicate_scalar_phi
> looks in need of some refactoring to avoid duplicate code.  We can
> do that a followup.
>
> Part 3 still needs the iteration to be resolved and make the use we
> actually care about single-use, not a random one so we can avoid
> iterating completely.
>
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi Richard,
>>>>
>>>> Here is updated patch which includes
>>>> (1) split critical edges for aggressive if conversion.
>>>> (2) delete all stuff related to support of critical edge predication.
>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>
>>>> I checked that patch works in stress testing mode, i.e. with
>>>> aggressive if conversion by default.
>>>>
>>>> What is your opinion?
>>>
>>> Looks ok overall, but please simply do
>>>
>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>       split_edge (e);
>>>
>>> for all blocks apart from the latch.
>>>
>>> Can you please send a combined patch up to this one?  Looking at
>>> the incremental diff is somewhat hard.  Thus a patch including all
>>> patches from patch1 to this one.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> Thanks for your reply!
>>>>>>
>>>>>> I didn't understand your point:
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>
>>>>>> but you do it unconditionally in proposed patch.
>>>>>
>>>>> I don't mind means I am fine with it.
>>>>>
>>>>>> Also I assume that
>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>> split headers of loops, loop exit blocks etc.
>>>>>
>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>> be surprised if so but that may be possible.
>>>>>
>>>>>> I prefer to do something
>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>> destination bb belongs to loop).
>>>>>
>>>>> That works for me as well but it is more complicated to implement.
>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>> predecessors (where we'd currently give up).  But note that this
>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>> will change loop->num_nodes so we have to be more careful in
>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>> I have few questions about your comments.
>>>>>>>>
>>>>>>>> 1. You wrote :
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> path
>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>> compile time since it builds hash_map.
>>>>>>>
>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>> non-aggressive if-conversion.
>>>>>>>
>>>>>>>> 2. About critical edge splitting.
>>>>>>>>
>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>
>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>> do something like
>>>>>>>
>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>      return 0;
>>>>>>>
>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>             && !loop->dont_vectorize))
>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>> +      {
>>>>>>> +       if (!critical_edges_split_p)
>>>>>>> +         {
>>>>>>> +           split_critical_edges ();
>>>>>>> +           critical_edges_split_p = true;
>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>> +         }
>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>> +      }
>>>>>>>
>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>    {
>>>>>>>
>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>> additional option.
>>>>>>>
>>>>>>> Yes, I know.
>>>>>>>
>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>> critical edges then I am all for that solution.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>> Thanks ahead.
>>>>>>>> Yuri.
>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>> and at least one incoming edge
>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>
>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>
>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 5>:
>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 6>:
>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>     else
>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>   {
>>>>>>>>>>     <bb 7>:
>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>> #if 0
>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>  else
>>>>>>>>>> #endif
>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>
>>>>>>>>>> we will get ICE:
>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>  void foo (int n)
>>>>>>>>>>       ^
>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>
>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>
>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>
>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>
>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>> predicating a PHI.
>>>>>>>>>
>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>> the printfs properly.
>>>>>>>>>
>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>> paths.
>>>>>>>>>
>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>
>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> ChangeLog is
>>>>>>>>>>
>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>> statement iterator.
>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>> critical.
>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>
>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>
>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>
>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>
>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>> works.
>>>>>>>>>>>
>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>
>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: patch.20141221 --]
[-- Type: application/octet-stream, Size: 34388 bytes --]

diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f7befac..1fe2b18 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -127,10 +127,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "insn-codes.h"
 #include "optabs.h"
+#include "hash-map.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Apply more aggressive (extended) if-conversion if true.  */
+static bool aggressive_if_conv;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -373,6 +377,22 @@ static tree
 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
   tree rhs1, lhs1, cond_expr;
+
+  /* If COND is comparison r != 0 and r has boolean type, convert COND
+     to SSA_NAME to accept by vect bool pattern.  */
+  if (TREE_CODE (cond) == NE_EXPR)
+    {
+      tree op0 = TREE_OPERAND (cond, 0);
+      tree op1 = TREE_OPERAND (cond, 1);
+      if (TREE_CODE (op0) == SSA_NAME
+	  && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+	  && (integer_zerop (op1)))
+	cond = op0;
+      else if (TREE_CODE (op1) == SSA_NAME
+	       && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
+	       && (integer_zerop (op0)))
+	cond = op1;
+    }
   cond_expr = fold_ternary (COND_EXPR, type, cond,
 			    rhs, lhs);
 
@@ -485,10 +505,11 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
 }
 
-/* Return true if one of the successor edges of BB exits LOOP.  */
+/* Returns true if one of the successor edges of BB exits LOOP.  */
 
 static bool
 bb_with_exit_edge_p (struct loop *loop, basic_block bb)
@@ -512,7 +533,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the aggressive_if_conv is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
@@ -524,11 +547,17 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2)
+	{
+	  if (!aggressive_if_conv)
+	    {
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "More than two phi node args.\n");
+	      return false;
+	    }
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -895,7 +924,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -942,6 +972,35 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
+/* Returns true if at least one successor in on critical edge.  */
+static inline bool
+has_pred_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) > 1)
+      return true;
+  return false;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -950,6 +1009,9 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction will be deleted after adding support for extended
+   predication.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -962,10 +1024,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
-  if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+  if (EDGE_COUNT (bb->succs) > 2)
     return false;
 
+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)
+	return false;
+    }
+
   if (exit_bb)
     {
       if (bb != loop->latch)
@@ -1001,20 +1068,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
-	  return false;
-	}
+     source.  This check is skipped if aggressive_if_conv is true.  */
+  if (!aggressive_if_conv
+      && EDGE_COUNT (bb->preds) > 1
+      && bb != loop->header
+      && all_preds_critical_p (bb))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "only critical predecessors\n");
+	return false;
     }
 
   return true;
@@ -1126,11 +1188,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1141,7 +1204,7 @@ predicate_bbs (loop_p loop)
 	  tree c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+	  tree c = build2_loc (loc, gimple_cond_code (stmt),
 				    boolean_type_node,
 				    gimple_cond_lhs (stmt),
 				    gimple_cond_rhs (stmt));
@@ -1363,60 +1426,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
   return res;
 }
 
-/* Basic block BB has two predecessors.  Using predecessor's bb
-   predicate, set an appropriate condition COND for the PHI node
-   replacement.  Return the true block whose phi arguments are
-   selected when cond is true.  LOOP is the loop containing the
-   if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
-
-static basic_block
-find_phi_replacement_condition (basic_block bb, tree *cond,
-				gimple_stmt_iterator *gsi)
-{
-  edge first_edge, second_edge;
-  tree tmp_cond;
-
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
-    {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
-
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
-
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
-      else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
-
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
-
-  return first_edge->src;
-}
-
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1427,14 +1436,15 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   ARG_0 and ARG_1 are correspondent PHI arguments.
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is true if PHI has > 2 arguments.  */
 
 static bool
-is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+is_cond_scalar_reduction (gimple phi, gimple *reduc, tree arg_0, tree arg_1,
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
-  tree arg_0, arg_1;
   gimple stmt;
   gimple header_phi = NULL;
   enum tree_code reduction_op;
@@ -1443,13 +1453,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  edge e;
+  edge_iterator ei;
+  bool result = false;
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
-  if (gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
+  if (!extended && gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
     {
       lhs = arg_1;
       header_phi = SSA_NAME_DEF_STMT (arg_0);
@@ -1480,8 +1490,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1578,9 +1593,66 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Helpers for PHI arguments hashtable map.  */
+
+struct phi_args_hash_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (tree);
+  static inline bool equal_keys (tree, tree);
+};
+
+inline hashval_t
+phi_args_hash_traits::hash (tree value)
+{
+  return iterative_hash_expr (value, 0);
+}
+
+inline bool
+phi_args_hash_traits::equal_keys (tree value1, tree value2)
+{
+  return operand_equal_p (value1, value2, 0);
+}
+
+  /* Produce condition for all occurrences of ARG in PHI node.  */
+
+static tree
+gen_phi_arg_condition (gphi *phi, vec<int> *occur,
+		       gimple_stmt_iterator *gsi)
+{
+  int len;
+  int i;
+  tree cond = NULL_TREE;
+  tree c;
+  edge e;
+
+  len = occur->length ();
+  gcc_assert (len > 0);
+  for (i = 0; i < len; i++)
+    {
+      e = gimple_phi_arg_edge (phi, (*occur)[i]);
+      c = bb_predicate (e->src);
+      if (is_true_predicate (c))
+	continue;
+      c = force_gimple_operand_gsi_1 (gsi, unshare_expr (c),
+				      is_gimple_condexpr, NULL_TREE,
+				      true, GSI_SAME_STMT);
+      if (cond != NULL_TREE)
+	{
+	  /* Must build OR expression.  */
+	  cond = fold_or_predicates (EXPR_LOCATION (c), c, cond);
+	  cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					     is_gimple_condexpr, NULL_TREE,
+					     true, GSI_SAME_STMT);
+	}
+      else
+	cond = c;
+    }
+  gcc_assert (cond != NULL_TREE);
+  return cond;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
-   This routine does not handle PHI nodes with more than two
-   arguments.
+   This routine can handle PHI nodes with more than two arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)>
@@ -1588,69 +1660,209 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
      S2: A = cond ? x1 : x2;
 
    The generated code is inserted at GSI that points to the top of
-   basic block's statement list.  When COND is true, phi arg from
-   TRUE_BB is selected.  */
+   basic block's statement list.
+   If PHI node has more than two arguments a chain of conditional
+   expression is produced.  */
+
 
 static void
-predicate_scalar_phi (gphi *phi, tree cond,
-		      basic_block true_bb,
-		      gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 {
-  gimple new_stmt;
+  gimple new_stmt = NULL, reduc;
+  tree rhs, res, arg0, arg1, op0, op1, scev;
+  tree cond;
+  unsigned int index0;
+  unsigned int max, args_len;
+  edge e;
   basic_block bb;
-  tree rhs, res, arg, scev;
-
-  gcc_assert (gimple_code (phi) == GIMPLE_PHI
-	      && gimple_phi_num_args (phi) == 2);
+  unsigned int i;
 
   res = gimple_phi_result (phi);
-  /* Do not handle virtual phi nodes.  */
   if (virtual_operand_p (res))
     return;
 
-  bb = gimple_bb (phi);
-
-  if ((arg = degenerate_phi_result (phi))
+  if ((rhs = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
 	  && !chrec_contains_undetermined (scev)
 	  && scev != res
-	  && (arg = gimple_phi_arg_def (phi, 0))))
-    rhs = arg;
-  else
-    {
-      tree arg_0, arg_1;
-      tree op0, op1;
-      gimple reduc;
+	  && (rhs = gimple_phi_arg_def (phi, 0)))) {
+    if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Degenerate phi!\n");
+	print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+      }
+    new_stmt = gimple_build_assign (res, rhs);
+    gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+    update_stmt (new_stmt);
+    return;
+  }
 
-      /* Use condition that is not TRUTH_NOT_EXPR in conditional modify expr.  */
+  bb = gimple_bb (phi);
+  if (EDGE_COUNT (bb->preds) == 2)
+    {
+      /* Predicate ordinary PHI node with 2 arguments.  */
+      edge first_edge, second_edge;
+      basic_block true_bb;
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+      cond = bb_predicate (first_edge->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  cond = bb_predicate (second_edge->src);
+	  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	    cond = TREE_OPERAND (cond, 0);
+	  else
+	    first_edge = second_edge;
+	}
+      else
+	cond = bb_predicate (first_edge->src);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      true_bb = first_edge->src;
       if (EDGE_PRED (bb, 1)->src == true_bb)
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 1);
-	  arg_1 = gimple_phi_arg_def (phi, 0);
+	  arg0 = gimple_phi_arg_def (phi, 1);
+	  arg1 = gimple_phi_arg_def (phi, 0);
 	}
       else
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 0);
-	  arg_1 = gimple_phi_arg_def (phi, 1);
+	  arg0 = gimple_phi_arg_def (phi, 0);
+	  arg1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, arg0, arg1,
+				    &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
-				    arg_0, arg_1);
+				    arg0, arg1);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "new phi replacement stmt\n");
+	  print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+	}
+      return;
+    }
+
+  /* Create hashmap for PHI node which contain vector of argument indexes
+     having the same value.  */
+  bool swap = false;
+  hash_map<tree, auto_vec<int>, phi_args_hash_traits> phi_arg_map;
+  unsigned int num_args = gimple_phi_num_args (phi);
+  int max_ind = -1;
+  /* Vector of different PHI argument values.  */
+  auto_vec<tree> args (num_args);
+
+  /* Compute phi_arg_map.  */
+  for (i = 0; i < num_args; i++)
+    {
+      tree arg;
+
+      arg = gimple_phi_arg_def (phi, i);
+      if (!phi_arg_map.get (arg))
+	args.quick_push (arg);
+      phi_arg_map.get_or_insert (arg).safe_push (i);
     }
 
-  new_stmt = gimple_build_assign (res, rhs);
-  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-  update_stmt (new_stmt);
+  /* Determine element with max number of occurrences.  */
+  max_ind = -1;
+  max = 1;
+  args_len = args.length ();
+  for (i = 0; i < args_len; i++)
+    {
+      unsigned int len;
+      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+	{
+	  max_ind = (int) i;
+	  max = len;
+	}
+    }
+
+  /* Put element with max number of occurences to the end of ARGS.  */
+  if (max_ind != -1 && max_ind +1 != (int) args_len)
+    {
+      tree tmp = args[args_len - 1];
+      args[args_len - 1] = args[max_ind];
+      args[max_ind] = tmp;
+    }
+
+  /* Handle one special case when number of arguments with different values
+     is equal 2 and one argument has the only occurrence.  Such PHI can be
+     handled as if would have only 2 arguments.  */
+  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+    {
+      vec<int> *indexes;
+      indexes = phi_arg_map.get (args[0]);
+      index0 = (*indexes)[0];
+      arg0 = args[0];
+      arg1 = args[1];
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = bb_predicate (e->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      if (!(is_cond_scalar_reduction (phi, &reduc, arg0 , arg1,
+				      &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+    }
+  else
+    {
+      /* Common case.  */
+      vec<int> *indexes;
+      tree type = TREE_TYPE (gimple_phi_result (phi));
+      tree lhs;
+      arg1 = args[1];
+      for (i = 0; i < args_len; i++)
+	{
+	  arg0 = args[i];
+	  indexes = phi_arg_map.get (args[i]);
+	  if (i != args_len - 1)
+	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+	  else
+	    lhs = res;
+	  cond = gen_phi_arg_condition (phi, indexes, gsi);
+	  rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				      arg0, arg1);
+	  new_stmt = gimple_build_assign (lhs, rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (new_stmt);
+	  arg1 = lhs;
+	}
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "new phi replacement stmt\n");
+      fprintf (dump_file, "new extended phi replacement stmt\n");
       print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
     }
 }
@@ -1668,28 +1880,25 @@ predicate_all_scalar_phis (struct loop *loop)
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
       gphi *phi;
-      tree cond = NULL_TREE;
       gimple_stmt_iterator gsi;
       gphi_iterator phi_gsi;
-      basic_block true_bb = NULL;
       bb = ifc_bbs[i];
 
       if (bb == loop->header)
 	continue;
 
+      if (EDGE_COUNT (bb->preds) == 1)
+	continue;
+
       phi_gsi = gsi_start_phis (bb);
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
-
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1710,7 +1919,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
-
+      if (!is_predicated (bb))
+	gcc_assert (bb_predicate_gimplified_stmts (bb) == NULL);
       if (!is_predicated (bb))
 	{
 	  /* Do not insert statements for a basic block that is not
@@ -1862,7 +2072,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
 static void
 predicate_mem_writes (loop_p loop)
 {
-  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
+  tree mask_vec[10];
 
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
@@ -1882,6 +2093,9 @@ predicate_mem_writes (loop_p loop)
 	  cond = TREE_OPERAND (cond, 0);
 	}
 
+      for (j=0; j<10; j++)
+	mask_vec[j] = NULL_TREE;
+
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
 	  continue;
@@ -1892,21 +2106,26 @@ predicate_mem_writes (loop_p loop)
 	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
 	    gimple new_stmt;
 	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
-
-	    masktype = build_nonstandard_integer_type (bitsize, 1);
-	    mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-	    mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
 	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
 	    mark_addressable (ref);
 	    addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
 					     true, NULL_TREE, true,
 					     GSI_SAME_STMT);
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
-	    mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					 mask_op0, mask_op1);
-	    mask = ifc_temp_var (masktype, mask, &gsi);
+	    gcc_assert (exact_log2 (bitsize) != -1);
+	    if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
+	      {
+		masktype = build_nonstandard_integer_type (bitsize, 1);
+		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
+		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
+		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
+					           is_gimple_condexpr,
+						   NULL_TREE,
+					           true, GSI_SAME_STMT);
+		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
+					     mask_op0, mask_op1);
+		mask = ifc_temp_var (masktype, mask, &gsi);
+		mask_vec[exact_log2 (bitsize)] = mask;
+	      }
 	    ptr = build_int_cst (reference_alias_ptr_type (ref), 0);
 	    /* Copy points-to info if possible.  */
 	    if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
@@ -2134,6 +2353,308 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+/* Performs splitting of critical edges if aggressive_if_conv is true.
+   Returns false if loop won't be if converted and true otherwise.  */
+
+static bool
+ifcvt_split_critical_edges (struct loop *loop)
+{
+  basic_block *body;
+  basic_block bb;
+  unsigned int num = loop->num_nodes;
+  unsigned int i;
+  gimple stmt;
+  edge e;
+  edge_iterator ei;
+
+  if (num <= 2)
+    return false;
+  if (loop->inner)
+    return false;
+  if (!single_exit (loop))
+    return false;
+
+  body = get_loop_body (loop);
+  for (i = 0; i < num; i++)
+    {
+      bb = body[i];
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
+	continue;
+      stmt = last_stmt (bb);
+      /* Skip basic blocks not ending with conditional branch.  */
+      if (!(stmt && gimple_code (stmt) == GIMPLE_COND))
+	continue;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
+	  split_edge (e);
+    }
+  free (body);
+  return true;
+}
+
+/* Assumes that lhs of DEF_STMT have multiple uses.
+   Delete one use by (1) creation of copy DEF_STMT with
+   unique lhs; (2) change original use of lhs in one
+   use statement with newly created lhs.  */
+
+static void
+ifcvt_split_def_stmt (gimple def_stmt, gimple use_stmt)
+{
+  tree var;
+  tree lhs;
+  gimple copy_stmt;
+  gimple_stmt_iterator gsi;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  var = gimple_assign_lhs (def_stmt);
+  copy_stmt = gimple_copy (def_stmt);
+  lhs = make_temp_ssa_name (TREE_TYPE (var), NULL, "_ifc_");
+  gimple_assign_set_lhs (copy_stmt, lhs);
+  SSA_NAME_DEF_STMT (lhs) = copy_stmt;
+  /* Insert copy of DEF_STMT.  */
+  gsi = gsi_for_stmt (def_stmt);
+  gsi_insert_after (&gsi, copy_stmt, GSI_SAME_STMT);
+  /* Change use of var to lhs in use_stmt.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Change use of var  ");
+      print_generic_expr (dump_file, var, TDF_SLIM);
+      fprintf (dump_file, " to ");
+      print_generic_expr (dump_file, lhs, TDF_SLIM);
+      fprintf (dump_file, "\n");
+    }
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, var)
+    {
+      if (USE_STMT (use_p) != use_stmt)
+	continue;
+      SET_USE (use_p, lhs);
+      break;
+    }
+}
+
+/* Traverse bool pattern recursively starting from VAR.
+   Save its def and use statements to defuse_list if VAR does
+   not have single use.  */
+
+static void
+ifcvt_walk_pattern_tree (tree var, vec<gimple> *defuse_list,
+			 gimple use_stmt)
+{
+  tree rhs1, rhs2;
+  enum tree_code code;
+  gimple def_stmt;
+
+  def_stmt = SSA_NAME_DEF_STMT (var);
+  if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
+    return;
+  if (!has_single_use (var))
+    {
+      /* Put def and use stmts into defuse_list.  */
+      defuse_list->safe_push (def_stmt);
+      defuse_list->safe_push (use_stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Multiple lhs uses in stmt\n");
+	  print_gimple_stmt (dump_file, def_stmt, 0, TDF_SLIM);
+	}
+    }
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  code = gimple_assign_rhs_code (def_stmt);
+  switch (code)
+    {
+    case SSA_NAME:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    CASE_CONVERT:
+      if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+	   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+	break;
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    case BIT_NOT_EXPR:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      rhs2 = gimple_assign_rhs2 (def_stmt);
+      ifcvt_walk_pattern_tree (rhs2, defuse_list, def_stmt);
+      break;
+    default:
+      break;
+    }
+  return;
+}
+
+/* Returns true if STMT can be a root of bool pattern apllied
+   by vectorizer.  VAR contains SSA_NAME which starts pattern.  */
+
+static bool
+stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
+{
+  enum tree_code code;
+  tree lhs, rhs;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (CONVERT_EXPR_CODE_P (code))
+    {
+      lhs = gimple_assign_lhs (stmt);
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+	return false;
+      if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  else if (code == COND_EXPR)
+    {
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (rhs) != SSA_NAME)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  return false;
+}
+
+/*  Traverse all statements in BB which correspondent to loop header to
+    find out all statements which can start bool pattern applied by
+    vectorizer and convert multiple uses in it to conform pattern
+    restrictions.  Such case can occur if the same predicate is used both
+    for phi node conversion and load/store mask.  */
+
+static void
+ifcvt_repair_bool_pattern (basic_block bb)
+{
+  tree rhs;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+  vec<gimple> defuse_list = vNULL;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
+	continue;
+      if (!stmt_is_root_of_bool_pattern (stmt, &rhs))
+	continue;
+      ifcvt_walk_pattern_tree (rhs, &defuse_list, stmt);
+      while (defuse_list.length () > 0)
+	{
+	  gimple def_stmt, use_stmt;
+	  use_stmt = defuse_list.pop ();
+	  def_stmt = defuse_list.pop ();
+	  ifcvt_split_def_stmt (def_stmt, use_stmt);
+	}
+    }
+}
+
+/* Delete redundant statements produced by predication which prevents
+   loop vectorization.  */
+
+static void
+ifcvt_local_dce (basic_block bb)
+{
+  gimple stmt;
+  gimple stmt1;
+  gimple phi;
+  gimple_stmt_iterator gsi;
+  vec<gimple> worklist;
+  enum gimple_code code;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+
+  worklist.create (64);
+  /* Consider all phi as live statements.  */
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      phi = gsi_stmt (gsi);
+      gimple_set_plf (phi, GF_PLF_2, true);
+      worklist.safe_push (phi);
+    }
+  /* Consider load/store statemnts, CALL and COND as live.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_store_p (stmt)
+	  || gimple_assign_load_p (stmt))
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  worklist.safe_push (stmt);
+	  continue;
+	}
+      code = gimple_code (stmt);
+      if (code == GIMPLE_COND || code == GIMPLE_CALL)
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  worklist.safe_push (stmt);
+	  continue;
+	}
+      gimple_set_plf (stmt, GF_PLF_2, false);
+
+      if (code == GIMPLE_ASSIGN)
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+	    {
+	      stmt1 = USE_STMT (use_p);
+	      if (gimple_bb (stmt1) != bb)
+		{
+		  gimple_set_plf (stmt, GF_PLF_2, true);
+		  worklist.safe_push (stmt);
+		  break;
+		}
+	    }
+	}
+    }
+  /* Propagate liveness through arguments of live stmt.  */
+  while (worklist.length () > 0)
+    {
+      ssa_op_iter iter;
+      use_operand_p use_p;
+      tree use;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
+	{
+	  use = USE_FROM_PTR (use_p);
+	  if (TREE_CODE (use) != SSA_NAME)
+	    continue;
+	  stmt1 = SSA_NAME_DEF_STMT (use);
+	  if (gimple_bb (stmt1) != bb
+	      || gimple_plf (stmt1, GF_PLF_2))
+	    continue;
+	  gimple_set_plf (stmt1, GF_PLF_2, true);
+	  worklist.safe_push (stmt1);
+	}
+    }
+  /* Delete dead statements.  */
+  gsi = gsi_start_bb (bb);
+  while (!gsi_end_p (gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_plf (stmt, GF_PLF_2))
+	{
+	  gsi_next (&gsi);
+	  continue;
+	}
+      gcc_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Delete dead stmt in bb#%d\n", bb->index);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      gsi_remove (&gsi, true);
+      release_defs (stmt);
+    }
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2145,6 +2666,20 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
+
+  if (aggressive_if_conv)
+    if (!ifcvt_split_critical_edges (loop))
+      goto cleanup;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2154,7 +2689,9 @@ tree_if_conversion (struct loop *loop)
 	  || loop->dont_vectorize))
     goto cleanup;
 
-  if (any_mask_load_store && !version_loop_for_if_conversion (loop))
+  if ((any_mask_load_store
+       || flag_tree_loop_if_convert != 1)
+      && !version_loop_for_if_conversion (loop))
     goto cleanup;
 
   /* Now all statements are if-convertible.  Combine all the basic
@@ -2162,6 +2699,14 @@ tree_if_conversion (struct loop *loop)
      on-the-fly.  */
   combine_blocks (loop, any_mask_load_store);
 
+  /* Delete dead predicate computations and repair tree correspondent
+     to bool pattern to delete multiple uses of preidcates.  */
+  if (aggressive_if_conv)
+    {
+      ifcvt_local_dce (loop->header);
+      ifcvt_repair_bool_pattern (loop->header);
+    }
+
   todo |= TODO_cleanup_cfg;
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     {
@@ -2175,7 +2720,10 @@ tree_if_conversion (struct loop *loop)
       unsigned int i;
 
       for (i = 0; i < loop->num_nodes; i++)
-	free_bb_predicate (ifc_bbs[i]);
+	{
+	  basic_block bb = ifc_bbs[i];
+	  free_bb_predicate (bb);
+	}
 
       free (ifc_bbs);
       ifc_bbs = NULL;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2014-12-22 14:49                                   ` Yuri Rumyantsev
@ 2015-01-09 12:31                                     ` Richard Biener
  2015-01-14 13:33                                       ` Yuri Rumyantsev
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Biener @ 2015-01-09 12:31 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I changed algorithm for bool pattern repair.
> It turned out that ifcvt_local_dce phaase is required since for
> test-case I sent you in previous mail vectorization is not performed
> without dead code elimination:
>
> For the loop
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>
> I've got the following message from vectorizer:
>
> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>
> t3.c:10:11: note: bit-precision arithmetic not supported.
> t3.c:10:11: note: not vectorized: relevant stmt not supported:
> _ifc__39 = t_5 > 0.0;
>
> It is caused by the following dead predicate computations after
> critical edge splitting:
>
> (after combine blocks):
>
> <bb 3>:
> # res_15 = PHI <res_1(7), 0(19)>
> # i_16 = PHI <i_11(7), 0(19)>
> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
> t_5 = a[i_16];
> _6 = t_5 > 0.0;
> _7 = t_5 < 9.9999998430674944e+16;
> _8 = _6 & _7;
> _10 = &c[i_16];
> _ifc__36 = _8 ? 4294967295 : 0;
> _9 = MASK_LOAD (_10, 0B, _ifc__36);
> _28 = _8;
> _29 = _9 != 0;
> _30 = _28 & _29;
> // Statements below are dead!!
> _31 = _8;
> _32 = _9 != 0;
> _33 = ~_32;
> _34 = _31 & _33;
> // End of dead statements.
> _ifc__35 = _30 ? 1 : 0;
> res_1 = res_15 + _ifc__35;
> i_11 = i_16 + 1;
> ivtmp_13 = ivtmp_14 - 1;
> if (ivtmp_13 != 0)
>   goto <bb 7>;
> else
>   goto <bb 8>;
>
> But if we delete these statements loop will be vectorized.

Hm, ok.  We insert predicates too early obviously and not only when
needed.  But let's fix that later.

> New patch is attached.

 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
   tree rhs1, lhs1, cond_expr;
+
+  /* If COND is comparison r != 0 and r has boolean type, convert COND
+     to SSA_NAME to accept by vect bool pattern.  */
+  if (TREE_CODE (cond) == NE_EXPR)
+    {
+      tree op0 = TREE_OPERAND (cond, 0);
+      tree op1 = TREE_OPERAND (cond, 1);
+      if (TREE_CODE (op0) == SSA_NAME
+         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+         && (integer_zerop (op1)))
+       cond = op0;
+      else if (TREE_CODE (op1) == SSA_NAME
+              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
+              && (integer_zerop (op0)))
+       cond = op1;

The 2nd form, 0 != SSA_NAME doesn't happen due to operand
canonicalization.  Please remove its handling.

+      if (gimple_phi_num_args (phi) != 2)
+       {
+         if (!aggressive_if_conv)

&& !aggressive_if_conv

+  if (EDGE_COUNT (bb->preds) > 2)
+    {
+      if (!aggressive_if_conv)

Likewise.

-      gimple reduc;
+         && (rhs = gimple_phi_arg_def (phi, 0)))) {

the { goes to the next line

 static void
 predicate_mem_writes (loop_p loop)
 {
-  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
+  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
+  tree mask_vec[10];

an upper limit of 10?

+      for (j=0; j<10; j++)

spaces around '<' and '='

+       mask_vec[j] = NULL_TREE;
+

+           gcc_assert (exact_log2 (bitsize) != -1);
+           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
+             {

this seems to be a completely separate "optimization"?  Note that
there are targets with non-power-of-two bitsize modes (PSImode),
so the assert will likely trigger.  I would prefer if you separate this
part of the patch.

+      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
+       continue;

no space before gimple_code

+  imm_use_iterator imm_iter;
+
+
+  worklist.create (64);

excessive vertical space.

The patch misses the addition of new testcases - please add some,
otherwise the code will be totally untested.

I assume the patch passes bootstrap and regtest (you didn't say so).
Can you also do a bootstrap with aggressive_if_conv forced to
true and --with-build-config=bootstrap-O3 --disable-werror?

Thanks,
Richard.

> Thanks.
> Yuri.
>
> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>> and patch.2 will let me know and i'll send you reduced patch.
>>>
>>> Below are few comments regarding your remarks for patch.3.
>>>
>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>> when dead code elimination is required to vectorize loop, i.e. dead
>>> statement is marked as relevant.
>>> 2. You wrote:
>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>> times instead of just doing sth like
>>>>
>>>>  while (!has_single_use (lhs))
>>>>    {
>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>      ifcvt_walk_pattern_tree (copy);
>>>>    }
>>>>
>>>> thus returning the copy you create and re-process it (the copy should
>>>> now have a single-use).
>>>
>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>> but some intermediate variables too. For example, for the following
>>> test-case
>>>
>>> float a[1000];
>>> int c[1000];
>>>
>>> int foo()
>>> {
>>>   int i, res = 0;
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>   return res;
>>> }
>>>
>>> After combine_blocks we have the following bb:
>>>
>>> <bb 3>:
>>> # res_15 = PHI <res_1(7), 0(15)>
>>> # i_16 = PHI <i_11(7), 0(15)>
>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>> t_5 = a[i_16];
>>> _6 = t_5 > 0.0;
>>> _7 = t_5 < 9.9999998430674944e+16;
>>> _8 = _6 & _7;
>>> _10 = &c[i_16];
>>> _ifc__32 = _8 ? 4294967295 : 0;
>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>> _28 = _8;
>>> _29 = _9 != 0;
>>> _30 = _28 & _29;
>>> _ifc__31 = _30 ? 1 : 0;
>>> res_1 = res_15 + _ifc__31;
>>> i_11 = i_16 + 1;
>>> ivtmp_13 = ivtmp_14 - 1;
>>> if (ivtmp_13 != 0)
>>>   goto <bb 7>;
>>> else
>>>   goto <bb 8>;
>>>
>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>> _8 = _6 & _7
>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>> iterative algorithm as the simplest one.
>>
>> But it walks the entire pattern again and again while you only need to
>> ensure you walk the pattern tree of the now single-use DEF again
>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>> you should pass down the user_operand_p that you need to make
>> single-use).
>>
>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>> for stmt_is_root_of_bool_pattern.
>>>
>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>> to check on COND_EXPR only since PHI predication and memory access
>>> predication produced only such statements,i.e. it can look like
>>>
>>> static bool
>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>> {
>>>   enum tree_code code;
>>>   tree lhs, rhs;
>>>
>>>   code = gimple_assign_rhs_code (stmt);
>>>   if (code == COND_EXPR)
>>>     {
>>>       rhs = gimple_assign_rhs1 (stmt);
>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>> return false;
>>>       *var = rhs;
>>>       return true;
>>>     }
>>>   return false;
>>> }
>>>
>>> I also did few minor changes in patch.2.
>>>
>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>> passed to compiler, i.e. we perform if-conversion for loop
>>> vectorization only and if it does not take place, we should delete
>>> if-converted version of loop.
>>> What is your opinion?
>>
>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>> looks in need of some refactoring to avoid duplicate code.  We can
>> do that a followup.
>>
>> Part 3 still needs the iteration to be resolved and make the use we
>> actually care about single-use, not a random one so we can avoid
>> iterating completely.
>>
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> Here is updated patch which includes
>>>>> (1) split critical edges for aggressive if conversion.
>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>
>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>> aggressive if conversion by default.
>>>>>
>>>>> What is your opinion?
>>>>
>>>> Looks ok overall, but please simply do
>>>>
>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>       split_edge (e);
>>>>
>>>> for all blocks apart from the latch.
>>>>
>>>> Can you please send a combined patch up to this one?  Looking at
>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>> patches from patch1 to this one.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Thanks for your reply!
>>>>>>>
>>>>>>> I didn't understand your point:
>>>>>>>
>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>
>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>
>>>>>> I don't mind means I am fine with it.
>>>>>>
>>>>>>> Also I assume that
>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>
>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>> be surprised if so but that may be possible.
>>>>>>
>>>>>>> I prefer to do something
>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>> destination bb belongs to loop).
>>>>>>
>>>>>> That works for me as well but it is more complicated to implement.
>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>> I have few questions about your comments.
>>>>>>>>>
>>>>>>>>> 1. You wrote :
>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>> path
>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>
>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>> non-aggressive if-conversion.
>>>>>>>>
>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>
>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>
>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>> do something like
>>>>>>>>
>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>      return 0;
>>>>>>>>
>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>> +      {
>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>> +         {
>>>>>>>> +           split_critical_edges ();
>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>> +         }
>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>> +      }
>>>>>>>>
>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>    {
>>>>>>>>
>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>> additional option.
>>>>>>>>
>>>>>>>> Yes, I know.
>>>>>>>>
>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Thanks ahead.
>>>>>>>>> Yuri.
>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>
>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>
>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>     else
>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>     else
>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>   {
>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>> #if 0
>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>  else
>>>>>>>>>>> #endif
>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>
>>>>>>>>>>> we will get ICE:
>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>       ^
>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>
>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>
>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>
>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>
>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>> predicating a PHI.
>>>>>>>>>>
>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>> the printfs properly.
>>>>>>>>>>
>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>> paths.
>>>>>>>>>>
>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>
>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>> statement iterator.
>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>> critical.
>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>
>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>
>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>> works.
>>>>>>>>>>>>
>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>
>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2015-01-09 12:31                                     ` Richard Biener
@ 2015-01-14 13:33                                       ` Yuri Rumyantsev
  2015-01-14 15:00                                         ` Richard Biener
  0 siblings, 1 reply; 22+ messages in thread
From: Yuri Rumyantsev @ 2015-01-14 13:33 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 38147 bytes --]

Richard,

I did all changes proposed by you and add couple tests.
Bootstrap, including aggressive one proposed by you, and regression
testing did not show any new failures.

Is it OK for trunk?

ChangeLog:

2015-01-14  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-if-conv.c: Include hash-map.h.
(aggressive_if_conv): New variable.
(fold_build_cond_expr): Add simplification of non-zero condition.
(add_to_dst_predicate_list): Invoke add_to_predicate_list if edge
destination block is not always executed.
(if_convertible_phi_p): Fix commentary, allow phi nodes have more
than two predecessors if AGGRESSIVE_IF_CONV is true.
(if_convertible_stmt_p): Fix commentary.
(all_preds_critical_p): New function.
(has_pred_critical_p): New function.
(if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true
BB can have more than two predecessors and all incoming edges can be
critical.
(predicate_bbs): Skip predication for loop exit block, use build2_loc
to compute predicate for true edge.
(find_phi_replacement_condition): Delete this function.
(is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
Allow interchange PHI arguments if EXTENDED is false.
Change check that block containing reduction statement candidate
is predecessor of phi-block since phi may have more than two arguments.
(phi_args_hash_traits): New helper structure.
(struct phi_args_hash_traits): New type.
(phi_args_hash_traits::hash): New function.
(phi_args_hash_traits::equal_keys): New function.
(gen_phi_arg_condition): New function.
(predicate_scalar_phi): Add handling of phi nodes with more than two
arguments, delete COND and TRUE_BB arguments, insert body of
find_phi_replacement_condition to predicate ordinary phi nodes.
(predicate_all_scalar_phis): Skip blocks with the only predecessor,
delete call of find_phi_replacement_condition and invoke
predicate_scalar_phi with two arguments.
(insert_gimplified_predicates): Add assert that non-predicated block
don't have statements to insert.
(ifcvt_split_critical_edges): New function.
(ifcvt_split_def_stmt): Likewise.
(ifcvt_walk_pattern_tree): Likewise.
(stmt_is_root_of_bool_pattern): Likewise.
(ifcvt_repair_bool_pattern): Likewise.
(ifcvt_local_dce): Likewise.
(tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
is copy of inner or outer loop force_vectorize field, invoke
ifcvt_split_critical_edges, ifcvt_local_dce and
ifcvt_repair_bool_pattern for aggressive if-conversion.

gcc/testsuite/ChangeLog

* gcc.dg/vect/vect-aggressive-1.c: New test.
* gcc.target/i386/avx2-vect-aggressive.c: Likewise.

2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Richard,
>>
>> I changed algorithm for bool pattern repair.
>> It turned out that ifcvt_local_dce phaase is required since for
>> test-case I sent you in previous mail vectorization is not performed
>> without dead code elimination:
>>
>> For the loop
>> #pragma omp simd safelen(8)
>>   for (i=0; i<512; i++)
>>   {
>>     float t = a[i];
>>     if (t > 0.0f & t < 1.0e+17f)
>>       if (c[i] != 0)
>> res += 1;
>>   }
>>
>> I've got the following message from vectorizer:
>>
>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>>
>> t3.c:10:11: note: bit-precision arithmetic not supported.
>> t3.c:10:11: note: not vectorized: relevant stmt not supported:
>> _ifc__39 = t_5 > 0.0;
>>
>> It is caused by the following dead predicate computations after
>> critical edge splitting:
>>
>> (after combine blocks):
>>
>> <bb 3>:
>> # res_15 = PHI <res_1(7), 0(19)>
>> # i_16 = PHI <i_11(7), 0(19)>
>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
>> t_5 = a[i_16];
>> _6 = t_5 > 0.0;
>> _7 = t_5 < 9.9999998430674944e+16;
>> _8 = _6 & _7;
>> _10 = &c[i_16];
>> _ifc__36 = _8 ? 4294967295 : 0;
>> _9 = MASK_LOAD (_10, 0B, _ifc__36);
>> _28 = _8;
>> _29 = _9 != 0;
>> _30 = _28 & _29;
>> // Statements below are dead!!
>> _31 = _8;
>> _32 = _9 != 0;
>> _33 = ~_32;
>> _34 = _31 & _33;
>> // End of dead statements.
>> _ifc__35 = _30 ? 1 : 0;
>> res_1 = res_15 + _ifc__35;
>> i_11 = i_16 + 1;
>> ivtmp_13 = ivtmp_14 - 1;
>> if (ivtmp_13 != 0)
>>   goto <bb 7>;
>> else
>>   goto <bb 8>;
>>
>> But if we delete these statements loop will be vectorized.
>
> Hm, ok.  We insert predicates too early obviously and not only when
> needed.  But let's fix that later.
>
>> New patch is attached.
>
>  fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
>  {
>    tree rhs1, lhs1, cond_expr;
> +
> +  /* If COND is comparison r != 0 and r has boolean type, convert COND
> +     to SSA_NAME to accept by vect bool pattern.  */
> +  if (TREE_CODE (cond) == NE_EXPR)
> +    {
> +      tree op0 = TREE_OPERAND (cond, 0);
> +      tree op1 = TREE_OPERAND (cond, 1);
> +      if (TREE_CODE (op0) == SSA_NAME
> +         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
> +         && (integer_zerop (op1)))
> +       cond = op0;
> +      else if (TREE_CODE (op1) == SSA_NAME
> +              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
> +              && (integer_zerop (op0)))
> +       cond = op1;
>
> The 2nd form, 0 != SSA_NAME doesn't happen due to operand
> canonicalization.  Please remove its handling.
>
> +      if (gimple_phi_num_args (phi) != 2)
> +       {
> +         if (!aggressive_if_conv)
>
> && !aggressive_if_conv
>
> +  if (EDGE_COUNT (bb->preds) > 2)
> +    {
> +      if (!aggressive_if_conv)
>
> Likewise.
>
> -      gimple reduc;
> +         && (rhs = gimple_phi_arg_def (phi, 0)))) {
>
> the { goes to the next line
>
>  static void
>  predicate_mem_writes (loop_p loop)
>  {
> -  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
> +  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
> +  tree mask_vec[10];
>
> an upper limit of 10?
>
> +      for (j=0; j<10; j++)
>
> spaces around '<' and '='
>
> +       mask_vec[j] = NULL_TREE;
> +
>
> +           gcc_assert (exact_log2 (bitsize) != -1);
> +           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
> +             {
>
> this seems to be a completely separate "optimization"?  Note that
> there are targets with non-power-of-two bitsize modes (PSImode),
> so the assert will likely trigger.  I would prefer if you separate this
> part of the patch.
>
> +      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
> +       continue;
>
> no space before gimple_code
>
> +  imm_use_iterator imm_iter;
> +
> +
> +  worklist.create (64);
>
> excessive vertical space.
>
> The patch misses the addition of new testcases - please add some,
> otherwise the code will be totally untested.
>
> I assume the patch passes bootstrap and regtest (you didn't say so).
> Can you also do a bootstrap with aggressive_if_conv forced to
> true and --with-build-config=bootstrap-O3 --disable-werror?
>
> Thanks,
> Richard.
>
>> Thanks.
>> Yuri.
>>
>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>>> and patch.2 will let me know and i'll send you reduced patch.
>>>>
>>>> Below are few comments regarding your remarks for patch.3.
>>>>
>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>>> when dead code elimination is required to vectorize loop, i.e. dead
>>>> statement is marked as relevant.
>>>> 2. You wrote:
>>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>>> times instead of just doing sth like
>>>>>
>>>>>  while (!has_single_use (lhs))
>>>>>    {
>>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>>      ifcvt_walk_pattern_tree (copy);
>>>>>    }
>>>>>
>>>>> thus returning the copy you create and re-process it (the copy should
>>>>> now have a single-use).
>>>>
>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>>> but some intermediate variables too. For example, for the following
>>>> test-case
>>>>
>>>> float a[1000];
>>>> int c[1000];
>>>>
>>>> int foo()
>>>> {
>>>>   int i, res = 0;
>>>> #pragma omp simd safelen(8)
>>>>   for (i=0; i<512; i++)
>>>>   {
>>>>     float t = a[i];
>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>       if (c[i] != 0)
>>>> res += 1;
>>>>   }
>>>>   return res;
>>>> }
>>>>
>>>> After combine_blocks we have the following bb:
>>>>
>>>> <bb 3>:
>>>> # res_15 = PHI <res_1(7), 0(15)>
>>>> # i_16 = PHI <i_11(7), 0(15)>
>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>>> t_5 = a[i_16];
>>>> _6 = t_5 > 0.0;
>>>> _7 = t_5 < 9.9999998430674944e+16;
>>>> _8 = _6 & _7;
>>>> _10 = &c[i_16];
>>>> _ifc__32 = _8 ? 4294967295 : 0;
>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>>> _28 = _8;
>>>> _29 = _9 != 0;
>>>> _30 = _28 & _29;
>>>> _ifc__31 = _30 ? 1 : 0;
>>>> res_1 = res_15 + _ifc__31;
>>>> i_11 = i_16 + 1;
>>>> ivtmp_13 = ivtmp_14 - 1;
>>>> if (ivtmp_13 != 0)
>>>>   goto <bb 7>;
>>>> else
>>>>   goto <bb 8>;
>>>>
>>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>>> _8 = _6 & _7
>>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>>> iterative algorithm as the simplest one.
>>>
>>> But it walks the entire pattern again and again while you only need to
>>> ensure you walk the pattern tree of the now single-use DEF again
>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>>> you should pass down the user_operand_p that you need to make
>>> single-use).
>>>
>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>>> for stmt_is_root_of_bool_pattern.
>>>>
>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>>> to check on COND_EXPR only since PHI predication and memory access
>>>> predication produced only such statements,i.e. it can look like
>>>>
>>>> static bool
>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>>> {
>>>>   enum tree_code code;
>>>>   tree lhs, rhs;
>>>>
>>>>   code = gimple_assign_rhs_code (stmt);
>>>>   if (code == COND_EXPR)
>>>>     {
>>>>       rhs = gimple_assign_rhs1 (stmt);
>>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>>> return false;
>>>>       *var = rhs;
>>>>       return true;
>>>>     }
>>>>   return false;
>>>> }
>>>>
>>>> I also did few minor changes in patch.2.
>>>>
>>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>>> passed to compiler, i.e. we perform if-conversion for loop
>>>> vectorization only and if it does not take place, we should delete
>>>> if-converted version of loop.
>>>> What is your opinion?
>>>
>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>>> looks in need of some refactoring to avoid duplicate code.  We can
>>> do that a followup.
>>>
>>> Part 3 still needs the iteration to be resolved and make the use we
>>> actually care about single-use, not a random one so we can avoid
>>> iterating completely.
>>>
>>> Richard.
>>>
>>>> Thanks.
>>>> Yuri.
>>>>
>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi Richard,
>>>>>>
>>>>>> Here is updated patch which includes
>>>>>> (1) split critical edges for aggressive if conversion.
>>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>>
>>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>>> aggressive if conversion by default.
>>>>>>
>>>>>> What is your opinion?
>>>>>
>>>>> Looks ok overall, but please simply do
>>>>>
>>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>>       split_edge (e);
>>>>>
>>>>> for all blocks apart from the latch.
>>>>>
>>>>> Can you please send a combined patch up to this one?  Looking at
>>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>>> patches from patch1 to this one.
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>> Yuri.
>>>>>>
>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>> Richard,
>>>>>>>>
>>>>>>>> Thanks for your reply!
>>>>>>>>
>>>>>>>> I didn't understand your point:
>>>>>>>>
>>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>>
>>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>>
>>>>>>> I don't mind means I am fine with it.
>>>>>>>
>>>>>>>> Also I assume that
>>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>>
>>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>>> be surprised if so but that may be possible.
>>>>>>>
>>>>>>>> I prefer to do something
>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>>> destination bb belongs to loop).
>>>>>>>
>>>>>>> That works for me as well but it is more complicated to implement.
>>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>> Richard,
>>>>>>>>>>
>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>>> I have few questions about your comments.
>>>>>>>>>>
>>>>>>>>>> 1. You wrote :
>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>> path
>>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>>
>>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>>> non-aggressive if-conversion.
>>>>>>>>>
>>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>>
>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>>
>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>>> do something like
>>>>>>>>>
>>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>>> ===================================================================
>>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>>      return 0;
>>>>>>>>>
>>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>>> +      {
>>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>>> +         {
>>>>>>>>> +           split_critical_edges ();
>>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>>> +         }
>>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>>> +      }
>>>>>>>>>
>>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>>    {
>>>>>>>>>
>>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>>> additional option.
>>>>>>>>>
>>>>>>>>> Yes, I know.
>>>>>>>>>
>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>> Thanks ahead.
>>>>>>>>>> Yuri.
>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>> Richard,
>>>>>>>>>>>>
>>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>>
>>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>>
>>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>     else
>>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>     else
>>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>>   {
>>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>>
>>>>>>>>>>>>   }
>>>>>>>>>>>>
>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>>> #if 0
>>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>>  else
>>>>>>>>>>>> #endif
>>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>>
>>>>>>>>>>>> we will get ICE:
>>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>>       ^
>>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>>
>>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>>
>>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>>
>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>>
>>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>>> predicating a PHI.
>>>>>>>>>>>
>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>>> the printfs properly.
>>>>>>>>>>>
>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>> paths.
>>>>>>>>>>>
>>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>>
>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>
>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>>> statement iterator.
>>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>>> critical.
>>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

[-- Attachment #2: patch.20150114 --]
[-- Type: application/octet-stream, Size: 32431 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c
new file mode 100755
index 0000000..366d705
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-fopenmp" } */
+
+int a[1000];
+int c[1000];
+
+int foo()
+{
+  int i, res = 0;
+#pragma omp simd safelen(8)
+  for (i=0; i<512; i++)
+  {
+    int t = a[i];
+    if (c[i] != 0)
+      if (t != 100 & t > 5)
+	res += 1;
+  }
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
new file mode 100755
index 0000000..236d4fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-aggressive.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-options "-mavx2 -ffast-math -O3 -fopenmp -fdump-tree-vect-details" } */
+
+float a[1000];
+int c[1000];
+
+int foo()
+{
+  int i, res = 0;
+#pragma omp simd safelen(8)
+  for (i=0; i<512; i++)
+  {
+    float t = a[i];
+    if (t > 0.0f & t < 1.0e+17f)
+      if (c[i] != 0)
+	res += 1;
+  }
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index f7befac..8b113fe 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -127,10 +127,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "expr.h"
 #include "insn-codes.h"
 #include "optabs.h"
+#include "hash-map.h"
 
 /* List of basic blocks in if-conversion-suitable order.  */
 static basic_block *ifc_bbs;
 
+/* Apply more aggressive (extended) if-conversion if true.  */
+static bool aggressive_if_conv;
+
 /* Structure used to predicate basic blocks.  This is attached to the
    ->aux field of the BBs in the loop to be if-converted.  */
 typedef struct bb_predicate_s {
@@ -373,6 +377,18 @@ static tree
 fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
 {
   tree rhs1, lhs1, cond_expr;
+
+  /* If COND is comparison r != 0 and r has boolean type, convert COND
+     to SSA_NAME to accept by vect bool pattern.  */
+  if (TREE_CODE (cond) == NE_EXPR)
+    {
+      tree op0 = TREE_OPERAND (cond, 0);
+      tree op1 = TREE_OPERAND (cond, 1);
+      if (TREE_CODE (op0) == SSA_NAME
+	  && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
+	  && (integer_zerop (op1)))
+	cond = op0;
+    }
   cond_expr = fold_ternary (COND_EXPR, type, cond,
 			    rhs, lhs);
 
@@ -485,7 +501,8 @@ add_to_dst_predicate_list (struct loop *loop, edge e,
     cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 			prev_cond, cond);
 
-  add_to_predicate_list (loop, e->dest, cond);
+  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
+    add_to_predicate_list (loop, e->dest, cond);
 }
 
 /* Return true if one of the successor edges of BB exits LOOP.  */
@@ -512,7 +529,9 @@ bb_with_exit_edge_p (struct loop *loop, basic_block bb)
    When the flag_tree_loop_if_convert_stores is not set, PHI is not
    if-convertible if:
    - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.  */
+   - there is a virtual PHI in a BB other than the loop->header.
+   When the aggressive_if_conv is set, PHI can have more than
+   two arguments.  */
 
 static bool
 if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
@@ -524,11 +543,15 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
       print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
     }
 
-  if (bb != loop->header && gimple_phi_num_args (phi) != 2)
+  if (bb != loop->header)
     {
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, "More than two phi node args.\n");
-      return false;
+      if (gimple_phi_num_args (phi) != 2
+	  && !aggressive_if_conv)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "More than two phi node args.\n");
+	  return false;
+        }
     }
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
@@ -895,7 +918,8 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
    A statement is if-convertible if:
    - it is an if-convertible GIMPLE_ASSIGN,
-   - it is a GIMPLE_LABEL or a GIMPLE_COND.  */
+   - it is a GIMPLE_LABEL or a GIMPLE_COND,
+   - it is builtins call.  */
 
 static bool
 if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
@@ -942,6 +966,35 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
   return true;
 }
 
+/* Assumes that BB has more than 1 predecessors.
+   Returns false if at least one successor is not on critical edge
+   and true otherwise.  */
+
+static inline bool
+all_preds_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) == 1)
+      return false;
+  return true;
+}
+
+/* Returns true if at least one successor in on critical edge.  */
+static inline bool
+has_pred_critical_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    if (EDGE_COUNT (e->src->succs) > 1)
+      return true;
+  return false;
+}
+
 /* Return true when BB is if-convertible.  This routine does not check
    basic block's statements and phis.
 
@@ -950,6 +1003,8 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
    - it is after the exit block but before the latch,
    - its edges are not normal.
 
+   Last restriction is valid if aggressive_if_conv is false.
+
    EXIT_BB is the basic block containing the exit of the LOOP.  BB is
    inside LOOP.  */
 
@@ -962,8 +1017,11 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "----------[%d]-------------\n", bb->index);
 
+  if (EDGE_COUNT (bb->succs) > 2)
+    return false;
+
   if (EDGE_COUNT (bb->preds) > 2
-      || EDGE_COUNT (bb->succs) > 2)
+      && !aggressive_if_conv)
     return false;
 
   if (exit_bb)
@@ -1001,20 +1059,15 @@ if_convertible_bb_p (struct loop *loop, basic_block bb, basic_block exit_bb)
 
   /* At least one incoming edge has to be non-critical as otherwise edge
      predicates are not equal to basic-block predicates of the edge
-     source.  */
-  if (EDGE_COUNT (bb->preds) > 1
-      && bb != loop->header)
-    {
-      bool found = false;
-      FOR_EACH_EDGE (e, ei, bb->preds)
-	if (EDGE_COUNT (e->src->succs) == 1)
-	  found = true;
-      if (!found)
-	{
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "only critical predecessors\n");
-	  return false;
-	}
+     source.  This check is skipped if aggressive_if_conv is true.  */
+  if (!aggressive_if_conv
+      && EDGE_COUNT (bb->preds) > 1
+      && bb != loop->header
+      && all_preds_critical_p (bb))
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "only critical predecessors\n");
+	return false;
     }
 
   return true;
@@ -1126,11 +1179,12 @@ predicate_bbs (loop_p loop)
       tree cond;
       gimple stmt;
 
-      /* The loop latch is always executed and has no extra conditions
-	 to be processed: skip it.  */
-      if (bb == loop->latch)
+      /* The loop latch and loop exit block are always executed and
+	 have no extra conditions to be processed: skip them.  */
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
 	{
-	  reset_bb_predicate (loop->latch);
+	  reset_bb_predicate (bb);
 	  continue;
 	}
 
@@ -1141,7 +1195,7 @@ predicate_bbs (loop_p loop)
 	  tree c2;
 	  edge true_edge, false_edge;
 	  location_t loc = gimple_location (stmt);
-	  tree c = fold_build2_loc (loc, gimple_cond_code (stmt),
+	  tree c = build2_loc (loc, gimple_cond_code (stmt),
 				    boolean_type_node,
 				    gimple_cond_lhs (stmt),
 				    gimple_cond_rhs (stmt));
@@ -1363,60 +1417,6 @@ if_convertible_loop_p (struct loop *loop, bool *any_mask_load_store)
   return res;
 }
 
-/* Basic block BB has two predecessors.  Using predecessor's bb
-   predicate, set an appropriate condition COND for the PHI node
-   replacement.  Return the true block whose phi arguments are
-   selected when cond is true.  LOOP is the loop containing the
-   if-converted region, GSI is the place to insert the code for the
-   if-conversion.  */
-
-static basic_block
-find_phi_replacement_condition (basic_block bb, tree *cond,
-				gimple_stmt_iterator *gsi)
-{
-  edge first_edge, second_edge;
-  tree tmp_cond;
-
-  gcc_assert (EDGE_COUNT (bb->preds) == 2);
-  first_edge = EDGE_PRED (bb, 0);
-  second_edge = EDGE_PRED (bb, 1);
-
-  /* Prefer an edge with a not negated predicate.
-     ???  That's a very weak cost model.  */
-  tmp_cond = bb_predicate (first_edge->src);
-  gcc_assert (tmp_cond);
-  if (TREE_CODE (tmp_cond) == TRUTH_NOT_EXPR)
-    {
-      edge tmp_edge;
-
-      tmp_edge = first_edge;
-      first_edge = second_edge;
-      second_edge = tmp_edge;
-    }
-
-  /* Check if the edge we take the condition from is not critical.
-     We know that at least one non-critical edge exists.  */
-  if (EDGE_COUNT (first_edge->src->succs) > 1)
-    {
-      *cond = bb_predicate (second_edge->src);
-
-      if (TREE_CODE (*cond) == TRUTH_NOT_EXPR)
-	*cond = TREE_OPERAND (*cond, 0);
-      else
-	/* Select non loop header bb.  */
-	first_edge = second_edge;
-    }
-  else
-    *cond = bb_predicate (first_edge->src);
-
-  /* Gimplify the condition to a valid cond-expr conditonal operand.  */
-  *cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (*cond),
-				      is_gimple_condexpr, NULL_TREE,
-				      true, GSI_SAME_STMT);
-
-  return first_edge->src;
-}
-
 /* Returns true if def-stmt for phi argument ARG is simple increment/decrement
    which is in predicated basic block.
    In fact, the following PHI pattern is searching:
@@ -1427,14 +1427,15 @@ find_phi_replacement_condition (basic_block bb, tree *cond,
 	  reduc_3 = ...
 	reduc_2 = PHI <reduc_1, reduc_3>
 
-   REDUC, OP0 and OP1 contain reduction stmt and its operands.  */
+   ARG_0 and ARG_1 are correspondent PHI arguments.
+   REDUC, OP0 and OP1 contain reduction stmt and its operands.
+   EXTENDED is true if PHI has > 2 arguments.  */
 
 static bool
-is_cond_scalar_reduction (gimple phi, gimple *reduc,
-			  tree *op0, tree *op1)
+is_cond_scalar_reduction (gimple phi, gimple *reduc, tree arg_0, tree arg_1,
+			  tree *op0, tree *op1, bool extended)
 {
   tree lhs, r_op1, r_op2;
-  tree arg_0, arg_1;
   gimple stmt;
   gimple header_phi = NULL;
   enum tree_code reduction_op;
@@ -1443,13 +1444,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
   edge latch_e = loop_latch_edge (loop);
   imm_use_iterator imm_iter;
   use_operand_p use_p;
-
-  arg_0 = PHI_ARG_DEF (phi, 0);
-  arg_1 = PHI_ARG_DEF (phi, 1);
+  edge e;
+  edge_iterator ei;
+  bool result = false;
   if (TREE_CODE (arg_0) != SSA_NAME || TREE_CODE (arg_1) != SSA_NAME)
     return false;
 
-  if (gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
+  if (!extended && gimple_code (SSA_NAME_DEF_STMT (arg_0)) == GIMPLE_PHI)
     {
       lhs = arg_1;
       header_phi = SSA_NAME_DEF_STMT (arg_0);
@@ -1480,8 +1481,13 @@ is_cond_scalar_reduction (gimple phi, gimple *reduc,
     return false;
 
   /* Check that stmt-block is predecessor of phi-block.  */
-  if (EDGE_PRED (bb, 0)->src != gimple_bb (stmt)
-      && EDGE_PRED (bb, 1)->src != gimple_bb (stmt))
+  FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+    if (e->dest == bb)
+      {
+	result = true;
+	break;
+      }
+  if (!result)
     return false;
 
   if (!has_single_use (lhs))
@@ -1578,9 +1584,66 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Helpers for PHI arguments hashtable map.  */
+
+struct phi_args_hash_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (tree);
+  static inline bool equal_keys (tree, tree);
+};
+
+inline hashval_t
+phi_args_hash_traits::hash (tree value)
+{
+  return iterative_hash_expr (value, 0);
+}
+
+inline bool
+phi_args_hash_traits::equal_keys (tree value1, tree value2)
+{
+  return operand_equal_p (value1, value2, 0);
+}
+
+  /* Produce condition for all occurrences of ARG in PHI node.  */
+
+static tree
+gen_phi_arg_condition (gphi *phi, vec<int> *occur,
+		       gimple_stmt_iterator *gsi)
+{
+  int len;
+  int i;
+  tree cond = NULL_TREE;
+  tree c;
+  edge e;
+
+  len = occur->length ();
+  gcc_assert (len > 0);
+  for (i = 0; i < len; i++)
+    {
+      e = gimple_phi_arg_edge (phi, (*occur)[i]);
+      c = bb_predicate (e->src);
+      if (is_true_predicate (c))
+	continue;
+      c = force_gimple_operand_gsi_1 (gsi, unshare_expr (c),
+				      is_gimple_condexpr, NULL_TREE,
+				      true, GSI_SAME_STMT);
+      if (cond != NULL_TREE)
+	{
+	  /* Must build OR expression.  */
+	  cond = fold_or_predicates (EXPR_LOCATION (c), c, cond);
+	  cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					     is_gimple_condexpr, NULL_TREE,
+					     true, GSI_SAME_STMT);
+	}
+      else
+	cond = c;
+    }
+  gcc_assert (cond != NULL_TREE);
+  return cond;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
-   This routine does not handle PHI nodes with more than two
-   arguments.
+   This routine can handle PHI nodes with more than two arguments.
 
    For example,
      S1: A = PHI <x1(1), x2(5)>
@@ -1588,69 +1651,210 @@ convert_scalar_cond_reduction (gimple reduc, gimple_stmt_iterator *gsi,
      S2: A = cond ? x1 : x2;
 
    The generated code is inserted at GSI that points to the top of
-   basic block's statement list.  When COND is true, phi arg from
-   TRUE_BB is selected.  */
+   basic block's statement list.
+   If PHI node has more than two arguments a chain of conditional
+   expression is produced.  */
+
 
 static void
-predicate_scalar_phi (gphi *phi, tree cond,
-		      basic_block true_bb,
-		      gimple_stmt_iterator *gsi)
+predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 {
-  gimple new_stmt;
+  gimple new_stmt = NULL, reduc;
+  tree rhs, res, arg0, arg1, op0, op1, scev;
+  tree cond;
+  unsigned int index0;
+  unsigned int max, args_len;
+  edge e;
   basic_block bb;
-  tree rhs, res, arg, scev;
-
-  gcc_assert (gimple_code (phi) == GIMPLE_PHI
-	      && gimple_phi_num_args (phi) == 2);
+  unsigned int i;
 
   res = gimple_phi_result (phi);
-  /* Do not handle virtual phi nodes.  */
   if (virtual_operand_p (res))
     return;
 
-  bb = gimple_bb (phi);
-
-  if ((arg = degenerate_phi_result (phi))
+  if ((rhs = degenerate_phi_result (phi))
       || ((scev = analyze_scalar_evolution (gimple_bb (phi)->loop_father,
 					    res))
 	  && !chrec_contains_undetermined (scev)
 	  && scev != res
-	  && (arg = gimple_phi_arg_def (phi, 0))))
-    rhs = arg;
-  else
+	  && (rhs = gimple_phi_arg_def (phi, 0))))
     {
-      tree arg_0, arg_1;
-      tree op0, op1;
-      gimple reduc;
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Degenerate phi!\n");
+	  print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+	}
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+      return;
+    }
 
-      /* Use condition that is not TRUTH_NOT_EXPR in conditional modify expr.  */
+  bb = gimple_bb (phi);
+  if (EDGE_COUNT (bb->preds) == 2)
+    {
+      /* Predicate ordinary PHI node with 2 arguments.  */
+      edge first_edge, second_edge;
+      basic_block true_bb;
+      first_edge = EDGE_PRED (bb, 0);
+      second_edge = EDGE_PRED (bb, 1);
+      cond = bb_predicate (first_edge->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  edge tmp_edge = first_edge;
+	  first_edge = second_edge;
+	  second_edge = tmp_edge;
+	}
+      if (EDGE_COUNT (first_edge->src->succs) > 1)
+	{
+	  cond = bb_predicate (second_edge->src);
+	  if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	    cond = TREE_OPERAND (cond, 0);
+	  else
+	    first_edge = second_edge;
+	}
+      else
+	cond = bb_predicate (first_edge->src);
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      true_bb = first_edge->src;
       if (EDGE_PRED (bb, 1)->src == true_bb)
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 1);
-	  arg_1 = gimple_phi_arg_def (phi, 0);
+	  arg0 = gimple_phi_arg_def (phi, 1);
+	  arg1 = gimple_phi_arg_def (phi, 0);
 	}
       else
 	{
-	  arg_0 = gimple_phi_arg_def (phi, 0);
-	  arg_1 = gimple_phi_arg_def (phi, 1);
+	  arg0 = gimple_phi_arg_def (phi, 0);
+	  arg1 = gimple_phi_arg_def (phi, 1);
 	}
-      if (is_cond_scalar_reduction (phi, &reduc, &op0, &op1))
+      if (is_cond_scalar_reduction (phi, &reduc, arg0, arg1,
+				    &op0, &op1, false))
 	/* Convert reduction stmt into vectorizable form.  */
 	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
 					     true_bb != gimple_bb (reduc));
       else
 	/* Build new RHS using selected condition and arguments.  */
 	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
-				    arg_0, arg_1);
+				    arg0, arg1);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "new phi replacement stmt\n");
+	  print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
+	}
+      return;
+    }
+
+  /* Create hashmap for PHI node which contain vector of argument indexes
+     having the same value.  */
+  bool swap = false;
+  hash_map<tree, auto_vec<int>, phi_args_hash_traits> phi_arg_map;
+  unsigned int num_args = gimple_phi_num_args (phi);
+  int max_ind = -1;
+  /* Vector of different PHI argument values.  */
+  auto_vec<tree> args (num_args);
+
+  /* Compute phi_arg_map.  */
+  for (i = 0; i < num_args; i++)
+    {
+      tree arg;
+
+      arg = gimple_phi_arg_def (phi, i);
+      if (!phi_arg_map.get (arg))
+	args.quick_push (arg);
+      phi_arg_map.get_or_insert (arg).safe_push (i);
+    }
+
+  /* Determine element with max number of occurrences.  */
+  max_ind = -1;
+  max = 1;
+  args_len = args.length ();
+  for (i = 0; i < args_len; i++)
+    {
+      unsigned int len;
+      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+	{
+	  max_ind = (int) i;
+	  max = len;
+	}
+    }
+
+  /* Put element with max number of occurences to the end of ARGS.  */
+  if (max_ind != -1 && max_ind +1 != (int) args_len)
+    {
+      tree tmp = args[args_len - 1];
+      args[args_len - 1] = args[max_ind];
+      args[max_ind] = tmp;
     }
 
-  new_stmt = gimple_build_assign (res, rhs);
-  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-  update_stmt (new_stmt);
+  /* Handle one special case when number of arguments with different values
+     is equal 2 and one argument has the only occurrence.  Such PHI can be
+     handled as if would have only 2 arguments.  */
+  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+    {
+      vec<int> *indexes;
+      indexes = phi_arg_map.get (args[0]);
+      index0 = (*indexes)[0];
+      arg0 = args[0];
+      arg1 = args[1];
+      e = gimple_phi_arg_edge (phi, index0);
+      cond = bb_predicate (e->src);
+      if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
+	{
+	  swap = true;
+	  cond = TREE_OPERAND (cond, 0);
+	}
+      /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = force_gimple_operand_gsi_1 (gsi, unshare_expr (cond),
+					 is_gimple_condexpr, NULL_TREE,
+					 true, GSI_SAME_STMT);
+      if (!(is_cond_scalar_reduction (phi, &reduc, arg0 , arg1,
+				      &op0, &op1, true)))
+	rhs = fold_build_cond_expr (TREE_TYPE (res), unshare_expr (cond),
+				    swap? arg1 : arg0,
+				    swap? arg0 : arg1);
+      else
+	/* Convert reduction stmt into vectorizable form.  */
+	rhs = convert_scalar_cond_reduction (reduc, gsi, cond, op0, op1,
+					     swap);
+      new_stmt = gimple_build_assign (res, rhs);
+      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+      update_stmt (new_stmt);
+    }
+  else
+    {
+      /* Common case.  */
+      vec<int> *indexes;
+      tree type = TREE_TYPE (gimple_phi_result (phi));
+      tree lhs;
+      arg1 = args[1];
+      for (i = 0; i < args_len; i++)
+	{
+	  arg0 = args[i];
+	  indexes = phi_arg_map.get (args[i]);
+	  if (i != args_len - 1)
+	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+	  else
+	    lhs = res;
+	  cond = gen_phi_arg_condition (phi, indexes, gsi);
+	  rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				      arg0, arg1);
+	  new_stmt = gimple_build_assign (lhs, rhs);
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  update_stmt (new_stmt);
+	  arg1 = lhs;
+	}
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "new phi replacement stmt\n");
+      fprintf (dump_file, "new extended phi replacement stmt\n");
       print_gimple_stmt (dump_file, new_stmt, 0, TDF_SLIM);
     }
 }
@@ -1668,28 +1872,25 @@ predicate_all_scalar_phis (struct loop *loop)
   for (i = 1; i < orig_loop_num_nodes; i++)
     {
       gphi *phi;
-      tree cond = NULL_TREE;
       gimple_stmt_iterator gsi;
       gphi_iterator phi_gsi;
-      basic_block true_bb = NULL;
       bb = ifc_bbs[i];
 
       if (bb == loop->header)
 	continue;
 
+      if (EDGE_COUNT (bb->preds) == 1)
+	continue;
+
       phi_gsi = gsi_start_phis (bb);
       if (gsi_end_p (phi_gsi))
 	continue;
 
-      /* BB has two predecessors.  Using predecessor's aux field, set
-	 appropriate condition for the PHI node replacement.  */
       gsi = gsi_after_labels (bb);
-      true_bb = find_phi_replacement_condition (bb, &cond, &gsi);
-
       while (!gsi_end_p (phi_gsi))
 	{
 	  phi = phi_gsi.phi ();
-	  predicate_scalar_phi (phi, cond, true_bb, &gsi);
+	  predicate_scalar_phi (phi, &gsi);
 	  release_phi_node (phi);
 	  gsi_next (&phi_gsi);
 	}
@@ -1710,7 +1911,8 @@ insert_gimplified_predicates (loop_p loop, bool any_mask_load_store)
     {
       basic_block bb = ifc_bbs[i];
       gimple_seq stmts;
-
+      if (!is_predicated (bb))
+	gcc_assert (bb_predicate_gimplified_stmts (bb) == NULL);
       if (!is_predicated (bb))
 	{
 	  /* Do not insert statements for a basic block that is not
@@ -2134,6 +2336,307 @@ version_loop_for_if_conversion (struct loop *loop)
   return true;
 }
 
+/* Performs splitting of critical edges if aggressive_if_conv is true.
+   Returns false if loop won't be if converted and true otherwise.  */
+
+static bool
+ifcvt_split_critical_edges (struct loop *loop)
+{
+  basic_block *body;
+  basic_block bb;
+  unsigned int num = loop->num_nodes;
+  unsigned int i;
+  gimple stmt;
+  edge e;
+  edge_iterator ei;
+
+  if (num <= 2)
+    return false;
+  if (loop->inner)
+    return false;
+  if (!single_exit (loop))
+    return false;
+
+  body = get_loop_body (loop);
+  for (i = 0; i < num; i++)
+    {
+      bb = body[i];
+      if (bb == loop->latch
+	  || bb_with_exit_edge_p (loop, bb))
+	continue;
+      stmt = last_stmt (bb);
+      /* Skip basic blocks not ending with conditional branch.  */
+      if (!(stmt && gimple_code (stmt) == GIMPLE_COND))
+	continue;
+      FOR_EACH_EDGE (e, ei, bb->succs)
+	if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
+	  split_edge (e);
+    }
+  free (body);
+  return true;
+}
+
+/* Assumes that lhs of DEF_STMT have multiple uses.
+   Delete one use by (1) creation of copy DEF_STMT with
+   unique lhs; (2) change original use of lhs in one
+   use statement with newly created lhs.  */
+
+static void
+ifcvt_split_def_stmt (gimple def_stmt, gimple use_stmt)
+{
+  tree var;
+  tree lhs;
+  gimple copy_stmt;
+  gimple_stmt_iterator gsi;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  var = gimple_assign_lhs (def_stmt);
+  copy_stmt = gimple_copy (def_stmt);
+  lhs = make_temp_ssa_name (TREE_TYPE (var), NULL, "_ifc_");
+  gimple_assign_set_lhs (copy_stmt, lhs);
+  SSA_NAME_DEF_STMT (lhs) = copy_stmt;
+  /* Insert copy of DEF_STMT.  */
+  gsi = gsi_for_stmt (def_stmt);
+  gsi_insert_after (&gsi, copy_stmt, GSI_SAME_STMT);
+  /* Change use of var to lhs in use_stmt.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Change use of var  ");
+      print_generic_expr (dump_file, var, TDF_SLIM);
+      fprintf (dump_file, " to ");
+      print_generic_expr (dump_file, lhs, TDF_SLIM);
+      fprintf (dump_file, "\n");
+    }
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, var)
+    {
+      if (USE_STMT (use_p) != use_stmt)
+	continue;
+      SET_USE (use_p, lhs);
+      break;
+    }
+}
+
+/* Traverse bool pattern recursively starting from VAR.
+   Save its def and use statements to defuse_list if VAR does
+   not have single use.  */
+
+static void
+ifcvt_walk_pattern_tree (tree var, vec<gimple> *defuse_list,
+			 gimple use_stmt)
+{
+  tree rhs1, rhs2;
+  enum tree_code code;
+  gimple def_stmt;
+
+  def_stmt = SSA_NAME_DEF_STMT (var);
+  if (gimple_code (def_stmt) != GIMPLE_ASSIGN)
+    return;
+  if (!has_single_use (var))
+    {
+      /* Put def and use stmts into defuse_list.  */
+      defuse_list->safe_push (def_stmt);
+      defuse_list->safe_push (use_stmt);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Multiple lhs uses in stmt\n");
+	  print_gimple_stmt (dump_file, def_stmt, 0, TDF_SLIM);
+	}
+    }
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  code = gimple_assign_rhs_code (def_stmt);
+  switch (code)
+    {
+    case SSA_NAME:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    CASE_CONVERT:
+      if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+	   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+	break;
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    case BIT_NOT_EXPR:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      break;
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      ifcvt_walk_pattern_tree (rhs1, defuse_list, def_stmt);
+      rhs2 = gimple_assign_rhs2 (def_stmt);
+      ifcvt_walk_pattern_tree (rhs2, defuse_list, def_stmt);
+      break;
+    default:
+      break;
+    }
+  return;
+}
+
+/* Returns true if STMT can be a root of bool pattern apllied
+   by vectorizer.  VAR contains SSA_NAME which starts pattern.  */
+
+static bool
+stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
+{
+  enum tree_code code;
+  tree lhs, rhs;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (CONVERT_EXPR_CODE_P (code))
+    {
+      lhs = gimple_assign_lhs (stmt);
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (TREE_TYPE (rhs)) != BOOLEAN_TYPE)
+	return false;
+      if (TREE_CODE (TREE_TYPE (lhs)) == BOOLEAN_TYPE)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  else if (code == COND_EXPR)
+    {
+      rhs = gimple_assign_rhs1 (stmt);
+      if (TREE_CODE (rhs) != SSA_NAME)
+	return false;
+      *var = rhs;
+      return true;
+    }
+  return false;
+}
+
+/*  Traverse all statements in BB which correspondent to loop header to
+    find out all statements which can start bool pattern applied by
+    vectorizer and convert multiple uses in it to conform pattern
+    restrictions.  Such case can occur if the same predicate is used both
+    for phi node conversion and load/store mask.  */
+
+static void
+ifcvt_repair_bool_pattern (basic_block bb)
+{
+  tree rhs;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+  vec<gimple> defuse_list = vNULL;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+	continue;
+      if (!stmt_is_root_of_bool_pattern (stmt, &rhs))
+	continue;
+      ifcvt_walk_pattern_tree (rhs, &defuse_list, stmt);
+      while (defuse_list.length () > 0)
+	{
+	  gimple def_stmt, use_stmt;
+	  use_stmt = defuse_list.pop ();
+	  def_stmt = defuse_list.pop ();
+	  ifcvt_split_def_stmt (def_stmt, use_stmt);
+	}
+    }
+}
+
+/* Delete redundant statements produced by predication which prevents
+   loop vectorization.  */
+
+static void
+ifcvt_local_dce (basic_block bb)
+{
+  gimple stmt;
+  gimple stmt1;
+  gimple phi;
+  gimple_stmt_iterator gsi;
+  vec<gimple> worklist;
+  enum gimple_code code;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  worklist.create (64);
+  /* Consider all phi as live statements.  */
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      phi = gsi_stmt (gsi);
+      gimple_set_plf (phi, GF_PLF_2, true);
+      worklist.safe_push (phi);
+    }
+  /* Consider load/store statemnts, CALL and COND as live.  */
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_store_p (stmt)
+	  || gimple_assign_load_p (stmt)
+	  || is_gimple_debug (stmt))
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  worklist.safe_push (stmt);
+	  continue;
+	}
+      code = gimple_code (stmt);
+      if (code == GIMPLE_COND || code == GIMPLE_CALL)
+	{
+	  gimple_set_plf (stmt, GF_PLF_2, true);
+	  worklist.safe_push (stmt);
+	  continue;
+	}
+      gimple_set_plf (stmt, GF_PLF_2, false);
+
+      if (code == GIMPLE_ASSIGN)
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+	    {
+	      stmt1 = USE_STMT (use_p);
+	      if (gimple_bb (stmt1) != bb)
+		{
+		  gimple_set_plf (stmt, GF_PLF_2, true);
+		  worklist.safe_push (stmt);
+		  break;
+		}
+	    }
+	}
+    }
+  /* Propagate liveness through arguments of live stmt.  */
+  while (worklist.length () > 0)
+    {
+      ssa_op_iter iter;
+      use_operand_p use_p;
+      tree use;
+
+      stmt = worklist.pop ();
+      FOR_EACH_PHI_OR_STMT_USE (use_p, stmt, iter, SSA_OP_USE)
+	{
+	  use = USE_FROM_PTR (use_p);
+	  if (TREE_CODE (use) != SSA_NAME)
+	    continue;
+	  stmt1 = SSA_NAME_DEF_STMT (use);
+	  if (gimple_bb (stmt1) != bb
+	      || gimple_plf (stmt1, GF_PLF_2))
+	    continue;
+	  gimple_set_plf (stmt1, GF_PLF_2, true);
+	  worklist.safe_push (stmt1);
+	}
+    }
+  /* Delete dead statements.  */
+  gsi = gsi_start_bb (bb);
+  while (!gsi_end_p (gsi))
+    {
+      stmt = gsi_stmt (gsi);
+      if (gimple_plf (stmt, GF_PLF_2))
+	{
+	  gsi_next (&gsi);
+	  continue;
+	}
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Delete dead stmt in bb#%d\n", bb->index);
+	  print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
+	}
+      gsi_remove (&gsi, true);
+      release_defs (stmt);
+    }
+}
+
 /* If-convert LOOP when it is legal.  For the moment this pass has no
    profitability analysis.  Returns non-zero todo flags when something
    changed.  */
@@ -2145,6 +2648,20 @@ tree_if_conversion (struct loop *loop)
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
+  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  aggressive_if_conv = loop->force_vectorize;
+  /* Check either outer loop was marked with simd pragma.  */
+  if (!aggressive_if_conv)
+    {
+      struct loop *outer_loop = loop_outer (loop);
+      if (outer_loop && outer_loop->force_vectorize)
+	aggressive_if_conv = true;
+    }
+
+  if (aggressive_if_conv)
+    if (!ifcvt_split_critical_edges (loop))
+      goto cleanup;
+
   if (!if_convertible_loop_p (loop, &any_mask_load_store)
       || !dbg_cnt (if_conversion_tree))
     goto cleanup;
@@ -2162,6 +2679,14 @@ tree_if_conversion (struct loop *loop)
      on-the-fly.  */
   combine_blocks (loop, any_mask_load_store);
 
+  /* Delete dead predicate computations and repair tree correspondent
+     to bool pattern to delete multiple uses of preidcates.  */
+  if (aggressive_if_conv)
+    {
+      ifcvt_local_dce (loop->header);
+      ifcvt_repair_bool_pattern (loop->header);
+    }
+
   todo |= TODO_cleanup_cfg;
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
     {

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] Extended if-conversion
  2015-01-14 13:33                                       ` Yuri Rumyantsev
@ 2015-01-14 15:00                                         ` Richard Biener
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Biener @ 2015-01-14 15:00 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Jan 14, 2015 at 2:14 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I did all changes proposed by you and add couple tests.
> Bootstrap, including aggressive one proposed by you, and regression
> testing did not show any new failures.
>
> Is it OK for trunk?

+++ b/gcc/testsuite/gcc.dg/vect/vect-aggressive-1.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-fopenmp" } */

please use { dg-additional-options "-fopenmp-simd" } instead and
vect_simd_clones target, not fopenmp target.

+/* { dg-options "-mavx2 -ffast-math -O3 -fopenmp -fdump-tree-vect-details" } */

likewise (for -ffast-math - don't use -mavx2, instead require a proper
vect target on the scan-tree-dump-times line

It would also be nice to have these runtime testcases so it can
be verified the code executes correctly instead of possibly producing
random crap ;)

The patch is ok with the above changes.

Thanks,
Richard.

> ChangeLog:
>
> 2015-01-14  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-if-conv.c: Include hash-map.h.
> (aggressive_if_conv): New variable.
> (fold_build_cond_expr): Add simplification of non-zero condition.
> (add_to_dst_predicate_list): Invoke add_to_predicate_list if edge
> destination block is not always executed.
> (if_convertible_phi_p): Fix commentary, allow phi nodes have more
> than two predecessors if AGGRESSIVE_IF_CONV is true.
> (if_convertible_stmt_p): Fix commentary.
> (all_preds_critical_p): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Fix commentary, if AGGRESSIVE_IF_CONV is true
> BB can have more than two predecessors and all incoming edges can be
> critical.
> (predicate_bbs): Skip predication for loop exit block, use build2_loc
> to compute predicate for true edge.
> (find_phi_replacement_condition): Delete this function.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (phi_args_hash_traits): New helper structure.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_scalar_phi): Add handling of phi nodes with more than two
> arguments, delete COND and TRUE_BB arguments, insert body of
> find_phi_replacement_condition to predicate ordinary phi nodes.
> (predicate_all_scalar_phis): Skip blocks with the only predecessor,
> delete call of find_phi_replacement_condition and invoke
> predicate_scalar_phi with two arguments.
> (insert_gimplified_predicates): Add assert that non-predicated block
> don't have statements to insert.
> (ifcvt_split_critical_edges): New function.
> (ifcvt_split_def_stmt): Likewise.
> (ifcvt_walk_pattern_tree): Likewise.
> (stmt_is_root_of_bool_pattern): Likewise.
> (ifcvt_repair_bool_pattern): Likewise.
> (ifcvt_local_dce): Likewise.
> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
> is copy of inner or outer loop force_vectorize field, invoke
> ifcvt_split_critical_edges, ifcvt_local_dce and
> ifcvt_repair_bool_pattern for aggressive if-conversion.
>
> gcc/testsuite/ChangeLog
>
> * gcc.dg/vect/vect-aggressive-1.c: New test.
> * gcc.target/i386/avx2-vect-aggressive.c: Likewise.
>
> 2015-01-09 15:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Dec 22, 2014 at 3:39 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Richard,
>>>
>>> I changed algorithm for bool pattern repair.
>>> It turned out that ifcvt_local_dce phaase is required since for
>>> test-case I sent you in previous mail vectorization is not performed
>>> without dead code elimination:
>>>
>>> For the loop
>>> #pragma omp simd safelen(8)
>>>   for (i=0; i<512; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f & t < 1.0e+17f)
>>>       if (c[i] != 0)
>>> res += 1;
>>>   }
>>>
>>> I've got the following message from vectorizer:
>>>
>>> t3.c:10:11: note: ==> examining statement: _ifc__39 = t_5 > 0.0;
>>>
>>> t3.c:10:11: note: bit-precision arithmetic not supported.
>>> t3.c:10:11: note: not vectorized: relevant stmt not supported:
>>> _ifc__39 = t_5 > 0.0;
>>>
>>> It is caused by the following dead predicate computations after
>>> critical edge splitting:
>>>
>>> (after combine blocks):
>>>
>>> <bb 3>:
>>> # res_15 = PHI <res_1(7), 0(19)>
>>> # i_16 = PHI <i_11(7), 0(19)>
>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(19)>
>>> t_5 = a[i_16];
>>> _6 = t_5 > 0.0;
>>> _7 = t_5 < 9.9999998430674944e+16;
>>> _8 = _6 & _7;
>>> _10 = &c[i_16];
>>> _ifc__36 = _8 ? 4294967295 : 0;
>>> _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>> _28 = _8;
>>> _29 = _9 != 0;
>>> _30 = _28 & _29;
>>> // Statements below are dead!!
>>> _31 = _8;
>>> _32 = _9 != 0;
>>> _33 = ~_32;
>>> _34 = _31 & _33;
>>> // End of dead statements.
>>> _ifc__35 = _30 ? 1 : 0;
>>> res_1 = res_15 + _ifc__35;
>>> i_11 = i_16 + 1;
>>> ivtmp_13 = ivtmp_14 - 1;
>>> if (ivtmp_13 != 0)
>>>   goto <bb 7>;
>>> else
>>>   goto <bb 8>;
>>>
>>> But if we delete these statements loop will be vectorized.
>>
>> Hm, ok.  We insert predicates too early obviously and not only when
>> needed.  But let's fix that later.
>>
>>> New patch is attached.
>>
>>  fold_build_cond_expr (tree type, tree cond, tree rhs, tree lhs)
>>  {
>>    tree rhs1, lhs1, cond_expr;
>> +
>> +  /* If COND is comparison r != 0 and r has boolean type, convert COND
>> +     to SSA_NAME to accept by vect bool pattern.  */
>> +  if (TREE_CODE (cond) == NE_EXPR)
>> +    {
>> +      tree op0 = TREE_OPERAND (cond, 0);
>> +      tree op1 = TREE_OPERAND (cond, 1);
>> +      if (TREE_CODE (op0) == SSA_NAME
>> +         && TREE_CODE (TREE_TYPE (op0)) == BOOLEAN_TYPE
>> +         && (integer_zerop (op1)))
>> +       cond = op0;
>> +      else if (TREE_CODE (op1) == SSA_NAME
>> +              && TREE_CODE (TREE_TYPE (op1)) == BOOLEAN_TYPE
>> +              && (integer_zerop (op0)))
>> +       cond = op1;
>>
>> The 2nd form, 0 != SSA_NAME doesn't happen due to operand
>> canonicalization.  Please remove its handling.
>>
>> +      if (gimple_phi_num_args (phi) != 2)
>> +       {
>> +         if (!aggressive_if_conv)
>>
>> && !aggressive_if_conv
>>
>> +  if (EDGE_COUNT (bb->preds) > 2)
>> +    {
>> +      if (!aggressive_if_conv)
>>
>> Likewise.
>>
>> -      gimple reduc;
>> +         && (rhs = gimple_phi_arg_def (phi, 0)))) {
>>
>> the { goes to the next line
>>
>>  static void
>>  predicate_mem_writes (loop_p loop)
>>  {
>> -  unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>> +  unsigned int i, j, orig_loop_num_nodes = loop->num_nodes;
>> +  tree mask_vec[10];
>>
>> an upper limit of 10?
>>
>> +      for (j=0; j<10; j++)
>>
>> spaces around '<' and '='
>>
>> +       mask_vec[j] = NULL_TREE;
>> +
>>
>> +           gcc_assert (exact_log2 (bitsize) != -1);
>> +           if ((mask = mask_vec[exact_log2 (bitsize)]) == NULL_TREE)
>> +             {
>>
>> this seems to be a completely separate "optimization"?  Note that
>> there are targets with non-power-of-two bitsize modes (PSImode),
>> so the assert will likely trigger.  I would prefer if you separate this
>> part of the patch.
>>
>> +      if ( gimple_code (stmt) != GIMPLE_ASSIGN)
>> +       continue;
>>
>> no space before gimple_code
>>
>> +  imm_use_iterator imm_iter;
>> +
>> +
>> +  worklist.create (64);
>>
>> excessive vertical space.
>>
>> The patch misses the addition of new testcases - please add some,
>> otherwise the code will be totally untested.
>>
>> I assume the patch passes bootstrap and regtest (you didn't say so).
>> Can you also do a bootstrap with aggressive_if_conv forced to
>> true and --with-build-config=bootstrap-O3 --disable-werror?
>>
>> Thanks,
>> Richard.
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-19 14:45 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I am sending you full patch (~1000 lines) but if you need only patch.1
>>>>> and patch.2 will let me know and i'll send you reduced patch.
>>>>>
>>>>> Below are few comments regarding your remarks for patch.3.
>>>>>
>>>>> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
>>>>> when dead code elimination is required to vectorize loop, i.e. dead
>>>>> statement is marked as relevant.
>>>>> 2. You wrote:
>>>>>> The "retry" code also looks odd - why do you walk the BB multiple
>>>>>> times instead of just doing sth like
>>>>>>
>>>>>>  while (!has_single_use (lhs))
>>>>>>    {
>>>>>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>>>>>      ifcvt_walk_pattern_tree (copy);
>>>>>>    }
>>>>>>
>>>>>> thus returning the copy you create and re-process it (the copy should
>>>>>> now have a single-use).
>>>>>
>>>>> The problem is that not only top SSA_NAME (lhs) may have multiple uses
>>>>> but some intermediate variables too. For example, for the following
>>>>> test-case
>>>>>
>>>>> float a[1000];
>>>>> int c[1000];
>>>>>
>>>>> int foo()
>>>>> {
>>>>>   int i, res = 0;
>>>>> #pragma omp simd safelen(8)
>>>>>   for (i=0; i<512; i++)
>>>>>   {
>>>>>     float t = a[i];
>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>       if (c[i] != 0)
>>>>> res += 1;
>>>>>   }
>>>>>   return res;
>>>>> }
>>>>>
>>>>> After combine_blocks we have the following bb:
>>>>>
>>>>> <bb 3>:
>>>>> # res_15 = PHI <res_1(7), 0(15)>
>>>>> # i_16 = PHI <i_11(7), 0(15)>
>>>>> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
>>>>> t_5 = a[i_16];
>>>>> _6 = t_5 > 0.0;
>>>>> _7 = t_5 < 9.9999998430674944e+16;
>>>>> _8 = _6 & _7;
>>>>> _10 = &c[i_16];
>>>>> _ifc__32 = _8 ? 4294967295 : 0;
>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__32);
>>>>> _28 = _8;
>>>>> _29 = _9 != 0;
>>>>> _30 = _28 & _29;
>>>>> _ifc__31 = _30 ? 1 : 0;
>>>>> res_1 = res_15 + _ifc__31;
>>>>> i_11 = i_16 + 1;
>>>>> ivtmp_13 = ivtmp_14 - 1;
>>>>> if (ivtmp_13 != 0)
>>>>>   goto <bb 7>;
>>>>> else
>>>>>   goto <bb 8>;
>>>>>
>>>>> and we can see that _8 has multiple uses. Also note that after splitting of
>>>>> _8 = _6 & _7
>>>>> we also get multiple uses for definition of  _6 and _7. So I used this
>>>>> iterative algorithm as the simplest one.
>>>>
>>>> But it walks the entire pattern again and again while you only need to
>>>> ensure you walk the pattern tree of the now single-use DEF again
>>>> (in fact, rather than replacing a random USE in ifcvt_split_def_stmt
>>>> you should pass down the user_operand_p that you need to make
>>>> single-use).
>>>>
>>>>> I think it would be nice to re-use some utility from tree-vect-patterns.c
>>>>> for stmt_is_root_of_bool_pattern.
>>>>>
>>>>> I assume that function stmt_is_root_of_bool_pattern can be simplified
>>>>> to check on COND_EXPR only since PHI predication and memory access
>>>>> predication produced only such statements,i.e. it can look like
>>>>>
>>>>> static bool
>>>>> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
>>>>> {
>>>>>   enum tree_code code;
>>>>>   tree lhs, rhs;
>>>>>
>>>>>   code = gimple_assign_rhs_code (stmt);
>>>>>   if (code == COND_EXPR)
>>>>>     {
>>>>>       rhs = gimple_assign_rhs1 (stmt);
>>>>>       if (TREE_CODE (rhs) != SSA_NAME)
>>>>> return false;
>>>>>       *var = rhs;
>>>>>       return true;
>>>>>     }
>>>>>   return false;
>>>>> }
>>>>>
>>>>> I also did few minor changes in patch.2.
>>>>>
>>>>> 3. You can also notice that I inserted code in tree_if_conversion to
>>>>> do loop version if explicit option "-ftree-loop-if-convert" was not
>>>>> passed to compiler, i.e. we perform if-conversion for loop
>>>>> vectorization only and if it does not take place, we should delete
>>>>> if-converted version of loop.
>>>>> What is your opinion?
>>>>
>>>> Overall part 1 and part 2 look good to me, predicate_scalar_phi
>>>> looks in need of some refactoring to avoid duplicate code.  We can
>>>> do that a followup.
>>>>
>>>> Part 3 still needs the iteration to be resolved and make the use we
>>>> actually care about single-use, not a random one so we can avoid
>>>> iterating completely.
>>>>
>>>> Richard.
>>>>
>>>>> Thanks.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> Here is updated patch which includes
>>>>>>> (1) split critical edges for aggressive if conversion.
>>>>>>> (2) delete all stuff related to support of critical edge predication.
>>>>>>> (3) only one function - predicate_scalar_phi performs predication.
>>>>>>> (4) function find_phi_replacement_condition was deleted since it was
>>>>>>> included in predicate_scalar_phi for phi with two arguments.
>>>>>>>
>>>>>>> I checked that patch works in stress testing mode, i.e. with
>>>>>>> aggressive if conversion by default.
>>>>>>>
>>>>>>> What is your opinion?
>>>>>>
>>>>>> Looks ok overall, but please simply do
>>>>>>
>>>>>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>>>>>       split_edge (e);
>>>>>>
>>>>>> for all blocks apart from the latch.
>>>>>>
>>>>>> Can you please send a combined patch up to this one?  Looking at
>>>>>> the incremental diff is somewhat hard.  Thus a patch including all
>>>>>> patches from patch1 to this one.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Thanks for your reply!
>>>>>>>>>
>>>>>>>>> I didn't understand your point:
>>>>>>>>>
>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>>>>>
>>>>>>>>> but you do it unconditionally in proposed patch.
>>>>>>>>
>>>>>>>> I don't mind means I am fine with it.
>>>>>>>>
>>>>>>>>> Also I assume that
>>>>>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>>>>>> split headers of loops, loop exit blocks etc.
>>>>>>>>
>>>>>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>>>>>> be surprised if so but that may be possible.
>>>>>>>>
>>>>>>>>> I prefer to do something
>>>>>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>>>>>> destination bb belongs to loop).
>>>>>>>>
>>>>>>>> That works for me as well but it is more complicated to implement.
>>>>>>>> Ideally you'd only split one edge if you find a block with only critical
>>>>>>>> predecessors (where we'd currently give up).  But note that this
>>>>>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>>>>>> will change loop->num_nodes so we have to be more careful in
>>>>>>>> constructing the loop calling if_convertible_bb_p.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>>>>>> I have few questions about your comments.
>>>>>>>>>>>
>>>>>>>>>>> 1. You wrote :
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> path
>>>>>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>>>>>> predicate_extended scalar phi to one function?
>>>>>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>>>>>> compile time since it builds hash_map.
>>>>>>>>>>
>>>>>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>>>>>> non-aggressive if-conversion.
>>>>>>>>>>
>>>>>>>>>>> 2. About critical edge splitting.
>>>>>>>>>>>
>>>>>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>>>>>> option only; (2) should we split all critical edges.
>>>>>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>>>>>
>>>>>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>>>>>> do something like
>>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-if-conv.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>>>>>      return 0;
>>>>>>>>>>
>>>>>>>>>> +  bool critical_edges_split_p = false;
>>>>>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>>>>>             && !loop->dont_vectorize))
>>>>>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>>>>>> +      {
>>>>>>>>>> +       if (!critical_edges_split_p)
>>>>>>>>>> +         {
>>>>>>>>>> +           split_critical_edges ();
>>>>>>>>>> +           critical_edges_split_p = true;
>>>>>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>>>>>> +         }
>>>>>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>>>>>> +      }
>>>>>>>>>>
>>>>>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>>>>>    {
>>>>>>>>>>
>>>>>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>>>>>> additional option.
>>>>>>>>>>
>>>>>>>>>> Yes, I know.
>>>>>>>>>>
>>>>>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>>>>>> critical edges then I am all for that solution.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Thanks ahead.
>>>>>>>>>>> Yuri.
>>>>>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>>>>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>>>>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>>>>>> and at least one incoming edge
>>>>>>>>>>>>> is critical. This saved iterator can be used by extended phi predication.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>>>>>
>>>>>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 5>:
>>>>>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 9>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 6>:
>>>>>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>>>>>       goto <bb 7>;
>>>>>>>>>>>>>     else
>>>>>>>>>>>>>       goto <bb 8>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>>>>>   {
>>>>>>>>>>>>>     <bb 7>:
>>>>>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>     goto <bb 11>;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>>>>>> #if 0
>>>>>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>>>>>  else
>>>>>>>>>>>>> #endif
>>>>>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>>>>>
>>>>>>>>>>>>> we will get ICE:
>>>>>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>>>>>  void foo (int n)
>>>>>>>>>>>>>       ^
>>>>>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>>>>>> _52 = _1 & _3;
>>>>>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>>>>>
>>>>>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>>>>>
>>>>>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>>>>>
>>>>>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>>>>>
>>>>>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>>>>>> predicating a PHI.
>>>>>>>>>>>>
>>>>>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>>>>>> the printfs properly.
>>>>>>>>>>>>
>>>>>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>>>>>> paths.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>>>>>
>>>>>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog is
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>>>>>> statement iterator.
>>>>>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>>>>>> is predecessor of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>>>>>         number of predecessors is geater 2 and at least one incoming edge is
>>>>>>>>>>>>> critical.
>>>>>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>> Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>>>>>> works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>>>>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to such issues,
>>>>>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect such phi.
>>>>>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>>>>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>>>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course needs
>>>>>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-01-14 14:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 13:36 [PATCH 2/3] Extended if-conversion Yuri Rumyantsev
2014-11-28 12:46 ` Richard Biener
2014-12-01 15:53   ` Yuri Rumyantsev
2014-12-02 13:28     ` Richard Biener
2014-12-02 15:29       ` Yuri Rumyantsev
2014-12-04 12:41         ` Richard Biener
2014-12-04 13:15           ` Yuri Rumyantsev
2014-12-04 13:37             ` Richard Biener
2014-12-09 13:11               ` Yuri Rumyantsev
2014-12-09 15:21                 ` Richard Biener
2014-12-10 10:54                   ` Yuri Rumyantsev
2014-12-10 14:31                     ` Richard Biener
2014-12-10 15:22                       ` Yuri Rumyantsev
2014-12-11  8:59                         ` Richard Biener
2014-12-16 15:16                           ` Yuri Rumyantsev
2014-12-17 15:45                             ` Richard Biener
2014-12-18 13:48                               ` Yuri Rumyantsev
2014-12-19 11:46                                 ` Richard Biener
2014-12-22 14:49                                   ` Yuri Rumyantsev
2015-01-09 12:31                                     ` Richard Biener
2015-01-14 13:33                                       ` Yuri Rumyantsev
2015-01-14 15:00                                         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).