public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]
@ 2023-07-07  9:37 Tamar Christina
  2023-07-07  9:37 ` [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154] Tamar Christina
  2023-07-11 10:59 ` [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Richard Biener
  0 siblings, 2 replies; 4+ messages in thread
From: Tamar Christina @ 2023-07-07  9:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 8914 bytes --]

Hi All,

Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
these two patches finishes the work fixing the regression and improves codegen.

As explained in that commit, ifconvert sorts PHI args in increasing number of
occurrences in order to reduce the number of comparisons done while
traversing the tree.

The remaining task that this patch fixes is dealing with the long chain of
comparisons that can be created from phi nodes, particularly when they share
any common successor (classical example is a diamond node).

on a PHI-node the true and else branches carry a condition, true will
carry `a` and false `~a`.  The issue is that at the moment GCC tests both `a`
and `~a` when the phi node has more than 2 arguments. Clearly this isn't
needed.  The deeper the nesting of phi nodes the larger the repetition.

As an example, for

foo (int *f, int d, int e)
{
  for (int i = 0; i < 1024; i++)
    {
      int a = f[i];
      int t;
      if (a < 0)
	t = 1;
      else if (a < e)
	t = 1 - a * d;
      else
	t = 0;
      f[i] = t;
    }
}

after Jakub's patch we generate:

  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _ifc__42 = _23 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42

but while better than before it is still inefficient, since in the false
branch, where we know ~_7 is true, we still test _21.

This leads to superfluous tests for every diamond node.  After this patch we
generate

 _7 = a_10 < 0;
 _22 = a_10 < e_11(D);
 _ifc__42 = _22 ? t_13 : 0;
 t_6 = _7 ? 1 : _ifc__42;

Which correctly elides the test of _21.  This is done by borrowing the
vectorizer's helper functions to limit predicate mask usages.  Ifcvt will chain
conditionals on the false edge (unless specifically inverted) so this patch on
creating cond a ? b : c, will register ~a when traversing c.  If c is a
conditional then c will be simplified to the smaller possible predicate given
the assumptions we already know to be true.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Not sure how to write a non-fragile testcase for this as the
conditionals chosen depends on threading etc. Any Suggestions?

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (gen_simplified_condition,
	gen_phi_nest_statement): New.
	(gen_phi_arg_condition, predicate_scalar_phi): Use it.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index e342532a343a3c066142adeec5fdfaf736a653e5..16b36dd8b0226f796c1a3fc6d45a9059385e812b 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1870,12 +1870,44 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Generate a simplified conditional.  */
+
+static tree
+gen_simplified_condition (tree cond, scalar_cond_masked_set_type &cond_set)
+{
+  /* Check if the value is already live in a previous branch.  This resolves
+     nested conditionals from diamond PHI reductions.  */
+  if (TREE_CODE (cond) == SSA_NAME)
+    {
+      gimple *stmt = SSA_NAME_DEF_STMT (cond);
+      gassign *assign = NULL;
+      if ((assign = as_a <gassign *> (stmt))
+	   && gimple_assign_rhs_code (assign) == BIT_AND_EXPR)
+	{
+	  tree arg1 = gimple_assign_rhs1 (assign);
+	  tree arg2 = gimple_assign_rhs2 (assign);
+	  if (cond_set.contains ({ arg1, 1 }))
+	    arg1 = boolean_true_node;
+	  else
+	    arg1 = gen_simplified_condition (arg1, cond_set);
+
+	  if (cond_set.contains ({ arg2, 1 }))
+	    arg2 = boolean_true_node;
+	  else
+	    arg2 = gen_simplified_condition (arg2, cond_set);
+
+	  cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, arg1, arg2);
+	}
+    }
+  return cond;
+}
+
 /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
    as to whether the condition is inverted.  */
 
 static tree
-gen_phi_arg_condition (gphi *phi, vec<int> *occur,
-		       gimple_stmt_iterator *gsi, bool *invert)
+gen_phi_arg_condition (gphi *phi, vec<int> *occur, gimple_stmt_iterator *gsi,
+		       scalar_cond_masked_set_type &cond_set, bool *invert)
 {
   int len;
   int i;
@@ -1902,6 +1934,8 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
 	  c = TREE_OPERAND (c, 0);
 	  *invert = true;
 	}
+
+      c = gen_simplified_condition (c, cond_set);
       c = force_gimple_operand_gsi (gsi, unshare_expr (c),
 				    true, NULL_TREE, true, GSI_SAME_STMT);
       if (cond != NULL_TREE)
@@ -1913,11 +1947,79 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
 	}
       else
 	cond = c;
+
+      /* Register the new possibly simplified conditional.  When more than 2
+	 entries in a phi node we chain entries in the false branch, so the
+	 inverted condition is active.  */
+      scalar_cond_masked_key pred_cond ({ cond, 1 });
+      if (!invert)
+	pred_cond.inverted_p = !pred_cond.inverted_p;
+      cond_set.add (pred_cond);
     }
   gcc_assert (cond != NULL_TREE);
   return cond;
 }
 
+/* Create the smallest nested conditional possible.  On pre-order we record
+   which conditionals are live, and on post-order rewrite the chain by removing
+   already active conditions.
+
+   As an example we simplify:
+
+  _7 = a_10 < 0;
+  _21 = a_10 >= 0;
+  _22 = a_10 < e_11(D);
+  _23 = _21 & _22;
+  _ifc__42 = _23 ? t_13 : 0;
+  t_6 = _7 ? 1 : _ifc__42
+
+  into
+
+  _7 = a_10 < 0;
+  _22 = a_10 < e_11(D);
+  _ifc__42 = _22 ? t_13 : 0;
+  t_6 = _7 ? 1 : _ifc__42;
+
+  which produces better code.  */
+
+static tree
+gen_phi_nest_statement (gphi *phi, gimple_stmt_iterator *gsi,
+			scalar_cond_masked_set_type &cond_set, tree type,
+			hash_map<tree_operand_hash, auto_vec<int>> &phi_arg_map,
+			gimple **res_stmt, tree lhs0, vec<tree> &args,
+			unsigned idx)
+{
+  if (idx == args.length ())
+    return args[idx - 1];
+
+  vec<int> *indexes = phi_arg_map.get (args[idx - 1]);
+  bool invert;
+  tree cond = gen_phi_arg_condition (phi, indexes, gsi, cond_set, &invert);
+  tree arg1 = gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
+				      res_stmt, lhs0, args, idx + 1);
+
+  unsigned prev = idx;
+  unsigned curr = prev - 1;
+  tree arg0 = args[curr];
+  tree rhs, lhs;
+  if (idx > 1)
+    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+  else
+    lhs = lhs0;
+
+  if (invert)
+    rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				arg1, arg0);
+  else
+    rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				arg0, arg1);
+  gassign *new_stmt = gimple_build_assign (lhs, rhs);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  update_stmt (new_stmt);
+  *res_stmt = new_stmt;
+  return lhs;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
    This routine can handle PHI nodes with more than two arguments.
 
@@ -1968,6 +2070,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
     }
 
   bb = gimple_bb (phi);
+  /* Keep track of conditionals already seen.  */
+  scalar_cond_masked_set_type cond_set;
   if (EDGE_COUNT (bb->preds) == 2)
     {
       /* Predicate ordinary PHI node with 2 arguments.  */
@@ -1976,6 +2080,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
       first_edge = EDGE_PRED (bb, 0);
       second_edge = EDGE_PRED (bb, 1);
       cond = bb_predicate (first_edge->src);
+      cond_set.add ({ cond, 1 });
       if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
 	std::swap (first_edge, second_edge);
       if (EDGE_COUNT (first_edge->src->succs) > 1)
@@ -1988,7 +2093,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 	}
       else
 	cond = bb_predicate (first_edge->src);
+
       /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = gen_simplified_condition (cond, cond_set);
       cond = force_gimple_operand_gsi (gsi, unshare_expr (cond), true,
 				       NULL_TREE, true, GSI_SAME_STMT);
       true_bb = first_edge->src;
@@ -2110,31 +2217,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   else
     {
       /* Common case.  */
-      vec<int> *indexes;
       tree type = TREE_TYPE (gimple_phi_result (phi));
-      tree lhs;
-      arg1 = args[args_len - 1];
-      for (i = args_len - 1; i > 0; i--)
-	{
-	  arg0 = args[i - 1];
-	  indexes = phi_arg_map.get (args[i - 1]);
-	  if (i != 1)
-	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
-	  else
-	    lhs = res;
-	  bool invert;
-	  cond = gen_phi_arg_condition (phi, indexes, gsi, &invert);
-	  if (invert)
-	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
-					arg1, arg0);
-	  else
-	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
-					arg0, arg1);
-	  new_stmt = gimple_build_assign (lhs, rhs);
-	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-	  update_stmt (new_stmt);
-	  arg1 = lhs;
-	}
+      gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
+			      &new_stmt, res, args, 1);
     }
 
   if (dump_file && (dump_flags & TDF_DETAILS))




-- 

[-- Attachment #2: rb17550.patch --]
[-- Type: text/plain, Size: 6506 bytes --]

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index e342532a343a3c066142adeec5fdfaf736a653e5..16b36dd8b0226f796c1a3fc6d45a9059385e812b 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1870,12 +1870,44 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
   return rhs;
 }
 
+/* Generate a simplified conditional.  */
+
+static tree
+gen_simplified_condition (tree cond, scalar_cond_masked_set_type &cond_set)
+{
+  /* Check if the value is already live in a previous branch.  This resolves
+     nested conditionals from diamond PHI reductions.  */
+  if (TREE_CODE (cond) == SSA_NAME)
+    {
+      gimple *stmt = SSA_NAME_DEF_STMT (cond);
+      gassign *assign = NULL;
+      if ((assign = as_a <gassign *> (stmt))
+	   && gimple_assign_rhs_code (assign) == BIT_AND_EXPR)
+	{
+	  tree arg1 = gimple_assign_rhs1 (assign);
+	  tree arg2 = gimple_assign_rhs2 (assign);
+	  if (cond_set.contains ({ arg1, 1 }))
+	    arg1 = boolean_true_node;
+	  else
+	    arg1 = gen_simplified_condition (arg1, cond_set);
+
+	  if (cond_set.contains ({ arg2, 1 }))
+	    arg2 = boolean_true_node;
+	  else
+	    arg2 = gen_simplified_condition (arg2, cond_set);
+
+	  cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, arg1, arg2);
+	}
+    }
+  return cond;
+}
+
 /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
    as to whether the condition is inverted.  */
 
 static tree
-gen_phi_arg_condition (gphi *phi, vec<int> *occur,
-		       gimple_stmt_iterator *gsi, bool *invert)
+gen_phi_arg_condition (gphi *phi, vec<int> *occur, gimple_stmt_iterator *gsi,
+		       scalar_cond_masked_set_type &cond_set, bool *invert)
 {
   int len;
   int i;
@@ -1902,6 +1934,8 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
 	  c = TREE_OPERAND (c, 0);
 	  *invert = true;
 	}
+
+      c = gen_simplified_condition (c, cond_set);
       c = force_gimple_operand_gsi (gsi, unshare_expr (c),
 				    true, NULL_TREE, true, GSI_SAME_STMT);
       if (cond != NULL_TREE)
@@ -1913,11 +1947,79 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
 	}
       else
 	cond = c;
+
+      /* Register the new possibly simplified conditional.  When more than 2
+	 entries in a phi node we chain entries in the false branch, so the
+	 inverted condition is active.  */
+      scalar_cond_masked_key pred_cond ({ cond, 1 });
+      if (!invert)
+	pred_cond.inverted_p = !pred_cond.inverted_p;
+      cond_set.add (pred_cond);
     }
   gcc_assert (cond != NULL_TREE);
   return cond;
 }
 
+/* Create the smallest nested conditional possible.  On pre-order we record
+   which conditionals are live, and on post-order rewrite the chain by removing
+   already active conditions.
+
+   As an example we simplify:
+
+  _7 = a_10 < 0;
+  _21 = a_10 >= 0;
+  _22 = a_10 < e_11(D);
+  _23 = _21 & _22;
+  _ifc__42 = _23 ? t_13 : 0;
+  t_6 = _7 ? 1 : _ifc__42
+
+  into
+
+  _7 = a_10 < 0;
+  _22 = a_10 < e_11(D);
+  _ifc__42 = _22 ? t_13 : 0;
+  t_6 = _7 ? 1 : _ifc__42;
+
+  which produces better code.  */
+
+static tree
+gen_phi_nest_statement (gphi *phi, gimple_stmt_iterator *gsi,
+			scalar_cond_masked_set_type &cond_set, tree type,
+			hash_map<tree_operand_hash, auto_vec<int>> &phi_arg_map,
+			gimple **res_stmt, tree lhs0, vec<tree> &args,
+			unsigned idx)
+{
+  if (idx == args.length ())
+    return args[idx - 1];
+
+  vec<int> *indexes = phi_arg_map.get (args[idx - 1]);
+  bool invert;
+  tree cond = gen_phi_arg_condition (phi, indexes, gsi, cond_set, &invert);
+  tree arg1 = gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
+				      res_stmt, lhs0, args, idx + 1);
+
+  unsigned prev = idx;
+  unsigned curr = prev - 1;
+  tree arg0 = args[curr];
+  tree rhs, lhs;
+  if (idx > 1)
+    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
+  else
+    lhs = lhs0;
+
+  if (invert)
+    rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				arg1, arg0);
+  else
+    rhs = fold_build_cond_expr (type, unshare_expr (cond),
+				arg0, arg1);
+  gassign *new_stmt = gimple_build_assign (lhs, rhs);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  update_stmt (new_stmt);
+  *res_stmt = new_stmt;
+  return lhs;
+}
+
 /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
    This routine can handle PHI nodes with more than two arguments.
 
@@ -1968,6 +2070,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
     }
 
   bb = gimple_bb (phi);
+  /* Keep track of conditionals already seen.  */
+  scalar_cond_masked_set_type cond_set;
   if (EDGE_COUNT (bb->preds) == 2)
     {
       /* Predicate ordinary PHI node with 2 arguments.  */
@@ -1976,6 +2080,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
       first_edge = EDGE_PRED (bb, 0);
       second_edge = EDGE_PRED (bb, 1);
       cond = bb_predicate (first_edge->src);
+      cond_set.add ({ cond, 1 });
       if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
 	std::swap (first_edge, second_edge);
       if (EDGE_COUNT (first_edge->src->succs) > 1)
@@ -1988,7 +2093,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
 	}
       else
 	cond = bb_predicate (first_edge->src);
+
       /* Gimplify the condition to a valid cond-expr conditonal operand.  */
+      cond = gen_simplified_condition (cond, cond_set);
       cond = force_gimple_operand_gsi (gsi, unshare_expr (cond), true,
 				       NULL_TREE, true, GSI_SAME_STMT);
       true_bb = first_edge->src;
@@ -2110,31 +2217,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   else
     {
       /* Common case.  */
-      vec<int> *indexes;
       tree type = TREE_TYPE (gimple_phi_result (phi));
-      tree lhs;
-      arg1 = args[args_len - 1];
-      for (i = args_len - 1; i > 0; i--)
-	{
-	  arg0 = args[i - 1];
-	  indexes = phi_arg_map.get (args[i - 1]);
-	  if (i != 1)
-	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
-	  else
-	    lhs = res;
-	  bool invert;
-	  cond = gen_phi_arg_condition (phi, indexes, gsi, &invert);
-	  if (invert)
-	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
-					arg1, arg0);
-	  else
-	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
-					arg0, arg1);
-	  new_stmt = gimple_build_assign (lhs, rhs);
-	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
-	  update_stmt (new_stmt);
-	  arg1 = lhs;
-	}
+      gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
+			      &new_stmt, res, args, 1);
     }
 
   if (dump_file && (dump_flags & TDF_DETAILS))




^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154]
  2023-07-07  9:37 [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Tamar Christina
@ 2023-07-07  9:37 ` Tamar Christina
  2023-07-11 10:59   ` Richard Biener
  2023-07-11 10:59 ` [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Richard Biener
  1 sibling, 1 reply; 4+ messages in thread
From: Tamar Christina @ 2023-07-07  9:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw

[-- Attachment #1: Type: text/plain, Size: 10112 bytes --]

Hi All,

This patch builds on the previous patch by fixing another issue with the
way ifcvt currently picks which branches to test.

The issue with the current implementation is while it sorts for
occurrences of the argument, it doesn't check for complexity of the arguments.

As an example:

  <bb 15> [local count: 528603100]:
  ...
  if (distbb_75 >= 0.0)
    goto <bb 17>; [59.00%]
  else
    goto <bb 16>; [41.00%]

  <bb 16> [local count: 216727269]:
  ...
  goto <bb 19>; [100.00%]

  <bb 17> [local count: 311875831]:
  ...
  if (distbb_75 < iftmp.0_98)
    goto <bb 18>; [20.00%]
  else
    goto <bb 19>; [80.00%]

  <bb 18> [local count: 62375167]:
  ...

  <bb 19> [local count: 528603100]:
  # prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)>

All tree arguments to the PHI have the same number of occurrences, namely 1,
however it makes a big difference which comparison we test first.

Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17,
This means we end up generating 4 comparisons, while 2 would have been enough.

By keeping track of the "complexity" of the COND in each BB, (i.e. the number
of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
using a key tuple of <occurrences, complexity> we end up selecting the compare
from BB 16 and BB 18 first.  BB 16 only requires 1 compare, and BB 18, after we
test BB 16 also only requires one additional compare.  This change paired with
the one previous above results in the optimal 2 compares.

For deep nesting, i.e. for

...
  _79 = vr_15 > 20;
  _80 = _68 & _79;
  _82 = vr_15 <= 20;
  _83 = _68 & _82;
  _84 = vr_15 < -20;
  _85 = _73 & _84;
  _87 = vr_15 >= -20;
  _88 = _73 & _87;
  _ifc__111 = _55 ? 10 : 12;
  _ifc__112 = _70 ? 7 : _ifc__111;
  _ifc__113 = _85 ? 8 : _ifc__112;
  _ifc__114 = _88 ? 9 : _ifc__113;
  _ifc__115 = _45 ? 1 : _ifc__114;
  _ifc__116 = _63 ? 3 : _ifc__115;
  _ifc__117 = _65 ? 4 : _ifc__116;
  _ifc__118 = _83 ? 6 : _ifc__117;
  _ifc__119 = _60 ? 2 : _ifc__118;
  _ifc__120 = _43 ? 13 : _ifc__119;
  _ifc__121 = _75 ? 11 : _ifc__120;
  vw_1 = _80 ? 5 : _ifc__121;

Most of the comparisons are still needed because the chain of
occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
_85 is _73 & vr_15 < -20.  clearly given _73 needs to be true in both branches,
the only additional test needed is on vr_15, where the one test is the negation
of the other.  So we don't need to do the comparison of _73 twice.

The changes in the patch reduces the overall number of compares by one, but has
a bigger effect on the dependency chain.

Previously we would generate 5 instructions chain:

	cmple   p7.s, p4/z, z29.s, z30.s
	cmpne   p7.s, p7/z, z29.s, #0
	cmple   p6.s, p7/z, z31.s, z30.s
	cmpge   p6.s, p6/z, z27.s, z25.s
	cmplt   p15.s, p6/z, z28.s, z21.s

as the longest chain.  With this patch we generate 3:

	cmple   p7.s, p3/z, z27.s, z30.s
	cmpne   p7.s, p7/z, z27.s, #0
	cmpgt   p7.s, p7/z, z31.s, z30.s

and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further.

Bootstrapped and Regtested on aarch64-none-linux-gnu and no issues.

Not sure how to write a non-fragile testcase for this as the
conditionals chosen depends on threading etc. Any Suggestions?

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/109154
	* tree-if-conv.cc (INCLUDE_ALGORITHM): Include.
	(struct bb_predicate): Add no_predicate_stmts.
	(set_bb_predicate): Increase predicate count.
	(set_bb_predicate_gimplified_stmts): Conditionally initialize
	no_predicate_stmts.
	(get_bb_num_predicate_stmts): New.
	(init_bb_predicate): Initialzie no_predicate_stmts.
	(release_bb_predicate): Cleanup no_predicate_stmts.
	(insert_gimplified_predicates): Preserve no_predicate_stmts.

--- inline copy of patch -- 
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 16b36dd8b0226f796c1a3fc6d45a9059385e812b..0ed50d99c46f99a4d1ea0e827ee2b2a3f494b2da 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
      <L18>:;
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -231,6 +232,10 @@ struct bb_predicate {
      recorded here, in order to avoid the duplication of computations
      that occur in previous conditions.  See PR44483.  */
   gimple_seq predicate_gimplified_stmts;
+
+  /* Records the number of statements recorded into
+     PREDICATE_GIMPLIFIED_STMTS.   */
+  unsigned no_predicate_stmts;
 };
 
 /* Returns true when the basic block BB has a predicate.  */
@@ -254,10 +259,16 @@ bb_predicate (basic_block bb)
 static inline void
 set_bb_predicate (basic_block bb, tree cond)
 {
+  auto aux = (struct bb_predicate *) bb->aux;
   gcc_assert ((TREE_CODE (cond) == TRUTH_NOT_EXPR
 	       && is_gimple_val (TREE_OPERAND (cond, 0)))
 	      || is_gimple_val (cond));
-  ((struct bb_predicate *) bb->aux)->predicate = cond;
+  aux->predicate = cond;
+  aux->no_predicate_stmts++;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Recording block %d value %d\n", bb->index,
+	     aux->no_predicate_stmts);
 }
 
 /* Returns the sequence of statements of the gimplification of the
@@ -270,12 +281,16 @@ bb_predicate_gimplified_stmts (basic_block bb)
 }
 
 /* Sets the sequence of statements STMTS of the gimplification of the
-   predicate for basic block BB.  */
+   predicate for basic block BB.  If PRESERVE_COUNTS then don't clear the predicate
+   counts.  */
 
 static inline void
-set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
+set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts,
+				   bool preserve_counts)
 {
   ((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts = stmts;
+  if (stmts == NULL && !preserve_counts)
+    ((struct bb_predicate *) bb->aux)->no_predicate_stmts = 0;
 }
 
 /* Adds the sequence of statements STMTS to the sequence of statements
@@ -296,18 +311,28 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
       gimple *stmt = gsi_stmt (gsi);
       delink_stmt_imm_use (stmt);
       gimple_set_modified (stmt, true);
+      ((struct bb_predicate *) bb->aux)->no_predicate_stmts++;
     }
   gimple_seq_add_seq_without_update
     (&(((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts), stmts);
 }
 
+/* Return the number of statements the predicate of the basic block consists
+   of.  */
+
+static inline unsigned
+get_bb_num_predicate_stmts (basic_block bb)
+{
+  return ((struct bb_predicate *) bb->aux)->no_predicate_stmts;
+}
+
 /* Initializes to TRUE the predicate of basic block BB.  */
 
 static inline void
 init_bb_predicate (basic_block bb)
 {
   bb->aux = XNEW (struct bb_predicate);
-  set_bb_predicate_gimplified_stmts (bb, NULL);
+  set_bb_predicate_gimplified_stmts (bb, NULL, false);
   set_bb_predicate (bb, boolean_true_node);
 }
 
@@ -327,7 +352,7 @@ release_bb_predicate (basic_block bb)
 
       /* Discard them.  */
       gimple_seq_discard (stmts);
-      set_bb_predicate_gimplified_stmts (bb, NULL);
+      set_bb_predicate_gimplified_stmts (bb, NULL, false);
     }
 }
 
@@ -2041,7 +2066,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   tree rhs, res, arg0, arg1, op0, op1, scev;
   tree cond;
   unsigned int index0;
-  unsigned int max, args_len;
   edge e;
   basic_block bb;
   unsigned int i;
@@ -2145,7 +2169,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   bool swap = false;
   hash_map<tree_operand_hash, auto_vec<int> > phi_arg_map;
   unsigned int num_args = gimple_phi_num_args (phi);
-  int max_ind = -1;
   /* Vector of different PHI argument values.  */
   auto_vec<tree> args (num_args);
 
@@ -2160,28 +2183,38 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
       phi_arg_map.get_or_insert (arg).safe_push (i);
     }
 
-  /* Determine element with max number of occurrences.  */
-  max_ind = -1;
-  max = 1;
-  args_len = args.length ();
-  for (i = 0; i < args_len; i++)
+  /* Determine element with max number of occurrences and complexity.  Looking at only
+     number of occurrences as a measure for complexity isn't enough as all usages can
+     be unique but the comparisons to reach the PHI node differ per branch.  */
+  typedef std::pair <tree, std::pair <unsigned, unsigned>> ArgEntry;
+  auto_vec<ArgEntry> argsKV;
+  for (i = 0; i < args.length (); i++)
     {
-      unsigned int len;
-      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+      unsigned int len = 0;
+      for (int index : phi_arg_map.get (args[i]))
 	{
-	  max_ind = (int) i;
-	  max = len;
+	  edge e = gimple_phi_arg_edge (phi, index);
+	  len += get_bb_num_predicate_stmts (e->src);
 	}
+
+      unsigned occur = phi_arg_map.get (args[i])->length ();
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
+      argsKV.safe_push ({ args[i], { len, occur }});
     }
 
-  /* Put element with max number of occurences to the end of ARGS.  */
-  if (max_ind != -1 && max_ind + 1 != (int) args_len)
-    std::swap (args[args_len - 1], args[max_ind]);
+  /* Sort elements based on rankings ARGS.  */
+  std::sort(argsKV.begin(), argsKV.end(), [](ArgEntry &left, ArgEntry &right) {
+    return left.second < right.second;
+  });
+
+  for (i = 0; i < args.length (); i++)
+    args[i] = argsKV[i].first;
 
   /* Handle one special case when number of arguments with different values
      is equal 2 and one argument has the only occurrence.  Such PHI can be
      handled as if would have only 2 arguments.  */
-  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+  if (args.length () == 2 && phi_arg_map.get (args[0])->length () == 1)
     {
       vec<int> *indexes;
       indexes = phi_arg_map.get (args[0]);
@@ -2317,7 +2350,7 @@ insert_gimplified_predicates (loop_p loop)
 	    }
 
 	  /* Once the sequence is code generated, set it to NULL.  */
-	  set_bb_predicate_gimplified_stmts (bb, NULL);
+	  set_bb_predicate_gimplified_stmts (bb, NULL, true);
 	}
     }
 }




-- 

[-- Attachment #2: rb17551.patch --]
[-- Type: text/plain, Size: 6322 bytes --]

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 16b36dd8b0226f796c1a3fc6d45a9059385e812b..0ed50d99c46f99a4d1ea0e827ee2b2a3f494b2da 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
      <L18>:;
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -231,6 +232,10 @@ struct bb_predicate {
      recorded here, in order to avoid the duplication of computations
      that occur in previous conditions.  See PR44483.  */
   gimple_seq predicate_gimplified_stmts;
+
+  /* Records the number of statements recorded into
+     PREDICATE_GIMPLIFIED_STMTS.   */
+  unsigned no_predicate_stmts;
 };
 
 /* Returns true when the basic block BB has a predicate.  */
@@ -254,10 +259,16 @@ bb_predicate (basic_block bb)
 static inline void
 set_bb_predicate (basic_block bb, tree cond)
 {
+  auto aux = (struct bb_predicate *) bb->aux;
   gcc_assert ((TREE_CODE (cond) == TRUTH_NOT_EXPR
 	       && is_gimple_val (TREE_OPERAND (cond, 0)))
 	      || is_gimple_val (cond));
-  ((struct bb_predicate *) bb->aux)->predicate = cond;
+  aux->predicate = cond;
+  aux->no_predicate_stmts++;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Recording block %d value %d\n", bb->index,
+	     aux->no_predicate_stmts);
 }
 
 /* Returns the sequence of statements of the gimplification of the
@@ -270,12 +281,16 @@ bb_predicate_gimplified_stmts (basic_block bb)
 }
 
 /* Sets the sequence of statements STMTS of the gimplification of the
-   predicate for basic block BB.  */
+   predicate for basic block BB.  If PRESERVE_COUNTS then don't clear the predicate
+   counts.  */
 
 static inline void
-set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
+set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts,
+				   bool preserve_counts)
 {
   ((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts = stmts;
+  if (stmts == NULL && !preserve_counts)
+    ((struct bb_predicate *) bb->aux)->no_predicate_stmts = 0;
 }
 
 /* Adds the sequence of statements STMTS to the sequence of statements
@@ -296,18 +311,28 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
       gimple *stmt = gsi_stmt (gsi);
       delink_stmt_imm_use (stmt);
       gimple_set_modified (stmt, true);
+      ((struct bb_predicate *) bb->aux)->no_predicate_stmts++;
     }
   gimple_seq_add_seq_without_update
     (&(((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts), stmts);
 }
 
+/* Return the number of statements the predicate of the basic block consists
+   of.  */
+
+static inline unsigned
+get_bb_num_predicate_stmts (basic_block bb)
+{
+  return ((struct bb_predicate *) bb->aux)->no_predicate_stmts;
+}
+
 /* Initializes to TRUE the predicate of basic block BB.  */
 
 static inline void
 init_bb_predicate (basic_block bb)
 {
   bb->aux = XNEW (struct bb_predicate);
-  set_bb_predicate_gimplified_stmts (bb, NULL);
+  set_bb_predicate_gimplified_stmts (bb, NULL, false);
   set_bb_predicate (bb, boolean_true_node);
 }
 
@@ -327,7 +352,7 @@ release_bb_predicate (basic_block bb)
 
       /* Discard them.  */
       gimple_seq_discard (stmts);
-      set_bb_predicate_gimplified_stmts (bb, NULL);
+      set_bb_predicate_gimplified_stmts (bb, NULL, false);
     }
 }
 
@@ -2041,7 +2066,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   tree rhs, res, arg0, arg1, op0, op1, scev;
   tree cond;
   unsigned int index0;
-  unsigned int max, args_len;
   edge e;
   basic_block bb;
   unsigned int i;
@@ -2145,7 +2169,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
   bool swap = false;
   hash_map<tree_operand_hash, auto_vec<int> > phi_arg_map;
   unsigned int num_args = gimple_phi_num_args (phi);
-  int max_ind = -1;
   /* Vector of different PHI argument values.  */
   auto_vec<tree> args (num_args);
 
@@ -2160,28 +2183,38 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
       phi_arg_map.get_or_insert (arg).safe_push (i);
     }
 
-  /* Determine element with max number of occurrences.  */
-  max_ind = -1;
-  max = 1;
-  args_len = args.length ();
-  for (i = 0; i < args_len; i++)
+  /* Determine element with max number of occurrences and complexity.  Looking at only
+     number of occurrences as a measure for complexity isn't enough as all usages can
+     be unique but the comparisons to reach the PHI node differ per branch.  */
+  typedef std::pair <tree, std::pair <unsigned, unsigned>> ArgEntry;
+  auto_vec<ArgEntry> argsKV;
+  for (i = 0; i < args.length (); i++)
     {
-      unsigned int len;
-      if ((len = phi_arg_map.get (args[i])->length ()) > max)
+      unsigned int len = 0;
+      for (int index : phi_arg_map.get (args[i]))
 	{
-	  max_ind = (int) i;
-	  max = len;
+	  edge e = gimple_phi_arg_edge (phi, index);
+	  len += get_bb_num_predicate_stmts (e->src);
 	}
+
+      unsigned occur = phi_arg_map.get (args[i])->length ();
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
+      argsKV.safe_push ({ args[i], { len, occur }});
     }
 
-  /* Put element with max number of occurences to the end of ARGS.  */
-  if (max_ind != -1 && max_ind + 1 != (int) args_len)
-    std::swap (args[args_len - 1], args[max_ind]);
+  /* Sort elements based on rankings ARGS.  */
+  std::sort(argsKV.begin(), argsKV.end(), [](ArgEntry &left, ArgEntry &right) {
+    return left.second < right.second;
+  });
+
+  for (i = 0; i < args.length (); i++)
+    args[i] = argsKV[i].first;
 
   /* Handle one special case when number of arguments with different values
      is equal 2 and one argument has the only occurrence.  Such PHI can be
      handled as if would have only 2 arguments.  */
-  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
+  if (args.length () == 2 && phi_arg_map.get (args[0])->length () == 1)
     {
       vec<int> *indexes;
       indexes = phi_arg_map.get (args[0]);
@@ -2317,7 +2350,7 @@ insert_gimplified_predicates (loop_p loop)
 	    }
 
 	  /* Once the sequence is code generated, set it to NULL.  */
-	  set_bb_predicate_gimplified_stmts (bb, NULL);
+	  set_bb_predicate_gimplified_stmts (bb, NULL, true);
 	}
     }
 }




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]
  2023-07-07  9:37 [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Tamar Christina
  2023-07-07  9:37 ` [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154] Tamar Christina
@ 2023-07-11 10:59 ` Richard Biener
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Biener @ 2023-07-11 10:59 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 7 Jul 2023, Tamar Christina wrote:

> Hi All,
> 
> Following on from Jakub's patch in g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
> these two patches finishes the work fixing the regression and improves codegen.
> 
> As explained in that commit, ifconvert sorts PHI args in increasing number of
> occurrences in order to reduce the number of comparisons done while
> traversing the tree.
> 
> The remaining task that this patch fixes is dealing with the long chain of
> comparisons that can be created from phi nodes, particularly when they share
> any common successor (classical example is a diamond node).
> 
> on a PHI-node the true and else branches carry a condition, true will
> carry `a` and false `~a`.  The issue is that at the moment GCC tests both `a`
> and `~a` when the phi node has more than 2 arguments. Clearly this isn't
> needed.  The deeper the nesting of phi nodes the larger the repetition.
> 
> As an example, for
> 
> foo (int *f, int d, int e)
> {
>   for (int i = 0; i < 1024; i++)
>     {
>       int a = f[i];
>       int t;
>       if (a < 0)
> 	t = 1;
>       else if (a < e)
> 	t = 1 - a * d;
>       else
> 	t = 0;
>       f[i] = t;
>     }
> }
> 
> after Jakub's patch we generate:
> 
>   _7 = a_10 < 0;
>   _21 = a_10 >= 0;
>   _22 = a_10 < e_11(D);
>   _23 = _21 & _22;
>   _ifc__42 = _23 ? t_13 : 0;
>   t_6 = _7 ? 1 : _ifc__42
> 
> but while better than before it is still inefficient, since in the false
> branch, where we know ~_7 is true, we still test _21.
> 
> This leads to superfluous tests for every diamond node.  After this patch we
> generate
> 
>  _7 = a_10 < 0;
>  _22 = a_10 < e_11(D);
>  _ifc__42 = _22 ? t_13 : 0;
>  t_6 = _7 ? 1 : _ifc__42;
> 
> Which correctly elides the test of _21.  This is done by borrowing the
> vectorizer's helper functions to limit predicate mask usages.  Ifcvt will chain
> conditionals on the false edge (unless specifically inverted) so this patch on
> creating cond a ? b : c, will register ~a when traversing c.  If c is a
> conditional then c will be simplified to the smaller possible predicate given
> the assumptions we already know to be true.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Not sure how to write a non-fragile testcase for this as the
> conditionals chosen depends on threading etc. Any Suggestions?
> 
> Ok for master?

OK.

For a testcase I wonder if you can produce a GIMPLE FE one starting
with pass_fix_loops?  (I think it's still not possible to start
when in LC SSA)

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/109154
> 	* tree-if-conv.cc (gen_simplified_condition,
> 	gen_phi_nest_statement): New.
> 	(gen_phi_arg_condition, predicate_scalar_phi): Use it.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index e342532a343a3c066142adeec5fdfaf736a653e5..16b36dd8b0226f796c1a3fc6d45a9059385e812b 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1870,12 +1870,44 @@ convert_scalar_cond_reduction (gimple *reduc, gimple_stmt_iterator *gsi,
>    return rhs;
>  }
>  
> +/* Generate a simplified conditional.  */
> +
> +static tree
> +gen_simplified_condition (tree cond, scalar_cond_masked_set_type &cond_set)
> +{
> +  /* Check if the value is already live in a previous branch.  This resolves
> +     nested conditionals from diamond PHI reductions.  */
> +  if (TREE_CODE (cond) == SSA_NAME)
> +    {
> +      gimple *stmt = SSA_NAME_DEF_STMT (cond);
> +      gassign *assign = NULL;
> +      if ((assign = as_a <gassign *> (stmt))
> +	   && gimple_assign_rhs_code (assign) == BIT_AND_EXPR)
> +	{
> +	  tree arg1 = gimple_assign_rhs1 (assign);
> +	  tree arg2 = gimple_assign_rhs2 (assign);
> +	  if (cond_set.contains ({ arg1, 1 }))
> +	    arg1 = boolean_true_node;
> +	  else
> +	    arg1 = gen_simplified_condition (arg1, cond_set);
> +
> +	  if (cond_set.contains ({ arg2, 1 }))
> +	    arg2 = boolean_true_node;
> +	  else
> +	    arg2 = gen_simplified_condition (arg2, cond_set);
> +
> +	  cond = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, arg1, arg2);
> +	}
> +    }
> +  return cond;
> +}
> +
>  /* Produce condition for all occurrences of ARG in PHI node.  Set *INVERT
>     as to whether the condition is inverted.  */
>  
>  static tree
> -gen_phi_arg_condition (gphi *phi, vec<int> *occur,
> -		       gimple_stmt_iterator *gsi, bool *invert)
> +gen_phi_arg_condition (gphi *phi, vec<int> *occur, gimple_stmt_iterator *gsi,
> +		       scalar_cond_masked_set_type &cond_set, bool *invert)
>  {
>    int len;
>    int i;
> @@ -1902,6 +1934,8 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
>  	  c = TREE_OPERAND (c, 0);
>  	  *invert = true;
>  	}
> +
> +      c = gen_simplified_condition (c, cond_set);
>        c = force_gimple_operand_gsi (gsi, unshare_expr (c),
>  				    true, NULL_TREE, true, GSI_SAME_STMT);
>        if (cond != NULL_TREE)
> @@ -1913,11 +1947,79 @@ gen_phi_arg_condition (gphi *phi, vec<int> *occur,
>  	}
>        else
>  	cond = c;
> +
> +      /* Register the new possibly simplified conditional.  When more than 2
> +	 entries in a phi node we chain entries in the false branch, so the
> +	 inverted condition is active.  */
> +      scalar_cond_masked_key pred_cond ({ cond, 1 });
> +      if (!invert)
> +	pred_cond.inverted_p = !pred_cond.inverted_p;
> +      cond_set.add (pred_cond);
>      }
>    gcc_assert (cond != NULL_TREE);
>    return cond;
>  }
>  
> +/* Create the smallest nested conditional possible.  On pre-order we record
> +   which conditionals are live, and on post-order rewrite the chain by removing
> +   already active conditions.
> +
> +   As an example we simplify:
> +
> +  _7 = a_10 < 0;
> +  _21 = a_10 >= 0;
> +  _22 = a_10 < e_11(D);
> +  _23 = _21 & _22;
> +  _ifc__42 = _23 ? t_13 : 0;
> +  t_6 = _7 ? 1 : _ifc__42
> +
> +  into
> +
> +  _7 = a_10 < 0;
> +  _22 = a_10 < e_11(D);
> +  _ifc__42 = _22 ? t_13 : 0;
> +  t_6 = _7 ? 1 : _ifc__42;
> +
> +  which produces better code.  */
> +
> +static tree
> +gen_phi_nest_statement (gphi *phi, gimple_stmt_iterator *gsi,
> +			scalar_cond_masked_set_type &cond_set, tree type,
> +			hash_map<tree_operand_hash, auto_vec<int>> &phi_arg_map,
> +			gimple **res_stmt, tree lhs0, vec<tree> &args,
> +			unsigned idx)
> +{
> +  if (idx == args.length ())
> +    return args[idx - 1];
> +
> +  vec<int> *indexes = phi_arg_map.get (args[idx - 1]);
> +  bool invert;
> +  tree cond = gen_phi_arg_condition (phi, indexes, gsi, cond_set, &invert);
> +  tree arg1 = gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
> +				      res_stmt, lhs0, args, idx + 1);
> +
> +  unsigned prev = idx;
> +  unsigned curr = prev - 1;
> +  tree arg0 = args[curr];
> +  tree rhs, lhs;
> +  if (idx > 1)
> +    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
> +  else
> +    lhs = lhs0;
> +
> +  if (invert)
> +    rhs = fold_build_cond_expr (type, unshare_expr (cond),
> +				arg1, arg0);
> +  else
> +    rhs = fold_build_cond_expr (type, unshare_expr (cond),
> +				arg0, arg1);
> +  gassign *new_stmt = gimple_build_assign (lhs, rhs);
> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> +  update_stmt (new_stmt);
> +  *res_stmt = new_stmt;
> +  return lhs;
> +}
> +
>  /* Replace a scalar PHI node with a COND_EXPR using COND as condition.
>     This routine can handle PHI nodes with more than two arguments.
>  
> @@ -1968,6 +2070,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>      }
>  
>    bb = gimple_bb (phi);
> +  /* Keep track of conditionals already seen.  */
> +  scalar_cond_masked_set_type cond_set;
>    if (EDGE_COUNT (bb->preds) == 2)
>      {
>        /* Predicate ordinary PHI node with 2 arguments.  */
> @@ -1976,6 +2080,7 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>        first_edge = EDGE_PRED (bb, 0);
>        second_edge = EDGE_PRED (bb, 1);
>        cond = bb_predicate (first_edge->src);
> +      cond_set.add ({ cond, 1 });
>        if (TREE_CODE (cond) == TRUTH_NOT_EXPR)
>  	std::swap (first_edge, second_edge);
>        if (EDGE_COUNT (first_edge->src->succs) > 1)
> @@ -1988,7 +2093,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>  	}
>        else
>  	cond = bb_predicate (first_edge->src);
> +
>        /* Gimplify the condition to a valid cond-expr conditonal operand.  */
> +      cond = gen_simplified_condition (cond, cond_set);
>        cond = force_gimple_operand_gsi (gsi, unshare_expr (cond), true,
>  				       NULL_TREE, true, GSI_SAME_STMT);
>        true_bb = first_edge->src;
> @@ -2110,31 +2217,9 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>    else
>      {
>        /* Common case.  */
> -      vec<int> *indexes;
>        tree type = TREE_TYPE (gimple_phi_result (phi));
> -      tree lhs;
> -      arg1 = args[args_len - 1];
> -      for (i = args_len - 1; i > 0; i--)
> -	{
> -	  arg0 = args[i - 1];
> -	  indexes = phi_arg_map.get (args[i - 1]);
> -	  if (i != 1)
> -	    lhs = make_temp_ssa_name (type, NULL, "_ifc_");
> -	  else
> -	    lhs = res;
> -	  bool invert;
> -	  cond = gen_phi_arg_condition (phi, indexes, gsi, &invert);
> -	  if (invert)
> -	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
> -					arg1, arg0);
> -	  else
> -	    rhs = fold_build_cond_expr (type, unshare_expr (cond),
> -					arg0, arg1);
> -	  new_stmt = gimple_build_assign (lhs, rhs);
> -	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> -	  update_stmt (new_stmt);
> -	  arg1 = lhs;
> -	}
> +      gen_phi_nest_statement (phi, gsi, cond_set, type, phi_arg_map,
> +			      &new_stmt, res, args, 1);
>      }
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154]
  2023-07-07  9:37 ` [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154] Tamar Christina
@ 2023-07-11 10:59   ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2023-07-11 10:59 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw

On Fri, 7 Jul 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch builds on the previous patch by fixing another issue with the
> way ifcvt currently picks which branches to test.
> 
> The issue with the current implementation is while it sorts for
> occurrences of the argument, it doesn't check for complexity of the arguments.
> 
> As an example:
> 
>   <bb 15> [local count: 528603100]:
>   ...
>   if (distbb_75 >= 0.0)
>     goto <bb 17>; [59.00%]
>   else
>     goto <bb 16>; [41.00%]
> 
>   <bb 16> [local count: 216727269]:
>   ...
>   goto <bb 19>; [100.00%]
> 
>   <bb 17> [local count: 311875831]:
>   ...
>   if (distbb_75 < iftmp.0_98)
>     goto <bb 18>; [20.00%]
>   else
>     goto <bb 19>; [80.00%]
> 
>   <bb 18> [local count: 62375167]:
>   ...
> 
>   <bb 19> [local count: 528603100]:
>   # prephitmp_175 = PHI <_173(18), 0.0(17), _174(16)>
> 
> All tree arguments to the PHI have the same number of occurrences, namely 1,
> however it makes a big difference which comparison we test first.
> 
> Sorting only on occurrences we'll pick the compares coming from BB 18 and BB 17,
> This means we end up generating 4 comparisons, while 2 would have been enough.
> 
> By keeping track of the "complexity" of the COND in each BB, (i.e. the number
> of comparisons needed to traverse from the start [BB 15] to end [BB 19]) and
> using a key tuple of <occurrences, complexity> we end up selecting the compare
> from BB 16 and BB 18 first.  BB 16 only requires 1 compare, and BB 18, after we
> test BB 16 also only requires one additional compare.  This change paired with
> the one previous above results in the optimal 2 compares.
> 
> For deep nesting, i.e. for
> 
> ...
>   _79 = vr_15 > 20;
>   _80 = _68 & _79;
>   _82 = vr_15 <= 20;
>   _83 = _68 & _82;
>   _84 = vr_15 < -20;
>   _85 = _73 & _84;
>   _87 = vr_15 >= -20;
>   _88 = _73 & _87;
>   _ifc__111 = _55 ? 10 : 12;
>   _ifc__112 = _70 ? 7 : _ifc__111;
>   _ifc__113 = _85 ? 8 : _ifc__112;
>   _ifc__114 = _88 ? 9 : _ifc__113;
>   _ifc__115 = _45 ? 1 : _ifc__114;
>   _ifc__116 = _63 ? 3 : _ifc__115;
>   _ifc__117 = _65 ? 4 : _ifc__116;
>   _ifc__118 = _83 ? 6 : _ifc__117;
>   _ifc__119 = _60 ? 2 : _ifc__118;
>   _ifc__120 = _43 ? 13 : _ifc__119;
>   _ifc__121 = _75 ? 11 : _ifc__120;
>   vw_1 = _80 ? 5 : _ifc__121;
> 
> Most of the comparisons are still needed because the chain of
> occurrences to not negate eachother. i.e. _80 is _73 & vr_15 >= -20 and
> _85 is _73 & vr_15 < -20.  clearly given _73 needs to be true in both branches,
> the only additional test needed is on vr_15, where the one test is the negation
> of the other.  So we don't need to do the comparison of _73 twice.
> 
> The changes in the patch reduces the overall number of compares by one, but has
> a bigger effect on the dependency chain.
> 
> Previously we would generate 5 instructions chain:
> 
> 	cmple   p7.s, p4/z, z29.s, z30.s
> 	cmpne   p7.s, p7/z, z29.s, #0
> 	cmple   p6.s, p7/z, z31.s, z30.s
> 	cmpge   p6.s, p6/z, z27.s, z25.s
> 	cmplt   p15.s, p6/z, z28.s, z21.s
> 
> as the longest chain.  With this patch we generate 3:
> 
> 	cmple   p7.s, p3/z, z27.s, z30.s
> 	cmpne   p7.s, p7/z, z27.s, #0
> 	cmpgt   p7.s, p7/z, z31.s, z30.s
> 
> and I don't think (x <= y) && (x != 0) && (z > y) can be reduced further.
> 
> Bootstrapped and Regtested on aarch64-none-linux-gnu and no issues.
> 
> Not sure how to write a non-fragile testcase for this as the
> conditionals chosen depends on threading etc. Any Suggestions?
> 
> Ok for master?

OK.

Likewise for the testcase - GIMPLE one starting at fix_loops.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/109154
> 	* tree-if-conv.cc (INCLUDE_ALGORITHM): Include.
> 	(struct bb_predicate): Add no_predicate_stmts.
> 	(set_bb_predicate): Increase predicate count.
> 	(set_bb_predicate_gimplified_stmts): Conditionally initialize
> 	no_predicate_stmts.
> 	(get_bb_num_predicate_stmts): New.
> 	(init_bb_predicate): Initialzie no_predicate_stmts.
> 	(release_bb_predicate): Cleanup no_predicate_stmts.
> 	(insert_gimplified_predicates): Preserve no_predicate_stmts.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 16b36dd8b0226f796c1a3fc6d45a9059385e812b..0ed50d99c46f99a4d1ea0e827ee2b2a3f494b2da 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
>       <L18>:;
>  */
>  
> +#define INCLUDE_ALGORITHM
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -231,6 +232,10 @@ struct bb_predicate {
>       recorded here, in order to avoid the duplication of computations
>       that occur in previous conditions.  See PR44483.  */
>    gimple_seq predicate_gimplified_stmts;
> +
> +  /* Records the number of statements recorded into
> +     PREDICATE_GIMPLIFIED_STMTS.   */
> +  unsigned no_predicate_stmts;
>  };
>  
>  /* Returns true when the basic block BB has a predicate.  */
> @@ -254,10 +259,16 @@ bb_predicate (basic_block bb)
>  static inline void
>  set_bb_predicate (basic_block bb, tree cond)
>  {
> +  auto aux = (struct bb_predicate *) bb->aux;
>    gcc_assert ((TREE_CODE (cond) == TRUTH_NOT_EXPR
>  	       && is_gimple_val (TREE_OPERAND (cond, 0)))
>  	      || is_gimple_val (cond));
> -  ((struct bb_predicate *) bb->aux)->predicate = cond;
> +  aux->predicate = cond;
> +  aux->no_predicate_stmts++;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    fprintf (dump_file, "Recording block %d value %d\n", bb->index,
> +	     aux->no_predicate_stmts);
>  }
>  
>  /* Returns the sequence of statements of the gimplification of the
> @@ -270,12 +281,16 @@ bb_predicate_gimplified_stmts (basic_block bb)
>  }
>  
>  /* Sets the sequence of statements STMTS of the gimplification of the
> -   predicate for basic block BB.  */
> +   predicate for basic block BB.  If PRESERVE_COUNTS then don't clear the predicate
> +   counts.  */
>  
>  static inline void
> -set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
> +set_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts,
> +				   bool preserve_counts)
>  {
>    ((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts = stmts;
> +  if (stmts == NULL && !preserve_counts)
> +    ((struct bb_predicate *) bb->aux)->no_predicate_stmts = 0;
>  }
>  
>  /* Adds the sequence of statements STMTS to the sequence of statements
> @@ -296,18 +311,28 @@ add_bb_predicate_gimplified_stmts (basic_block bb, gimple_seq stmts)
>        gimple *stmt = gsi_stmt (gsi);
>        delink_stmt_imm_use (stmt);
>        gimple_set_modified (stmt, true);
> +      ((struct bb_predicate *) bb->aux)->no_predicate_stmts++;
>      }
>    gimple_seq_add_seq_without_update
>      (&(((struct bb_predicate *) bb->aux)->predicate_gimplified_stmts), stmts);
>  }
>  
> +/* Return the number of statements the predicate of the basic block consists
> +   of.  */
> +
> +static inline unsigned
> +get_bb_num_predicate_stmts (basic_block bb)
> +{
> +  return ((struct bb_predicate *) bb->aux)->no_predicate_stmts;
> +}
> +
>  /* Initializes to TRUE the predicate of basic block BB.  */
>  
>  static inline void
>  init_bb_predicate (basic_block bb)
>  {
>    bb->aux = XNEW (struct bb_predicate);
> -  set_bb_predicate_gimplified_stmts (bb, NULL);
> +  set_bb_predicate_gimplified_stmts (bb, NULL, false);
>    set_bb_predicate (bb, boolean_true_node);
>  }
>  
> @@ -327,7 +352,7 @@ release_bb_predicate (basic_block bb)
>  
>        /* Discard them.  */
>        gimple_seq_discard (stmts);
> -      set_bb_predicate_gimplified_stmts (bb, NULL);
> +      set_bb_predicate_gimplified_stmts (bb, NULL, false);
>      }
>  }
>  
> @@ -2041,7 +2066,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>    tree rhs, res, arg0, arg1, op0, op1, scev;
>    tree cond;
>    unsigned int index0;
> -  unsigned int max, args_len;
>    edge e;
>    basic_block bb;
>    unsigned int i;
> @@ -2145,7 +2169,6 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>    bool swap = false;
>    hash_map<tree_operand_hash, auto_vec<int> > phi_arg_map;
>    unsigned int num_args = gimple_phi_num_args (phi);
> -  int max_ind = -1;
>    /* Vector of different PHI argument values.  */
>    auto_vec<tree> args (num_args);
>  
> @@ -2160,28 +2183,38 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator *gsi)
>        phi_arg_map.get_or_insert (arg).safe_push (i);
>      }
>  
> -  /* Determine element with max number of occurrences.  */
> -  max_ind = -1;
> -  max = 1;
> -  args_len = args.length ();
> -  for (i = 0; i < args_len; i++)
> +  /* Determine element with max number of occurrences and complexity.  Looking at only
> +     number of occurrences as a measure for complexity isn't enough as all usages can
> +     be unique but the comparisons to reach the PHI node differ per branch.  */
> +  typedef std::pair <tree, std::pair <unsigned, unsigned>> ArgEntry;
> +  auto_vec<ArgEntry> argsKV;
> +  for (i = 0; i < args.length (); i++)
>      {
> -      unsigned int len;
> -      if ((len = phi_arg_map.get (args[i])->length ()) > max)
> +      unsigned int len = 0;
> +      for (int index : phi_arg_map.get (args[i]))
>  	{
> -	  max_ind = (int) i;
> -	  max = len;
> +	  edge e = gimple_phi_arg_edge (phi, index);
> +	  len += get_bb_num_predicate_stmts (e->src);
>  	}
> +
> +      unsigned occur = phi_arg_map.get (args[i])->length ();
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file, "Ranking %d as len=%d, idx=%d\n", i, len, occur);
> +      argsKV.safe_push ({ args[i], { len, occur }});
>      }
>  
> -  /* Put element with max number of occurences to the end of ARGS.  */
> -  if (max_ind != -1 && max_ind + 1 != (int) args_len)
> -    std::swap (args[args_len - 1], args[max_ind]);
> +  /* Sort elements based on rankings ARGS.  */
> +  std::sort(argsKV.begin(), argsKV.end(), [](ArgEntry &left, ArgEntry &right) {
> +    return left.second < right.second;
> +  });
> +
> +  for (i = 0; i < args.length (); i++)
> +    args[i] = argsKV[i].first;
>  
>    /* Handle one special case when number of arguments with different values
>       is equal 2 and one argument has the only occurrence.  Such PHI can be
>       handled as if would have only 2 arguments.  */
> -  if (args_len == 2 && phi_arg_map.get (args[0])->length () == 1)
> +  if (args.length () == 2 && phi_arg_map.get (args[0])->length () == 1)
>      {
>        vec<int> *indexes;
>        indexes = phi_arg_map.get (args[0]);
> @@ -2317,7 +2350,7 @@ insert_gimplified_predicates (loop_p loop)
>  	    }
>  
>  	  /* Once the sequence is code generated, set it to NULL.  */
> -	  set_bb_predicate_gimplified_stmts (bb, NULL);
> +	  set_bb_predicate_gimplified_stmts (bb, NULL, true);
>  	}
>      }
>  }
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-11 10:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07  9:37 [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Tamar Christina
2023-07-07  9:37 ` [PATCH 2/2]middle-end ifcvt: Sort PHI arguments not only occurrences but also complexity [PR109154] Tamar Christina
2023-07-11 10:59   ` Richard Biener
2023-07-11 10:59 ` [PATCH 1/2]middle-end ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154] Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).