[PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
@ 2021-06-24  9:54 Di Zhao
  2021-06-24 12:29 ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Di Zhao @ 2021-06-24  9:54 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5071 bytes --]

This patch enhances FRE by recording equivalences generated by conditional
statements, so that we can find more optimize opportunities in situations like
following two cases:

case 1:
    void f (unsigned int a, unsigned int b)
    {
      if (a == b)
        {
          for (unsigned i = 0; i < a; i++)
                      {
                        if (i == b)
                          foo (); /* Unreachable */
                      }
        }
    }
The "i == b" condition is always false, yet original FRE cannot predict this.
Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s
result. (Moreover, VRP can't infer "i == b" is false either, as its boundary
calculation is hindered by the "unsigned" modifier.)

case 2:
Consider GIMPLE code:
              <bb 2> :
              if (a != 0)
              goto <bb 3>; [INV]
              else
              goto <bb 4>; [INV]

              <bb 3> :

              <bb 4> :
              # c = PHI <y(2), x(3)>
              if (a != 0)
              goto <bb 5>; [INV]
              else
              goto <bb 7>; [INV]

              <bb 5> :
              if (c > x)
              goto <bb 6>; [INV]
              else
              goto <bb 8>; [INV]

              <bb 6> :
              __builtin_puts (&"Unreachable!"[0]);

              <bb 7> :
              __builtin_puts (&"a"[0]);

              <bb 8> :
              ...
The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the same
conditional statement, meanwhile bb2 dominates bb4, so it is predictable that
c==x at bb5 and c==y at bb7. Keeping record of this might bring further
optimizations, such as removing the conditional in bb5.

The basic idea is to use a hashmap to record additional equivalence information
generated by conditional statements.

Insert to the map:
    1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record
    the equivalence of x and y at edge destiny in the map.
    2. Consider case 2 above, when we fail to predict the result of a
    conditional expression (at bb4), if following conditions be satisfied:
              a. There is a previous corresponding predicate. In this case, it will
              be "a != 0 is true" at bb3.
              b. bb3's single predecessor bb2 dominates bb4.
              c. bb3's conditional statement's predicate code (or inverted code) is
              identical with that of bb4. (This condition can be relaxed.)
    Then we can try to find a PHI operation along A's true and false edge, and
    record the equivalence of PHI operation's result and arguments at bb4's
    true and false destinations. Regarding this case, "c==x at bb5" and
    "c==y at bb7" will be recorded.

Use the map:
    When we cannot find a prediction result for a conditional statement by
    original process, replace conditional expression's operands with qualified
    equivalence and try again.

As searching predicates and the ssa names to record are based on
value numbering, we need to "unwind" the equivalence map for iteration.

Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu.

Regards,
Di Zhao

Extend FRE with an "equivalence map" for condition prediction.

2021-06-24  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
        (dominated_by_p_w_unex): Moved upward, no change.
        (vn_nary_op_get_predicated_value): Moved upward, no change.
        (struct val_equiv_hasher): Hasher for the "equivalence map".
        (compute_single_op_hash): Compute hashcode from ssa name.
        (is_vn_qualified_at_bb): Check if vn_pval is valid at BB.
        (val_equiv_insert): Insert into "equivalence map".
        (vn_lookup_cond_result): Lookup conditional expression's result by VN.
        (find_predicated_value_by_equiv): Search for predicated value,
        utilizing equivalences that we have recorded.
        (record_equiv_from_previous_edge): Record equivalence relation from a
        previouse edge on current edge.
        (record_equiv_from_previous_cond): Search for equivalences generated by
        previous conditional statements, and record them on current BB's
        successors.
        (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
        into the "equivalence map" for predicate like "x==y is true".
        (free_rpo_vn): Free the "equivalence map".
        (process_bb): Insert into & lookup from the "equivalence map".
        (struct unwind_state): Add "equivalence map" unwind state.
        (do_unwind): Unwind the "equivalence map".
        (do_rpo_vn): Update "equivalence map" unwind state.

gcc/testsuite/ChangeLog:
        PR tree-optimization/101186
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-96.c: New test.

[-- Attachment #2: tree-optimization-101186.patch --]
[-- Type: application/octet-stream, Size: 20679 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
new file mode 100644
index 00000000000..9ba1c8c61ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
@@ -0,0 +1,80 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void boo(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable, cannot be eliminated by VRP. FRE can eliminate this.
+	     */
+	    foo ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b)
+{
+  for (unsigned i = 0; i < a; i++)
+    {
+      if (a == b)
+	{
+	  if (i >= b)
+	    /* Unreachable, cannot be eliminated by VRP. FRE can eliminate this.
+	     */
+	    foo ();
+	}
+	bag ();
+    }
+}
+
+void h (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.   */
+		foo ();
+	    }
+	}
+    }
+}
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable. */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
new file mode 100644
index 00000000000..8e3a133f1ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.
+	     */
+	    bar ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (a != 0)
+    {
+      c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should be eliminated.
+	     */
+	    foo ();
+	}
+      i++;
+    }
+}
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	{
+	  c = b;
+	}
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.
+	     */
+	    bag ();
+	}
+      i++;
+    }
+}
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index bafb65a53d6..bea2f1b0de5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdisable-tree-fre1 -fdisable-tree-fre2 -fdisable-tree-fre3 -fdump-tree-vrp1" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 64e3a707f5c..bcc3ea8ec17 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -3781,6 +3781,366 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (! single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+    return true;
+}
+
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      /* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+      if (dominated_by_p_w_unex
+	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	     false))
+	return val->result;
+  return NULL_TREE;
+}
+
+/* hashtable & helpers to record equivalences at given bb.  */
+
+typedef struct val_equiv_s
+{
+  val_equiv_s *next;
+  val_equiv_s *unwind_to;
+  hashval_t hashcode;
+  /* SSA name this val_equiv_s is associated with.  */
+  tree name;
+  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
+  vn_pval *values;
+} * val_equiv_t;
+
+struct val_equiv_hasher : nofree_ptr_hash<val_equiv_s>
+{
+  static inline hashval_t hash (const val_equiv_s *entry)
+  {
+    return entry->hashcode;
+  }
+  static inline bool equal (const val_equiv_t &entry, const val_equiv_t &other)
+  {
+    return other->name == entry->name;
+  }
+};
+
+static hash_table<val_equiv_hasher> *val_equiv_hash;
+typedef hash_table<val_equiv_hasher>::iterator val_equiv_iterator_type;
+
+static val_equiv_t last_inserted_equiv;
+
+static hashval_t
+compute_single_op_hash (tree name)
+{
+  inchash::hash hstate;
+  inchash::add_expr (name, hstate);
+  return hstate.end ();
+}
+
+static inline bool
+is_vn_qualified_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p (CDI_DOMINATORS, bb, val_bb))
+	{
+	  return true;
+	}
+    }
+  return false;
+}
+
+static val_equiv_t
+val_equiv_insert (tree name, tree equiv, basic_block bb, bool recursive)
+{
+  if (TREE_CODE (name) == SSA_NAME)
+    name = VN_INFO (name)->valnum;
+  if (TREE_CODE (equiv) == SSA_NAME)
+    equiv = VN_INFO (equiv)->valnum;
+  if (name == equiv)
+    return NULL;
+  val_equiv_t new_equiv
+    = (val_equiv_t) obstack_alloc (&vn_tables_obstack, sizeof (val_equiv_s));
+  new_equiv->name = name;
+  new_equiv->values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  new_equiv->values->next = NULL;
+  new_equiv->values->n = 1;
+  new_equiv->values->valid_dominated_by_p[0] = bb->index;
+  new_equiv->values->result = equiv;
+  new_equiv->hashcode = compute_single_op_hash (name);
+  val_equiv_t *slot
+    = val_equiv_hash->find_slot_with_hash (new_equiv, new_equiv->hashcode,
+					   INSERT);
+  new_equiv->unwind_to = *slot;
+  auto_vec<tree> todo_list;
+
+  if (*slot)
+    {
+      bool found = false;
+      vn_pval *nval = new_equiv->values;
+      vn_pval **next = &(new_equiv->values);
+      for (vn_pval *val = (*slot)->values; val; val = val->next)
+	{
+	  if (val->result == equiv)
+	    {
+	      found = true;
+	      if (is_vn_qualified_at_bb (val, bb))
+		return *slot;
+	      /* Append value.  */
+	      *next = (vn_pval *) obstack_alloc (&vn_tables_obstack,
+						 sizeof (vn_pval)
+						   + val->n * sizeof (int));
+	      (*next)->next = NULL;
+	      (*next)->result = val->result;
+	      (*next)->n = val->n + 1;
+	      memcpy ((*next)->valid_dominated_by_p, val->valid_dominated_by_p,
+		      val->n * sizeof (int));
+	      (*next)->valid_dominated_by_p[val->n] = bb->index;
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Appending equivalence.\n");
+	      next = &(*next)->next;
+	      continue;
+	    }
+	  /* Copy others.  */
+	  unsigned val_size = sizeof (vn_pval) + (val->n - 1) * sizeof (int);
+	  *next = (vn_pval *) obstack_alloc (&vn_tables_obstack, val_size);
+	  memcpy (*next, val, val_size);
+	  (*next)->next = NULL;
+	  next = &(*next)->next;
+	  /* Record transitive equivalences.  */
+	  if (recursive && is_vn_qualified_at_bb (val, bb))
+	      todo_list.safe_push (val->result);
+	}
+      if (!found)
+	// append new equiv at last
+	*next = nval;
+    }
+
+  *slot = new_equiv;
+  new_equiv->next = last_inserted_equiv;
+  last_inserted_equiv = new_equiv;
+  if (recursive)
+    for (unsigned i = 0; i < todo_list.length (); ++i)
+      {
+	val_equiv_insert (todo_list[i], equiv, bb, false);
+	val_equiv_insert (equiv, todo_list[i], bb, false);
+      }
+  return new_equiv;
+}
+
+static tree
+vn_lookup_cond_result (tree lhs, tree rhs, tree_code code, basic_block bb)
+{
+  // Shortcut for gimple_simplify, as it can get confused by the ~X == 1
+  // -> X == 0 transform.
+  if (lhs == rhs)
+    switch (code)
+      {
+      case EQ_EXPR:
+      case LE_EXPR:
+      case GE_EXPR:
+	return boolean_true_node;
+      case LT_EXPR:
+      case GT_EXPR:
+      case NE_EXPR:
+	return boolean_false_node;
+      default:;
+      }
+  tree val;
+  tree ops[2];
+  ops[0] = lhs;
+  ops[1] = rhs;
+  vn_nary_op_t vno;
+  val = vn_nary_op_lookup_pieces (2, code, boolean_type_node, ops, &vno);
+  /* Did we get a predicated value?  */
+  if (!val && vno && vno->predicated_values)
+    val = vn_nary_op_get_predicated_value (vno, bb);
+  return val;
+}
+
+static tree
+find_predicated_value_by_equiv (tree lhs, tree rhs, tree_code code,
+				basic_block bb)
+{
+  if (TREE_CODE (lhs) == SSA_NAME)
+    lhs = VN_INFO (lhs)->valnum;
+  struct val_equiv_s equiv;
+  tree result;
+  equiv.name = lhs;
+  equiv.hashcode = compute_single_op_hash (lhs);
+  val_equiv_t *slot
+    = val_equiv_hash->find_slot_with_hash (&equiv, equiv.hashcode, NO_INSERT);
+  auto_vec<tree> lhs_equivs;
+  // Search by lhs's equivalences.
+  if (slot)
+    {
+      for (vn_pval *val = (*slot)->values; val; val = val->next)
+	{
+	  if (!is_vn_qualified_at_bb (val, bb))
+	    continue;
+	  if ((result = vn_lookup_cond_result (val->result, rhs, code, bb)))
+	    return result;
+	  // Store equivalence to list, in case we need to search by rhs.
+	  lhs_equivs.safe_push (val->result);
+	}
+    }
+  // Search by rhs's equivalences.
+  if (TREE_CODE (rhs) == SSA_NAME)
+    rhs = VN_INFO (rhs)->valnum;
+  equiv.name = rhs;
+  equiv.hashcode = compute_single_op_hash (rhs);
+  slot
+    = val_equiv_hash->find_slot_with_hash (&equiv, equiv.hashcode, NO_INSERT);
+  if (!slot)
+    return NULL_TREE;
+  for (vn_pval *val = (*slot)->values; val; val = val->next)
+    {
+      if (!is_vn_qualified_at_bb (val, bb))
+	continue;
+      if ((result = vn_lookup_cond_result (lhs, val->result, code, bb)))
+	return result;
+      // Search by both sides's equivalences.
+      for (unsigned j = 0; j < lhs_equivs.length (); ++j)
+	if ((result
+	     = vn_lookup_cond_result (lhs_equivs[j], val->result, code, bb)))
+	  return result;
+    }
+  return NULL_TREE;
+}
+
+/* Record equivalence generated by PHI node, on current visiting edge's
+ * destination.  */
+
+static void
+record_equiv_from_previous_edge (basic_block pred, basic_block dest, edge e)
+{
+  if (!vn_tracking_edge (e))
+    return;
+  while (single_succ_p (dest) && single_pred_p (dest))
+    {
+      pred = dest;
+      dest = single_succ (dest);
+    }
+  if (single_pred_p (dest))
+    return;
+  /* Currently situations like this are not handled:
+	  pred
+	  /  \
+	 /t   \f
+	B       C
+	 \    /  \
+	  \  /    \
+      D (PHI node)  E
+	    |
+	    |
+	  current
+	   / \
+	  /t  \f
+	   ...
+  */
+  for (gphi_iterator gsi = gsi_start_nonvirtual_phis (dest); !gsi_end_p (gsi);
+       gsi_next_nonvirtual_phi (&gsi))
+    {
+      gphi *phi = gsi.phi ();
+      tree res = vn_valueize (PHI_RESULT (phi));
+      for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	{
+	  edge in = gimple_phi_arg_edge (phi, i);
+	  if (in->src == pred)
+	    {
+	      tree arg = vn_valueize (gimple_phi_arg_def (phi, i));
+	      if (arg == res)
+	      	continue;
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Recording equivalence of ");
+		  print_generic_expr (dump_file, res);
+		  fprintf (dump_file, " and ");
+		  print_generic_expr (dump_file, arg);
+		  fprintf (dump_file, " at BB%d\n", e->dest->index);
+		}
+	      val_equiv_insert (res, arg, e->dest, true);
+	      val_equiv_insert (arg, res, e->dest, true);
+	    }
+	}
+    }
+}
+
+static void
+record_equiv_from_previous_cond (vn_nary_op_t vno, edge true_e, edge false_e,
+				 basic_block bb)
+{
+  if (!vno->predicated_values)
+    return;
+  gimple *pred_last;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	basic_block bb1
+	  = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+	if (!single_pred_p (bb1))
+	  continue;
+	basic_block p = single_pred (bb1);
+	if (p && dominated_by_p_w_unex (bb, p, false))
+	  {
+	    // Check whether there's an edge "dominates" bb's true/false edge.
+	    pred_last = last_stmt (p);
+	    if (!pred_last || gimple_code (pred_last) != GIMPLE_COND)
+	      continue;
+	    enum tree_code pred_code = gimple_cond_code (pred_last);
+	    enum tree_code pred_icode = swap_tree_comparison (pred_code);
+	    // For now only identical conditions are handled.
+	    // TODO: even if the conditions are not the same, we still have
+	    // one edge that can "derive" one of the current edges.
+	    if (pred_code != vno->opcode && pred_icode != vno->opcode)
+	      continue;
+	    // Find out previous true & false destinies (here "true" and "false"
+	    // are corresponding to current bb.
+	    // First initialize them with bb1 and "p's successor that is not
+	    // bb1" respectively, then switch them if result in VNO is false.
+	    basic_block pre_true_dest = bb1;
+	    basic_block pre_false_dest = EDGE_SUCC (p, 0)->dest != bb1
+					   ? EDGE_SUCC (p, 0)->dest
+					   : EDGE_SUCC (p, 1)->dest;
+	    if (val->result == boolean_false_node)
+	      {
+		pre_true_dest = pre_false_dest;
+		pre_false_dest = bb1;
+	      }
+	    if (true_e)
+	      record_equiv_from_previous_edge (p, pre_true_dest, true_e);
+	    if (false_e)
+	      record_equiv_from_previous_edge (p, pre_false_dest, false_e);
+	  }
+      }
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -4155,21 +4515,8 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
+  if (!vn_tracking_edge (pred_e))
 	return NULL;
-    }
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4182,6 +4529,12 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
+  if (code == EQ_EXPR && result == boolean_true_node)
+    /* NE_EXPRs with false result are handled by inverted condition.  */
+    {
+      val_equiv_insert (ops[0], ops[1], pred_e->dest, true);
+      val_equiv_insert (ops[1], ops[0], pred_e->dest, true);
+    }
   vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
   init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
   vno1->predicated_values = 1;
@@ -4194,26 +4547,6 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary, true);
 }
 
-static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
-
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
-{
-  if (! vno->predicated_values)
-    return vno->u.result;
-  for (vn_pval *val = vno->u.values; val; val = val->next)
-    for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
-}
-
 /* Insert the rhs of STMT into the current hash table with a value number of
    RESULT.  */
 
@@ -6863,6 +7196,8 @@ free_rpo_vn (void)
       release_ssa_name (info->name);
   obstack_free (&vn_ssa_aux_obstack, NULL);
   delete vn_ssa_aux_hash;
+  delete val_equiv_hash;
+  val_equiv_hash = NULL;
 
   delete constant_to_value_id;
   constant_to_value_id = NULL;
@@ -7241,11 +7576,11 @@ process_bb (rpo_elim &avail, basic_block bb,
 	    tree val = gimple_simplify (gimple_cond_code (last),
 					boolean_type_node, lhs, rhs,
 					NULL, vn_valueize);
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
 		tree ops[2];
 		ops[0] = lhs;
 		ops[1] = rhs;
@@ -7264,6 +7599,21 @@ process_bb (rpo_elim &avail, basic_block bb,
 			print_gimple_stmt (dump_file, last, TDF_SLIM);
 		      }
 		  }
+		if (!val && iterate)
+		  {
+		    // Try to find equivalences in val_equiv_hash and search
+		    // again.
+		    val
+		      = find_predicated_value_by_equiv (lhs, rhs,
+							gimple_cond_code (last),
+							bb);
+		    if (val && dump_file && (dump_flags & TDF_DETAILS))
+		      {
+			fprintf (dump_file, "Found predicated value ");
+			print_generic_expr (dump_file, val, TDF_SLIM);
+			fprintf (dump_file, " by equivalence.\n");
+		      }
+		  }
 	      }
 	    if (val)
 	      e = find_taken_edge (bb, val);
@@ -7316,6 +7666,17 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* If we got a previous condition, but no predicated
+		   value, try to record what we know from previous true & false
+		   branches on current true & false edges.  This might provide
+		   some opportunities for futher prediction.
+		   For now we only record equivalences generated by PHI nodes,
+		   that depends on the previous condition.
+		    */
+		if (iterate && vnresult && (true_e || false_e))
+		  record_equiv_from_previous_cond (vnresult, true_e, false_e,
+						   bb);
 	      }
 	    break;
 	  }
@@ -7437,6 +7798,7 @@ struct unwind_state
   vn_reference_t ref_top;
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
+  val_equiv_t equiv_top;
   vn_avail *avail_top;
 };
 
@@ -7475,6 +7837,18 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       (*slot)->operands.release ();
       valid_info->references->clear_slot (slot);
     }
+  for (; last_inserted_equiv != to->equiv_top;
+       last_inserted_equiv = last_inserted_equiv->next)
+    {
+      val_equiv_t *slot;
+      slot = val_equiv_hash->find_slot_with_hash (last_inserted_equiv,
+						  last_inserted_equiv->hashcode,
+						  NO_INSERT);
+      if ((*slot)->unwind_to)
+	*slot = (*slot)->unwind_to;
+      else
+	val_equiv_hash->clear_slot (slot);
+    }
   obstack_free (&vn_tables_obstack, to->ob_top);
 
   /* Prune [rpo_idx, ] from avail.  */
@@ -7590,6 +7964,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   next_constant_value_id = -1;
 
   vn_ssa_aux_hash = new hash_table <vn_ssa_aux_hasher> (region_size * 2);
+  val_equiv_hash = new hash_table <val_equiv_hasher> (5);
   gcc_obstack_init (&vn_ssa_aux_obstack);
 
   gcc_obstack_init (&vn_tables_obstack);
@@ -7600,6 +7975,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_inserted_equiv = NULL;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7693,6 +8069,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].ref_top = last_inserted_ref;
 	    rpo_state[idx].phi_top = last_inserted_phi;
 	    rpo_state[idx].nary_top = last_inserted_nary;
+	    rpo_state[idx].equiv_top = last_inserted_equiv;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
 	  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-24  9:54 [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Di Zhao
@ 2021-06-24 12:29 ` Richard Biener
  2021-06-24 13:25   ` Andrew MacLeod
  2021-07-18 19:25   ` Di Zhao OS
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Biener @ 2021-06-24 12:29 UTC (permalink / raw)
  To: Di Zhao, Andrew MacLeod; +Cc: gcc-patches

On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch enhances FRE by recording equivalences generated by conditional
> statements, so that we can find more optimize opportunities in situations like
> following two cases:
>
> case 1:
>     void f (unsigned int a, unsigned int b)
>     {
>       if (a == b)
>         {
>           for (unsigned i = 0; i < a; i++)
>                       {
>                         if (i == b)
>                           foo (); /* Unreachable */
>                       }
>         }
>     }
> The "i == b" condition is always false, yet original FRE cannot predict this.
> Because VN only stores "i < a" and "a == b", so there won't be "i == b"'s
> result. (Moreover, VRP can't infer "i == b" is false either, as its boundary
> calculation is hindered by the "unsigned" modifier.)
>
> case 2:
> Consider GIMPLE code:
>               <bb 2> :
>               if (a != 0)
>               goto <bb 3>; [INV]
>               else
>               goto <bb 4>; [INV]
>
>               <bb 3> :
>
>               <bb 4> :
>               # c = PHI <y(2), x(3)>
>               if (a != 0)
>               goto <bb 5>; [INV]
>               else
>               goto <bb 7>; [INV]
>
>               <bb 5> :
>               if (c > x)
>               goto <bb 6>; [INV]
>               else
>               goto <bb 8>; [INV]
>
>               <bb 6> :
>               __builtin_puts (&"Unreachable!"[0]);
>
>               <bb 7> :
>               __builtin_puts (&"a"[0]);
>
>               <bb 8> :
>               ...
> The result of "if (c > x)" in bb4 is unknown. However bb2 and bb4 have the same
> conditional statement, meanwhile bb2 dominates bb4, so it is predictable that
> c==x at bb5 and c==y at bb7. Keeping record of this might bring further
> optimizations, such as removing the conditional in bb5.
>
> The basic idea is to use a hashmap to record additional equivalence information
> generated by conditional statements.
>
> Insert to the map:
>     1. When recording an EQ_EXPR=true predicate, e.g. "x==y is true", record
>     the equivalence of x and y at edge destiny in the map.
>     2. Consider case 2 above, when we fail to predict the result of a
>     conditional expression (at bb4), if following conditions be satisfied:
>               a. There is a previous corresponding predicate. In this case, it will
>               be "a != 0 is true" at bb3.
>               b. bb3's single predecessor bb2 dominates bb4.
>               c. bb3's conditional statement's predicate code (or inverted code) is
>               identical with that of bb4. (This condition can be relaxed.)
>     Then we can try to find a PHI operation along A's true and false edge, and
>     record the equivalence of PHI operation's result and arguments at bb4's
>     true and false destinations. Regarding this case, "c==x at bb5" and
>     "c==y at bb7" will be recorded.
>
> Use the map:
>     When we cannot find a prediction result for a conditional statement by
>     original process, replace conditional expression's operands with qualified
>     equivalence and try again.
>
> As searching predicates and the ssa names to record are based on
> value numbering, we need to "unwind" the equivalence map for iteration.
>
> Bootstrapped and tested on x86_64-pc-linux-gnu and aarch64-unknown-linux-gnu.

I have some reservations about extending the ad-hoc "predicated value" code.

Some comments on the patch:

+/* hashtable & helpers to record equivalences at given bb.  */
+
+typedef struct val_equiv_s
+{
+  val_equiv_s *next;
+  val_equiv_s *unwind_to;
+  hashval_t hashcode;
+  /* SSA name this val_equiv_s is associated with.  */
+  tree name;
+  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
+  vn_pval *values;
+} * val_equiv_t;

all of this (and using a hashtable for recording) is IMHO a bit overkill.
Since you only ever record equivalences for values the more natural
place to hook those in is the vn_ssa_aux structure where we also
record the availability chain.

There's little commentary in the new code, in particular function-level
comments are missing everywhere.

There's complexity issues, like I see val_equiv_insert has a "recurse"
feature but also find_predicated_value_by_equiv is quadratic in
the number of equivalences of the lhs/rhs.  Without knowing what
the recursion on the former is for - nothing tells me - I suspect
either of both should be redundant.

You seem to record equivalences at possible use points which
looks odd at best - I'd expected equivalences being recorded
at the same point we record predicated values and for the
current condition, not the one determining some other predication.
What was the motivation to do it the way you do it?

Why is the code conditional on 'iterate'?

You've probably realized that there's no "nice" way to
handle temporary equivalences in a VN scheme using
hashing for expressions (unless you degrade hashes a lot).

You quote opportunities that are catched with this like

+  if (a != 0)
+    {
+      c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+       {
+         if (i >= b)
+           /* Should be eliminated.
+            */
+           foo ();

but say other "obvious" cases like

+  if (a != 0)
+    {
+      c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+       {
+           /* Should be zero.  */
             return b - c;

are not handled.  That's in line with the current "value predication"
which mainly aims at catching simple jump threading opportunities;
you only simplify conditions with the recorded equivalences.  But
then the complexity of handling equivalences does probably not
outweight the opportunities catched - can you share some numbers
on how many branches are newly known taken during VN with
this patch during say bootstrap or build of SPEC CPU?

I've hoped to ditch the current "value predication" code by eventually
using the relation oracle from ranger but did not yet have the chance
to look into that.  Now, the predicated equivalences are likely not
something that infrastructure can handle?

In the end I think we should research into maintaining an alternate
expression table for conditions (those we like to simplify with
equivalences) and use a data structure that more easily allows to
introduce (temporary) equivalences.  Like by maintaining
back-references of values we've recorded a condition for and a way
to quickly re-canonicalize conditions.  Well - it requires some research,
as said.

Richard.

> Regards,
> Di Zhao
>
> Extend FRE with an "equivalence map" for condition prediction.
>
> 2021-06-24  Di Zhao  <dizhao@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
>         (dominated_by_p_w_unex): Moved upward, no change.
>         (vn_nary_op_get_predicated_value): Moved upward, no change.
>         (struct val_equiv_hasher): Hasher for the "equivalence map".
>         (compute_single_op_hash): Compute hashcode from ssa name.
>         (is_vn_qualified_at_bb): Check if vn_pval is valid at BB.
>         (val_equiv_insert): Insert into "equivalence map".
>         (vn_lookup_cond_result): Lookup conditional expression's result by VN.
>         (find_predicated_value_by_equiv): Search for predicated value,
>         utilizing equivalences that we have recorded.
>         (record_equiv_from_previous_edge): Record equivalence relation from a
>         previouse edge on current edge.
>         (record_equiv_from_previous_cond): Search for equivalences generated by
>         previous conditional statements, and record them on current BB's
>         successors.
>         (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
>         into the "equivalence map" for predicate like "x==y is true".
>         (free_rpo_vn): Free the "equivalence map".
>         (process_bb): Insert into & lookup from the "equivalence map".
>         (struct unwind_state): Add "equivalence map" unwind state.
>         (do_unwind): Unwind the "equivalence map".
>         (do_rpo_vn): Update "equivalence map" unwind state.
>
> gcc/testsuite/ChangeLog:
>         PR tree-optimization/101186
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-96.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-24 12:29 ` Richard Biener
@ 2021-06-24 13:25   ` Andrew MacLeod
  2021-06-24 15:01     ` Andrew MacLeod
  2021-07-18 19:25   ` Di Zhao OS
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew MacLeod @ 2021-06-24 13:25 UTC (permalink / raw)
  To: Richard Biener, Di Zhao; +Cc: gcc-patches, Aldy Hernandez

On 6/24/21 8:29 AM, Richard Biener wrote:
> On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>
> You quote opportunities that are catched with this like
>
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +         if (i >= b)
> +           /* Should be eliminated.
> +            */
> +           foo ();
>
> but say other "obvious" cases like
>
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +           /* Should be zero.  */
>               return b - c;
>
> are not handled.  That's in line with the current "value predication"
> which mainly aims at catching simple jump threading opportunities;
> you only simplify conditions with the recorded equivalences.  But
> then the complexity of handling equivalences does probably not
> outweight the opportunities catched - can you share some numbers
> on how many branches are newly known taken during VN with
> this patch during say bootstrap or build of SPEC CPU?
>
> I've hoped to ditch the current "value predication" code by eventually
> using the relation oracle from ranger but did not yet have the chance
> to look into that.  Now, the predicated equivalences are likely not
> something that infrastructure can handle?
>
> In the end I think we should research into maintaining an alternate
> expression table for conditions (those we like to simplify with
> equivalences) and use a data structure that more easily allows to
> introduce (temporary) equivalences.  Like by maintaining
> back-references of values we've recorded a condition for and a way
> to quickly re-canonicalize conditions.  Well - it requires some research,
> as said.
>
The idea would be to handle all this eventually if it doesnt now.  It'll 
certainly be capable of it. we havent looked into any missed cases yet. 
early days :-)

The oracle  handles predicated things just fine.  We're missing 
transitive relations, and any time an edge has multiple predecessors we 
sort of bail on predicated things for now.  I also haven't gotten to 
producing a nice relation/equivlance map of what we actually know... 
That might be worthwhile sooner than later.

THe original function in EVRP currently looks like:

  =========== BB 2 ============
     <bb 2> :
     if (a_5(D) == b_6(D))
       goto <bb 8>; [INV]
     else
       goto <bb 7>; [INV]

=========== BB 8 ============
Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides 
a_5 and b_6 as equivalences
     <bb 8> :
     goto <bb 6>; [100.00%]

=========== BB 6 ============
     <bb 6> :
     # i_1 = PHI <0(8), i_10(5)>
     if (i_1 < a_5(D))
       goto <bb 3>; [INV]
     else
       goto <bb 7>; [INV]

=========== BB 3 ============
Relational : (i_1 < a_5(D))                         edge 6->3 provides 
this relation
     <bb 3> :
     if (i_1 == b_6(D))
       goto <bb 4>; [INV]
     else
       goto <bb 5>; [INV]


So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < 
a_5 in BB3 as well..

so we should be able to indicate that  i_1 == b_6 as [0,0]..  we 
currently aren't.   I think I had turned on equivalence mapping during 
relational processing, so should be able to tag that without transitive 
relations...  I'll have a look at why.

And once we get a bit further along, you will be able to access this 
without ranger.. if one wants to simply register the relations directly.

Anyway, I'll get back to you why its currently being missed.

Andrew




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-24 13:25   ` Andrew MacLeod
@ 2021-06-24 15:01     ` Andrew MacLeod
  2021-06-25  7:38       ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew MacLeod @ 2021-06-24 15:01 UTC (permalink / raw)
  To: Richard Biener, Di Zhao; +Cc: gcc-patches, Aldy Hernandez

On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> On 6/24/21 8:29 AM, Richard Biener wrote:
>
>
> THe original function in EVRP currently looks like:
>
>  =========== BB 2 ============
>     <bb 2> :
>     if (a_5(D) == b_6(D))
>       goto <bb 8>; [INV]
>     else
>       goto <bb 7>; [INV]
>
> =========== BB 8 ============
> Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides 
> a_5 and b_6 as equivalences
>     <bb 8> :
>     goto <bb 6>; [100.00%]
>
> =========== BB 6 ============
>     <bb 6> :
>     # i_1 = PHI <0(8), i_10(5)>
>     if (i_1 < a_5(D))
>       goto <bb 3>; [INV]
>     else
>       goto <bb 7>; [INV]
>
> =========== BB 3 ============
> Relational : (i_1 < a_5(D))                         edge 6->3 provides 
> this relation
>     <bb 3> :
>     if (i_1 == b_6(D))
>       goto <bb 4>; [INV]
>     else
>       goto <bb 5>; [INV]
>
>
> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 < 
> a_5 in BB3 as well..
>
> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we 
> currently aren't.   I think I had turned on equivalence mapping during 
> relational processing, so should be able to tag that without 
> transitive relations...  I'll have a look at why.
>
> And once we get a bit further along, you will be able to access this 
> without ranger.. if one wants to simply register the relations directly.
>
> Anyway, I'll get back to you why its currently being missed.
>
> Andrew
>
>
>
As promised.  There was a typo in the equivalency comparisons... so it 
was getting missed.  With the fix, the oracle identifies the relation 
and evrp will now fold that expression away and the IL becomes:

   <bb 2> :
   if (a_5(D) == b_6(D))
     goto <bb 4>; [INV]
   else
     goto <bb 5>; [INV]

   <bb 3> :
   i_10 = i_1 + 1;

   <bb 4> :
   # i_1 = PHI <0(2), i_10(3)>
   if (i_1 < a_5(D))
     goto <bb 3>; [INV]
   else
     goto <bb 5>; [INV]

   <bb 5> :
   return;

for the other cases you quote, there are no predictions such that if a 
!= 0 then this equivalency exists...

+  if (a != 0)
+    {
+      c = b;
+    }

but the oracle would register that in the TRUE block,  c and b are 
equivalent... so some other pass that was interested in tracking 
conditions that make a block relevant would be able to compare relations...

Andrew



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-24 15:01     ` Andrew MacLeod
@ 2021-06-25  7:38       ` Richard Biener
  2021-06-27 15:46         ` Aldy Hernandez
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2021-06-25  7:38 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Di Zhao, gcc-patches, Aldy Hernandez

On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> > On 6/24/21 8:29 AM, Richard Biener wrote:
> >
> >
> > THe original function in EVRP currently looks like:
> >
> >  =========== BB 2 ============
> >     <bb 2> :
> >     if (a_5(D) == b_6(D))
> >       goto <bb 8>; [INV]
> >     else
> >       goto <bb 7>; [INV]
> >
> > =========== BB 8 ============
> > Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides
> > a_5 and b_6 as equivalences
> >     <bb 8> :
> >     goto <bb 6>; [100.00%]
> >
> > =========== BB 6 ============
> >     <bb 6> :
> >     # i_1 = PHI <0(8), i_10(5)>
> >     if (i_1 < a_5(D))
> >       goto <bb 3>; [INV]
> >     else
> >       goto <bb 7>; [INV]
> >
> > =========== BB 3 ============
> > Relational : (i_1 < a_5(D))                         edge 6->3 provides
> > this relation
> >     <bb 3> :
> >     if (i_1 == b_6(D))
> >       goto <bb 4>; [INV]
> >     else
> >       goto <bb 5>; [INV]
> >
> >
> > So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
> > a_5 in BB3 as well..
> >
> > so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
> > currently aren't.   I think I had turned on equivalence mapping during
> > relational processing, so should be able to tag that without
> > transitive relations...  I'll have a look at why.
> >
> > And once we get a bit further along, you will be able to access this
> > without ranger.. if one wants to simply register the relations directly.
> >
> > Anyway, I'll get back to you why its currently being missed.
> >
> > Andrew
> >
> >
> >
> As promised.  There was a typo in the equivalency comparisons... so it
> was getting missed.  With the fix, the oracle identifies the relation
> and evrp will now fold that expression away and the IL becomes:
>
>    <bb 2> :
>    if (a_5(D) == b_6(D))
>      goto <bb 4>; [INV]
>    else
>      goto <bb 5>; [INV]
>
>    <bb 3> :
>    i_10 = i_1 + 1;
>
>    <bb 4> :
>    # i_1 = PHI <0(2), i_10(3)>
>    if (i_1 < a_5(D))
>      goto <bb 3>; [INV]
>    else
>      goto <bb 5>; [INV]
>
>    <bb 5> :
>    return;
>
> for the other cases you quote, there are no predictions such that if a
> != 0 then this equivalency exists...
>
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
>
> but the oracle would register that in the TRUE block,  c and b are
> equivalent... so some other pass that was interested in tracking
> conditions that make a block relevant would be able to compare relations...

I guess to fully leverage optimizations for cases like

  if (a != 0)
    c = b;
  ...
  if (a != 0)
    {
        if (c == b)
...
    }

one would need to consider the "optimally jump threaded path" to the
program point where the to be optimized stmt resides, making all
originally conditional but on the jump threaded path unconditional
relations and equivalences available.

For VN that could be done by unwinding to the CFG merge after
the first if (a != 0), treating only one of the predecessor edges
as executable and registering the appropriate a != 0 result and
continue VN up to the desired point, committing to the result
until before the CFG merge after the second if (a != 0).  And then
unwinding again for the "else" path.  Sounds like a possible
explosion in complexity as well if second-order opportunities
arise.

That is, we'd do simplifications exposed by jump threading but
without actually doing the jump threading (which will of course
not allow all possible simplifications w/o inserting extra PHIs
for computations we might want to re-use).

Richard.

> Andrew
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-25  7:38       ` Richard Biener
@ 2021-06-27 15:46         ` Aldy Hernandez
  2021-06-28  8:12           ` Richard Biener
  2021-06-28 13:15           ` Andrew MacLeod
  0 siblings, 2 replies; 11+ messages in thread
From: Aldy Hernandez @ 2021-06-27 15:46 UTC (permalink / raw)
  To: Richard Biener, Andrew MacLeod; +Cc: Di Zhao, gcc-patches, Jeff Law



On 6/25/21 9:38 AM, Richard Biener wrote:
> On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
>>> On 6/24/21 8:29 AM, Richard Biener wrote:
>>>
>>>
>>> THe original function in EVRP currently looks like:
>>>
>>>   =========== BB 2 ============
>>>      <bb 2> :
>>>      if (a_5(D) == b_6(D))
>>>        goto <bb 8>; [INV]
>>>      else
>>>        goto <bb 7>; [INV]
>>>
>>> =========== BB 8 ============
>>> Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides
>>> a_5 and b_6 as equivalences
>>>      <bb 8> :
>>>      goto <bb 6>; [100.00%]
>>>
>>> =========== BB 6 ============
>>>      <bb 6> :
>>>      # i_1 = PHI <0(8), i_10(5)>
>>>      if (i_1 < a_5(D))
>>>        goto <bb 3>; [INV]
>>>      else
>>>        goto <bb 7>; [INV]
>>>
>>> =========== BB 3 ============
>>> Relational : (i_1 < a_5(D))                         edge 6->3 provides
>>> this relation
>>>      <bb 3> :
>>>      if (i_1 == b_6(D))
>>>        goto <bb 4>; [INV]
>>>      else
>>>        goto <bb 5>; [INV]
>>>
>>>
>>> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
>>> a_5 in BB3 as well..
>>>
>>> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
>>> currently aren't.   I think I had turned on equivalence mapping during
>>> relational processing, so should be able to tag that without
>>> transitive relations...  I'll have a look at why.
>>>
>>> And once we get a bit further along, you will be able to access this
>>> without ranger.. if one wants to simply register the relations directly.
>>>
>>> Anyway, I'll get back to you why its currently being missed.
>>>
>>> Andrew
>>>
>>>
>>>
>> As promised.  There was a typo in the equivalency comparisons... so it
>> was getting missed.  With the fix, the oracle identifies the relation
>> and evrp will now fold that expression away and the IL becomes:
>>
>>     <bb 2> :
>>     if (a_5(D) == b_6(D))
>>       goto <bb 4>; [INV]
>>     else
>>       goto <bb 5>; [INV]
>>
>>     <bb 3> :
>>     i_10 = i_1 + 1;
>>
>>     <bb 4> :
>>     # i_1 = PHI <0(2), i_10(3)>
>>     if (i_1 < a_5(D))
>>       goto <bb 3>; [INV]
>>     else
>>       goto <bb 5>; [INV]
>>
>>     <bb 5> :
>>     return;
>>
>> for the other cases you quote, there are no predictions such that if a
>> != 0 then this equivalency exists...
>>
>> +  if (a != 0)
>> +    {
>> +      c = b;
>> +    }
>>
>> but the oracle would register that in the TRUE block,  c and b are
>> equivalent... so some other pass that was interested in tracking
>> conditions that make a block relevant would be able to compare relations...
> 
> I guess to fully leverage optimizations for cases like
> 
>    if (a != 0)
>      c = b;
>    ...
>    if (a != 0)
>      {
>          if (c == b)
> ...
>      }
> 
> one would need to consider the "optimally jump threaded path" to the
> program point where the to be optimized stmt resides, making all
> originally conditional but on the jump threaded path unconditional
> relations and equivalences available.
> 
> For VN that could be done by unwinding to the CFG merge after
> the first if (a != 0), treating only one of the predecessor edges
> as executable and registering the appropriate a != 0 result and
> continue VN up to the desired point, committing to the result
> until before the CFG merge after the second if (a != 0).  And then
> unwinding again for the "else" path.  Sounds like a possible
> explosion in complexity as well if second-order opportunities
> arise.
> 
> That is, we'd do simplifications exposed by jump threading but
> without actually doing the jump threading (which will of course
> not allow all possible simplifications w/o inserting extra PHIs
> for computations we might want to re-use).

FWIW, as I mention in the PR, if the upcoming threader work could be 
taught to use the relation oracle, it could easily solve the conditional 
flowing through the a!=0 path.  However, we wouldn't be able to thread 
it because in this particular case, the path crosses loop boundaries.

I leave it to Jeff/others to pontificate on whether the jump-threader 
path duplicator could be taught to through loops. ??

Aldy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-27 15:46         ` Aldy Hernandez
@ 2021-06-28  8:12           ` Richard Biener
  2021-06-28 13:15           ` Andrew MacLeod
  1 sibling, 0 replies; 11+ messages in thread
From: Richard Biener @ 2021-06-28  8:12 UTC (permalink / raw)
  To: Aldy Hernandez; +Cc: Andrew MacLeod, Di Zhao, gcc-patches, Jeff Law

On Sun, Jun 27, 2021 at 5:46 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
>
>
> On 6/25/21 9:38 AM, Richard Biener wrote:
> > On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod <amacleod@redhat.com> wrote:
> >>
> >> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> >>> On 6/24/21 8:29 AM, Richard Biener wrote:
> >>>
> >>>
> >>> THe original function in EVRP currently looks like:
> >>>
> >>>   =========== BB 2 ============
> >>>      <bb 2> :
> >>>      if (a_5(D) == b_6(D))
> >>>        goto <bb 8>; [INV]
> >>>      else
> >>>        goto <bb 7>; [INV]
> >>>
> >>> =========== BB 8 ============
> >>> Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides
> >>> a_5 and b_6 as equivalences
> >>>      <bb 8> :
> >>>      goto <bb 6>; [100.00%]
> >>>
> >>> =========== BB 6 ============
> >>>      <bb 6> :
> >>>      # i_1 = PHI <0(8), i_10(5)>
> >>>      if (i_1 < a_5(D))
> >>>        goto <bb 3>; [INV]
> >>>      else
> >>>        goto <bb 7>; [INV]
> >>>
> >>> =========== BB 3 ============
> >>> Relational : (i_1 < a_5(D))                         edge 6->3 provides
> >>> this relation
> >>>      <bb 3> :
> >>>      if (i_1 == b_6(D))
> >>>        goto <bb 4>; [INV]
> >>>      else
> >>>        goto <bb 5>; [INV]
> >>>
> >>>
> >>> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
> >>> a_5 in BB3 as well..
> >>>
> >>> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
> >>> currently aren't.   I think I had turned on equivalence mapping during
> >>> relational processing, so should be able to tag that without
> >>> transitive relations...  I'll have a look at why.
> >>>
> >>> And once we get a bit further along, you will be able to access this
> >>> without ranger.. if one wants to simply register the relations directly.
> >>>
> >>> Anyway, I'll get back to you why its currently being missed.
> >>>
> >>> Andrew
> >>>
> >>>
> >>>
> >> As promised.  There was a typo in the equivalency comparisons... so it
> >> was getting missed.  With the fix, the oracle identifies the relation
> >> and evrp will now fold that expression away and the IL becomes:
> >>
> >>     <bb 2> :
> >>     if (a_5(D) == b_6(D))
> >>       goto <bb 4>; [INV]
> >>     else
> >>       goto <bb 5>; [INV]
> >>
> >>     <bb 3> :
> >>     i_10 = i_1 + 1;
> >>
> >>     <bb 4> :
> >>     # i_1 = PHI <0(2), i_10(3)>
> >>     if (i_1 < a_5(D))
> >>       goto <bb 3>; [INV]
> >>     else
> >>       goto <bb 5>; [INV]
> >>
> >>     <bb 5> :
> >>     return;
> >>
> >> for the other cases you quote, there are no predictions such that if a
> >> != 0 then this equivalency exists...
> >>
> >> +  if (a != 0)
> >> +    {
> >> +      c = b;
> >> +    }
> >>
> >> but the oracle would register that in the TRUE block,  c and b are
> >> equivalent... so some other pass that was interested in tracking
> >> conditions that make a block relevant would be able to compare relations...
> >
> > I guess to fully leverage optimizations for cases like
> >
> >    if (a != 0)
> >      c = b;
> >    ...
> >    if (a != 0)
> >      {
> >          if (c == b)
> > ...
> >      }
> >
> > one would need to consider the "optimally jump threaded path" to the
> > program point where the to be optimized stmt resides, making all
> > originally conditional but on the jump threaded path unconditional
> > relations and equivalences available.
> >
> > For VN that could be done by unwinding to the CFG merge after
> > the first if (a != 0), treating only one of the predecessor edges
> > as executable and registering the appropriate a != 0 result and
> > continue VN up to the desired point, committing to the result
> > until before the CFG merge after the second if (a != 0).  And then
> > unwinding again for the "else" path.  Sounds like a possible
> > explosion in complexity as well if second-order opportunities
> > arise.
> >
> > That is, we'd do simplifications exposed by jump threading but
> > without actually doing the jump threading (which will of course
> > not allow all possible simplifications w/o inserting extra PHIs
> > for computations we might want to re-use).
>
> FWIW, as I mention in the PR, if the upcoming threader work could be
> taught to use the relation oracle, it could easily solve the conditional
> flowing through the a!=0 path.  However, we wouldn't be able to thread
> it because in this particular case, the path crosses loop boundaries.
>
> I leave it to Jeff/others to pontificate on whether the jump-threader
> path duplicator could be taught to through loops. ??

If the path doesn't end outside of the loop then it will usually
create an alternate entry and thus turn the loop into an irreducible
region - that's quite a bad thing to do unless we somehow get a
high profitability hint and are late in the compilation.

Richard.

>
> Aldy
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-27 15:46         ` Aldy Hernandez
  2021-06-28  8:12           ` Richard Biener
@ 2021-06-28 13:15           ` Andrew MacLeod
  2021-06-29 10:54             ` Richard Biener
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew MacLeod @ 2021-06-28 13:15 UTC (permalink / raw)
  To: Aldy Hernandez, Richard Biener; +Cc: Di Zhao, gcc-patches, Jeff Law

On 6/27/21 11:46 AM, Aldy Hernandez wrote:
>
>
> On 6/25/21 9:38 AM, Richard Biener wrote:
>> On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod <amacleod@redhat.com> 
>> wrote:
>>>
>>> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
>>>> On 6/24/21 8:29 AM, Richard Biener wrote:
>>>>
>>>>
>>>> THe original function in EVRP currently looks like:
>>>>
>>>>   =========== BB 2 ============
>>>>      <bb 2> :
>>>>      if (a_5(D) == b_6(D))
>>>>        goto <bb 8>; [INV]
>>>>      else
>>>>        goto <bb 7>; [INV]
>>>>
>>>> =========== BB 8 ============
>>>> Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides
>>>> a_5 and b_6 as equivalences
>>>>      <bb 8> :
>>>>      goto <bb 6>; [100.00%]
>>>>
>>>> =========== BB 6 ============
>>>>      <bb 6> :
>>>>      # i_1 = PHI <0(8), i_10(5)>
>>>>      if (i_1 < a_5(D))
>>>>        goto <bb 3>; [INV]
>>>>      else
>>>>        goto <bb 7>; [INV]
>>>>
>>>> =========== BB 3 ============
>>>> Relational : (i_1 < a_5(D))                         edge 6->3 provides
>>>> this relation
>>>>      <bb 3> :
>>>>      if (i_1 == b_6(D))
>>>>        goto <bb 4>; [INV]
>>>>      else
>>>>        goto <bb 5>; [INV]
>>>>
>>>>
>>>> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
>>>> a_5 in BB3 as well..
>>>>
>>>> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
>>>> currently aren't.   I think I had turned on equivalence mapping during
>>>> relational processing, so should be able to tag that without
>>>> transitive relations...  I'll have a look at why.
>>>>
>>>> And once we get a bit further along, you will be able to access this
>>>> without ranger.. if one wants to simply register the relations 
>>>> directly.
>>>>
>>>> Anyway, I'll get back to you why its currently being missed.
>>>>
>>>> Andrew
>>>>
>>>>
>>>>
>>> As promised.  There was a typo in the equivalency comparisons... so it
>>> was getting missed.  With the fix, the oracle identifies the relation
>>> and evrp will now fold that expression away and the IL becomes:
>>>
>>>     <bb 2> :
>>>     if (a_5(D) == b_6(D))
>>>       goto <bb 4>; [INV]
>>>     else
>>>       goto <bb 5>; [INV]
>>>
>>>     <bb 3> :
>>>     i_10 = i_1 + 1;
>>>
>>>     <bb 4> :
>>>     # i_1 = PHI <0(2), i_10(3)>
>>>     if (i_1 < a_5(D))
>>>       goto <bb 3>; [INV]
>>>     else
>>>       goto <bb 5>; [INV]
>>>
>>>     <bb 5> :
>>>     return;
>>>
>>> for the other cases you quote, there are no predictions such that if a
>>> != 0 then this equivalency exists...
>>>
>>> +  if (a != 0)
>>> +    {
>>> +      c = b;
>>> +    }
>>>
>>> but the oracle would register that in the TRUE block,  c and b are
>>> equivalent... so some other pass that was interested in tracking
>>> conditions that make a block relevant would be able to compare 
>>> relations...
>>
>> I guess to fully leverage optimizations for cases like
>>
>>    if (a != 0)
>>      c = b;
>>    ...
>>    if (a != 0)
>>      {
>>          if (c == b)
>> ...
>>      }
>>
>> That is, we'd do simplifications exposed by jump threading but
>> without actually doing the jump threading (which will of course
>> not allow all possible simplifications w/o inserting extra PHIs
>> for computations we might want to re-use).
>
> FWIW, as I mention in the PR, if the upcoming threader work could be 
> taught to use the relation oracle, it could easily solve the 
> conditional flowing through the a!=0 path.  However, we wouldn't be 
> able to thread it because in this particular case, the path crosses 
> loop boundaries.
>
> I leave it to Jeff/others to pontificate on whether the jump-threader 
> path duplicator could be taught to through loops. ??
>
> Aldy
>
This is still bouncing around in my head. I think we have the tools to 
do this better than via threading,  Ranger is now trivially capable of 
calculating when a predicate expression is true or false at another 
location in the IL. Combine this with flagging relations that are true 
when the predicate is, and that relation could be simply added into the 
oracle.

ie:

     <bb 2> :
     if (a_5(D) != 0)
       goto <bb 3>; [INV]
     else
       goto <bb 4>; [INV]

     <bb 3> :
     <bb 4> :
     # c_1 = PHI <c_6(D)(2), b_7(D)(3)>

the predicate and relations are:
     (a_5 != 0)  ->  c_1 == b_7
     (a_5 == 0) -> c_1 == c_6

then :

<bb 9> :
     # i_2 = PHI <0(4), i_12(8)>
     if (c_1 > i_2)
       goto <bb 5>; [INV]
     else
       goto <bb 10>; [INV]

     <bb 5> :                                9->5 registers c_1 > 1_2 
with the oracle
     if (a_5(D) != 0)
       goto <bb 6>; [INV]
     else
       goto <bb 8>; [INV]

     <bb 6> :
     if (i_2 == b_7(D))
       goto <bb 7>; [INV]
     else
       goto <bb 8>; [INV]
..
If we know to check the predicate list in bb_6, ranger can answer the 
question: on the branch in bb6, a_5 != 0.
This in turn means the predicated relation c_1 == b_7 can be applied to 
bb6 and register with the oracle.
Once that is done,  we already know c_1 > i_2 so we'll fold i_2 == b_7 
as [0, 0] as the equivalency between b_7 and c_1 is now applied.

So the capability is built in.. it boils down to finding the predicated 
relations we care about and knowing to apply them.
This one is pretty straightforward because the condition is exactly the 
same.   When we see a_5 != 0, a_5 is in the export list and we just 
check to see if there are any predicated flagged on a_5.  The actual 
expression could be more complicated, and it would still be able to 
answer it.  This is very similar to how the new threader finds 
threads..  It matches imports and exports.. here we mostly care just 
about the exports and figuring out what the predicates we care about are,

Anyway, theres a fit in there somewhere.  It might be something as 
straightforward as a an enhancement to the relation oracle.   I plan to 
get to some experiments eventually.

Andrew



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-28 13:15           ` Andrew MacLeod
@ 2021-06-29 10:54             ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2021-06-29 10:54 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Aldy Hernandez, Di Zhao, gcc-patches, Jeff Law

On Mon, Jun 28, 2021 at 3:15 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>
> On 6/27/21 11:46 AM, Aldy Hernandez wrote:
> >
> >
> > On 6/25/21 9:38 AM, Richard Biener wrote:
> >> On Thu, Jun 24, 2021 at 5:01 PM Andrew MacLeod <amacleod@redhat.com>
> >> wrote:
> >>>
> >>> On 6/24/21 9:25 AM, Andrew MacLeod wrote:
> >>>> On 6/24/21 8:29 AM, Richard Biener wrote:
> >>>>
> >>>>
> >>>> THe original function in EVRP currently looks like:
> >>>>
> >>>>   =========== BB 2 ============
> >>>>      <bb 2> :
> >>>>      if (a_5(D) == b_6(D))
> >>>>        goto <bb 8>; [INV]
> >>>>      else
> >>>>        goto <bb 7>; [INV]
> >>>>
> >>>> =========== BB 8 ============
> >>>> Equivalence set : [a_5(D), b_6(D)]                 edge 2->8 provides
> >>>> a_5 and b_6 as equivalences
> >>>>      <bb 8> :
> >>>>      goto <bb 6>; [100.00%]
> >>>>
> >>>> =========== BB 6 ============
> >>>>      <bb 6> :
> >>>>      # i_1 = PHI <0(8), i_10(5)>
> >>>>      if (i_1 < a_5(D))
> >>>>        goto <bb 3>; [INV]
> >>>>      else
> >>>>        goto <bb 7>; [INV]
> >>>>
> >>>> =========== BB 3 ============
> >>>> Relational : (i_1 < a_5(D))                         edge 6->3 provides
> >>>> this relation
> >>>>      <bb 3> :
> >>>>      if (i_1 == b_6(D))
> >>>>        goto <bb 4>; [INV]
> >>>>      else
> >>>>        goto <bb 5>; [INV]
> >>>>
> >>>>
> >>>> So It knows that a_5 and b_6 are equivalence, and it knows that i_1 <
> >>>> a_5 in BB3 as well..
> >>>>
> >>>> so we should be able to indicate that  i_1 == b_6 as [0,0]..  we
> >>>> currently aren't.   I think I had turned on equivalence mapping during
> >>>> relational processing, so should be able to tag that without
> >>>> transitive relations...  I'll have a look at why.
> >>>>
> >>>> And once we get a bit further along, you will be able to access this
> >>>> without ranger.. if one wants to simply register the relations
> >>>> directly.
> >>>>
> >>>> Anyway, I'll get back to you why its currently being missed.
> >>>>
> >>>> Andrew
> >>>>
> >>>>
> >>>>
> >>> As promised.  There was a typo in the equivalency comparisons... so it
> >>> was getting missed.  With the fix, the oracle identifies the relation
> >>> and evrp will now fold that expression away and the IL becomes:
> >>>
> >>>     <bb 2> :
> >>>     if (a_5(D) == b_6(D))
> >>>       goto <bb 4>; [INV]
> >>>     else
> >>>       goto <bb 5>; [INV]
> >>>
> >>>     <bb 3> :
> >>>     i_10 = i_1 + 1;
> >>>
> >>>     <bb 4> :
> >>>     # i_1 = PHI <0(2), i_10(3)>
> >>>     if (i_1 < a_5(D))
> >>>       goto <bb 3>; [INV]
> >>>     else
> >>>       goto <bb 5>; [INV]
> >>>
> >>>     <bb 5> :
> >>>     return;
> >>>
> >>> for the other cases you quote, there are no predictions such that if a
> >>> != 0 then this equivalency exists...
> >>>
> >>> +  if (a != 0)
> >>> +    {
> >>> +      c = b;
> >>> +    }
> >>>
> >>> but the oracle would register that in the TRUE block,  c and b are
> >>> equivalent... so some other pass that was interested in tracking
> >>> conditions that make a block relevant would be able to compare
> >>> relations...
> >>
> >> I guess to fully leverage optimizations for cases like
> >>
> >>    if (a != 0)
> >>      c = b;
> >>    ...
> >>    if (a != 0)
> >>      {
> >>          if (c == b)
> >> ...
> >>      }
> >>
> >> That is, we'd do simplifications exposed by jump threading but
> >> without actually doing the jump threading (which will of course
> >> not allow all possible simplifications w/o inserting extra PHIs
> >> for computations we might want to re-use).
> >
> > FWIW, as I mention in the PR, if the upcoming threader work could be
> > taught to use the relation oracle, it could easily solve the
> > conditional flowing through the a!=0 path.  However, we wouldn't be
> > able to thread it because in this particular case, the path crosses
> > loop boundaries.
> >
> > I leave it to Jeff/others to pontificate on whether the jump-threader
> > path duplicator could be taught to through loops. ??
> >
> > Aldy
> >
> This is still bouncing around in my head. I think we have the tools to
> do this better than via threading,  Ranger is now trivially capable of
> calculating when a predicate expression is true or false at another
> location in the IL. Combine this with flagging relations that are true
> when the predicate is, and that relation could be simply added into the
> oracle.
>
> ie:
>
>      <bb 2> :
>      if (a_5(D) != 0)
>        goto <bb 3>; [INV]
>      else
>        goto <bb 4>; [INV]
>
>      <bb 3> :
>      <bb 4> :
>      # c_1 = PHI <c_6(D)(2), b_7(D)(3)>
>
> the predicate and relations are:
>      (a_5 != 0)  ->  c_1 == b_7
>      (a_5 == 0) -> c_1 == c_6
>
> then :
>
> <bb 9> :
>      # i_2 = PHI <0(4), i_12(8)>
>      if (c_1 > i_2)
>        goto <bb 5>; [INV]
>      else
>        goto <bb 10>; [INV]
>
>      <bb 5> :                                9->5 registers c_1 > 1_2
> with the oracle
>      if (a_5(D) != 0)
>        goto <bb 6>; [INV]
>      else
>        goto <bb 8>; [INV]
>
>      <bb 6> :
(*) >      if (i_2 == b_7(D))
>        goto <bb 7>; [INV]
>      else
>        goto <bb 8>; [INV]
> ..
> If we know to check the predicate list in bb_6, ranger can answer the
> question: on the branch in bb6, a_5 != 0.
> This in turn means the predicated relation c_1 == b_7 can be applied to
> bb6 and register with the oracle.
> Once that is done,  we already know c_1 > i_2 so we'll fold i_2 == b_7
> as [0, 0] as the equivalency between b_7 and c_1 is now applied.
>
> So the capability is built in.. it boils down to finding the predicated
> relations we care about and knowing to apply them.
> This one is pretty straightforward because the condition is exactly the
> same.   When we see a_5 != 0, a_5 is in the export list and we just
> check to see if there are any predicated flagged on a_5.  The actual
> expression could be more complicated, and it would still be able to
> answer it.  This is very similar to how the new threader finds
> threads..  It matches imports and exports.. here we mostly care just
> about the exports and figuring out what the predicates we care about are,
>
> Anyway, theres a fit in there somewhere.  It might be something as
> straightforward as a an enhancement to the relation oracle.   I plan to
> get to some experiments eventually.

The important part is to not waste too many cycles here.  For example
for the exact equalities i_2 == b_7(D) at (*) you could lookup whether
there are any "predicated" relations registered that match this and then
see whether the current active predicate matches the recorded one.
For sth more complicated like i_2 < b_7(D) you'd then need to
improve this lookup to say "any relation between i_2 and b_7(D)?".

Doing it this way is I guess cheaper than looking for all relations
recorded predicated on a predicate that's currently active.

But note that such "relation between i_2 and b_7(D)"  map will be
interesting to maintain since following equivalences may need to
alter existing relations ... but things quickly become quadratic
or worse if you want to cover all cases and I've not yet come up
with a scheme to improve that.  But there must be existing
research papers about the topic ...

Richard.

> Andrew
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-06-24 12:29 ` Richard Biener
  2021-06-24 13:25   ` Andrew MacLeod
@ 2021-07-18 19:25   ` Di Zhao OS
  2021-07-28  9:38     ` Richard Biener
  1 sibling, 1 reply; 11+ messages in thread
From: Di Zhao OS @ 2021-07-18 19:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 10367 bytes --]


I tried to improve the patch following your advices and to catch more
opportunities. Hope it'll be helpful.

On 6/24/21 8:29 AM, Richard Biener wrote: 
> On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> 
> I have some reservations about extending the ad-hoc "predicated value" code.
> 
> Some comments on the patch:
> 
> +/* hashtable & helpers to record equivalences at given bb.  */
> +
> +typedef struct val_equiv_s
> +{
> +  val_equiv_s *next;
> +  val_equiv_s *unwind_to;
> +  hashval_t hashcode;
> +  /* SSA name this val_equiv_s is associated with.  */
> +  tree name;
> +  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
> +  vn_pval *values;
> +} * val_equiv_t;
> 
> all of this (and using a hashtable for recording) is IMHO a bit overkill.
> Since you only ever record equivalences for values the more natural place to
> hook those in is the vn_ssa_aux structure where we also record the availability
> chain.
 
I tried to store the equivalences in the vn_ssa_aux structure, but I didn't  
optimize the second case successfully: I need to record the equivalence
of a PHI expression's result and arguments, but their value number results will
become VARYING first, so they won't be changed. Maybe I'm missing something, or
can I force change a VARYING result?

Besides, the way currently used, equivalences only need to be "predictable"
rather than available, maybe availability chains do not represent them very
well?

> There's little commentary in the new code, in particular function-level
> comments are missing everywhere.

Added more comments.

> There's complexity issues, like I see val_equiv_insert has a "recurse"
> feature but also find_predicated_value_by_equiv is quadratic in the number of
> equivalences of the lhs/rhs.  Without knowing what the recursion on the
> former is for - nothing tells me - I suspect either of both should be redundant.

The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of
"C opcode Z", to find the result of "A opcode Y". I removed the "recurse"
feature and modified the searching logic so solve the issue. Now a temporary
hash_set is used to record the equivalences that are visited when searching.

> You seem to record equivalences at possible use points which looks odd at best
> - I'd expected equivalences being recorded at the same point we record
> predicated values and for the current condition, not the one determining some
> other predication.
> What was the motivation to do it the way you do it?

The purpose is to "bring down" what can be known from a previous basic-block
that effectively dominates current block, but not actually does so (in the
example it is because jump threading is hindered by a loop). For example in
this case:

  if (a != 0)
      // Nothing useful can be recorded here, because this BB doesn't dominate
      // the BB that we want to simplify.
      c = b;
  for (unsigned i = 0; i < c; i++)
    {
      if (a != 0)  // The recording is triggered here.
       {
         // c == b will be recorded here, so it can be used for simplification.
         // In gimple it is the equivalence of a PHI's result and argument.
         if (i >= b) 
           foo ();

These requires finding a previous condition that is identical with current
one, so it is convenient to do this in FRE. Besides, as FRE records derived
predicate, so for relative conditions there also might be opportunities for
optimization. In the new patch code this is included.

Besides, to find more opportunities, added a hashmap to store mappings from
immediate dominators to basic-blocks with PHIs of interest.

> Why is the code conditional on 'iterate'?

I haven't worked it out to fit the non-iterate mode, so it now breaks the
if_conversion pass. I think this is because some of the equivalence-recordings
are too optimistic for non-iterate mode.

> You've probably realized that there's no "nice" way to handle temporary
> equivalences in a VN scheme using hashing for expressions (unless you degrade
> hashes a lot).

I modified the code to use TREE_HASH on ssa names. Would that be better?  

> You quote opportunities that are catched with this like
> 
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +         if (i >= b)
> +           /* Should be eliminated.
> +            */
> +           foo ();
> 
> but say other "obvious" cases like
> 
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +           /* Should be zero.  */
>              return b - c;
> 
> are not handled.  That's in line with the current "value predication"
> which mainly aims at catching simple jump threading opportunities; you only
> simplify conditions with the recorded equivalences.  But then the complexity of
> handling equivalences does probably not outweight the opportunities catched -
> can you share some numbers on how many branches are newly known taken
> during VN with this patch during say bootstrap or build of SPEC CPU?

I extended the code a little to cover the cases like "A - A" and "A xor A".

Here are some results on one bootstrap step: 
           | values found | more bb removed | values found in all
           | in fre1      | at fre1         | fre & pre passes   
-----------------------------------------------------------------
 bootstrap | 592          | 40              | 1272

As the code is different for bootstrap, the "more bb removed" metric is not
precise. I also tested on SPEC CPU 2017 integer cases:
                | values found | more bb removed | values found in all
                | in fre1      | at fre1         | fre & pre passes
-----------------------------------------------------------------
 500.perlbench_r| 3            |  0              | 9
 502.gcc_r      | 25           |  39             | 241
 520.omnetpp    | 9            |  6              | 34
 523.xalancbmk_r| 12           |  0              | 35
 541.leela_r    | 2            |  0              | 2

In cases not listed above there's no value found by equivalences. Benefits
after fre1 are not counted as CGF may be different from here (for
523.xalancbmk_r there're 8 more basic-blocks removed at fre3). Although the
chances are not plenty, there might be potential benefits, such as making a
function pure.

> I've hoped to ditch the current "value predication" code by eventually using the
> relation oracle from ranger but did not yet have the chance to look into that.
> Now, the predicated equivalences are likely not something that infrastructure
> can handle?
> 
> In the end I think we should research into maintaining an alternate expression
> table for conditions (those we like to simplify with
> equivalences) and use a data structure that more easily allows to introduce
> (temporary) equivalences.  Like by maintaining back-references of values we've
> recorded a condition for and a way to quickly re-canonicalize conditions.  Well -
> it requires some research, as said.
> 
> Richard.
> 
> > Regards,
> > Di Zhao
> >
> > Extend FRE with an "equivalence map" for condition prediction.
> >
> > 2021-06-24  Di Zhao  <dizhao@os.amperecomputing.com>
> >

Thanks,
Di Zhao

--------
Extend FRE with an "equivalence map" for condition prediction.

2021-07-18  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
        (dominated_by_p_w_unex): Moved upward, no change.
        (vn_nary_op_get_predicated_value): Moved upward, no change.
        (struct val_equiv_hasher): Hasher for the "equivalence map".
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (val_equiv_insert): Insert into "equivalence map".
        (vn_lookup_binary_op_result): Lookup binary expression's result by VN.
        (iterate_val_equivs): Iterate on equivalences and returns a non-NULL
        result returned by callback.
        (find_predicated_binary_by_lhs_equiv): Callback for iterate_val_equivs.
        Lookup a binary operations result by LHS equivalences.
        (find_predicated_binary_by_rhs_equiv): Callback for iterate_val_equivs.
        Lookup a binary operations result by RHS equivalences.
        (find_predicated_binary_by_equiv): Lookup predicated value of a binary
        operation by equivalences.
        (is_relaxed_cond_code): Whether operation code is a relaxed condition
        code derived from original code.
        (branch_may_come_from_another): Whether there's a path from the one
        true or false destination to another.
        (record_equiv_from_previous_edge): Record equivalence relation from a
        previous condition on current bb' true and false edges. 
        (record_equiv_from_previous_cond): Record equivalences generated by
        previous conditions on current BB's true and false edges.
        (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
        into the "equivalence map" for predicate like "x==y is true".
        (record_dom_to_phi_bb): Record mappings from immediate dominator to
        basic_block with PHIs.
        (vn_phi_lookup): Record mappings from immediate dominator to PHIs.
        (visit_nary_op): Add lookup predicated values of binaries by
        equivalences.
        (free_rpo_vn): Free the "equivalence map".
        (process_bb): Insert into & lookup from the "equivalence map".
        (struct unwind_state): Add "equivalence map" unwind state.
        (do_unwind): Unwind the "equivalence map".
        (do_rpo_vn): Update "equivalence map" unwind state.

gcc/testsuite/ChangeLog:
        PR tree-optimization/101186
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-96.c: New test.

[-- Attachment #2: tree-optimization-101186-1.patch --]
[-- Type: application/octet-stream, Size: 31859 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
index ac8271cc574..e9a797e82b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
index b2c09cbb021..bcbc419575c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
index 2316f7e1c04..d63291da440 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 int f(int x, int y)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index e7038d0237f..c1e5667b45a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 static inline long load(long *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
new file mode 100644
index 00000000000..c9dee01cec0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void boo(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable.  */
+	    foo ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.  */
+		foo ();
+	    }
+	}
+    }
+}
+
+void h (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a > 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0) // "a != 0" can be derived from "a > 0" 
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bag ();
+      b++;
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
new file mode 100644
index 00000000000..c6a94bc0f22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
@@ -0,0 +1,74 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.
+	     */
+	    bar ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b, unsigned int d)
+{
+  unsigned int c = 100;
+  if (a > 0)
+    c = b;
+  else
+    c = d;
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a >= 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated, as "a >= 0" can be derived from "a >
+	     * 0".
+	     */
+	    bas ();
+	}
+      else
+	{
+	  if (i >= d)
+	    /* Should be eliminated, as "a < 0" can be derived from "a <= 0".
+	     */
+	    foo ();
+	}
+      i++;
+    }
+}
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.
+	     */
+	    bag ();
+	}
+      i++;
+    }
+}
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index bafb65a53d6..bea2f1b0de5 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fno-tree-fre -fdump-tree-vrp1" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 7900df946f4..5e29ffa9345 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -194,6 +194,12 @@ vn_phi_hasher::equal (const vn_phi_s *vp1, const vn_phi_s *vp2)
 typedef hash_table<vn_phi_hasher> vn_phi_table_type;
 typedef vn_phi_table_type::iterator vn_phi_iterator_type;
 
+typedef hash_map<basic_block, hash_set<basic_block> > bb_map_t;
+
+/* Maps a basic-block P to a list of basic-blocks with PHIs and are immediately
+ * dominated by P.  This is to help finding temporary equivalences introduced by
+ * PHIs.  */
+static bb_map_t *dominator_to_phi_map = NULL;
 
 /* Compare two reference operands P1 and P2 for equality.  Return true if
    they are equal, and false otherwise.  */
@@ -3779,6 +3785,515 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+/* Returns whether edge PRED_E is currently being tracked.  */
+
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (! single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+    return true;
+}
+
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      /* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+      if (dominated_by_p_w_unex
+	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	     false))
+	return val->result;
+  return NULL_TREE;
+}
+
+/* hashtable & helpers to record equivalences.  */
+
+typedef struct val_equiv_s
+{
+  val_equiv_s *next;
+  val_equiv_s *unwind_to;
+  /* SSA name this val_equiv_s is associated with.  */
+  tree name;
+  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
+  vn_pval *values;
+} * val_equiv_t;
+
+struct val_equiv_hasher : nofree_ptr_hash<val_equiv_s>
+{
+  static inline hashval_t hash (const val_equiv_s *entry)
+  {
+    return TREE_HASH (entry->name);
+  }
+  static inline bool equal (const val_equiv_t &entry, const val_equiv_t &other)
+  {
+    return other->name == entry->name;
+  }
+};
+
+static hash_table<val_equiv_hasher> *val_equiv_hash;
+typedef hash_table<val_equiv_hasher>::iterator val_equiv_iterator_type;
+
+static val_equiv_t last_inserted_equiv;
+
+/* Whether the predicated value in VAL is valid at basic-block BB.  */
+
+static inline bool
+is_vn_valid_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p (CDI_DOMINATORS, bb, val_bb))
+	{
+	  return true;
+	}
+    }
+  return false;
+}
+
+/* Record in hash table that EQUIV is an equivalence of NAME at location BB.
+   Return the resulting reference structure we created.  */
+
+static val_equiv_t
+val_equiv_insert (tree name, tree equiv, basic_block bb)
+{
+  if (TREE_CODE (name) == SSA_NAME)
+    name = VN_INFO (name)->valnum;
+  if (TREE_CODE (equiv) == SSA_NAME)
+    equiv = VN_INFO (equiv)->valnum;
+  if (name == equiv)
+    return NULL;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Recording equivalence of ");
+      print_generic_expr (dump_file, name);
+      fprintf (dump_file, " and ");
+      print_generic_expr (dump_file, equiv);
+      fprintf (dump_file, " at BB%d\n", bb->index);
+    }
+  /* Similar with inserting vn_nary_op_t, maintain a stack of states to unwind
+   * for iteration.  */
+  val_equiv_t new_equiv
+    = (val_equiv_t) obstack_alloc (&vn_tables_obstack, sizeof (val_equiv_s));
+  new_equiv->name = name;
+  new_equiv->values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  new_equiv->values->next = NULL;
+  new_equiv->values->n = 1;
+  new_equiv->values->valid_dominated_by_p[0] = bb->index;
+  new_equiv->values->result = equiv;
+  val_equiv_t *slot
+    = val_equiv_hash->find_slot_with_hash (new_equiv, TREE_HASH (name), INSERT);
+  new_equiv->unwind_to = *slot;
+
+  if (*slot)
+    {
+      bool found = false;
+      vn_pval *nval = new_equiv->values;
+      vn_pval **next = &(new_equiv->values);
+      for (vn_pval *val = (*slot)->values; val; val = val->next)
+	{
+	  if (val->result == equiv)
+	    {
+	      found = true;
+	      if (is_vn_valid_at_bb (val, bb))
+		/* More generic equivalence has been recorded.  */
+		return *slot;
+	      /* Append value.  */
+	      *next = (vn_pval *) obstack_alloc (&vn_tables_obstack,
+						 sizeof (vn_pval)
+						   + val->n * sizeof (int));
+	      (*next)->next = NULL;
+	      (*next)->result = val->result;
+	      (*next)->n = val->n + 1;
+	      memcpy ((*next)->valid_dominated_by_p, val->valid_dominated_by_p,
+		      val->n * sizeof (int));
+	      (*next)->valid_dominated_by_p[val->n] = bb->index;
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		fprintf (dump_file, "Appending equivalence.\n");
+	      next = &(*next)->next;
+	      continue;
+	    }
+	  /* Copy others.  */
+	  unsigned val_size = sizeof (vn_pval) + (val->n - 1) * sizeof (int);
+	  *next = (vn_pval *) obstack_alloc (&vn_tables_obstack, val_size);
+	  memcpy (*next, val, val_size);
+	  (*next)->next = NULL;
+	  next = &(*next)->next;
+	}
+      if (!found)
+	// append new equiv at last
+	*next = nval;
+    }
+
+  *slot = new_equiv;
+  new_equiv->next = last_inserted_equiv;
+  last_inserted_equiv = new_equiv;
+  return new_equiv;
+}
+
+/* Lookup result (value number) of a binary operation in the hash table.  If no
+ * result is found, NULL_TREE will be returned.  */
+
+static tree
+vn_lookup_binary_op_result (tree lhs, tree rhs, tree_code code, basic_block bb)
+{
+  tree val = NULL_TREE;
+  // Try to simplify first.
+  if (lhs == rhs)
+    switch (code)
+      {
+      case EQ_EXPR:
+      case LE_EXPR:
+      case GE_EXPR:
+	return boolean_true_node;
+      case LT_EXPR:
+      case GT_EXPR:
+      case NE_EXPR:
+	return boolean_false_node;
+      case MINUS_EXPR:
+      case BIT_XOR_EXPR:
+	return integer_zero_node;
+      default:;
+      }
+  tree ops[2];
+  ops[0] = lhs;
+  ops[1] = rhs;
+  vn_nary_op_t vno;
+  val = vn_nary_op_lookup_pieces (2, code, boolean_type_node, ops, &vno);
+  /* Did we get a predicated value?  */
+  if (!val && vno && vno->predicated_values)
+    val = vn_nary_op_get_predicated_value (vno, bb);
+  return val;
+}
+
+/* Callback function type for iterate_val_equivs.  */
+
+typedef tree (*handle_equiv) (tree, basic_block, void *);
+
+/* Iterate on all the equivalences of NAME that are valid at BB, until
+ * CALLBACK returns a non-NULL result for an equivalence.  If there's no
+ * equivalence or non-NULL result, return NULL_TREE.  */
+
+tree
+iterate_val_equivs (tree name, basic_block bb, handle_equiv callback,
+		    void *callback_arg)
+{
+  struct val_equiv_s val_equiv;
+  /* todo_list stores the equivalences we need to search iteratively.
+     For example, in the hashtable we have "X's equivalence = {Y}" and "X's
+     equivalence = {Z}", by searching iteratively we have "X==Z".
+     done_set stores the equivalences that are already visited, to avoid
+     searching repeatedly.  */
+  hash_set<tree> done_set;
+  auto_vec<tree> todo_list;
+  unsigned int todo_index = 0;
+  todo_list.safe_push (name);
+  tree result;
+  val_equiv_t *slot;
+  /* Iterate all elements in the todo_list.  */
+  while (todo_index < todo_list.length ())
+    {
+      val_equiv.name = todo_list[todo_index];
+      todo_index++;
+      slot = val_equiv_hash->find_slot_with_hash (&val_equiv,
+						  TREE_HASH (val_equiv.name),
+						  NO_INSERT);
+      if (!slot)
+	/* Possible for NAME itself.  */
+	continue;
+      /* Process all valid equivalences of current element.  */
+      for (vn_pval *val = (*slot)->values; val; val = val->next)
+	{
+	  if (done_set.contains (val->result)
+	      || !is_vn_valid_at_bb (val, bb))
+	    continue;
+	  if ((result = callback (val->result, bb, callback_arg)))
+	    return result;
+	  done_set.add (val->result);
+	  todo_list.safe_push (val->result);
+	}
+    }
+    return NULL_TREE;
+}
+
+/* Helpers to lookup the results of binary operations by equivalences.  */
+
+typedef struct find_binary_by_equiv_arg_s
+{
+  /* The other operand.  */
+  tree other;
+  tree_code code;
+  auto_vec<tree> *vec;
+} * find_binary_by_equiv_arg_t;
+
+/* Callback for find_predicated_binary_by_equiv.  Lookup a binary operations'
+ * result by lhs equivalences.  */
+
+tree
+find_predicated_binary_by_lhs_equiv (tree equiv, basic_block bb,
+				     void *callback_arg)
+{
+  find_binary_by_equiv_arg_t args
+    = (find_binary_by_equiv_arg_t) callback_arg;
+  tree result
+    = vn_lookup_binary_op_result (equiv, args->other, args->code, bb);
+  if (!result)
+    /* If no result is found, store equivalence to list, in case we need to
+       search by rhs.  */
+    args->vec->safe_push (equiv);
+  return result;
+}
+
+/* Callback for find_predicated_binary_by_equiv.  Lookup a binary operations'
+ * result by rhs equivalences.  */
+
+tree
+find_predicated_binary_by_rhs_equiv (tree equiv, basic_block bb,
+				     void *callback_arg)
+{
+  find_binary_by_equiv_arg_t args
+    = (find_binary_by_equiv_arg_t) callback_arg;
+  tree result
+    = vn_lookup_binary_op_result (args->other, equiv, args->code, bb);
+  if (result)
+    return result;
+  // Search by both sides' equivalences.
+  for (unsigned j = 0; j < args->vec->length (); ++j)
+    if ((result = vn_lookup_binary_op_result ((*args->vec)[j], equiv,
+					 args->code, bb)))
+      return result;
+  return NULL_TREE;
+}
+
+/* Lookup result (value number) of a binary operation by the equivalences of LHS
+ * and RHS.  If no result is found, NULL_TREE will be returned.  */
+
+static tree
+find_predicated_binary_by_equiv (tree lhs, tree rhs, tree_code code,
+				 basic_block bb)
+{
+  if (TREE_CODE (lhs) == SSA_NAME)
+    lhs = VN_INFO (lhs)->valnum;
+  // search by lhs
+  auto_vec<tree> lhs_equivs;
+  find_binary_by_equiv_arg_s args;
+  args.vec = &lhs_equivs;
+  args.other = rhs;
+  args.code = code;
+  tree result
+    = iterate_val_equivs (lhs, bb, find_predicated_binary_by_lhs_equiv,
+			  (void *) (&args));
+  if (result)
+    return result;
+  // search by rhs
+  if (TREE_CODE (rhs) == SSA_NAME)
+    rhs = VN_INFO (rhs)->valnum;
+  args.other = lhs;
+  return iterate_val_equivs (rhs, bb, find_predicated_binary_by_rhs_equiv,
+			     (void *) (&args));
+}
+
+/* Returns whether DERIVED_CODE is a relaxed condition code derived from
+ * ORIG_CODE.  For example: "a>=b" is relaxed from "a>b", so
+ * is_relaxed_cond_code (>, >=) is true; "a!=b" is reversion of "a==b",
+ * is_relaxed_cond_code (==, !=) is false.  */
+
+static bool
+is_relaxed_cond_code (tree_code orig_code, tree_code derived_code)
+{
+  if (orig_code == derived_code
+      || swap_tree_comparison (orig_code) == derived_code)
+    return false;
+  if (commutative_tree_code (orig_code) && commutative_tree_code (derived_code))
+    // NE_EXPR <=> EQ_EXPR
+    return false;
+  return true;
+}
+
+/* Given that BB is basic block P's true or false destination, returns whether
+ * there's a path from the other true or false destination to it.  */
+
+static inline bool
+branch_may_come_from_another (basic_block p, basic_block bb)
+{
+  if (EDGE_COUNT (bb->preds) == 1)
+    return false;
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      /* Checking that the other true or false destination dominates e->src
+	   * isn't enough.  See PR79194.  */
+      if (e->src != p && !((e->flags & EDGE_DFS_BACK)
+	    && dominated_by_p (CDI_DOMINATORS, e->src, bb)))
+	return true;
+    }
+  return false;
+}
+
+/* Given a slot VNO that holds the same conditional expression with that in the
+   branching statement of BB, record valid equivalence relation on BB's true and
+   false edges.  For now we record equivalences of PHI results and arguments
+   that depend on the predicate of VNO.  */
+
+static void
+record_equiv_from_previous_cond (vn_nary_op_t vno, edge true_e, edge false_e,
+				 basic_block bb)
+{
+  if (!vno->predicated_values)
+    return;
+  if (true_e && !vn_tracking_edge (true_e))
+    true_e = NULL;
+  if (false_e && !vn_tracking_edge (false_e))
+    false_e = NULL;
+  if (!true_e && !false_e)
+    return;
+
+  /* Process all results in VNO, where basic-block containing the original
+     condition dominates BB.  */
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	basic_block bb1
+	  = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+
+	// Original condition is in bb1's single_pred.
+	if (!single_pred_p (bb1))
+	  continue;
+	basic_block p = single_pred (bb1);
+	if (!dominated_by_p_w_unex (bb, p, false))
+	  continue;
+
+	gimple *pred_last = last_stmt (p);
+	if (!pred_last || gimple_code (pred_last) != GIMPLE_COND)
+	  continue;
+	hash_set<basic_block> *phi_bbs = dominator_to_phi_map->get (p);
+	if (!phi_bbs)
+	  continue;
+
+	enum tree_code pred_code = gimple_cond_code (pred_last);
+	/* Find out previous true & false edge destinations (here "true" and
+	   "false" correspond to current bb).
+	   First initialize them with bb1 and "p's successor that is not
+	   bb1" respectively, then switch them if result in VNO is false.  */
+	basic_block pre_true_dest = bb1;
+	basic_block pre_false_dest = EDGE_SUCC (p, 0)->dest != bb1
+				       ? EDGE_SUCC (p, 0)->dest
+				       : EDGE_SUCC (p, 1)->dest;
+	if (val->result == boolean_false_node)
+	  std::swap (pre_true_dest, pre_false_dest);
+
+	bool insert_by_true = true, insert_by_false = true;
+	/* If the condition in vno is not identical with original condition, but
+	   a relaxed one, then only edge p->bb1 is helpful.  */
+	if (is_relaxed_cond_code (pred_code, vno->opcode))
+	  {
+	    if (val->result == boolean_true_node)
+	      insert_by_false = false;
+	    else
+	      insert_by_true = false;
+	  }
+
+	/* Iterate on basic-blocks with PHIs immediately dominated by p.  */
+	for (hash_set<basic_block>::iterator it = phi_bbs->begin ();
+	     it != phi_bbs->end (); ++it)
+	  {
+	    basic_block phi_bb = *it;
+	    if (!dominated_by_p_w_unex (bb, phi_bb, false))
+	      continue;
+
+	    /* Among true & false edges, if one edge can lead to another before
+	       reaching phi_bb, then we can't derive equivalences by the former.
+	       Part of the situations (when the joint is pre_true_dest or
+	       pre_false_dest) are ruled out by the branch_may_come_from_another
+	       check.  The others (when the joint is a descendant of
+	       pre_true_dest or pre_false_dest) are handled by calculating and
+	       checking possible_true_args & possible_false_args.  */
+	    bool insert_by_true_1
+	      = insert_by_true
+		&& (pre_false_dest == phi_bb
+		    || !branch_may_come_from_another (p, pre_false_dest));
+	    bool insert_by_false_1
+	      = insert_by_false
+		&& (pre_true_dest == phi_bb
+		    || !branch_may_come_from_another (p, pre_true_dest));
+
+	    /* Process PHIs, record the equivalences that can be derived from
+	       previous true or false branch to current bb's successors.  */
+	    for (gphi_iterator gsi = gsi_start_nonvirtual_phis (phi_bb);
+		 !gsi_end_p (gsi); gsi_next_nonvirtual_phi (&gsi))
+	      {
+		gphi *phi = gsi.phi ();
+		tree res = vn_valueize (PHI_RESULT (phi));
+		/* Calculate possible arguments collections for previous true
+		   and false edges.  */
+		hash_set<tree> possible_true_args;
+		hash_set<tree> possible_false_args;
+		for (unsigned k = 0; k < gimple_phi_num_args (phi); ++k)
+		  {
+		    edge in = gimple_phi_arg_edge (phi, k);
+		    tree arg = vn_valueize (gimple_phi_arg_def (phi, k));
+		    if ((in->src == p && pre_true_dest == phi_bb)
+			|| dominated_by_p (CDI_DOMINATORS, in->src,
+					   pre_true_dest))
+		      possible_true_args.add (arg);
+		    else
+		      possible_false_args.add (arg);
+
+		    if ((in->src == p && pre_false_dest == phi_bb)
+			|| dominated_by_p (CDI_DOMINATORS, in->src,
+					   pre_false_dest))
+		      possible_false_args.add (arg);
+		    else
+		      possible_true_args.add (arg);
+		  }
+
+		/* Record equivalences if there's one single argument in the
+		   possible collection. (Otherwise we need to check that all
+		   arguments in the collection are effectively equal.)  */
+		if (true_e && insert_by_true_1
+		    && possible_true_args.elements () == 1)
+		  {
+		    tree arg = *possible_true_args.begin ();
+		    val_equiv_insert (arg, res, true_e->dest);
+		    val_equiv_insert (res, arg, true_e->dest);
+		  }
+		if (false_e && insert_by_false_1
+		    && possible_false_args.elements () == 1)
+		  {
+		    tree arg = *possible_false_args.begin ();
+		    val_equiv_insert (arg, res, false_e->dest);
+		    val_equiv_insert (res, arg, false_e->dest);
+		  }
+	      }
+	  }
+      }
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -4153,21 +4668,8 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
+  if (!vn_tracking_edge (pred_e))
 	return NULL;
-    }
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4180,6 +4682,12 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
+  if (code == EQ_EXPR && result == boolean_true_node)
+    /* NE_EXPRs with false result are handled by inverted condition.  */
+    {
+      val_equiv_insert (ops[0], ops[1], pred_e->dest);
+      val_equiv_insert (ops[1], ops[0], pred_e->dest);
+    }
   vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
   init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
   vno1->predicated_values = 1;
@@ -4192,26 +4700,6 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary, true);
 }
 
-static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
-
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
-{
-  if (! vno->predicated_values)
-    return vno->u.result;
-  for (vn_pval *val = vno->u.values; val; val = val->next)
-    for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
-}
-
 /* Insert the rhs of STMT into the current hash table with a value number of
    RESULT.  */
 
@@ -4409,6 +4897,22 @@ vn_phi_eq (const_vn_phi_t const vp1, const_vn_phi_t const vp2)
   return true;
 }
 
+/* Record mappings from immediate dominator to basic_block with PHIs.  */
+
+static void
+record_dom_to_phi_bb (basic_block idom, basic_block phi_bb)
+{
+  if (!dominator_to_phi_map)
+    return;
+  /* Rule out when there is a backedge between idom and phi_bb.  */
+  basic_block loop_header = phi_bb->loop_father->header;
+  if (!dominated_by_p (CDI_DOMINATORS, idom, loop_header)
+      && dominated_by_p (CDI_DOMINATORS, loop_header, idom))
+    return;
+  hash_set<basic_block> &phi_bbs = dominator_to_phi_map->get_or_insert (idom);
+  phi_bbs.add (phi_bb);
+}
+
 /* Lookup PHI in the current hash table, and return the resulting
    value number if it exists in the hash table.  Return NULL_TREE if
    it does not exist in the hash table. */
@@ -4447,6 +4951,9 @@ vn_phi_lookup (gimple *phi, bool backedges_varying_p)
 	   allow VN_TOP.  */
 	vp1->cclhs = vn_valueize (gimple_cond_lhs (last1));
 	vp1->ccrhs = vn_valueize (gimple_cond_rhs (last1));
+	/* Record the mapping, this will be used in
+	 * record_equiv_from_previous_cond.  */
+	record_dom_to_phi_bb (idom1, vp1->block);
       }
   vp1->hashcode = vn_phi_compute_hash (vp1);
   slot = valid_info->phis->find_slot_with_hash (vp1, vp1->hashcode, NO_INSERT);
@@ -4819,8 +5326,27 @@ visit_nary_op (tree lhs, gassign *stmt)
 {
   vn_nary_op_t vnresult;
   tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
-  if (! result && vnresult)
+  if (!result && vnresult)
     result = vn_nary_op_get_predicated_value (vnresult, gimple_bb (stmt));
+  if (!result)
+    {
+      /* Try to find a predicated result by equivalences.  */
+      tree_code opcode = gimple_assign_rhs_code (stmt);
+      // binary operation
+      if (gimple_num_ops (stmt) == 3)
+	result = find_predicated_binary_by_equiv (gimple_op (stmt, 1),
+						  gimple_op (stmt, 2), opcode,
+						  gimple_bb (stmt));
+      if (result
+	  && !useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (result)))
+	result = fold_convert (TREE_TYPE (lhs), result);
+      if (result && dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Found predicated value ");
+	  print_generic_expr (dump_file, result, TDF_SLIM);
+	  fprintf (dump_file, " by equivalence.\n");
+	}
+    }
   if (result)
     return set_ssa_val_to (lhs, result);
 
@@ -6861,9 +7387,13 @@ free_rpo_vn (void)
       release_ssa_name (info->name);
   obstack_free (&vn_ssa_aux_obstack, NULL);
   delete vn_ssa_aux_hash;
+  delete val_equiv_hash;
+  val_equiv_hash = NULL;
 
   delete constant_to_value_id;
   constant_to_value_id = NULL;
+  delete dominator_to_phi_map;
+  dominator_to_phi_map = NULL;
 }
 
 /* Hook for maybe_push_res_to_seq, lookup the expression in the VN tables.  */
@@ -7239,11 +7769,11 @@ process_bb (rpo_elim &avail, basic_block bb,
 	    tree val = gimple_simplify (gimple_cond_code (last),
 					boolean_type_node, lhs, rhs,
 					NULL, vn_valueize);
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
 		tree ops[2];
 		ops[0] = lhs;
 		ops[1] = rhs;
@@ -7262,6 +7792,20 @@ process_bb (rpo_elim &avail, basic_block bb,
 			print_gimple_stmt (dump_file, last, TDF_SLIM);
 		      }
 		  }
+		if (!val && iterate)
+		  {
+		    /* Try to find a result by equivalences of lhs and rhs.  */
+		    val
+		      = find_predicated_binary_by_equiv (lhs, rhs,
+							gimple_cond_code (last),
+							bb);
+		    if (val && dump_file && (dump_flags & TDF_DETAILS))
+		      {
+			fprintf (dump_file, "Found predicated value ");
+			print_generic_expr (dump_file, val, TDF_SLIM);
+			fprintf (dump_file, " by equivalence.\n");
+		      }
+		  }
 	      }
 	    if (val)
 	      e = find_taken_edge (bb, val);
@@ -7314,6 +7858,16 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* If we got a previous condition, but no predicated value, try
+		   to record what we know from previous true & false branches on
+		   current true & false edges.  This might provide some
+		   opportunities for futher prediction.  For now we only record
+		   equivalences generated by PHI nodes, that depends on the
+		   previous condition.  */
+		if (iterate && vnresult && (true_e || false_e))
+		  record_equiv_from_previous_cond (vnresult, true_e, false_e,
+						   bb);
 	      }
 	    break;
 	  }
@@ -7435,6 +7989,7 @@ struct unwind_state
   vn_reference_t ref_top;
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
+  val_equiv_t equiv_top;
   vn_avail *avail_top;
 };
 
@@ -7473,6 +8028,17 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       (*slot)->operands.release ();
       valid_info->references->clear_slot (slot);
     }
+  for (; last_inserted_equiv != to->equiv_top;
+       last_inserted_equiv = last_inserted_equiv->next)
+    {
+      val_equiv_t *slot;
+      slot = val_equiv_hash->find_slot_with_hash (
+	last_inserted_equiv, TREE_HASH (last_inserted_equiv->name), NO_INSERT);
+      if ((*slot)->unwind_to)
+	*slot = (*slot)->unwind_to;
+      else
+	val_equiv_hash->clear_slot (slot);
+    }
   obstack_free (&vn_tables_obstack, to->ob_top);
 
   /* Prune [rpo_idx, ] from avail.  */
@@ -7588,6 +8154,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   next_constant_value_id = -1;
 
   vn_ssa_aux_hash = new hash_table <vn_ssa_aux_hasher> (region_size * 2);
+  val_equiv_hash = new hash_table <val_equiv_hasher> (5);
   gcc_obstack_init (&vn_ssa_aux_obstack);
 
   gcc_obstack_init (&vn_tables_obstack);
@@ -7598,6 +8165,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_inserted_equiv = NULL;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7668,6 +8236,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 		  }
 	      rpo_state[bb_to_rpo[header->index]].iterate = non_latch_backedge;
 	    }
+      dominator_to_phi_map = new bb_map_t;
     }
 
   uint64_t nblk = 0;
@@ -7691,6 +8260,7 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].ref_top = last_inserted_ref;
 	    rpo_state[idx].phi_top = last_inserted_phi;
 	    rpo_state[idx].nary_top = last_inserted_nary;
+	    rpo_state[idx].equiv_top = last_inserted_equiv;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
 	  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-07-18 19:25   ` Di Zhao OS
@ 2021-07-28  9:38     ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2021-07-28  9:38 UTC (permalink / raw)
  To: Di Zhao OS; +Cc: gcc-patches

On Sun, Jul 18, 2021 at 9:25 PM Di Zhao OS
<dizhao@os.amperecomputing.com> wrote:
>
>
> I tried to improve the patch following your advices and to catch more
> opportunities. Hope it'll be helpful.

Sorry for the late reply.

> On 6/24/21 8:29 AM, Richard Biener wrote:
> > On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches <gcc-
> > patches@gcc.gnu.org> wrote:
> >
> > I have some reservations about extending the ad-hoc "predicated value" code.
> >
> > Some comments on the patch:
> >
> > +/* hashtable & helpers to record equivalences at given bb.  */
> > +
> > +typedef struct val_equiv_s
> > +{
> > +  val_equiv_s *next;
> > +  val_equiv_s *unwind_to;
> > +  hashval_t hashcode;
> > +  /* SSA name this val_equiv_s is associated with.  */
> > +  tree name;
> > +  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
> > +  vn_pval *values;
> > +} * val_equiv_t;
> >
> > all of this (and using a hashtable for recording) is IMHO a bit overkill.
> > Since you only ever record equivalences for values the more natural place to
> > hook those in is the vn_ssa_aux structure where we also record the availability
> > chain.
>
> I tried to store the equivalences in the vn_ssa_aux structure, but I didn't
> optimize the second case successfully: I need to record the equivalence
> of a PHI expression's result and arguments, but their value number results will
> become VARYING first, so they won't be changed. Maybe I'm missing something, or
> can I force change a VARYING result?

But VARYING still has a value-number - it's the result itself?

> Besides, the way currently used, equivalences only need to be "predictable"
> rather than available, maybe availability chains do not represent them very
> well?

Sure, they are a different beast - I'm only commenting on the place you store
them as being not too efficient.

> > There's little commentary in the new code, in particular function-level
> > comments are missing everywhere.
>
> Added more comments.
>
> > There's complexity issues, like I see val_equiv_insert has a "recurse"
> > feature but also find_predicated_value_by_equiv is quadratic in the number of
> > equivalences of the lhs/rhs.  Without knowing what the recursion on the
> > former is for - nothing tells me - I suspect either of both should be redundant.
>
> The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of
> "C opcode Z", to find the result of "A opcode Y". I removed the "recurse"
> feature and modified the searching logic so solve the issue. Now a temporary
> hash_set is used to record the equivalences that are visited when searching.

OK, so you're covering transitivity at query time - that looks
expensive to me.  As
said I wonder if there's a more efficient way to store equivalences here.

> > You seem to record equivalences at possible use points which looks odd at best
> > - I'd expected equivalences being recorded at the same point we record
> > predicated values and for the current condition, not the one determining some
> > other predication.
> > What was the motivation to do it the way you do it?
>
> The purpose is to "bring down" what can be known from a previous basic-block
> that effectively dominates current block, but not actually does so (in the
> example it is because jump threading is hindered by a loop). For example in
> this case:
>
>   if (a != 0)
>       // Nothing useful can be recorded here, because this BB doesn't dominate
>       // the BB that we want to simplify.
>       c = b;
>   for (unsigned i = 0; i < c; i++)
>     {
>       if (a != 0)  // The recording is triggered here.
>        {
>          // c == b will be recorded here, so it can be used for simplification.
>          // In gimple it is the equivalence of a PHI's result and argument.
>          if (i >= b)
>            foo ();
>
> These requires finding a previous condition that is identical with current
> one, so it is convenient to do this in FRE. Besides, as FRE records derived
> predicate, so for relative conditions there also might be opportunities for
> optimization. In the new patch code this is included.

I still don't quite understand why you cannot record the c = b equivalence
when processing its block.  You'd record "c == b if a != 0" and later
you look for equivalences on b and see if they are valid at the use site?
That's how predicated values work.

I'd like to see this equivalence stuff more naturally integrated with the
value lattice, not a collection of bolted-on hashmaps.

> Besides, to find more opportunities, added a hashmap to store mappings from
> immediate dominators to basic-blocks with PHIs of interest.
>
> > Why is the code conditional on 'iterate'?
>
> I haven't worked it out to fit the non-iterate mode, so it now breaks the
> if_conversion pass. I think this is because some of the equivalence-recordings
> are too optimistic for non-iterate mode.

Huh.  The non-iterative mode should be easier to deal with, in fact if you
are running into correctness issues this hints at latent issues even with the
iterative mode.

> > You've probably realized that there's no "nice" way to handle temporary
> > equivalences in a VN scheme using hashing for expressions (unless you degrade
> > hashes a lot).
>
> I modified the code to use TREE_HASH on ssa names. Would that be better?
>
> > You quote opportunities that are catched with this like
> >
> > +  if (a != 0)
> > +    {
> > +      c = b;
> > +    }
> > +  for (unsigned i = 0; i < c; i++)
> > +    {
> > +      if (a != 0)
> > +       {
> > +         if (i >= b)
> > +           /* Should be eliminated.
> > +            */
> > +           foo ();
> >
> > but say other "obvious" cases like
> >
> > +  if (a != 0)
> > +    {
> > +      c = b;
> > +    }
> > +  for (unsigned i = 0; i < c; i++)
> > +    {
> > +      if (a != 0)
> > +       {
> > +           /* Should be zero.  */
> >              return b - c;
> >
> > are not handled.  That's in line with the current "value predication"
> > which mainly aims at catching simple jump threading opportunities; you only
> > simplify conditions with the recorded equivalences.  But then the complexity of
> > handling equivalences does probably not outweight the opportunities catched -
> > can you share some numbers on how many branches are newly known taken
> > during VN with this patch during say bootstrap or build of SPEC CPU?
>
> I extended the code a little to cover the cases like "A - A" and "A xor A".
>
> Here are some results on one bootstrap step:
>            | values found | more bb removed | values found in all
>            | in fre1      | at fre1         | fre & pre passes
> -----------------------------------------------------------------
>  bootstrap | 592          | 40              | 1272
>
> As the code is different for bootstrap, the "more bb removed" metric is not
> precise. I also tested on SPEC CPU 2017 integer cases:
>                 | values found | more bb removed | values found in all
>                 | in fre1      | at fre1         | fre & pre passes
> -----------------------------------------------------------------
>  500.perlbench_r| 3            |  0              | 9
>  502.gcc_r      | 25           |  39             | 241
>  520.omnetpp    | 9            |  6              | 34
>  523.xalancbmk_r| 12           |  0              | 35
>  541.leela_r    | 2            |  0              | 2
>
> In cases not listed above there's no value found by equivalences. Benefits
> after fre1 are not counted as CGF may be different from here (for
> 523.xalancbmk_r there're 8 more basic-blocks removed at fre3). Although the
> chances are not plenty, there might be potential benefits, such as making a
> function pure.

That's very little improvements for this large pice of code :/

Richard.

> > I've hoped to ditch the current "value predication" code by eventually using the
> > relation oracle from ranger but did not yet have the chance to look into that.
> > Now, the predicated equivalences are likely not something that infrastructure
> > can handle?
> >
> > In the end I think we should research into maintaining an alternate expression
> > table for conditions (those we like to simplify with
> > equivalences) and use a data structure that more easily allows to introduce
> > (temporary) equivalences.  Like by maintaining back-references of values we've
> > recorded a condition for and a way to quickly re-canonicalize conditions.  Well -
> > it requires some research, as said.
> >
> > Richard.
> >
> > > Regards,
> > > Di Zhao
> > >
> > > Extend FRE with an "equivalence map" for condition prediction.
> > >
> > > 2021-06-24  Di Zhao  <dizhao@os.amperecomputing.com>
> > >
>
> Thanks,
> Di Zhao
>
> --------
> Extend FRE with an "equivalence map" for condition prediction.
>
> 2021-07-18  Di Zhao  <dizhao@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
>         (dominated_by_p_w_unex): Moved upward, no change.
>         (vn_nary_op_get_predicated_value): Moved upward, no change.
>         (struct val_equiv_hasher): Hasher for the "equivalence map".
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (val_equiv_insert): Insert into "equivalence map".
>         (vn_lookup_binary_op_result): Lookup binary expression's result by VN.
>         (iterate_val_equivs): Iterate on equivalences and returns a non-NULL
>         result returned by callback.
>         (find_predicated_binary_by_lhs_equiv): Callback for iterate_val_equivs.
>         Lookup a binary operations result by LHS equivalences.
>         (find_predicated_binary_by_rhs_equiv): Callback for iterate_val_equivs.
>         Lookup a binary operations result by RHS equivalences.
>         (find_predicated_binary_by_equiv): Lookup predicated value of a binary
>         operation by equivalences.
>         (is_relaxed_cond_code): Whether operation code is a relaxed condition
>         code derived from original code.
>         (branch_may_come_from_another): Whether there's a path from the one
>         true or false destination to another.
>         (record_equiv_from_previous_edge): Record equivalence relation from a
>         previous condition on current bb' true and false edges.
>         (record_equiv_from_previous_cond): Record equivalences generated by
>         previous conditions on current BB's true and false edges.
>         (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
>         into the "equivalence map" for predicate like "x==y is true".
>         (record_dom_to_phi_bb): Record mappings from immediate dominator to
>         basic_block with PHIs.
>         (vn_phi_lookup): Record mappings from immediate dominator to PHIs.
>         (visit_nary_op): Add lookup predicated values of binaries by
>         equivalences.
>         (free_rpo_vn): Free the "equivalence map".
>         (process_bb): Insert into & lookup from the "equivalence map".
>         (struct unwind_state): Add "equivalence map" unwind state.
>         (do_unwind): Unwind the "equivalence map".
>         (do_rpo_vn): Update "equivalence map" unwind state.
>
> gcc/testsuite/ChangeLog:
>         PR tree-optimization/101186
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-96.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-07-28  9:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24  9:54 [PATCH] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Di Zhao
2021-06-24 12:29 ` Richard Biener
2021-06-24 13:25   ` Andrew MacLeod
2021-06-24 15:01     ` Andrew MacLeod
2021-06-25  7:38       ` Richard Biener
2021-06-27 15:46         ` Aldy Hernandez
2021-06-28  8:12           ` Richard Biener
2021-06-28 13:15           ` Andrew MacLeod
2021-06-29 10:54             ` Richard Biener
2021-07-18 19:25   ` Di Zhao OS
2021-07-28  9:38     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).