public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
@ 2021-09-16 18:13 Di Zhao OS
  2021-10-01 12:59 ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Di Zhao OS @ 2021-09-16 18:13 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8207 bytes --]

Sorry about updating on this after so long. It took me much time to work out a
new plan and pass the tests.

The new idea is to use one variable to represent a set of equal variables at
some basic-block. This variable is called a "equivalence head" or "equiv-head"
in the code. (There's no-longer a "equivalence map".)

- Initially an SSA_NAME's "equivalence head" is its value number. Temporary
  equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
  map. Besides, when inserting into vn_nary_op_t map, make the new result at
  front of the vn_pval list, so that when searching for a variable's
  equivalence head, the first result represents the largest equivalence set at
  current location.
- In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
  For recorded equivalences, the reference is result->entry; for normal N-ary
  operations, the reference is operand->entry.
- When recording equivalences, if one side A is constant or has more refs, make
  it the new equivalence head of the other side B. Traverse B's ref-list, if a
  variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
  operations by replacing B with A.
- When inserting and looking for the results of n-ary operations, insert and
  lookup by the operands' equiv-heads.

So except for the refs in vn_ssa_aux_t, this scheme uses the original
infrastructure to its best. Quadric search time is avoided at the cost of some
re-insertions. Test results on SPEC2017 intrate (counts and percentages):

                |more bb |more bb |more stmt|more stmt|more       |more
                |removed |removed |removed  |removed  |nv_nary_ops|nv_nary_ops
                |at fre1 |at fre1 |at fre1  |at fre1  |inserted   |inserted
------------------------------------------------------------------------------
 500.perlbench_r| 64     | 1.98%  | 103     | 0.19%   | 11260     | 12.16%
 502.gcc_r      | 671    | 4.80%  | 317     | 0.23%   | 13964     | 6.09%
 505.mcf_r      | 5      | 35.71% | 9       | 1.40%   | 32        | 2.52%
 520.omnetpp    | 132    | 5.45%  | 39      | 0.11%   | 1895      | 3.91%
 523.xalancbmk_r| 238    | 3.26%  | 313     | 0.36%   | 1417      | 1.27%
 525.x264_r     | 4      | 1.36%  | 27      | 0.11%   | 1752      | 6.78%
 531.deepsjeng_r| 1      | 3.45%  | 2       | 0.14%   | 228       | 8.67%
 541.leela_r    | 2      | 0.63%  | 0       | 0.00%   | 92        | 1.14%
 548.exchange2_r| 0      | 0.00%  | 3       | 0.04%   | 49        | 1.03%
 557.xz_r       | 0      | 0.00%  | 3       | 0.07%   | 272       | 7.55%

There're more basic_blocks and statements removed compared with last
implementation, the reasons are:
1) "CONST op CONST" simplification is included. It is missed in previous patch.
2) By inserting RHS of statements on equiv-heads, more N-ary operations can be
   simplified. One example is in 'ssa-fre-97.c' in the patch file.

While jump-threading & vrp also utilize temporary equivalences (so some of the
newly removed blocks and statements can also be covered by them), I think this
patch is a supplement, in cases when jump threading cannot take place (the
original example), or value number info needs to be involved (the
'ssa-fre-97.c' example).

Fixed the former issue with non-iterate mode.

About recording the temporary equivalences generated by PHIs (i.e. the
'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and
the code size is large, but I haven't find a better way yet. Consider a piece
of CFG like the one below, if we want to record x==x2 on the true edge when
processing bb1, the location (following current practice) will be bb2. But that
is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't
find a place to record x==x1 when processing bb1.
If we can record things on edges rather than blocks, say x==x1 on 1->3 and
x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid
equiv-head for x since bb5. But I think it lacks efficiency and is not
persuasive. It is more efficient to find a valid previous predicate when
processing bb4, because the vn_nary_op_t will be fetched anyway.
        --------------
        | if (a != 0) | bb1
        --------------
        f |      \ t
          |    -------
          |    | bb2 | 
          |    -------
          |      /
    -------------------------
    | x = PHI<x1(1), x2(2)> | bb3
    -------------------------
               |
              ....
               |
       --------------
       | if (a != 0) | bb4
       --------------
           |f     \t
    -------------  -------
bb7 | where     |  | bb5 |  ==> where "x==x2" is recorded now
    | "x==x1" is|  -------
    | recorded  |    \
    | now       |    ...
    -------------     |
                   -------
                   | bb6 |  ==> where "x==x2" needs to be used
                   -------
Although I think I can remove the 'dominator_to_phi_map' and generalize this a
little, but the major logic will be similar. So I left this unchanged for now,
and would like to hear your suggestions on the new scheme overall.

Thanks,
Di Zhao

--------
Extend FRE with temporary equivalences.

2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (dominated_by_p_w_unex): Moved upward, no change.
        (vn_nary_op_get_predicated_value): Moved upward, no change.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (simplify_nary_op): Try to simplify N-ary expressions.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (vn_nary_op_insert_pieces_predicated): Insert by "equivalence head"s.
        (is_relaxed_cond_code): Whether operation code is a relaxed condition
        code derived from original code.
        (branch_may_come_from_another): Whether there's a path from the one
        true or false destination to another.
        (val_equiv_insert): Record temporary equivalence.
        (record_equiv_from_previous_cond): Record equivalences generated by
        previous conditions on current BB's true and false edges.
        (find_predicated_value_by_equivs): Find predicated value by the
        operands' equivalences.
        (record_dom_to_phi_bb): Record mappings from immediate dominator to
        basic_block with PHIs.
        (vn_phi_insert): Record mapping from the immediate dominator to a PHI
        node.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (free_rpo_vn): Free dominator_to_phi_map.
        (process_bb): Add lookup predicated values by equivalences.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-96.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-97.c: New test.


[-- Attachment #2: v2-tree-optimization-101186.patch --]
[-- Type: application/octet-stream, Size: 37635 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
index cca706e0c4f..0a99c3c620e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w" } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2-details -w" } */
 
 typedef union tree_node *tree;
 struct gcc_options
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
index ac8271cc574..e9a797e82b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
index b2c09cbb021..bcbc419575c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
index 2316f7e1c04..d63291da440 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 int f(int x, int y)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index e7038d0237f..c1e5667b45a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 static inline long load(long *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
index b44c064aa23..069e4a106c3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
index 26e5abbdc29..26b303e896e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
index 22b68d5ede0..3ed3bf6ad29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
new file mode 100644
index 00000000000..c9dee01cec0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-95.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void boo(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable.  */
+	    foo ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.  */
+		foo ();
+	    }
+	}
+    }
+}
+
+void h (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a > 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0) // "a != 0" can be derived from "a > 0" 
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bag ();
+      b++;
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
new file mode 100644
index 00000000000..ce11c7f2096
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-96.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.
+	     */
+	    bar ();
+	}
+    }
+}
+
+void g (unsigned int a, unsigned int b, unsigned int d)
+{
+  unsigned int c = 100;
+  if (a > 0)
+    c = b;
+  else
+    c = d;
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a >= 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated. ("a >= 0" cannot derive "a > 0".)  */
+	    bas ();
+	}
+      else
+	{
+	  if (i >= d)
+	    /* Should be eliminated, as "a < 0" can derive "a <= 0".  */
+	    foo ();
+	}
+      i++;
+    }
+}
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.  */
+	    bag ();
+	}
+      i++;
+    }
+}
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c
new file mode 100644
index 00000000000..d560ce090a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-97.c
@@ -0,0 +1,43 @@
+
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void foo(void);
+int g;
+
+void
+f (unsigned int a, unsigned int b, unsigned int x, unsigned int y)
+{
+  if (a == b)
+    {
+      g = 100;
+      if (a + x == 99)
+	g = x + b; /* should be simplified to 99 */
+    }
+  else
+    // a!= b
+    {
+      if (a == 100 - x)
+	{
+	  g++;
+	  if (b == 100 - x)
+	    foo (); /* should be removed */
+	}
+      else
+	{
+	  g--;
+	  if (b == 100 - x)
+	    bag (); /* should not be removed */
+	  else
+	    bar (); /* should not be removed */
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "= b_* + x_" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index 1d7ea4e8ffb..87df31d4f0d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fno-tree-fre -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index bfa516b6cf9..6c59c135e01 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -194,6 +194,12 @@ vn_phi_hasher::equal (const vn_phi_s *vp1, const vn_phi_s *vp2)
 typedef hash_table<vn_phi_hasher> vn_phi_table_type;
 typedef vn_phi_table_type::iterator vn_phi_iterator_type;
 
+typedef hash_map<basic_block, hash_set<basic_block> > bb_map_t;
+
+/* Maps a basic-block P to a list of basic-blocks with PHIs and are immediately
+ * dominated by P.  This is to help finding temporary equivalences introduced
+ * by PHIs.  */
+static bb_map_t *dominator_to_phi_map = NULL;
 
 /* Compare two reference operands P1 and P2 for equality.  Return true if
    they are equal, and false otherwise.  */
@@ -348,6 +354,8 @@ static vn_reference_t last_inserted_ref;
 static vn_phi_t last_inserted_phi;
 static vn_nary_op_t last_inserted_nary;
 static vn_ssa_aux_t last_pushed_avail;
+static vn_ssa_aux_t last_pushed_nary_ref;
+static nary_ref* nary_ref_freelist;
 
 /* Valid hashtables storing information we have proven to be
    correct.  */
@@ -3823,6 +3831,114 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	/* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+      if (dominated_by_p_w_unex
+	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	     false))
+	return val->result;
+      }
+  return NULL_TREE;
+}
+
+/* Whether a predicated value VAL is valid at BB.  */
+
+static inline bool
+is_vn_valid_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p_w_unex (bb, val_bb, false))
+	return true;
+    }
+  return false;
+}
+
+/* Lookup the "equivalence head" of given node NAME, which represents a set of
+   equivalences of NAME at basic-block BB.  */
+
+static inline tree
+lookup_equiv_head (tree name, basic_block bb)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+    return name;
+  tree result = VN_INFO (name)->valnum;
+  if (result == VN_TOP)
+    /* VN_TOP is not allowed, otherwise it will be too optimistic for
+     * non-iterating mode.  */
+    return name;
+  vn_nary_op_t vnresult;
+  /* As we insert new value at front in vn_nary_op_insert_into, we don't need
+   * search iteratively for the most general equiv-head.  */
+  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (result), &result,
+			    &vnresult);
+  if (vnresult)
+    {
+      tree eq = vn_nary_op_get_predicated_value (vnresult, bb);
+      if (eq)
+	return eq;
+    }
+  return result;
+}
+
+/* Lookup the "equivalence head"s of OPS at BB, and store them in TO.  N is the
+ * size of OPS.  */
+
+static void
+lookup_equiv_heads (unsigned n, tree *ops, tree *to, basic_block bb)
+{
+  for (unsigned i = 0; i < n; i++)
+    {
+      tree equiv_head = lookup_equiv_head (ops[i], bb);
+      if (dump_file && (dump_flags & TDF_DETAILS) && equiv_head != ops[i])
+	{
+	  fprintf (dump_file, "Use equiv-head ");
+	  print_generic_expr (dump_file, equiv_head, TDF_NONE);
+	  fprintf (dump_file, " for ");
+	  print_generic_expr (dump_file, ops[i], TDF_NONE);
+	  fprintf (dump_file, "\n");
+	}
+      to[i] = equiv_head;
+    }
+}
+
+/* Returns whether edge PRED_E is currently being tracked.  */
+
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (!single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (!dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+  return true;
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -3960,6 +4076,9 @@ init_vn_nary_op_from_stmt (vn_nary_op_t vno, gassign *stmt)
       for (i = 0; i < vno->length; ++i)
 	vno->op[i] = gimple_op (stmt, i + 1);
     }
+  /* Insert and lookup N-ary results by the operands' equivalence heads.  */
+  if (gimple_bb (stmt))
+    lookup_equiv_heads (vno->length, vno->op, vno->op, gimple_bb (stmt));
 }
 
 /* Compute the hashcode for VNO and look for it in the hash table;
@@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table,
 	      next = &(*next)->next;
 	    }
 	  if (!found)
-	    *next = nval;
+	    {
+	      /* Insert new value at the front, so that lookup_equiv_head can
+	       * find the latest result.  */
+	      nval->next = vno->u.values;
+	      vno->u.values = nval;
+	    }
 
 	  *slot = vno;
 	  vno->next = last_inserted_nary;
@@ -4190,6 +4314,75 @@ vn_nary_op_insert_pieces (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary, true);
 }
 
+/* Similar with vn_nary_op_insert_pieces, but insert as predicated values.  */
+
+static vn_nary_op_t
+vn_nary_op_insert_pieces_predicated_1 (unsigned int length, enum tree_code code,
+				       tree type, tree *ops, tree result,
+				       unsigned int value_id, basic_block bb)
+{
+  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
+  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
+  vno1->predicated_values = 1;
+  vno1->u.values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  vno1->u.values->next = NULL;
+  vno1->u.values->result = result;
+  vno1->u.values->n = 1;
+  vno1->u.values->valid_dominated_by_p[0] = bb->index;
+  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+}
+
+/* Try to simplify N-ary expressions, return NULL_TREE if not folded.  */
+
+static tree
+simplify_nary_op (unsigned length, tree_code opcode, tree *op, tree type)
+{
+  switch (length)
+    {
+    case 1:
+      if (CONSTANT_CLASS_P (op[0]))
+	return gimple_simplify (opcode, type, op[0], NULL, no_follow_ssa_edges);
+      break;
+    case 2:
+      if ((CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1]))
+	  || op[0] == op[1])
+	return gimple_simplify (opcode, type, op[0], op[1], NULL,
+				no_follow_ssa_edges);
+      break;
+    case 3:
+      if (CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1])
+	  && CONSTANT_CLASS_P (op[2]))
+	return gimple_simplify (opcode, type, op[0], op[1], op[2], NULL,
+				no_follow_ssa_edges);
+      break;
+    default:
+      break;
+    }
+  return NULL_TREE;
+}
+
+/* Insert a back-reference to vn_nary_op_t in vn_ssa_aux.  */
+
+static inline void
+push_new_nary_ref (vn_nary_op_t slot, vn_ssa_aux_t info)
+{
+  nary_ref *ref;
+  if (nary_ref_freelist)
+    {
+      ref = nary_ref_freelist;
+      nary_ref_freelist = nary_ref_freelist->next;
+    }
+  else
+    ref = XOBNEW (&vn_ssa_aux_obstack, nary_ref);
+  ref->next = info->ref;
+  ref->next_undo = last_pushed_nary_ref;
+  ref->slot = slot;
+  last_pushed_nary_ref = info;
+  info->ref = ref;
+  ++info->num_nary_ref;
+}
+
 static vn_nary_op_t
 vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     tree type, tree *ops,
@@ -4197,21 +4390,8 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
-	return NULL;
-    }
+  if (!vn_tracking_edge (pred_e))
+    return NULL;
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-					      sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+  /* Insert by operands' equiv-heads.  */
+  tree new_ops[3];
+  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
+  /* Skip if the result is known.  */
+  tree simplified = simplify_nary_op (length, code, new_ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Result is known: ");
+	  print_generic_expr (dump_file, simplified, TDF_NONE);
+	  fprintf (dump_file, ", skipped.\n");
+	}
+      return NULL;
+    }
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (length, code, type, new_ops,
+					     result, value_id, pred_e->dest);
+  /* Insert back-refs (operand->slot) to vn_ssa_aux structure.  */
+  for (unsigned i = 0; i < length; i++)
+    {
+      if (TREE_CODE (new_ops[i]) == SSA_NAME)
+	{
+	  vn_ssa_aux_t info = VN_INFO (new_ops[i]);
+	  if (info->valnum == new_ops[i])
+	    push_new_nary_ref (val, info);
+	}
+    }
+  return val;
 }
 
+/* Returns whether DERIVED_CODE is a relaxed condition code derived from
+ * ORIG_CODE.  For example: "a>=b" is relaxed from "a>b", so
+ * is_relaxed_cond_code (>, >=) is true; "a!=b" is reversion of "a==b",
+ * is_relaxed_cond_code (==, !=) is false.  */
+
 static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+is_relaxed_cond_code (tree_code orig_code, tree_code derived_code)
+{
+  if (orig_code == derived_code
+      || swap_tree_comparison (orig_code) == derived_code)
+    return false;
+  if (commutative_tree_code (orig_code) && commutative_tree_code (derived_code))
+    /* NE_EXPR and EQ_EXPR.  */
+    return false;
+  return true;
+}
 
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+/* Given that BB is basic block P's true or false destination, returns whether
+ * there's a path from the other true or false destination to it.  */
+
+static inline bool
+branch_may_come_from_another (basic_block p, basic_block bb)
 {
-  if (! vno->predicated_values)
-    return vno->u.result;
+  if (EDGE_COUNT (bb->preds) == 1)
+    return false;
+  edge e;
+  edge_iterator ei;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      /* Checking that the other true or false destination dominates e->src
+       * isn't enough.  See PR79194.  */
+      if (e->src != p
+	  && !((e->flags & EDGE_DFS_BACK)
+	       && dominated_by_p (CDI_DOMINATORS, e->src, bb)))
+	return true;
+    }
+  return false;
+}
+
+/* Record the equivalence of LHS and RHS at basic_block BB.  */
+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
+{
+  /* Lookup current LHS and RHS's equivalence heads.  Set RHS's equiv-head to
+   * be new equiv-head for LHS.  */
+  basic_block bb = e->dest;
+  tree lhs = lookup_equiv_head (op1, bb);
+  tree rhs = lookup_equiv_head (op2, bb);
+  if (lhs == rhs)
+    return NULL;
+  if (is_gimple_min_invariant (lhs))
+    std::swap (lhs, rhs);
+  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
+    /* Possible if invoked from record_equiv_from_previous_cond.  */
+    return NULL;
+
+  /* Make the hand-side with more recorded n-ary expressions new
+   * equivalence-head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME
+      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
+    std::swap (lhs, rhs);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Recording equivalence of ");
+      print_generic_expr (dump_file, lhs);
+      fprintf (dump_file, " and ");
+      print_generic_expr (dump_file, rhs);
+      fprintf (dump_file, " at BB%d\n", bb->index);
+    }
+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+					     &lhs, rhs, 0, bb);
+  /* Back-ref for equivalences is result->slot rather than op->slot, so that
+   * when new equiv-head is appended, we can update them.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    push_new_nary_ref (val, VN_INFO (rhs));
+
+  /* Iterate on LHS's references, first update equiv-heads, then re-insert
+   * former predicates, so that the result will be latest.  */
+  auto_vec<vn_nary_op_t> predicates;
+  auto_vec<tree> results;
+  vn_nary_op_t slot;
+  vn_ssa_aux_t old_info = VN_INFO (lhs);
+  tree old_head = old_info->valnum;
+  nary_ref *ref = old_info->ref;
+  for (; ref; ref = ref->next)
+    {
+      slot = ref->slot;
+      if (!slot->predicated_values)
+	continue;
+      /* Update recorded equivalences with new equiv-head.  */
+      if (slot->opcode == NOP_EXPR && slot->length == 1)
+	{
+	  if (slot->op[0] == rhs)
+	    continue;
+	  for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	    {
+	      if (pval->result == old_head && is_vn_valid_at_bb (pval, bb))
+		{
+		  if (dump_file && (dump_flags & TDF_DETAILS))
+		    {
+		      fprintf (dump_file, "Updating equiv-head of ");
+		      print_generic_expr (dump_file, slot->op[0], TDF_SLIM);
+		      fprintf (dump_file, " to ");
+		      print_generic_expr (dump_file, rhs, TDF_SLIM);
+		      fprintf (dump_file, " at BB%d\n", bb->index);
+		    }
+		  vn_nary_op_t new_val = vn_nary_op_insert_pieces_predicated_1 (
+		    1, NOP_EXPR, slot->type, slot->op, rhs, 0, bb);
+		  /* Push back-ref (result->slot) for new equivalence.  */
+		  if (TREE_CODE (rhs) == SSA_NAME)
+		    push_new_nary_ref (new_val, VN_INFO (rhs));
+		}
+	    }
+	}
+      else
+	/* Predicated values are collected, to be re-inserted later.  */
+	for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	  {
+	    if (!is_vn_valid_at_bb (pval, bb))
+	      continue;
+	    predicates.safe_push (slot);
+	    results.safe_push (pval->result);
+	  }
+    }
+  unsigned i;
+  if (predicates.length () > 0 && dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Re-inserting %d predicates...\n",
+	     predicates.length ());
+  FOR_EACH_VEC_ELT (predicates, i, slot)
+    {
+      /* vn_nary_op_insert_pieces_predicated will do insertion with new
+       * equiv-head as operand, since it's already recorded.  */
+      vn_nary_op_insert_pieces_predicated (slot->length, slot->opcode,
+					   slot->type, slot->op, results[i],
+					   slot->value_id, e);
+    }
+  return val;
+}
+
+/* Given a slot VNO that holds the same conditional expression with that in the
+   branching statement of BB, record valid equivalence relation on BB's true and
+   false edges.  For now we record equivalences of PHI results and arguments
+   that depend on the predicate of VNO.  */
+
+static void
+record_equiv_from_previous_cond (vn_nary_op_t vno, edge true_e, edge false_e,
+				 basic_block bb)
+{
+  if (!vno->predicated_values)
+    return;
+  if (true_e && !vn_tracking_edge (true_e))
+    true_e = NULL;
+  if (false_e && !vn_tracking_edge (false_e))
+    false_e = NULL;
+  if (!true_e && !false_e)
+    return;
+
+  /* Process all results in VNO, where basic-block containing the original
+     condition dominates BB.  */
   for (vn_pval *val = vno->u.values; val; val = val->next)
+    /* Process each predicated value.  */
     for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
+      {
+	basic_block bb1
+	  = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+
+	/* Original condition is in bb1's single_pred.  */
+	if (!single_pred_p (bb1))
+	  continue;
+	basic_block p = single_pred (bb1);
+	if (!dominated_by_p_w_unex (bb, p, false))
+	  continue;
+
+	gimple *pred_last = last_stmt (p);
+	if (!pred_last || gimple_code (pred_last) != GIMPLE_COND)
+	  continue;
+	/* If the predicate is re-inserted when updating equiv-heads, then it
+	 * has nothing to do here.  */
+	gcond *cond = as_a<gcond *> (pred_last);
+	tree cond_lhs = lookup_equiv_head (gimple_cond_lhs (cond), p);
+	tree cond_rhs = lookup_equiv_head (gimple_cond_rhs (cond), p);
+	if (!((expressions_equal_p (vno->op[0], cond_lhs)
+	       && expressions_equal_p (vno->op[1], cond_rhs))
+	      || (expressions_equal_p (vno->op[0], cond_lhs)
+		  && expressions_equal_p (vno->op[1], cond_rhs))))
+	  continue;
+
+	hash_set<basic_block> *phi_bbs = dominator_to_phi_map->get (p);
+	if (!phi_bbs)
+	  continue;
+	enum tree_code pred_code = gimple_cond_code (pred_last);
+	/* Find out previous true & false edge destinations (here "true" and
+	   "false" correspond to current bb).
+	   First initialize them with bb1 and "p's successor that is not
+	   bb1" respectively, then switch them if result in VNO is false.  */
+	basic_block pre_true_dest = bb1;
+	basic_block pre_false_dest = EDGE_SUCC (p, 0)->dest != bb1
+				       ? EDGE_SUCC (p, 0)->dest
+				       : EDGE_SUCC (p, 1)->dest;
+	if (val->result == boolean_false_node)
+	  std::swap (pre_true_dest, pre_false_dest);
+
+	bool insert_by_true = true, insert_by_false = true;
+	/* If the condition in vno is not identical with original condition, but
+	   a relaxed one, then only edge p->bb1 is helpful.  */
+	if (is_relaxed_cond_code (pred_code, vno->opcode))
+	  {
+	    if (val->result == boolean_true_node)
+	      insert_by_false = false;
+	    else
+	      insert_by_true = false;
+	  }
+
+	/* Iterate on basic-blocks with PHIs immediately dominated by p.  */
+	for (hash_set<basic_block>::iterator it = phi_bbs->begin ();
+	     it != phi_bbs->end (); ++it)
+	  {
+	    basic_block phi_bb = *it;
+	    if (!dominated_by_p_w_unex (bb, phi_bb, false))
+	      continue;
+
+	    /* Among true & false edges, if one edge can lead to another before
+	       reaching phi_bb, then we can't derive equivalences by the former.
+	       Part of the situations (when the joint is pre_true_dest or
+	       pre_false_dest) are ruled out by the branch_may_come_from_another
+	       check.  The others (when the joint is a descendant of
+	       pre_true_dest or pre_false_dest) are handled by calculating and
+	       checking possible_true_args & possible_false_args.  */
+	    bool insert_by_true_1
+	      = insert_by_true
+		&& (pre_false_dest == phi_bb
+		    || !branch_may_come_from_another (p, pre_false_dest));
+	    bool insert_by_false_1
+	      = insert_by_false
+		&& (pre_true_dest == phi_bb
+		    || !branch_may_come_from_another (p, pre_true_dest));
+
+	    /* Process PHIs, record the equivalences that can be derived from
+	       previous true or false branch to current bb's successors.  */
+	    for (gphi_iterator gsi = gsi_start_nonvirtual_phis (phi_bb);
+		 !gsi_end_p (gsi); gsi_next_nonvirtual_phi (&gsi))
+	      {
+		gphi *phi = gsi.phi ();
+		tree res = vn_valueize (PHI_RESULT (phi));
+		/* Calculate possible arguments collections for previous true
+		   and false edges.  */
+		hash_set<tree> possible_true_args;
+		hash_set<tree> possible_false_args;
+		for (unsigned k = 0; k < gimple_phi_num_args (phi); ++k)
+		  {
+		    edge in = gimple_phi_arg_edge (phi, k);
+		    tree arg = vn_valueize (gimple_phi_arg_def (phi, k));
+		    if ((in->src == p && pre_true_dest == phi_bb)
+			|| dominated_by_p (CDI_DOMINATORS, in->src,
+					   pre_true_dest))
+		      possible_true_args.add (arg);
+		    else
+		      possible_false_args.add (arg);
+
+		    if ((in->src == p && pre_false_dest == phi_bb)
+			|| dominated_by_p (CDI_DOMINATORS, in->src,
+					   pre_false_dest))
+		      possible_false_args.add (arg);
+		    else
+		      possible_true_args.add (arg);
+		  }
+
+		/* Record equivalences if there's one single argument in the
+		   possible collection. (Otherwise we need to check that all
+		   arguments in the collection are effectively equal.)  */
+		if (true_e && insert_by_true_1
+		    && possible_true_args.elements () == 1)
+		  {
+		    tree arg = *possible_true_args.begin ();
+		    val_equiv_insert (arg, res, true_e);
+		  }
+		if (false_e && insert_by_false_1
+		    && possible_false_args.elements () == 1)
+		  {
+		    tree arg = *possible_false_args.begin ();
+		    val_equiv_insert (arg, res, false_e);
+		  }
+	      }
+	  }
+      }
+}
+
+/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
+ * NULL_TREE if no known result is found.  */
+
+static tree
+find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
+				 vn_nary_op_t *vnresult)
+{
+  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
+  tree result
+    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
+  if (result)
+    {
+      if (!useless_type_conversion_p (vno->type, TREE_TYPE (result)))
+	result = fold_convert (vno->type, result);
+      return result;
+    }
+  return vn_nary_op_lookup_1 (vno, vnresult);
 }
 
 /* Insert the rhs of STMT into the current hash table with a value number of
@@ -4499,6 +4988,20 @@ vn_phi_lookup (gimple *phi, bool backedges_varying_p)
   return (*slot)->result;
 }
 
+/* Record mappings from immediate dominator to basic_block with PHIs.  */
+
+static void
+record_dom_to_phi_bb (basic_block idom, basic_block phi_bb)
+{
+  /* Rule out when there is a backedge between idom and phi_bb.  */
+  basic_block loop_header = phi_bb->loop_father->header;
+  if (!dominated_by_p (CDI_DOMINATORS, idom, loop_header)
+      && dominated_by_p (CDI_DOMINATORS, loop_header, idom))
+    return;
+  hash_set<basic_block> &phi_bbs = dominator_to_phi_map->get_or_insert (idom);
+  phi_bbs.add (phi_bb);
+}
+
 /* Insert PHI into the current hash table with a value number of
    RESULT.  */
 
@@ -4536,6 +5039,9 @@ vn_phi_insert (gimple *phi, tree result, bool backedges_varying_p)
 	   allow VN_TOP.  */
 	vp1->cclhs = vn_valueize (gimple_cond_lhs (last1));
 	vp1->ccrhs = vn_valueize (gimple_cond_rhs (last1));
+	/* Record the mapping, this will be used in
+	 * record_equiv_from_previous_cond.  */
+	record_dom_to_phi_bb (idom1, vp1->block);
       }
   vp1->result = result;
   vp1->hashcode = vn_phi_compute_hash (vp1);
@@ -4862,8 +5368,32 @@ static bool
 visit_nary_op (tree lhs, gassign *stmt)
 {
   vn_nary_op_t vnresult;
-  tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
-  if (! result && vnresult)
+  unsigned length = vn_nary_length_from_stmt (stmt);
+  vn_nary_op_t vno
+    = XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (length));
+  init_vn_nary_op_from_stmt (vno, stmt);
+  tree result = vn_nary_op_lookup_1 (vno, &vnresult);
+
+  /* Because we search by equivalences, there's a change to get a simplified
+     result.  */
+  if (!result)
+    {
+      gimple *use_stmt;
+      use_operand_p use_p;
+      /* Replacing variable used in PHI at loop exit may cause
+       * simplify_peeled_chrec/constant_after_peeling to miss the loop.  */
+      if (!(single_imm_use (lhs, &use_p, &use_stmt)
+	    && gimple_code (use_stmt) == GIMPLE_PHI
+	    && single_succ_p (gimple_bb (use_stmt))
+	    && (single_succ_edge (gimple_bb (use_stmt))->flags
+		& EDGE_DFS_BACK)))
+	result
+	  = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
+      if (result && !useless_type_conversion_p (vno->type, TREE_TYPE (result)))
+	result = fold_convert (vno->type, result);
+    }
+
+  if (!result && vnresult)
     result = vn_nary_op_get_predicated_value (vnresult, gimple_bb (stmt));
   if (result)
     return set_ssa_val_to (lhs, result);
@@ -6932,6 +7462,8 @@ free_rpo_vn (void)
 
   delete constant_to_value_id;
   constant_to_value_id = NULL;
+  delete dominator_to_phi_map;
+  dominator_to_phi_map = NULL;
 }
 
 /* Hook for maybe_push_res_to_seq, lookup the expression in the VN tables.  */
@@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
 	    tree val = gimple_simplify (gimple_cond_code (last),
 					boolean_type_node, lhs, rhs,
 					NULL, vn_valueize);
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
 		tree ops[2];
 		ops[0] = lhs;
 		ops[1] = rhs;
-		val = vn_nary_op_lookup_pieces (2, gimple_cond_code (last),
-						boolean_type_node, ops,
-						&vnresult);
+		vn_nary_op_t vno1
+		  = XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (2));
+		init_vn_nary_op_from_pieces (vno1, 2, gimple_cond_code (last),
+					     boolean_type_node, ops);
+		val = vn_nary_op_lookup_1 (vno1, &vnresult);
+		if ((!val || TREE_CODE (val) != INTEGER_CST))
+		  /* If no simplified result has been found, try to find one by
+		   * equivalences.  */
+		  val = find_predicated_value_by_equivs (vno1, bb, &vnresult);
 		/* Did we get a predicated value?  */
 		if (! val && vnresult && vnresult->predicated_values)
 		  {
@@ -7382,6 +7920,22 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* Record new equivalence generated by EQ_EXPR.  */
+		if (code == EQ_EXPR && true_e && vn_tracking_edge (true_e))
+		  val_equiv_insert (ops[0], ops[1], true_e);
+		if (code == NE_EXPR && false_e && vn_tracking_edge (false_e))
+		  val_equiv_insert (ops[0], ops[1], false_e);
+
+		/* If we got a previous condition, but no predicated value, try
+		   to record what we know from previous true & false branches on
+		   current true & false edges.  This might provide some
+		   opportunities for futher prediction.  For now we only record
+		   equivalences generated by PHI nodes, that depends on the
+		   previous condition.  */
+		if (vnresult && (true_e || false_e))
+		  record_equiv_from_previous_cond (vnresult, true_e, false_e,
+						   bb);
 	      }
 	    break;
 	  }
@@ -7504,6 +8058,7 @@ struct unwind_state
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
   vn_avail *avail_top;
+  nary_ref *nary_ref_top;
 };
 
 /* Unwind the RPO VN state for iteration.  */
@@ -7553,6 +8108,17 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       av->next = avail.m_avail_freelist;
       avail.m_avail_freelist = av;
     }
+  /* Drain nary_refs to freelist, to reuse the memory.  */
+  for (; last_pushed_nary_ref && last_pushed_nary_ref->ref != to->nary_ref_top;)
+    {
+      vn_ssa_aux_t val = last_pushed_nary_ref;
+      nary_ref *ref = val->ref;
+      val->ref = ref->next;
+      last_pushed_nary_ref = ref->next_undo;
+      ref->next = nary_ref_freelist;
+      nary_ref_freelist = ref;
+      --val->num_nary_ref;
+    }
 }
 
 /* Do VN on a SEME region specified by ENTRY and EXIT_BBS in FN.
@@ -7666,6 +8232,9 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_pushed_nary_ref = NULL;
+  nary_ref_freelist = NULL;
+  dominator_to_phi_map = new bb_map_t;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7760,6 +8329,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].nary_top = last_inserted_nary;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
+	    rpo_state[idx].nary_ref_top
+	      = last_pushed_nary_ref ? last_pushed_nary_ref->ref : NULL;
 	  }
 
 	if (!(bb->flags & BB_EXECUTABLE))
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 96100596d2e..545285a4439 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -216,6 +216,16 @@ struct vn_avail
   struct vn_ssa_aux *next_undo;
 };
 
+/* In vn_ssa_aux structure, hold a lists of references to the nary map entries,
+   so that when recording new equivalence to the value number, we can re-insert
+   expressions' results based on this.  */
+struct nary_ref
+{
+  nary_ref *next;
+  vn_nary_op_t slot;
+  struct vn_ssa_aux *next_undo;
+};
+
 typedef struct vn_ssa_aux
 {
   /* SSA name this vn_ssa_aux is associated with in the lattice.  */
@@ -229,6 +239,12 @@ typedef struct vn_ssa_aux
      for SSA names also serving as values (NAME == VALNUM).  */
   vn_avail *avail;
 
+  /* References to slots in the nary map, with VALNUM as operand.  */
+  nary_ref *ref;
+
+  /* nary_ref count.  */
+  unsigned num_nary_ref;
+
   /* Unique identifier that all expressions with the same value have. */
   unsigned int value_id;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-09-16 18:13 [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Di Zhao OS
@ 2021-10-01 12:59 ` Richard Biener
  2021-10-24 19:03   ` Di Zhao OS
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2021-10-01 12:59 UTC (permalink / raw)
  To: Di Zhao OS; +Cc: gcc-patches

On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
<dizhao@os.amperecomputing.com> wrote:
>
> Sorry about updating on this after so long. It took me much time to work out a
> new plan and pass the tests.
>
> The new idea is to use one variable to represent a set of equal variables at
> some basic-block. This variable is called a "equivalence head" or "equiv-head"
> in the code. (There's no-longer a "equivalence map".)
>
> - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
>   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
>   map. Besides, when inserting into vn_nary_op_t map, make the new result at
>   front of the vn_pval list, so that when searching for a variable's
>   equivalence head, the first result represents the largest equivalence set at
>   current location.
> - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
>   For recorded equivalences, the reference is result->entry; for normal N-ary
>   operations, the reference is operand->entry.
> - When recording equivalences, if one side A is constant or has more refs, make
>   it the new equivalence head of the other side B. Traverse B's ref-list, if a
>   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
>   operations by replacing B with A.
> - When inserting and looking for the results of n-ary operations, insert and
>   lookup by the operands' equiv-heads.
>
> So except for the refs in vn_ssa_aux_t, this scheme uses the original
> infrastructure to its best. Quadric search time is avoided at the cost of some
> re-insertions. Test results on SPEC2017 intrate (counts and percentages):
>
>                 |more bb |more bb |more stmt|more stmt|more       |more
>                 |removed |removed |removed  |removed  |nv_nary_ops|nv_nary_ops
>                 |at fre1 |at fre1 |at fre1  |at fre1  |inserted   |inserted
> ------------------------------------------------------------------------------
>  500.perlbench_r| 64     | 1.98%  | 103     | 0.19%   | 11260     | 12.16%
>  502.gcc_r      | 671    | 4.80%  | 317     | 0.23%   | 13964     | 6.09%
>  505.mcf_r      | 5      | 35.71% | 9       | 1.40%   | 32        | 2.52%
>  520.omnetpp    | 132    | 5.45%  | 39      | 0.11%   | 1895      | 3.91%
>  523.xalancbmk_r| 238    | 3.26%  | 313     | 0.36%   | 1417      | 1.27%
>  525.x264_r     | 4      | 1.36%  | 27      | 0.11%   | 1752      | 6.78%
>  531.deepsjeng_r| 1      | 3.45%  | 2       | 0.14%   | 228       | 8.67%
>  541.leela_r    | 2      | 0.63%  | 0       | 0.00%   | 92        | 1.14%
>  548.exchange2_r| 0      | 0.00%  | 3       | 0.04%   | 49        | 1.03%
>  557.xz_r       | 0      | 0.00%  | 3       | 0.07%   | 272       | 7.55%
>
> There're more basic_blocks and statements removed compared with last
> implementation, the reasons are:
> 1) "CONST op CONST" simplification is included. It is missed in previous patch.
> 2) By inserting RHS of statements on equiv-heads, more N-ary operations can be
>    simplified. One example is in 'ssa-fre-97.c' in the patch file.
>
> While jump-threading & vrp also utilize temporary equivalences (so some of the
> newly removed blocks and statements can also be covered by them), I think this
> patch is a supplement, in cases when jump threading cannot take place (the
> original example), or value number info needs to be involved (the
> 'ssa-fre-97.c' example).
>
> Fixed the former issue with non-iterate mode.
>
> About recording the temporary equivalences generated by PHIs (i.e. the
> 'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and
> the code size is large, but I haven't find a better way yet. Consider a piece
> of CFG like the one below, if we want to record x==x2 on the true edge when
> processing bb1, the location (following current practice) will be bb2. But that
> is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't
> find a place to record x==x1 when processing bb1.
> If we can record things on edges rather than blocks, say x==x1 on 1->3 and
> x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid
> equiv-head for x since bb5. But I think it lacks efficiency and is not
> persuasive. It is more efficient to find a valid previous predicate when
> processing bb4, because the vn_nary_op_t will be fetched anyway.
>         --------------
>         | if (a != 0) | bb1
>         --------------
>         f |      \ t
>           |    -------
>           |    | bb2 |
>           |    -------
>           |      /
>     -------------------------
>     | x = PHI<x1(1), x2(2)> | bb3
>     -------------------------
>                |
>               ....
>                |
>        --------------
>        | if (a != 0) | bb4
>        --------------
>            |f     \t
>     -------------  -------
> bb7 | where     |  | bb5 |  ==> where "x==x2" is recorded now
>     | "x==x1" is|  -------
>     | recorded  |    \
>     | now       |    ...
>     -------------     |
>                    -------
>                    | bb6 |  ==> where "x==x2" needs to be used
>                    -------
> Although I think I can remove the 'dominator_to_phi_map' and generalize this a
> little, but the major logic will be similar. So I left this unchanged for now,
> and would like to hear your suggestions on the new scheme overall.
>
> Thanks,
> Di Zhao
>
> --------
> Extend FRE with temporary equivalences.

Comments on the patch:

+  /* nary_ref count.  */
+  unsigned num_nary_ref;
+

I think a unsigned short should be enough and that would nicely
pack after value_id together with the bitfield (maybe change that
to unsigned short :1 then).

@@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
            tree val = gimple_simplify (gimple_cond_code (last),
                                        boolean_type_node, lhs, rhs,
                                        NULL, vn_valueize);
+           vn_nary_op_t vnresult = NULL;
            /* If the condition didn't simplfy see if we have recorded
               an expression from sofar taken edges.  */
            if (! val || TREE_CODE (val) != INTEGER_CST)
              {
-               vn_nary_op_t vnresult;

looks like you don't need vnresult outside of the if()?

+                 /* If no simplified result has been found, try to find one by
+                  * equivalences.  */

no * on the second line.

+/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
+ * NULL_TREE if no known result is found.  */
+
+static tree
+find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
+                                vn_nary_op_t *vnresult)
+{
+  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
+  tree result
+    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);

why is it necessary to simplify here?  It looks like the caller
already does this.
I wonder whether it's valid to always perform find_predicated_value_by_equivs
from inside vn_nary_op_get_predicated_value instead of bolting it to only
a single place?

+  if (result)
+    {
+      if (!useless_type_conversion_p (vno->type, TREE_TYPE (result)))
+       result = fold_convert (vno->type, result);
+      return result;
+    }
+  return vn_nary_op_lookup_1 (vno, vnresult);
 }

+/* Record the equivalence of LHS and RHS at basic_block BB.  */

I suppose E->dest instead of BB

+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
+{

+  if (is_gimple_min_invariant (lhs))
+    std::swap (lhs, rhs);
+  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
+    /* Possible if invoked from record_equiv_from_previous_cond.  */
+    return NULL;

Better formulate all of the above in terms of only SSA_NAME checks since...

+  /* Make the hand-side with more recorded n-ary expressions new
+   * equivalence-head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME
+      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
+    std::swap (lhs, rhs);

here LHS needs to be an SSA_NAME.

+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+                                            &lhs, rhs, 0, bb);

so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
with value rhs?
Presumably you think that's good enough to not overlap with entries
for "real" IL?
In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
&[lhs, rhs], boolean_true_node, ...) like I think what would be
inserted for by the
existing code in process_bb via vn_nary_op_insert_pieces_predicated?
It would probably felt more natural if the "new" code bolted on the case where
that is called with an equivalence?

Btw, I didn't find any condition that prevents equivalences being used for
non-integer types?  Are you sure this is all valid for FP compares without
further restrictions?

@@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
               integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-                                             sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+  /* Insert by operands' equiv-heads.  */
+  tree new_ops[3];
+  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
+  /* Skip if the result is known.  */
+  tree simplified = simplify_nary_op (length, code, new_ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       {
+         fprintf (dump_file, "Result is known: ");
+         print_generic_expr (dump_file, simplified, TDF_NONE);
+         fprintf (dump_file, ", skipped.\n");
+       }
+      return NULL;
+    }

soo - how do the existing predicated expressions change to be handled here?
Why would they ever be simplified?  Should this simplification not better happen
upthread when visiting the conditional?  Maybe that needs to use the equivs
already?

You re-use the predicated value list which I think is great, can you explain
how "latest" is meant in the following?

@@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
vn_nary_op_table_type *table,
              next = &(*next)->next;
            }
          if (!found)
-           *next = nval;
+           {
+             /* Insert new value at the front, so that lookup_equiv_head can
+              * find the latest result.  */
+             nval->next = vno->u.values;
+             vno->u.values = nval;
+           }


OK, I have to stop now but will try to understand and review further next week.
Sorry for the delays but the patch is quite complex to understand :/

Richard.

> 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (dominated_by_p_w_unex): Moved upward, no change.
>         (vn_nary_op_get_predicated_value): Moved upward, no change.
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (lookup_equiv_head): Lookup the "equivalence head" of given node.
>         (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
>         (vn_tracking_edge): Extracted utility function.
>         (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
>         (vn_nary_op_insert_into): Insert new value at the front.
>         (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
>         from pieces.
>         (simplify_nary_op): Try to simplify N-ary expressions.
>         (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
>         (vn_nary_op_insert_pieces_predicated): Insert by "equivalence head"s.
>         (is_relaxed_cond_code): Whether operation code is a relaxed condition
>         code derived from original code.
>         (branch_may_come_from_another): Whether there's a path from the one
>         true or false destination to another.
>         (val_equiv_insert): Record temporary equivalence.
>         (record_equiv_from_previous_cond): Record equivalences generated by
>         previous conditions on current BB's true and false edges.
>         (find_predicated_value_by_equivs): Find predicated value by the
>         operands' equivalences.
>         (record_dom_to_phi_bb): Record mappings from immediate dominator to
>         basic_block with PHIs.
>         (vn_phi_insert): Record mapping from the immediate dominator to a PHI
>         node.
>         (visit_nary_op): Add lookup previous results of N-ary operations by
>         equivalences.
>         (free_rpo_vn): Free dominator_to_phi_map.
>         (process_bb): Add lookup predicated values by equivalences.
>         (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
>         (do_unwind): Unwind the back-refs to vn_nary_op_t.
>         (do_rpo_vn): Update back-reference unwind state.
>         * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
>         nary map entries.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-96.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-97.c: New test.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-10-01 12:59 ` Richard Biener
@ 2021-10-24 19:03   ` Di Zhao OS
  2021-11-10  9:43     ` Richard Biener
  0 siblings, 1 reply; 7+ messages in thread
From: Di Zhao OS @ 2021-10-24 19:03 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 15774 bytes --]

Hi,

Attached is a new version of the patch, mainly for improving performance
and simplifying the code.

First, regarding the comments:

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Friday, October 1, 2021 9:00 PM
> To: Di Zhao OS <dizhao@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
> 
> On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> <dizhao@os.amperecomputing.com> wrote:
> >
> > Sorry about updating on this after so long. It took me much time to work out a
> > new plan and pass the tests.
> >
> > The new idea is to use one variable to represent a set of equal variables at
> > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > in the code. (There's no-longer a "equivalence map".)
> >
> > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> >   front of the vn_pval list, so that when searching for a variable's
> >   equivalence head, the first result represents the largest equivalence set at
> >   current location.
> > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> >   For recorded equivalences, the reference is result->entry; for normal N-ary
> >   operations, the reference is operand->entry.
> > - When recording equivalences, if one side A is constant or has more refs, make
> >   it the new equivalence head of the other side B. Traverse B's ref-list, if a
> >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> >   operations by replacing B with A.
> > - When inserting and looking for the results of n-ary operations, insert and
> >   lookup by the operands' equiv-heads.
> > ...
> >
> > Thanks,
> > Di Zhao
> >
> > --------
> > Extend FRE with temporary equivalences.
> 
> Comments on the patch:
> 
> +  /* nary_ref count.  */
> +  unsigned num_nary_ref;
> +
> 
> I think a unsigned short should be enough and that would nicely
> pack after value_id together with the bitfield (maybe change that
> to unsigned short :1 then).

Changed num_nary_ref to unsigned short and moved after value_id.

> @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
>             tree val = gimple_simplify (gimple_cond_code (last),
>                                         boolean_type_node, lhs, rhs,
>                                         NULL, vn_valueize);
> +           vn_nary_op_t vnresult = NULL;
>             /* If the condition didn't simplfy see if we have recorded
>                an expression from sofar taken edges.  */
>             if (! val || TREE_CODE (val) != INTEGER_CST)
>               {
> -               vn_nary_op_t vnresult;
> 
> looks like you don't need vnresult outside of the if()?

vnresult is reused later to record equivalences generated by PHI nodes.

> +/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
> + * NULL_TREE if no known result is found.  */
> +
> +static tree
> +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> +                                vn_nary_op_t *vnresult)
> +{
> +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> +  tree result
> +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> 
> why is it necessary to simplify here?  It looks like the caller
> already does this.

In the new patch, changed the code a little to remove redundant calculation.

> I wonder whether it's valid to always perform find_predicated_value_by_equivs
> from inside vn_nary_op_get_predicated_value instead of bolting it to only
> a single place?

Removed function find_predicated_value_by_equivs and inlined the code.

Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
inserted and searched by equiv-heads.

> +
> +static vn_nary_op_t
> +val_equiv_insert (tree op1, tree op2, edge e)
> +{
> 
> +  if (is_gimple_min_invariant (lhs))
> +    std::swap (lhs, rhs);
> +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> +    return NULL;
> 
> Better formulate all of the above in terms of only SSA_NAME checks since...
> 
> +  /* Make the hand-side with more recorded n-ary expressions new
> +   * equivalence-head, to make fewer re-insertions.  */
> +  if (TREE_CODE (rhs) == SSA_NAME
> +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> +    std::swap (lhs, rhs);
> 
> here LHS needs to be an SSA_NAME.

Tried to fix this in the new patch.

> +  /* Record equivalence as unary NOP_EXPR.  */
> +  vn_nary_op_t val
> +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> +                                            &lhs, rhs, 0, bb);
> 
> so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> with value rhs?
> Presumably you think that's good enough to not overlap with entries
> for "real" IL?

By checking related codes, it seems to me there's no overlap as long as
equivalences are inserted and searched as predicated_values. It should be easy
to replace this with other preferred opcode.

> In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> &[lhs, rhs], boolean_true_node, ...) like I think what would be
> inserted for by the
> existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> It would probably felt more natural if the "new" code bolted on the case where
> that is called with an equivalence?

The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
search in the nary-map, the set of equivalences of a variable can be found.
Also it's easy to find out whether two variables are equal. And the 
disjoint-sets are path-compressed by updating "equivalence head" and
re-inserting predicates.

In the new attached patch, predicates like "A==B is true" and "A!=B is false"
are not inserted, as they can be computed from equivalences. Same with the
predicates derived from them. This reduces insertions of predicates by a lot (
counting the equivalences themselves and re-inserted predicates). Below are
test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):

              |more bb |more bb |more stmt|more stmt|less      |less
              |removed |removed |removed  |removed  |predicates|predicates
              |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
--------------------------------------------------------------------------
500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%

Considering the copying and unwinding work saved, this patch could reduce
compile time rather than increase it.

> Btw, I didn't find any condition that prevents equivalences being used for
> non-integer types?  Are you sure this is all valid for FP compares without
> further restrictions?

I think this approach follows sccvn's previous decisions on FP comparisons,
for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
The patch is tested on regression tests and SPEC fprate cases.

> 
> @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> int length, enum tree_code code,
>        fprintf (dump_file, " == %s\n",
>                integer_zerop (result) ? "false" : "true");
>      }
> -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> -  vno1->predicated_values = 1;
> -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> -                                             sizeof (vn_pval));
> -  vno1->u.values->next = NULL;
> -  vno1->u.values->result = result;
> -  vno1->u.values->n = 1;
> -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> +  /* Insert by operands' equiv-heads.  */
> +  tree new_ops[3];
> +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> +  /* Skip if the result is known.  */
> +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> +  if (simplified)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fprintf (dump_file, "Result is known: ");
> +         print_generic_expr (dump_file, simplified, TDF_NONE);
> +         fprintf (dump_file, ", skipped.\n");
> +       }
> +      return NULL;
> +    }
> 
> soo - how do the existing predicated expressions change to be handled here?
> Why would they ever be simplified?  Should this simplification not better
> happen
> upthread when visiting the conditional?  Maybe that needs to use the equivs
> already?

This happens when updating variable A's equiv_head from B to C, and 
re-inserting former predicates of B with C. For example, "x < 0" is recorded,
then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
simplify. It shouldn't happen when inserting new predicates.

> 
> You re-use the predicated value list which I think is great, can you explain
> how "latest" is meant in the following?
> 
> @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> vn_nary_op_table_type *table,
>               next = &(*next)->next;
>             }
>           if (!found)
> -           *next = nval;
> +           {
> +             /* Insert new value at the front, so that lookup_equiv_head can
> +              * find the latest result.  */
> +             nval->next = vno->u.values;
> +             vno->u.values = nval;
> +           }

It meant the "most general" result. (I picked the wrong word. The description
is changed in the new patch file. )

There can be multiple valid results at the same time. As we traverse in RPO and
update equiv-heads (i.e. unite disjoint-sets), the last inserted result
represents the most complete set of equivalences of the variable.

(In the last patch version, when "Appending predicate to value", the new result
is not moved to the front. That violates the property of equiv-heads and some
opportunities will be lost. Fixed this in the new patch file.)

> 
> OK, I have to stop now but will try to understand and review further next week.
> Sorry for the delays but the patch is quite complex to understand :/
> 
> Richard.
> 
> > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> >

Other changes:
- Fixed some errs in comments.
- On recording temporary equivalences generated by PHI nodes, altered the
  logic, hope it can be a bit more clear.
  1. When a conditional expression's true/false edge E is assessed to be 
     executable, search for conflicting predicates for E to be taken. If a
     conflicting predicate is found at L, and L's single predecessor P 
     dominates current BB, then we know P->L cannot be taken if E is taken.
  2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
     PHI arguments (x2) that can only come from P->L. If there's a single 
     PHI argument (x1) left, record the equivalence of x1 and PHI result x
     at edge E.
          -----
          | P | 
          -----
           |  \
           |   -----
           |   | L |  <== conflicting predicate's location
           |   -----
           |    /
  -------------------------
  | x = PHI<x1(P), x2(L)> | phi_bb
  -------------------------
             |
            ....
             |
           -----
           | BB | <== current bb
           -----
          /     \ edge E
        ...     -------
                | dest |
                -------

- Fixed a correctness problem when recording equivalences from PHI in previous
  version. Also test case ssa-fre-95.c:k in previous version was incorrect,
  fixed that.

Thanks,
Di Zhao

---
Extend FRE with temporary equivalences.

2021-10-24  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
        predicate already).
        (dominated_by_p_w_unex): Moved upward.
        (vn_nary_op_get_predicated_value): Moved upward.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (val_equiv_insert): Record temporary equivalence.
        (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
        some predicates; insert back-refs.
        (record_equiv_from_prev_phi_1): Record temporary equivalences generated
        by PHI nodes.
        (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
        expression taken, record equivalences generated by PHI nodes.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (insert_related_predicates_on_edge): Some predicates can be computed
        from equivalences, no need to insert them.
        (process_bb): Add lookup predicated values by equivalenc~/tmp/time-frees.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
        * gcc.dg/tree-ssa/ssa-pre-34.c: New test.


[-- Attachment #2: v3-tree-optimization-101186.patch --]
[-- Type: application/octet-stream, Size: 37568 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
index cca706e0c4f..0a99c3c620e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w" } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2-details -w" } */
 
 typedef union tree_node *tree;
 struct gcc_options
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
index ac8271cc574..e9a797e82b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
index b2c09cbb021..bcbc419575c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
index 2316f7e1c04..d63291da440 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 int f(int x, int y)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index e7038d0237f..c1e5667b45a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 static inline long load(long *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
index b44c064aa23..069e4a106c3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
index 26e5abbdc29..26b303e896e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
index 22b68d5ede0..3ed3bf6ad29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 00000000000..ccb0fdaa698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void boo(void);
+extern void foo(void);
+
+/* Test temporary equivalences: basic.  */
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable.  */
+	    foo ();
+	}
+    }
+}
+
+/* Test temporary equivalences: transitive.  */
+
+void g (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.  */
+		foo ();
+	    }
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
new file mode 100644
index 00000000000..48954d50f2e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
@@ -0,0 +1,74 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.  */
+	    bar ();
+	}
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+void g (unsigned int a, unsigned int b, unsigned int d)
+{
+  unsigned int c = 100;
+  if (a > 0)
+    c = b;
+  else
+    c = d;
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a >= 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated. ("a >= 0" cannot derive "a > 0".)  */
+	    bas ();
+	}
+      else
+	{
+	  if (i >= d)
+	    /* Should be eliminated, as "a < 0" can derive "a <= 0".  */
+	    foo ();
+	}
+      i++;
+    }
+}
+
+/* Test temporary equivalences: no domination.  */
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.  */
+	    bag ();
+	}
+      i++;
+    }
+}
+
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
new file mode 100644
index 00000000000..02f12929cbb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
@@ -0,0 +1,45 @@
+
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void foo(void);
+int g;
+
+/* Test temporary equivalences: optimize N-ary expressions.  */
+
+void
+f (unsigned int a, unsigned int b, unsigned int x, unsigned int y)
+{
+  if (a == b)
+    {
+      g = 100;
+      if (a + x == 99)
+	g = x + b; /* should be simplified to 99 */
+    }
+  else
+    // a!= b
+    {
+      if (a == 100 - x)
+	{
+	  g++;
+	  if (b == 100 - x)
+	    foo (); /* should be removed */
+	}
+      else
+	{
+	  g--;
+	  if (b == 100 - x)
+	    bag (); /* should not be removed */
+	  else
+	    bar (); /* should not be removed */
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "= b_* + x_" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
new file mode 100644
index 00000000000..dba9aed338a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+extern void foo(void);
+extern void bag(void);
+extern void bas(void);
+
+/* Test temporary equivalences: identical condition.  */
+
+void h (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a > 0)
+	/* All paths reaching "a>0 is true" come from "a!=0 is true".  */
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bag ();
+      b++;
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index 1d7ea4e8ffb..87df31d4f0d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fno-tree-fre -fdump-tree-vrp1 -fdisable-tree-ethread -fdisable-tree-thread1" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 893b1d0ddaa..8e3cf28b937 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -353,6 +353,8 @@ static vn_reference_t last_inserted_ref;
 static vn_phi_t last_inserted_phi;
 static vn_nary_op_t last_inserted_nary;
 static vn_ssa_aux_t last_pushed_avail;
+static vn_ssa_aux_t last_pushed_nary_ref;
+static nary_ref* nary_ref_freelist;
 
 /* Valid hashtables storing information we have proven to be
    correct.  */
@@ -491,9 +493,9 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_true_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary, true);
-	    gcc_assert (nary->unwind_to == NULL);
 	    /* Also do not link it into the undo chain.  */
 	    last_inserted_nary = nary->next;
+	    /* There could be a predicate already.  */
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    nary = alloc_vn_nary_op_noinit (2, &vn_tables_insert_obstack);
 	    init_vn_nary_op_from_pieces (nary, 2, EQ_EXPR,
@@ -501,7 +503,6 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_false_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary, true);
-	    gcc_assert (nary->unwind_to == NULL);
 	    last_inserted_nary = nary->next;
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3847,6 +3848,114 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	/* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+	if (dominated_by_p_w_unex (
+	      bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	      false))
+	return val->result;
+      }
+  return NULL_TREE;
+}
+
+/* Whether a predicated value VAL is valid at BB.  */
+
+static inline bool
+is_vn_valid_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p_w_unex (bb, val_bb, false))
+	return true;
+    }
+  return false;
+}
+
+/* Lookup the variable that represents a set of equivalences of NAME at
+   basic-block BB.  This variable is called the "equivalence head" of NAME.  */
+
+static tree
+lookup_equiv_head (tree name, basic_block bb)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+    return name;
+  tree result = VN_INFO (name)->valnum;
+  if (result == VN_TOP)
+    /* VN_TOP is not allowed, otherwise it will be too optimistic for
+       non-iterating mode.  */
+    return name;
+  vn_nary_op_t vnresult;
+  /* As we insert new value at front in vn_nary_op_insert_into, we don't need
+     search iteratively for the most general equiv-head.  */
+  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (result), &result,
+			    &vnresult);
+  if (vnresult && vnresult->predicated_values)
+    {
+      tree eq = vn_nary_op_get_predicated_value (vnresult, bb);
+      if (eq)
+	return eq;
+    }
+  return result;
+}
+
+/* Lookup the "equivalence head"s of OPS at BB, and store them in TO.  N is the
+   size of OPS.  */
+
+static void
+lookup_equiv_heads (unsigned n, tree *ops, tree *to, basic_block bb)
+{
+  for (unsigned i = 0; i < n; i++)
+    {
+      tree equiv_head = lookup_equiv_head (ops[i], bb);
+      if (dump_file && (dump_flags & TDF_DETAILS) && equiv_head != ops[i])
+	{
+	  fprintf (dump_file, "Use equiv-head ");
+	  print_generic_expr (dump_file, equiv_head, TDF_NONE);
+	  fprintf (dump_file, " for ");
+	  print_generic_expr (dump_file, ops[i], TDF_NONE);
+	  fprintf (dump_file, "\n");
+	}
+      to[i] = equiv_head;
+    }
+}
+
+/* Returns whether edge PRED_E is currently being tracked.  */
+
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (!single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (!dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+  return true;
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -3984,6 +4093,9 @@ init_vn_nary_op_from_stmt (vn_nary_op_t vno, gassign *stmt)
       for (i = 0; i < vno->length; ++i)
 	vno->op[i] = gimple_op (stmt, i + 1);
     }
+  /* Insert and lookup N-ary results by the operands' equivalence heads.  */
+  if (gimple_bb (stmt))
+    lookup_equiv_heads (vno->length, vno->op, vno->op, gimple_bb (stmt));
 }
 
 /* Compute the hashcode for VNO and look for it in the hash table;
@@ -4126,12 +4238,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table,
 	    = BASIC_BLOCK_FOR_FN (cfun, vno->u.values->valid_dominated_by_p[0]);
 	  vn_pval *nval = vno->u.values;
 	  vn_pval **next = &vno->u.values;
-	  bool found = false;
 	  for (vn_pval *val = (*slot)->u.values; val; val = val->next)
 	    {
 	      if (expressions_equal_p (val->result, nval->result))
 		{
-		  found = true;
 		  for (unsigned i = 0; i < val->n; ++i)
 		    {
 		      basic_block val_bb
@@ -4145,17 +4255,15 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table,
 			gcc_unreachable ();
 		    }
 		  /* Append value.  */
-		  *next = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-						     sizeof (vn_pval)
-						     + val->n * sizeof (int));
-		  (*next)->next = NULL;
-		  (*next)->result = val->result;
-		  (*next)->n = val->n + 1;
-		  memcpy ((*next)->valid_dominated_by_p,
-			  val->valid_dominated_by_p,
+		  nval = (vn_pval *) obstack_alloc (&vn_tables_obstack,
+						    sizeof (vn_pval)
+						      + val->n * sizeof (int));
+		  nval->next = NULL;
+		  nval->result = val->result;
+		  nval->n = val->n + 1;
+		  memcpy (nval->valid_dominated_by_p, val->valid_dominated_by_p,
 			  val->n * sizeof (int));
-		  (*next)->valid_dominated_by_p[val->n] = vno_bb->index;
-		  next = &(*next)->next;
+		  nval->valid_dominated_by_p[val->n] = vno_bb->index;
 		  if (dump_file && (dump_flags & TDF_DETAILS))
 		    fprintf (dump_file, "Appending predicate to value.\n");
 		  continue;
@@ -4168,8 +4276,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table,
 	      (*next)->next = NULL;
 	      next = &(*next)->next;
 	    }
-	  if (!found)
-	    *next = nval;
+	  /* Put new result at front, so lookup_equiv_head can find the last
+	     inserted (most general) result.  */
+	  nval->next = vno->u.values;
+	  vno->u.values = nval;
 
 	  *slot = vno;
 	  vno->next = last_inserted_nary;
@@ -4214,6 +4324,86 @@ vn_nary_op_insert_pieces (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary, true);
 }
 
+/* Similar with vn_nary_op_insert_pieces, but insert as predicated values.  */
+
+static vn_nary_op_t
+vn_nary_op_insert_pieces_predicated_1 (unsigned int length, enum tree_code code,
+				       tree type, tree *ops, tree result,
+				       unsigned int value_id, basic_block bb)
+{
+  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
+  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
+  vno1->predicated_values = 1;
+  vno1->u.values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  vno1->u.values->next = NULL;
+  vno1->u.values->result = result;
+  vno1->u.values->n = 1;
+  vno1->u.values->valid_dominated_by_p[0] = bb->index;
+  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+}
+
+/* Fold N-ary expression of equiv-heads, return NULL_TREE if not folded.  */
+
+static tree
+fold_const_from_equiv_heads (unsigned length, tree_code opcode, tree *op,
+			     tree type)
+{
+  tree result = NULL_TREE;
+  /* We're using equiv-heads, i.e. the most general operands, shortcut for
+     situations that can fold.  */
+  switch (length)
+    {
+    case 1:
+      if (CONSTANT_CLASS_P (op[0]))
+	result
+	  = gimple_simplify (opcode, type, op[0], NULL, no_follow_ssa_edges);
+      break;
+    case 2:
+      if ((CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1]))
+	  || op[0] == op[1])
+	result = gimple_simplify (opcode, type, op[0], op[1], NULL,
+				  no_follow_ssa_edges);
+      break;
+    case 3:
+      if (CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1])
+	  && CONSTANT_CLASS_P (op[2]))
+	result = gimple_simplify (opcode, type, op[0], op[1], op[2], NULL,
+				  no_follow_ssa_edges);
+      break;
+    default:
+      break;
+    }
+  if (result && !useless_type_conversion_p (type, TREE_TYPE (result)))
+    result = fold_convert (type, result);
+  return result;
+}
+
+/* Insert a back-reference to vn_nary_op_t in vn_ssa_aux.  */
+
+static inline void
+push_new_nary_ref (vn_nary_op_t slot, vn_ssa_aux_t info)
+{
+  nary_ref *ref;
+  if (nary_ref_freelist)
+    {
+      ref = nary_ref_freelist;
+      nary_ref_freelist = nary_ref_freelist->next;
+    }
+  else
+    ref = XOBNEW (&vn_ssa_aux_obstack, nary_ref);
+  ref->next = info->ref;
+  ref->next_undo = last_pushed_nary_ref;
+  ref->slot = slot;
+  last_pushed_nary_ref = info;
+  info->ref = ref;
+  ++info->num_nary_ref;
+  gcc_assert (info->num_nary_ref > 0);
+}
+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e);
+
 static vn_nary_op_t
 vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     tree type, tree *ops,
@@ -4221,21 +4411,15 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
-	return NULL;
-    }
+  if (!vn_tracking_edge (pred_e))
+    return NULL;
+  /* Equivalence can represent this.  */
+  if (code == EQ_EXPR && result == boolean_true_node)
+    return val_equiv_insert (ops[0], ops[1], pred_e);
+  /* Leave this for inverted condition.  */
+  if (code == NE_EXPR && result == boolean_false_node)
+    return NULL;
+
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4248,36 +4432,252 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-					      sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+  for (unsigned i = 0; i < length; i++)
+    gcc_checking_assert (lookup_equiv_head (ops[i], pred_e->dest) == ops[i]);
+
+  /* Skip if the result is known. (Caused by re-inserting predicates.)  */
+  tree simplified = fold_const_from_equiv_heads (length, code, ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Result is known: ");
+	  print_generic_expr (dump_file, simplified, TDF_NONE);
+	  fprintf (dump_file, ", skipped.\n");
+	}
+      return NULL;
+    }
+
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (length, code, type, ops,
+					     result, value_id, pred_e->dest);
+  /* Insert back-refs (operand->slot) to vn_ssa_aux structure.  */
+  for (unsigned i = 0; i < length; i++)
+    if (TREE_CODE (ops[i]) == SSA_NAME)
+      {
+	vn_ssa_aux_t info = VN_INFO (ops[i]);
+	if (info->valnum == ops[i])
+	  push_new_nary_ref (val, info);
+      }
+  return val;
 }
 
-static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+/* Record the equivalence of OP1 and OP2 on edge E.  */
 
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
 {
-  if (! vno->predicated_values)
-    return vno->u.result;
+  /* Record by equivalence heads.  */
+  basic_block bb = e->dest;
+  tree lhs = lookup_equiv_head (op1, bb);
+  tree rhs = lookup_equiv_head (op2, bb);
+  if (lhs == rhs)
+    return NULL;
+  /* Set RHS to be new equiv-head for LHS.  LHS must be SSA_NAME.  */
+  if (TREE_CODE (lhs) != SSA_NAME)
+    {
+      if (TREE_CODE (rhs) == SSA_NAME)
+	std::swap (lhs, rhs);
+      else
+	return NULL;
+    }
+  vn_ssa_aux_t rhs_info = NULL, lhs_info = VN_INFO (lhs);
+
+  /* Choose the hand-side with more recorded n-ary expressions as new
+     equivalence head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    {
+      rhs_info = VN_INFO (rhs);
+      if (rhs_info->num_nary_ref < lhs_info->num_nary_ref)
+	{
+	  std::swap (lhs, rhs);
+	  std::swap (rhs_info, lhs_info);
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Recording equivalence of ");
+      print_generic_expr (dump_file, lhs);
+      fprintf (dump_file, " and ");
+      print_generic_expr (dump_file, rhs);
+      fprintf (dump_file, " at BB%d\n", bb->index);
+    }
+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+					     &lhs, rhs, 0, bb);
+  /* Back-ref for equivalences is result->slot rather than op->slot, so that
+     when new equiv-head is appended, we can update them.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    push_new_nary_ref (val, rhs_info);
+
+  /* Iterate on LHS's references, first update equiv-heads, then re-insert
+     former predicates, so that the result will be latest.  */
+  auto_vec<vn_nary_op_t> predicates;
+  auto_vec<tree> results;
+  vn_nary_op_t slot;
+  tree old_head = lhs_info->valnum;
+  nary_ref *ref = lhs_info->ref;
+  for (; ref; ref = ref->next)
+    {
+      slot = ref->slot;
+      if (!slot->predicated_values)
+	continue;
+      /* Update recorded equivalences with new equiv-head.  */
+      if (slot->opcode == NOP_EXPR && slot->length == 1)
+	{
+	  if (slot->op[0] == rhs)
+	    continue;
+	  for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	    if (pval->result == old_head && is_vn_valid_at_bb (pval, bb))
+	      {
+		if (dump_file && (dump_flags & TDF_DETAILS))
+		  {
+		    fprintf (dump_file, "Updating equiv-head of ");
+		    print_generic_expr (dump_file, slot->op[0], TDF_SLIM);
+		    fprintf (dump_file, " to ");
+		    print_generic_expr (dump_file, rhs, TDF_SLIM);
+		    fprintf (dump_file, " at BB%d\n", bb->index);
+		  }
+		vn_nary_op_t new_val
+		  = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR,
+							   slot->type, slot->op,
+							   rhs, 0, bb);
+		/* Push back-ref (result->slot) for new equivalence.  */
+		if (TREE_CODE (rhs) == SSA_NAME)
+		  push_new_nary_ref (new_val, rhs_info);
+	      }
+	}
+      else
+	/* Predicated values are collected, to be re-inserted later.  */
+	for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	  {
+	    if (!is_vn_valid_at_bb (pval, bb))
+	      continue;
+	    predicates.safe_push (slot);
+	    results.safe_push (pval->result);
+	  }
+    }
+  unsigned i;
+  if (predicates.length () > 0 && dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Re-inserting %d predicates...\n",
+	     predicates.length ());
+  tree new_ops[3];
+  FOR_EACH_VEC_ELT (predicates, i, slot)
+    {
+      lookup_equiv_heads (slot->length, slot->op, new_ops, e->dest);
+      vn_nary_op_insert_pieces_predicated (slot->length, slot->opcode,
+					   slot->type, new_ops, results[i],
+					   slot->value_id, e);
+    }
+  return val;
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  VNO contains the interested expression.  If
+   INVERSE is true, rule out VNO's results that are different with RESULT;
+   otherwise rule out RESULT itself.  */
+
+static void
+record_equiv_from_prev_phi_1 (vn_nary_op_t vno, edge e, basic_block bb,
+			      tree result, bool inverse)
+{
+  if (!(vno && vno->predicated_values))
+    return;
+
+  /* Process predicated values in VNO, looking for a conflicting result, and
+     the location's single_pred dominates BB.  */
   for (vn_pval *val = vno->u.values; val; val = val->next)
-    for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
+    {
+      if (inverse ? val->result == result : val->result != result)
+	continue;
+      /* Found a result to rule out, process each location.  */
+      for (unsigned i = 0; i < val->n; ++i)
+	{
+	  basic_block bb1
+	    = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+	  if (!single_pred_p (bb1))
+	    continue;
+	  basic_block p = single_pred (bb1);
+	  if (!dominated_by_p_w_unex (bb, p, false))
+	    continue;
+	  gcc_assert (EDGE_COUNT (p->succs) == 2);
+	  /* Iterate on possible phi_bbs (can be BB itself). Considering
+	     non-executable edges, there can be multiple phi_bbs.  */
+	  for (basic_block phi_bb : get_dominated_by (CDI_DOMINATORS, p))
+	    {
+	      if (!dominated_by_p_w_unex (bb, phi_bb, false))
+		continue;
+	      /* Process PHIs, rule out incoming edges that definitely come
+		 from p->bb1.  If there's a single PHI argument X left, record
+		 the equivalence of X and PHI result at edge e.  */
+	      for (gphi_iterator gsi = gsi_start_nonvirtual_phis (phi_bb);
+		   !gsi_end_p (gsi); gsi_next_nonvirtual_phi (&gsi))
+		{
+		  gphi *phi = gsi.phi ();
+		  tree equiv = NULL_TREE;
+		  for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
+		    {
+		      edge in = gimple_phi_arg_edge (phi, j);
+		      if (in->src == bb1
+			  || dominated_by_p_w_unex (in->src, bb1, false))
+			/* Definitely comes from p->bb1.  */
+			continue;
+		      tree arg = vn_valueize (gimple_phi_arg_def (phi, j));
+		      if (equiv && equiv != arg)
+			{
+			  equiv = NULL_TREE;
+			  break;
+			}
+		      equiv = arg;
+		    }
+		  if (equiv)
+		    val_equiv_insert (equiv, PHI_RESULT (phi), e);
+		}
+	    }
+	}
+    }
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  CODE, OPS and RESULT are the condition and
+   result for E to be taken.  VNO contains the condition's results recorded in
+   nary map.
+
+   The idea is to find a conflicting predicate at some location L, where L's
+   predecessor P dominates BB, so we know path P->L will not be taken if E is
+   taken.  If by ruling out P->L can generate some equivalences, record them at
+   E->dest.  */
+
+static void
+record_equiv_from_prev_phi (vn_nary_op_t vno, edge e, basic_block bb,
+			    tree_code code, tree *ops, tree result)
+{
+  if (!vn_tracking_edge (e))
+    return;
+  record_equiv_from_prev_phi_1 (vno, e, bb, result, true);
+
+  /* For predicates like "a==b is false" or "a!=b is true", equivalences are
+     conflicting results, i.e. "NO_OP (a) is b" or "NO_OP (b) is a".  */
+  if ((result == boolean_true_node
+       && (code == NE_EXPR || code == GT_EXPR || code == LT_EXPR))
+      || (result == boolean_false_node && code == EQ_EXPR))
+    {
+      vn_nary_op_t vnresult;
+      if (TREE_CODE (ops[0]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[0]), ops,
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[1], false);
+	}
+      if (TREE_CODE (ops[1]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[1]), &ops[1],
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[0], false);
+	}
+    }
 }
 
 /* Insert the rhs of STMT into the current hash table with a value number of
@@ -4896,8 +5296,25 @@ static bool
 visit_nary_op (tree lhs, gassign *stmt)
 {
   vn_nary_op_t vnresult;
-  tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
-  if (! result && vnresult)
+  unsigned length = vn_nary_length_from_stmt (stmt);
+  vn_nary_op_t vno
+    = XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (length));
+  init_vn_nary_op_from_stmt (vno, stmt);
+  tree result = NULL_TREE;
+  /* Try to get a simplified result.  */
+  /* Do not simplify variable used in PHI at loop exit, or
+     simplify_peeled_chrec/constant_after_peeling may miss the loop.  */
+  gimple *use_stmt;
+  use_operand_p use_p;
+  if (!(single_imm_use (lhs, &use_p, &use_stmt)
+	&& gimple_code (use_stmt) == GIMPLE_PHI
+	&& single_succ_p (gimple_bb (use_stmt))
+	&& (single_succ_edge (gimple_bb (use_stmt))->flags & EDGE_DFS_BACK)))
+    result = fold_const_from_equiv_heads (vno->length, vno->opcode, vno->op,
+					  vno->type);
+  if (!result)
+    result = vn_nary_op_lookup_1 (vno, &vnresult);
+  if (!result && vnresult)
     result = vn_nary_op_get_predicated_value (vnresult, gimple_bb (stmt));
   if (result)
     return set_ssa_val_to (lhs, result);
@@ -7170,12 +7587,8 @@ insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
 					   ops, boolean_false_node, 0, pred_e);
       break;
     case EQ_EXPR:
-      /* a == b -> ! a {<,>} b */
-      vn_nary_op_insert_pieces_predicated (2, LT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      vn_nary_op_insert_pieces_predicated (2, GT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      break;
+      /* No need to insert derived predicates for '==', as they can be computed
+	 by equiv-heads & fold_const_from_equiv_heads.  */
     case LE_EXPR:
     case GE_EXPR:
     case NE_EXPR:
@@ -7342,24 +7755,31 @@ process_bb (rpo_elim &avail, basic_block bb,
       switch (gimple_code (last))
 	{
 	case GIMPLE_SWITCH:
+	  /* TODO: utilize temporary equivalences.  */
 	  e = find_taken_edge (bb, vn_valueize (gimple_switch_index
 						(as_a <gswitch *> (last))));
 	  break;
 	case GIMPLE_COND:
 	  {
-	    tree lhs = vn_valueize (gimple_cond_lhs (last));
-	    tree rhs = vn_valueize (gimple_cond_rhs (last));
-	    tree val = gimple_simplify (gimple_cond_code (last),
-					boolean_type_node, lhs, rhs,
-					NULL, vn_valueize);
+	    tree lhs = lookup_equiv_head (gimple_cond_lhs (last), bb);
+	    tree rhs = lookup_equiv_head (gimple_cond_rhs (last), bb);
+	    tree ops[2];
+	    ops[0] = lhs;
+	    ops[1] = rhs;
+	    tree val = fold_const_from_equiv_heads (2, gimple_cond_code (last),
+						    ops, boolean_type_node);
+	    if (val && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Got simplified result ");
+		print_generic_expr (dump_file, val, TDF_NONE);
+		fprintf (dump_file, " for ");
+		print_gimple_stmt (dump_file, last, TDF_SLIM);
+	      }
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
-		tree ops[2];
-		ops[0] = lhs;
-		ops[1] = rhs;
 		val = vn_nary_op_lookup_pieces (2, gimple_cond_code (last),
 						boolean_type_node, ops,
 						&vnresult);
@@ -7427,6 +7847,15 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* Try to record equivalences generated by previous PHI nodes
+		    on current true & false edges.  */
+		if (true_e)
+		  record_equiv_from_prev_phi (vnresult, true_e, bb, code, ops,
+					      boolean_true_node);
+		if (false_e)
+		  record_equiv_from_prev_phi (vnresult, false_e, bb, code, ops,
+					      boolean_false_node);
 	      }
 	    break;
 	  }
@@ -7549,6 +7978,7 @@ struct unwind_state
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
   vn_avail *avail_top;
+  nary_ref *nary_ref_top;
 };
 
 /* Unwind the RPO VN state for iteration.  */
@@ -7598,6 +8028,17 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       av->next = avail.m_avail_freelist;
       avail.m_avail_freelist = av;
     }
+  /* Drain nary_refs to freelist, to reuse the memory.  */
+  for (; last_pushed_nary_ref && last_pushed_nary_ref->ref != to->nary_ref_top;)
+    {
+      vn_ssa_aux_t val = last_pushed_nary_ref;
+      nary_ref *ref = val->ref;
+      val->ref = ref->next;
+      last_pushed_nary_ref = ref->next_undo;
+      ref->next = nary_ref_freelist;
+      nary_ref_freelist = ref;
+      --val->num_nary_ref;
+    }
 }
 
 /* Do VN on a SEME region specified by ENTRY and EXIT_BBS in FN.
@@ -7711,6 +8152,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_pushed_nary_ref = NULL;
+  nary_ref_freelist = NULL;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7805,6 +8248,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].nary_top = last_inserted_nary;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
+	    rpo_state[idx].nary_ref_top
+	      = last_pushed_nary_ref ? last_pushed_nary_ref->ref : NULL;
 	  }
 
 	if (!(bb->flags & BB_EXECUTABLE))
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 8a1b649c726..65240c9bf25 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -217,6 +217,16 @@ struct vn_avail
   struct vn_ssa_aux *next_undo;
 };
 
+/* In vn_ssa_aux structure, hold a lists of references to the nary map entries,
+   so that when recording new equivalence to the value number, we can re-insert
+   expressions' results based on this.  */
+struct nary_ref
+{
+  nary_ref *next;
+  vn_nary_op_t slot;
+  struct vn_ssa_aux *next_undo;
+};
+
 typedef struct vn_ssa_aux
 {
   /* SSA name this vn_ssa_aux is associated with in the lattice.  */
@@ -230,9 +240,15 @@ typedef struct vn_ssa_aux
      for SSA names also serving as values (NAME == VALNUM).  */
   vn_avail *avail;
 
+  /* References to slots in the nary map, with VALNUM as operand.  */
+  nary_ref *ref;
+
   /* Unique identifier that all expressions with the same value have. */
   unsigned int value_id;
 
+  /* nary_ref count.  */
+  unsigned short num_nary_ref;
+
   /* Whether the SSA_NAME has been processed at least once.  */
   unsigned visited : 1;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-10-24 19:03   ` Di Zhao OS
@ 2021-11-10  9:43     ` Richard Biener
  2021-11-15 17:24       ` Di Zhao OS
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Biener @ 2021-11-10  9:43 UTC (permalink / raw)
  To: Di Zhao OS; +Cc: gcc-patches, Andrew MacLeod

On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
<dizhao@os.amperecomputing.com> wrote:
>
> Hi,
>
> Attached is a new version of the patch, mainly for improving performance
> and simplifying the code.

The patch doesn't apply anymore, can you update it please?

I see the new ssa-fre-101.c test already passing without the patch.
Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
the pass dump after fre1 which is evrp so it seems that evrp already
handles the equivalences (likely with the relation oracle) now?
I'm sure there are second order effects when eliminating conditions
in FRE but did you re-evaluate what made you improve VN to see
if the cases are handled as expected now without this change?

I will still look at and consider the change btw, but given the EVRP
improvements I'm also considering to remove the predication
support from VN alltogether.  At least in the non-iterating mode
it should be trivially easy to use rangers relation oracle to simplify
predicates.  For the iterating mode it might not be 100% effective
since I'm not sure we can make it use the current SSA values and
how it would behave with those eventually changing to worse.

Andrew, how would one ask the relation oracle to simplify a
condition?  Do I have to do any bookkeeping to register
predicates on edges for it?

Thanks,
Richard.

> First, regarding the comments:
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Friday, October 1, 2021 9:00 PM
> > To: Di Zhao OS <dizhao@os.amperecomputing.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > "equivalence map" for condition prediction
> >
> > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> > <dizhao@os.amperecomputing.com> wrote:
> > >
> > > Sorry about updating on this after so long. It took me much time to work out a
> > > new plan and pass the tests.
> > >
> > > The new idea is to use one variable to represent a set of equal variables at
> > > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > > in the code. (There's no-longer a "equivalence map".)
> > >
> > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> > >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> > >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> > >   front of the vn_pval list, so that when searching for a variable's
> > >   equivalence head, the first result represents the largest equivalence set at
> > >   current location.
> > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> > >   For recorded equivalences, the reference is result->entry; for normal N-ary
> > >   operations, the reference is operand->entry.
> > > - When recording equivalences, if one side A is constant or has more refs, make
> > >   it the new equivalence head of the other side B. Traverse B's ref-list, if a
> > >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> > >   operations by replacing B with A.
> > > - When inserting and looking for the results of n-ary operations, insert and
> > >   lookup by the operands' equiv-heads.
> > > ...
> > >
> > > Thanks,
> > > Di Zhao
> > >
> > > --------
> > > Extend FRE with temporary equivalences.
> >
> > Comments on the patch:
> >
> > +  /* nary_ref count.  */
> > +  unsigned num_nary_ref;
> > +
> >
> > I think a unsigned short should be enough and that would nicely
> > pack after value_id together with the bitfield (maybe change that
> > to unsigned short :1 then).
>
> Changed num_nary_ref to unsigned short and moved after value_id.
>
> > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
> >             tree val = gimple_simplify (gimple_cond_code (last),
> >                                         boolean_type_node, lhs, rhs,
> >                                         NULL, vn_valueize);
> > +           vn_nary_op_t vnresult = NULL;
> >             /* If the condition didn't simplfy see if we have recorded
> >                an expression from sofar taken edges.  */
> >             if (! val || TREE_CODE (val) != INTEGER_CST)
> >               {
> > -               vn_nary_op_t vnresult;
> >
> > looks like you don't need vnresult outside of the if()?
>
> vnresult is reused later to record equivalences generated by PHI nodes.
>
> > +/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
> > + * NULL_TREE if no known result is found.  */
> > +
> > +static tree
> > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> > +                                vn_nary_op_t *vnresult)
> > +{
> > +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> > +  tree result
> > +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> >
> > why is it necessary to simplify here?  It looks like the caller
> > already does this.
>
> In the new patch, changed the code a little to remove redundant calculation.
>
> > I wonder whether it's valid to always perform find_predicated_value_by_equivs
> > from inside vn_nary_op_get_predicated_value instead of bolting it to only
> > a single place?
>
> Removed function find_predicated_value_by_equivs and inlined the code.
>
> Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
> vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
> equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
> inserted and searched by equiv-heads.
>
> > +
> > +static vn_nary_op_t
> > +val_equiv_insert (tree op1, tree op2, edge e)
> > +{
> >
> > +  if (is_gimple_min_invariant (lhs))
> > +    std::swap (lhs, rhs);
> > +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> > +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> > +    return NULL;
> >
> > Better formulate all of the above in terms of only SSA_NAME checks since...
> >
> > +  /* Make the hand-side with more recorded n-ary expressions new
> > +   * equivalence-head, to make fewer re-insertions.  */
> > +  if (TREE_CODE (rhs) == SSA_NAME
> > +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> > +    std::swap (lhs, rhs);
> >
> > here LHS needs to be an SSA_NAME.
>
> Tried to fix this in the new patch.
>
> > +  /* Record equivalence as unary NOP_EXPR.  */
> > +  vn_nary_op_t val
> > +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> > +                                            &lhs, rhs, 0, bb);
> >
> > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> > with value rhs?
> > Presumably you think that's good enough to not overlap with entries
> > for "real" IL?
>
> By checking related codes, it seems to me there's no overlap as long as
> equivalences are inserted and searched as predicated_values. It should be easy
> to replace this with other preferred opcode.
>
> > In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> > &[lhs, rhs], boolean_true_node, ...) like I think what would be
> > inserted for by the
> > existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> > It would probably felt more natural if the "new" code bolted on the case where
> > that is called with an equivalence?
>
> The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
> search in the nary-map, the set of equivalences of a variable can be found.
> Also it's easy to find out whether two variables are equal. And the
> disjoint-sets are path-compressed by updating "equivalence head" and
> re-inserting predicates.
>
> In the new attached patch, predicates like "A==B is true" and "A!=B is false"
> are not inserted, as they can be computed from equivalences. Same with the
> predicates derived from them. This reduces insertions of predicates by a lot (
> counting the equivalences themselves and re-inserted predicates). Below are
> test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):
>
>               |more bb |more bb |more stmt|more stmt|less      |less
>               |removed |removed |removed  |removed  |predicates|predicates
>               |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
> --------------------------------------------------------------------------
> 500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
> 502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
> 505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
> 520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
> 523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
> 525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
> 531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
> 541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
> 548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
> 557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%
>
> Considering the copying and unwinding work saved, this patch could reduce
> compile time rather than increase it.
>
> > Btw, I didn't find any condition that prevents equivalences being used for
> > non-integer types?  Are you sure this is all valid for FP compares without
> > further restrictions?
>
> I think this approach follows sccvn's previous decisions on FP comparisons,
> for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
> The patch is tested on regression tests and SPEC fprate cases.
>
> >
> > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> > int length, enum tree_code code,
> >        fprintf (dump_file, " == %s\n",
> >                integer_zerop (result) ? "false" : "true");
> >      }
> > -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> > -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> > -  vno1->predicated_values = 1;
> > -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> > -                                             sizeof (vn_pval));
> > -  vno1->u.values->next = NULL;
> > -  vno1->u.values->result = result;
> > -  vno1->u.values->n = 1;
> > -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> > -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> > +  /* Insert by operands' equiv-heads.  */
> > +  tree new_ops[3];
> > +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> > +  /* Skip if the result is known.  */
> > +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> > +  if (simplified)
> > +    {
> > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > +       {
> > +         fprintf (dump_file, "Result is known: ");
> > +         print_generic_expr (dump_file, simplified, TDF_NONE);
> > +         fprintf (dump_file, ", skipped.\n");
> > +       }
> > +      return NULL;
> > +    }
> >
> > soo - how do the existing predicated expressions change to be handled here?
> > Why would they ever be simplified?  Should this simplification not better
> > happen
> > upthread when visiting the conditional?  Maybe that needs to use the equivs
> > already?
>
> This happens when updating variable A's equiv_head from B to C, and
> re-inserting former predicates of B with C. For example, "x < 0" is recorded,
> then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
> simplify. It shouldn't happen when inserting new predicates.
>
> >
> > You re-use the predicated value list which I think is great, can you explain
> > how "latest" is meant in the following?
> >
> > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> > vn_nary_op_table_type *table,
> >               next = &(*next)->next;
> >             }
> >           if (!found)
> > -           *next = nval;
> > +           {
> > +             /* Insert new value at the front, so that lookup_equiv_head can
> > +              * find the latest result.  */
> > +             nval->next = vno->u.values;
> > +             vno->u.values = nval;
> > +           }
>
> It meant the "most general" result. (I picked the wrong word. The description
> is changed in the new patch file. )
>
> There can be multiple valid results at the same time. As we traverse in RPO and
> update equiv-heads (i.e. unite disjoint-sets), the last inserted result
> represents the most complete set of equivalences of the variable.
>
> (In the last patch version, when "Appending predicate to value", the new result
> is not moved to the front. That violates the property of equiv-heads and some
> opportunities will be lost. Fixed this in the new patch file.)
>
> >
> > OK, I have to stop now but will try to understand and review further next week.
> > Sorry for the delays but the patch is quite complex to understand :/
> >
> > Richard.
> >
> > > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> > >
>
> Other changes:
> - Fixed some errs in comments.
> - On recording temporary equivalences generated by PHI nodes, altered the
>   logic, hope it can be a bit more clear.
>   1. When a conditional expression's true/false edge E is assessed to be
>      executable, search for conflicting predicates for E to be taken. If a
>      conflicting predicate is found at L, and L's single predecessor P
>      dominates current BB, then we know P->L cannot be taken if E is taken.
>   2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
>      PHI arguments (x2) that can only come from P->L. If there's a single
>      PHI argument (x1) left, record the equivalence of x1 and PHI result x
>      at edge E.
>           -----
>           | P |
>           -----
>            |  \
>            |   -----
>            |   | L |  <== conflicting predicate's location
>            |   -----
>            |    /
>   -------------------------
>   | x = PHI<x1(P), x2(L)> | phi_bb
>   -------------------------
>              |
>             ....
>              |
>            -----
>            | BB | <== current bb
>            -----
>           /     \ edge E
>         ...     -------
>                 | dest |
>                 -------
>
> - Fixed a correctness problem when recording equivalences from PHI in previous
>   version. Also test case ssa-fre-95.c:k in previous version was incorrect,
>   fixed that.
>
> Thanks,
> Di Zhao
>
> ---
> Extend FRE with temporary equivalences.
>
> 2021-10-24  Di Zhao  <dizhao@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
>         predicate already).
>         (dominated_by_p_w_unex): Moved upward.
>         (vn_nary_op_get_predicated_value): Moved upward.
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (lookup_equiv_head): Lookup the "equivalence head" of given node.
>         (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
>         (vn_tracking_edge): Extracted utility function.
>         (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
>         (vn_nary_op_insert_into): Insert new value at the front.
>         (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
>         from pieces.
>         (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
>         (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
>         (val_equiv_insert): Record temporary equivalence.
>         (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
>         some predicates; insert back-refs.
>         (record_equiv_from_prev_phi_1): Record temporary equivalences generated
>         by PHI nodes.
>         (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
>         expression taken, record equivalences generated by PHI nodes.
>         (visit_nary_op): Add lookup previous results of N-ary operations by
>         equivalences.
>         (insert_related_predicates_on_edge): Some predicates can be computed
>         from equivalences, no need to insert them.
>         (process_bb): Add lookup predicated values by equivalenc~/tmp/time-frees.
>         (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
>         (do_unwind): Unwind the back-refs to vn_nary_op_t.
>         (do_rpo_vn): Update back-reference unwind state.
>         * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
>         nary map entries.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
>         * gcc.dg/tree-ssa/ssa-pre-34.c: New test.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-11-10  9:43     ` Richard Biener
@ 2021-11-15 17:24       ` Di Zhao OS
  2021-11-18 15:47         ` Andrew MacLeod
  2021-12-03  6:42         ` Di Zhao OS
  0 siblings, 2 replies; 7+ messages in thread
From: Di Zhao OS @ 2021-11-15 17:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 19590 bytes --]

Attached is the updated patch. Fixed some errors in testcases.

> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, November 10, 2021 5:44 PM
> To: Di Zhao OS <dizhao@os.amperecomputing.com>
> Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod <amacleod@redhat.com>
> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
> 
> On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
> <dizhao@os.amperecomputing.com> wrote:
> >
> > Hi,
> >
> > Attached is a new version of the patch, mainly for improving performance
> > and simplifying the code.
> 
> The patch doesn't apply anymore, can you update it please?
> 
> I see the new ssa-fre-101.c test already passing without the patch.

It was a mistake in test ssa-fre-101.c::g to define the variables with
the unsigned integers, in this way "a >= 0" is always true. After modified
the case, now fre1 in the patch can remove unreachable call "foo ()", and
evrp on trunk does not.

> Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
> the pass dump after fre1 which is evrp so it seems that evrp already
> handles the equivalences (likely with the relation oracle) now?
> I'm sure there are second order effects when eliminating conditions
> in FRE but did you re-evaluate what made you improve VN to see
> if the cases are handled as expected now without this change?

In case ssa-fre-102.c, the unreachable call "foo ()" can be removed by
evrp, but fre in the patch can additionally replace "g = x + b" with
"g = 99". (Again I'm sorry the regex to check this was wrong..)

Test cases to simulate the original problem I found are moved into
gcc.dg/tree-ssa/ssa-pre-34.c. The unreachable calls to "foo ()" are
still removed by pre with the patch. (If compiled with -O3, the
"foo ()"s in test file can now be removed by thread2/threadfull2 and
dom3 on trunk. This relies on jump threading across the loops, so
even with -O3, similar cases may not get optimized say if there're
too many statements to copy.)

Thanks,
Di Zhao

> 
> I will still look at and consider the change btw, but given the EVRP
> improvements I'm also considering to remove the predication
> support from VN alltogether.  At least in the non-iterating mode
> it should be trivially easy to use rangers relation oracle to simplify
> predicates.  For the iterating mode it might not be 100% effective
> since I'm not sure we can make it use the current SSA values and
> how it would behave with those eventually changing to worse.
> 
> Andrew, how would one ask the relation oracle to simplify a
> condition?  Do I have to do any bookkeeping to register
> predicates on edges for it?
> 
> Thanks,
> Richard.
> 
> > First, regarding the comments:
> >
> > > -----Original Message-----
> > > From: Richard Biener <richard.guenther@gmail.com>
> > > Sent: Friday, October 1, 2021 9:00 PM
> > > To: Di Zhao OS <dizhao@os.amperecomputing.com>
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > > "equivalence map" for condition prediction
> > >
> > > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> > > <dizhao@os.amperecomputing.com> wrote:
> > > >
> > > > Sorry about updating on this after so long. It took me much time to work out a
> > > > new plan and pass the tests.
> > > >
> > > > The new idea is to use one variable to represent a set of equal variables at
> > > > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > > > in the code. (There's no-longer a "equivalence map".)
> > > >
> > > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> > > >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> > > >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> > > >   front of the vn_pval list, so that when searching for a variable's
> > > >   equivalence head, the first result represents the largest equivalence set at
> > > >   current location.
> > > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> > > >   For recorded equivalences, the reference is result->entry; for normal N-ary
> > > >   operations, the reference is operand->entry.
> > > > - When recording equivalences, if one side A is constant or has more refs, make
> > > >   it the new equivalence head of the other side B. Traverse B's ref-list,if a
> > > >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> > > >   operations by replacing B with A.
> > > > - When inserting and looking for the results of n-ary operations, insert and
> > > >   lookup by the operands' equiv-heads.
> > > > ...
> > > >
> > > > Thanks,
> > > > Di Zhao
> > > >
> > > > --------
> > > > Extend FRE with temporary equivalences.
> > >
> > > Comments on the patch:
> > >
> > > +  /* nary_ref count.  */
> > > +  unsigned num_nary_ref;
> > > +
> > >
> > > I think a unsigned short should be enough and that would nicely
> > > pack after value_id together with the bitfield (maybe change that
> > > to unsigned short :1 then).
> >
> > Changed num_nary_ref to unsigned short and moved after value_id.
> >
> > > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
> > >             tree val = gimple_simplify (gimple_cond_code (last),
> > >                                         boolean_type_node, lhs, rhs,
> > >                                         NULL, vn_valueize);
> > > +           vn_nary_op_t vnresult = NULL;
> > >             /* If the condition didn't simplfy see if we have recorded
> > >                an expression from sofar taken edges.  */
> > >             if (! val || TREE_CODE (val) != INTEGER_CST)
> > >               {
> > > -               vn_nary_op_t vnresult;
> > >
> > > looks like you don't need vnresult outside of the if()?
> >
> > vnresult is reused later to record equivalences generated by PHI nodes.
> >
> > > +/* Find predicated value of vn_nary_op by the operands' equivalences. Return
> > > + * NULL_TREE if no known result is found.  */
> > > +
> > > +static tree
> > > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> > > +                                vn_nary_op_t *vnresult)
> > > +{
> > > +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> > > +  tree result
> > > +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> > >
> > > why is it necessary to simplify here?  It looks like the caller
> > > already does this.
> >
> > In the new patch, changed the code a little to remove redundant calculation.
> >
> > > I wonder whether it's valid to always perform find_predicated_value_by_equivs
> > > from inside vn_nary_op_get_predicated_value instead of bolting it to only
> > > a single place?
> >
> > Removed function find_predicated_value_by_equivs and inlined the code.
> >
> > Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
> > vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
> > equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
> > inserted and searched by equiv-heads.
> >
> > > +
> > > +static vn_nary_op_t
> > > +val_equiv_insert (tree op1, tree op2, edge e)
> > > +{
> > >
> > > +  if (is_gimple_min_invariant (lhs))
> > > +    std::swap (lhs, rhs);
> > > +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> > > +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> > > +    return NULL;
> > >
> > > Better formulate all of the above in terms of only SSA_NAME checks since...
> > >
> > > +  /* Make the hand-side with more recorded n-ary expressions new
> > > +   * equivalence-head, to make fewer re-insertions.  */
> > > +  if (TREE_CODE (rhs) == SSA_NAME
> > > +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> > > +    std::swap (lhs, rhs);
> > >
> > > here LHS needs to be an SSA_NAME.
> >
> > Tried to fix this in the new patch.
> >
> > > +  /* Record equivalence as unary NOP_EXPR.  */
> > > +  vn_nary_op_t val
> > > +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> > > +                                            &lhs, rhs, 0, bb);
> > >
> > > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> > > with value rhs?
> > > Presumably you think that's good enough to not overlap with entries
> > > for "real" IL?
> >
> > By checking related codes, it seems to me there's no overlap as long as
> > equivalences are inserted and searched as predicated_values. It should be easy
> > to replace this with other preferred opcode.
> >
> > > In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> > > &[lhs, rhs], boolean_true_node, ...) like I think what would be
> > > inserted for by the
> > > existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> > > It would probably felt more natural if the "new" code bolted on the case where
> > > that is called with an equivalence?
> >
> > The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
> > search in the nary-map, the set of equivalences of a variable can be found.
> > Also it's easy to find out whether two variables are equal. And the
> > disjoint-sets are path-compressed by updating "equivalence head" and
> > re-inserting predicates.
> >
> > In the new attached patch, predicates like "A==B is true" and "A!=B is false"
> > are not inserted, as they can be computed from equivalences. Same with the
> > predicates derived from them. This reduces insertions of predicates by a lot (
> > counting the equivalences themselves and re-inserted predicates). Below are
> > test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):
> >
> >               |more bb |more bb |more stmt|more stmt|less      |less
> >               |removed |removed |removed  |removed  |predicates|predicates
> >               |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
> > --------------------------------------------------------------------------
> > 500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
> > 502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
> > 505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
> > 520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
> > 523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
> > 525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
> > 531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
> > 541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
> > 548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
> > 557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%
> >
> > Considering the copying and unwinding work saved, this patch could reduce
> > compile time rather than increase it.
> >
> > > Btw, I didn't find any condition that prevents equivalences being used for
> > > non-integer types?  Are you sure this is all valid for FP compares without
> > > further restrictions?
> >
> > I think this approach follows sccvn's previous decisions on FP comparisons,
> > for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
> > The patch is tested on regression tests and SPEC fprate cases.
> >
> > >
> > > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> > > int length, enum tree_code code,
> > >        fprintf (dump_file, " == %s\n",
> > >                integer_zerop (result) ? "false" : "true");
> > >      }
> > > -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> > > -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> > > -  vno1->predicated_values = 1;
> > > -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> > > -                                             sizeof (vn_pval));
> > > -  vno1->u.values->next = NULL;
> > > -  vno1->u.values->result = result;
> > > -  vno1->u.values->n = 1;
> > > -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> > > -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> > > +  /* Insert by operands' equiv-heads.  */
> > > +  tree new_ops[3];
> > > +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> > > +  /* Skip if the result is known.  */
> > > +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> > > +  if (simplified)
> > > +    {
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +       {
> > > +         fprintf (dump_file, "Result is known: ");
> > > +         print_generic_expr (dump_file, simplified, TDF_NONE);
> > > +         fprintf (dump_file, ", skipped.\n");
> > > +       }
> > > +      return NULL;
> > > +    }
> > >
> > > soo - how do the existing predicated expressions change to be handled here?
> > > Why would they ever be simplified?  Should this simplification not better happen
> > > upthread when visiting the conditional?  Maybe that needs to use the equivs
> > > already?
> >
> > This happens when updating variable A's equiv_head from B to C, and
> > re-inserting former predicates of B with C. For example, "x < 0" is recorded,
> > then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
> > simplify. It shouldn't happen when inserting new predicates.
> >
> > >
> > > You re-use the predicated value list which I think is great, can you explain
> > > how "latest" is meant in the following?
> > >
> > > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> > > vn_nary_op_table_type *table,
> > >               next = &(*next)->next;
> > >             }
> > >           if (!found)
> > > -           *next = nval;
> > > +           {
> > > +             /* Insert new value at the front, so that lookup_equiv_head can
> > > +              * find the latest result.  */
> > > +             nval->next = vno->u.values;
> > > +             vno->u.values = nval;
> > > +           }
> >
> > It meant the "most general" result. (I picked the wrong word. The description
> > is changed in the new patch file. )
> >
> > There can be multiple valid results at the same time. As we traverse in RPO and
> > update equiv-heads (i.e. unite disjoint-sets), the last inserted result
> > represents the most complete set of equivalences of the variable.
> >
> > (In the last patch version, when "Appending predicate to value", the new result
> > is not moved to the front. That violates the property of equiv-heads and some
> > opportunities will be lost. Fixed this in the new patch file.)
> >
> > >
> > > OK, I have to stop now but will try to understand and review further next week.
> > > Sorry for the delays but the patch is quite complex to understand :/
> > >
> > > Richard.
> > >
> > > > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> > > >
> >
> > Other changes:
> > - Fixed some errs in comments.
> > - On recording temporary equivalences generated by PHI nodes, altered the
> >   logic, hope it can be a bit more clear.
> >   1. When a conditional expression's true/false edge E is assessed to be
> >      executable, search for conflicting predicates for E to be taken. If a
> >      conflicting predicate is found at L, and L's single predecessor P
> >      dominates current BB, then we know P->L cannot be taken if E is taken.
> >   2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
> >      PHI arguments (x2) that can only come from P->L. If there's a single
> >      PHI argument (x1) left, record the equivalence of x1 and PHI result x
> >      at edge E.
> >           -----
> >           | P |
> >           -----
> >            |  \
> >            |   -----
> >            |   | L |  <== conflicting predicate's location
> >            |   -----
> >            |    /
> >   -------------------------
> >   | x = PHI<x1(P), x2(L)> | phi_bb
> >   -------------------------
> >              |
> >             ....
> >              |
> >            -----
> >            | BB | <== current bb
> >            -----
> >           /     \ edge E
> >         ...     -------
> >                 | dest |
> >                 -------
> >
> > - Fixed a correctness problem when recording equivalences from PHI in previous
> >   version. Also test case ssa-fre-95.c:k in previous version was incorrect,
> >   fixed that.
> >
> > Thanks,
> > Di Zhao

---

Extend FRE with temporary equivalences.

2021-11-15  Di Zhao  <dizhao@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
        predicate already).
        (dominated_by_p_w_unex): Moved upward.
        (vn_nary_op_get_predicated_value): Moved upward.
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (lookup_equiv_head): Lookup the "equivalence head" of given node.
        (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
        (vn_tracking_edge): Extracted utility function.
        (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
        (vn_nary_op_insert_into): Insert new value at the front.
        (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
        from pieces.
        (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
        (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
        (val_equiv_insert): Record temporary equivalence.
        (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
        some predicates; insert back-refs.
        (record_equiv_from_prev_phi_1): Record temporary equivalences generated
        by PHI nodes.
        (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
        expression taken, record equivalences generated by PHI nodes.
        (visit_nary_op): Add lookup previous results of N-ary operations by
        equivalences.
        (insert_related_predicates_on_edge): Some predicates can be computed
        from equivalences, no need to insert them.
        (process_bb): Add lookup predicated values by equivalenc~/tmp/time-frees.
        (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
        (do_unwind): Unwind the back-refs to vn_nary_op_t.
        (do_rpo_vn): Update back-reference unwind state.
        * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
        nary map entries.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
        * gcc.dg/tree-ssa/ssa-pre-34.c: New test.

[-- Attachment #2: v4-tree-optimization-101186.patch --]
[-- Type: application/octet-stream, Size: 37565 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
index cca706e0c4f..0a99c3c620e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w" } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2-details -w" } */
 
 typedef union tree_node *tree;
 struct gcc_options
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
index ac8271cc574..e9a797e82b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
index b2c09cbb021..bcbc419575c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
index 2316f7e1c04..d63291da440 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 int f(int x, int y)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index e7038d0237f..c1e5667b45a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 static inline long load(long *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
index b44c064aa23..069e4a106c3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
index 26e5abbdc29..26b303e896e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
index 22b68d5ede0..3ed3bf6ad29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 00000000000..ccb0fdaa698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void boo(void);
+extern void foo(void);
+
+/* Test temporary equivalences: basic.  */
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable.  */
+	    foo ();
+	}
+    }
+}
+
+/* Test temporary equivalences: transitive.  */
+
+void g (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.  */
+		foo ();
+	    }
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
new file mode 100644
index 00000000000..6178437a309
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.  */
+	    bar ();
+	}
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+int gg;
+
+void g (int a, int b, int d)
+{
+  int c = 100;
+  if (a > 0)
+    {
+      c = b;
+      gg++;
+    }
+  else
+    /* a <= 0 */
+    {
+      c = d;
+      gg--;
+    }
+  for (int i = 0; i < c; i++)
+    {
+      if (a >= 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated. ("a >= 0" cannot derive "a > 0".)  */
+	    bas ();
+	}
+      else
+	/* a < 0 */
+	{
+	  if (i >= d)
+	    /* Should be eliminated, as "a < 0" can derive "a <= 0".  */
+	    foo ();
+	}
+    }
+}
+
+/* Test temporary equivalences: no domination.  */
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.  */
+	    bag ();
+	}
+      i++;
+    }
+}
+
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
new file mode 100644
index 00000000000..3c8900c4219
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
@@ -0,0 +1,45 @@
+
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void foo(void);
+int g;
+
+/* Test temporary equivalences: optimize N-ary expressions.  */
+
+void
+f (unsigned int a, unsigned int b, unsigned int x, unsigned int y)
+{
+  if (a == b)
+    {
+      g = 100;
+      if (a + x == 99)
+	g = x + b; /* should be simplified to 99 */
+    }
+  else
+    // a!= b
+    {
+      if (a == 100 - x)
+	{
+	  g++;
+	  if (b == 100 - x)
+	    foo (); /* should be removed */
+	}
+      else
+	{
+	  g--;
+	  if (b == 100 - x)
+	    bag (); /* should not be removed */
+	  else
+	    bar (); /* should not be removed */
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "= b_\[0-9]+\\(\[A-Z]+\\) \\+ x_" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
new file mode 100644
index 00000000000..091a67ec0a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+extern void foo(void);
+extern void bag(void);
+extern void bas(void);
+
+/* Test temporary equivalences: identical condition.  */
+
+void h (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != b)
+    c = x;
+  for (int i = 0; i < 1000; i++)
+    {
+      if (a < b)
+	/* All paths reaching "a>0 is true" come from "a!=0 is true".  */
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bag ();
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo \\(\\)" "pre" } } */
+/* { dg-final { scan-tree-dump "bag" "pre" } } */
+/* { dg-final { scan-tree-dump "bas" "pre" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index 4cbaca41332..69b83b6cc30 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps -fno-tree-fre" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 149674e6a16..2061647ae06 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -353,6 +353,8 @@ static vn_reference_t last_inserted_ref;
 static vn_phi_t last_inserted_phi;
 static vn_nary_op_t last_inserted_nary;
 static vn_ssa_aux_t last_pushed_avail;
+static vn_ssa_aux_t last_pushed_nary_ref;
+static nary_ref* nary_ref_freelist;
 
 /* Valid hashtables storing information we have proven to be
    correct.  */
@@ -491,9 +493,9 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_true_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary);
-	    gcc_assert (nary->unwind_to == NULL);
 	    /* Also do not link it into the undo chain.  */
 	    last_inserted_nary = nary->next;
+	    /* There could be a predicate already.  */
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    nary = alloc_vn_nary_op_noinit (2, &vn_tables_insert_obstack);
 	    init_vn_nary_op_from_pieces (nary, 2, EQ_EXPR,
@@ -501,7 +503,6 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_false_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary);
-	    gcc_assert (nary->unwind_to == NULL);
 	    last_inserted_nary = nary->next;
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3850,6 +3851,114 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	/* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+	if (dominated_by_p_w_unex (
+	      bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	      false))
+	return val->result;
+      }
+  return NULL_TREE;
+}
+
+/* Whether a predicated value VAL is valid at BB.  */
+
+static inline bool
+is_vn_valid_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p_w_unex (bb, val_bb, false))
+	return true;
+    }
+  return false;
+}
+
+/* Lookup the variable that represents a set of equivalences of NAME at
+   basic-block BB.  This variable is called the "equivalence head" of NAME.  */
+
+static tree
+lookup_equiv_head (tree name, basic_block bb)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+    return name;
+  tree result = VN_INFO (name)->valnum;
+  if (result == VN_TOP)
+    /* VN_TOP is not allowed, otherwise it will be too optimistic for
+       non-iterating mode.  */
+    return name;
+  vn_nary_op_t vnresult;
+  /* As we insert new value at front in vn_nary_op_insert_into, we don't need
+     search iteratively for the most general equiv-head.  */
+  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (result), &result,
+			    &vnresult);
+  if (vnresult && vnresult->predicated_values)
+    {
+      tree eq = vn_nary_op_get_predicated_value (vnresult, bb);
+      if (eq)
+	return eq;
+    }
+  return result;
+}
+
+/* Lookup the "equivalence head"s of OPS at BB, and store them in TO.  N is the
+   size of OPS.  */
+
+static void
+lookup_equiv_heads (unsigned n, tree *ops, tree *to, basic_block bb)
+{
+  for (unsigned i = 0; i < n; i++)
+    {
+      tree equiv_head = lookup_equiv_head (ops[i], bb);
+      if (dump_file && (dump_flags & TDF_DETAILS) && equiv_head != ops[i])
+	{
+	  fprintf (dump_file, "Use equiv-head ");
+	  print_generic_expr (dump_file, equiv_head, TDF_NONE);
+	  fprintf (dump_file, " for ");
+	  print_generic_expr (dump_file, ops[i], TDF_NONE);
+	  fprintf (dump_file, "\n");
+	}
+      to[i] = equiv_head;
+    }
+}
+
+/* Returns whether edge PRED_E is currently being tracked.  */
+
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (!single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (!dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+  return true;
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -3983,6 +4092,9 @@ init_vn_nary_op_from_stmt (vn_nary_op_t vno, gassign *stmt)
       for (i = 0; i < vno->length; ++i)
 	vno->op[i] = gimple_op (stmt, i + 1);
     }
+  /* Insert and lookup N-ary results by the operands' equivalence heads.  */
+  if (gimple_bb (stmt))
+    lookup_equiv_heads (vno->length, vno->op, vno->op, gimple_bb (stmt));
 }
 
 /* Compute the hashcode for VNO and look for it in the hash table;
@@ -4128,12 +4240,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 	    = BASIC_BLOCK_FOR_FN (cfun, vno->u.values->valid_dominated_by_p[0]);
 	  vn_pval *nval = vno->u.values;
 	  vn_pval **next = &vno->u.values;
-	  bool found = false;
 	  for (vn_pval *val = (*slot)->u.values; val; val = val->next)
 	    {
 	      if (expressions_equal_p (val->result, nval->result))
 		{
-		  found = true;
 		  for (unsigned i = 0; i < val->n; ++i)
 		    {
 		      basic_block val_bb
@@ -4147,17 +4257,15 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 			gcc_unreachable ();
 		    }
 		  /* Append value.  */
-		  *next = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-						     sizeof (vn_pval)
-						     + val->n * sizeof (int));
-		  (*next)->next = NULL;
-		  (*next)->result = val->result;
-		  (*next)->n = val->n + 1;
-		  memcpy ((*next)->valid_dominated_by_p,
-			  val->valid_dominated_by_p,
+		  nval = (vn_pval *) obstack_alloc (&vn_tables_obstack,
+						    sizeof (vn_pval)
+						      + val->n * sizeof (int));
+		  nval->next = NULL;
+		  nval->result = val->result;
+		  nval->n = val->n + 1;
+		  memcpy (nval->valid_dominated_by_p, val->valid_dominated_by_p,
 			  val->n * sizeof (int));
-		  (*next)->valid_dominated_by_p[val->n] = vno_bb->index;
-		  next = &(*next)->next;
+		  nval->valid_dominated_by_p[val->n] = vno_bb->index;
 		  if (dump_file && (dump_flags & TDF_DETAILS))
 		    fprintf (dump_file, "Appending predicate to value.\n");
 		  continue;
@@ -4170,8 +4278,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 	      (*next)->next = NULL;
 	      next = &(*next)->next;
 	    }
-	  if (!found)
-	    *next = nval;
+	  /* Put new result at front, so lookup_equiv_head can find the last
+	     inserted (most general) result.  */
+	  nval->next = vno->u.values;
+	  vno->u.values = nval;
 
 	  *slot = vno;
 	  vno->next = last_inserted_nary;
@@ -4216,6 +4326,86 @@ vn_nary_op_insert_pieces (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary);
 }
 
+/* Similar with vn_nary_op_insert_pieces, but insert as predicated values.  */
+
+static vn_nary_op_t
+vn_nary_op_insert_pieces_predicated_1 (unsigned int length, enum tree_code code,
+				       tree type, tree *ops, tree result,
+				       unsigned int value_id, basic_block bb)
+{
+  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
+  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
+  vno1->predicated_values = 1;
+  vno1->u.values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  vno1->u.values->next = NULL;
+  vno1->u.values->result = result;
+  vno1->u.values->n = 1;
+  vno1->u.values->valid_dominated_by_p[0] = bb->index;
+  return vn_nary_op_insert_into (vno1, valid_info->nary);
+}
+
+/* Fold N-ary expression of equiv-heads, return NULL_TREE if not folded.  */
+
+static tree
+fold_const_from_equiv_heads (unsigned length, tree_code opcode, tree *op,
+			     tree type)
+{
+  tree result = NULL_TREE;
+  /* We're using equiv-heads, i.e. the most general operands, shortcut for
+     situations that can fold.  */
+  switch (length)
+    {
+    case 1:
+      if (CONSTANT_CLASS_P (op[0]))
+	result
+	  = gimple_simplify (opcode, type, op[0], NULL, no_follow_ssa_edges);
+      break;
+    case 2:
+      if ((CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1]))
+	  || op[0] == op[1])
+	result = gimple_simplify (opcode, type, op[0], op[1], NULL,
+				  no_follow_ssa_edges);
+      break;
+    case 3:
+      if (CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1])
+	  && CONSTANT_CLASS_P (op[2]))
+	result = gimple_simplify (opcode, type, op[0], op[1], op[2], NULL,
+				  no_follow_ssa_edges);
+      break;
+    default:
+      break;
+    }
+  if (result && !useless_type_conversion_p (type, TREE_TYPE (result)))
+    result = fold_convert (type, result);
+  return result;
+}
+
+/* Insert a back-reference to vn_nary_op_t in vn_ssa_aux.  */
+
+static inline void
+push_new_nary_ref (vn_nary_op_t slot, vn_ssa_aux_t info)
+{
+  nary_ref *ref;
+  if (nary_ref_freelist)
+    {
+      ref = nary_ref_freelist;
+      nary_ref_freelist = nary_ref_freelist->next;
+    }
+  else
+    ref = XOBNEW (&vn_ssa_aux_obstack, nary_ref);
+  ref->next = info->ref;
+  ref->next_undo = last_pushed_nary_ref;
+  ref->slot = slot;
+  last_pushed_nary_ref = info;
+  info->ref = ref;
+  ++info->num_nary_ref;
+  gcc_assert (info->num_nary_ref > 0);
+}
+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e);
+
 static vn_nary_op_t
 vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     tree type, tree *ops,
@@ -4223,21 +4413,15 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
-	return NULL;
-    }
+  if (!vn_tracking_edge (pred_e))
+    return NULL;
+  /* Equivalence can represent this.  */
+  if (code == EQ_EXPR && result == boolean_true_node)
+    return val_equiv_insert (ops[0], ops[1], pred_e);
+  /* Leave this for inverted condition.  */
+  if (code == NE_EXPR && result == boolean_false_node)
+    return NULL;
+
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4250,36 +4434,252 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-					      sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary);
+  for (unsigned i = 0; i < length; i++)
+    gcc_checking_assert (lookup_equiv_head (ops[i], pred_e->dest) == ops[i]);
+
+  /* Skip if the result is known. (Caused by re-inserting predicates.)  */
+  tree simplified = fold_const_from_equiv_heads (length, code, ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Result is known: ");
+	  print_generic_expr (dump_file, simplified, TDF_NONE);
+	  fprintf (dump_file, ", skipped.\n");
+	}
+      return NULL;
+    }
+
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (length, code, type, ops,
+					     result, value_id, pred_e->dest);
+  /* Insert back-refs (operand->slot) to vn_ssa_aux structure.  */
+  for (unsigned i = 0; i < length; i++)
+    if (TREE_CODE (ops[i]) == SSA_NAME)
+      {
+	vn_ssa_aux_t info = VN_INFO (ops[i]);
+	if (info->valnum == ops[i])
+	  push_new_nary_ref (val, info);
+      }
+  return val;
 }
 
-static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+/* Record the equivalence of OP1 and OP2 on edge E.  */
 
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
 {
-  if (! vno->predicated_values)
-    return vno->u.result;
+  /* Record by equivalence heads.  */
+  basic_block bb = e->dest;
+  tree lhs = lookup_equiv_head (op1, bb);
+  tree rhs = lookup_equiv_head (op2, bb);
+  if (lhs == rhs)
+    return NULL;
+  /* Set RHS to be new equiv-head for LHS.  LHS must be SSA_NAME.  */
+  if (TREE_CODE (lhs) != SSA_NAME)
+    {
+      if (TREE_CODE (rhs) == SSA_NAME)
+	std::swap (lhs, rhs);
+      else
+	return NULL;
+    }
+  vn_ssa_aux_t rhs_info = NULL, lhs_info = VN_INFO (lhs);
+
+  /* Choose the hand-side with more recorded n-ary expressions as new
+     equivalence head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    {
+      rhs_info = VN_INFO (rhs);
+      if (rhs_info->num_nary_ref < lhs_info->num_nary_ref)
+	{
+	  std::swap (lhs, rhs);
+	  std::swap (rhs_info, lhs_info);
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Recording equivalence of ");
+      print_generic_expr (dump_file, lhs);
+      fprintf (dump_file, " and ");
+      print_generic_expr (dump_file, rhs);
+      fprintf (dump_file, " at BB%d\n", bb->index);
+    }
+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+					     &lhs, rhs, 0, bb);
+  /* Back-ref for equivalences is result->slot rather than op->slot, so that
+     when new equiv-head is appended, we can update them.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    push_new_nary_ref (val, rhs_info);
+
+  /* Iterate on LHS's references, first update equiv-heads, then re-insert
+     former predicates, so that the result will be latest.  */
+  auto_vec<vn_nary_op_t> predicates;
+  auto_vec<tree> results;
+  vn_nary_op_t slot;
+  tree old_head = lhs_info->valnum;
+  nary_ref *ref = lhs_info->ref;
+  for (; ref; ref = ref->next)
+    {
+      slot = ref->slot;
+      if (!slot->predicated_values)
+	continue;
+      /* Update recorded equivalences with new equiv-head.  */
+      if (slot->opcode == NOP_EXPR && slot->length == 1)
+	{
+	  if (slot->op[0] == rhs)
+	    continue;
+	  for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	    if (pval->result == old_head && is_vn_valid_at_bb (pval, bb))
+	      {
+		if (dump_file && (dump_flags & TDF_DETAILS))
+		  {
+		    fprintf (dump_file, "Updating equiv-head of ");
+		    print_generic_expr (dump_file, slot->op[0], TDF_SLIM);
+		    fprintf (dump_file, " to ");
+		    print_generic_expr (dump_file, rhs, TDF_SLIM);
+		    fprintf (dump_file, " at BB%d\n", bb->index);
+		  }
+		vn_nary_op_t new_val
+		  = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR,
+							   slot->type, slot->op,
+							   rhs, 0, bb);
+		/* Push back-ref (result->slot) for new equivalence.  */
+		if (TREE_CODE (rhs) == SSA_NAME)
+		  push_new_nary_ref (new_val, rhs_info);
+	      }
+	}
+      else
+	/* Predicated values are collected, to be re-inserted later.  */
+	for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	  {
+	    if (!is_vn_valid_at_bb (pval, bb))
+	      continue;
+	    predicates.safe_push (slot);
+	    results.safe_push (pval->result);
+	  }
+    }
+  unsigned i;
+  if (predicates.length () > 0 && dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Re-inserting %d predicates...\n",
+	     predicates.length ());
+  tree new_ops[3];
+  FOR_EACH_VEC_ELT (predicates, i, slot)
+    {
+      lookup_equiv_heads (slot->length, slot->op, new_ops, e->dest);
+      vn_nary_op_insert_pieces_predicated (slot->length, slot->opcode,
+					   slot->type, new_ops, results[i],
+					   slot->value_id, e);
+    }
+  return val;
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  VNO contains the interested expression.  If
+   INVERSE is true, rule out VNO's results that are different with RESULT;
+   otherwise rule out RESULT itself.  */
+
+static void
+record_equiv_from_prev_phi_1 (vn_nary_op_t vno, edge e, basic_block bb,
+			      tree result, bool inverse)
+{
+  if (!(vno && vno->predicated_values))
+    return;
+
+  /* Process predicated values in VNO, looking for a conflicting result, and
+     the location's single_pred dominates BB.  */
   for (vn_pval *val = vno->u.values; val; val = val->next)
-    for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
+    {
+      if (inverse ? val->result == result : val->result != result)
+	continue;
+      /* Found a result to rule out, process each location.  */
+      for (unsigned i = 0; i < val->n; ++i)
+	{
+	  basic_block bb1
+	    = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+	  if (!single_pred_p (bb1))
+	    continue;
+	  basic_block p = single_pred (bb1);
+	  if (!dominated_by_p_w_unex (bb, p, false))
+	    continue;
+	  gcc_assert (EDGE_COUNT (p->succs) == 2);
+	  /* Iterate on possible phi_bbs (can be BB itself). Considering
+	     non-executable edges, there can be multiple phi_bbs.  */
+	  for (basic_block phi_bb : get_dominated_by (CDI_DOMINATORS, p))
+	    {
+	      if (!dominated_by_p_w_unex (bb, phi_bb, false))
+		continue;
+	      /* Process PHIs, rule out incoming edges that definitely come
+		 from p->bb1.  If there's a single PHI argument X left, record
+		 the equivalence of X and PHI result at edge e.  */
+	      for (gphi_iterator gsi = gsi_start_nonvirtual_phis (phi_bb);
+		   !gsi_end_p (gsi); gsi_next_nonvirtual_phi (&gsi))
+		{
+		  gphi *phi = gsi.phi ();
+		  tree equiv = NULL_TREE;
+		  for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
+		    {
+		      edge in = gimple_phi_arg_edge (phi, j);
+		      if (in->src == bb1
+			  || dominated_by_p_w_unex (in->src, bb1, false))
+			/* Definitely comes from p->bb1.  */
+			continue;
+		      tree arg = vn_valueize (gimple_phi_arg_def (phi, j));
+		      if (equiv && equiv != arg)
+			{
+			  equiv = NULL_TREE;
+			  break;
+			}
+		      equiv = arg;
+		    }
+		  if (equiv)
+		    val_equiv_insert (equiv, PHI_RESULT (phi), e);
+		}
+	    }
+	}
+    }
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  CODE, OPS and RESULT are the condition and
+   result for E to be taken.  VNO contains the condition's results recorded in
+   nary map.
+
+   The idea is to find a conflicting predicate at some location L, where L's
+   predecessor P dominates BB, so we know path P->L will not be taken if E is
+   taken.  If by ruling out P->L can generate some equivalences, record them at
+   E->dest.  */
+
+static void
+record_equiv_from_prev_phi (vn_nary_op_t vno, edge e, basic_block bb,
+			    tree_code code, tree *ops, tree result)
+{
+  if (!vn_tracking_edge (e))
+    return;
+  record_equiv_from_prev_phi_1 (vno, e, bb, result, true);
+
+  /* For predicates like "a==b is false" or "a!=b is true", equivalences are
+     conflicting results, i.e. "NO_OP (a) is b" or "NO_OP (b) is a".  */
+  if ((result == boolean_true_node
+       && (code == NE_EXPR || code == GT_EXPR || code == LT_EXPR))
+      || (result == boolean_false_node && code == EQ_EXPR))
+    {
+      vn_nary_op_t vnresult;
+      if (TREE_CODE (ops[0]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[0]), ops,
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[1], false);
+	}
+      if (TREE_CODE (ops[1]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[1]), &ops[1],
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[0], false);
+	}
+    }
 }
 
 /* Insert the rhs of STMT into the current hash table with a value number of
@@ -4902,8 +5302,25 @@ static bool
 visit_nary_op (tree lhs, gassign *stmt)
 {
   vn_nary_op_t vnresult;
-  tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
-  if (! result && vnresult)
+  unsigned length = vn_nary_length_from_stmt (stmt);
+  vn_nary_op_t vno
+    = XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (length));
+  init_vn_nary_op_from_stmt (vno, stmt);
+  tree result = NULL_TREE;
+  /* Try to get a simplified result.  */
+  /* Do not simplify variable used in PHI at loop exit, or
+     simplify_peeled_chrec/constant_after_peeling may miss the loop.  */
+  gimple *use_stmt;
+  use_operand_p use_p;
+  if (!(single_imm_use (lhs, &use_p, &use_stmt)
+	&& gimple_code (use_stmt) == GIMPLE_PHI
+	&& single_succ_p (gimple_bb (use_stmt))
+	&& (single_succ_edge (gimple_bb (use_stmt))->flags & EDGE_DFS_BACK)))
+    result = fold_const_from_equiv_heads (vno->length, vno->opcode, vno->op,
+					  vno->type);
+  if (!result)
+    result = vn_nary_op_lookup_1 (vno, &vnresult);
+  if (!result && vnresult)
     result = vn_nary_op_get_predicated_value (vnresult, gimple_bb (stmt));
   if (result)
     return set_ssa_val_to (lhs, result);
@@ -7179,12 +7596,8 @@ insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
 					   ops, boolean_false_node, 0, pred_e);
       break;
     case EQ_EXPR:
-      /* a == b -> ! a {<,>} b */
-      vn_nary_op_insert_pieces_predicated (2, LT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      vn_nary_op_insert_pieces_predicated (2, GT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      break;
+      /* No need to insert derived predicates for '==', as they can be computed
+	 by equiv-heads & fold_const_from_equiv_heads.  */
     case LE_EXPR:
     case GE_EXPR:
     case NE_EXPR:
@@ -7351,24 +7764,31 @@ process_bb (rpo_elim &avail, basic_block bb,
       switch (gimple_code (last))
 	{
 	case GIMPLE_SWITCH:
+	  /* TODO: utilize temporary equivalences.  */
 	  e = find_taken_edge (bb, vn_valueize (gimple_switch_index
 						(as_a <gswitch *> (last))));
 	  break;
 	case GIMPLE_COND:
 	  {
-	    tree lhs = vn_valueize (gimple_cond_lhs (last));
-	    tree rhs = vn_valueize (gimple_cond_rhs (last));
-	    tree val = gimple_simplify (gimple_cond_code (last),
-					boolean_type_node, lhs, rhs,
-					NULL, vn_valueize);
+	    tree lhs = lookup_equiv_head (gimple_cond_lhs (last), bb);
+	    tree rhs = lookup_equiv_head (gimple_cond_rhs (last), bb);
+	    tree ops[2];
+	    ops[0] = lhs;
+	    ops[1] = rhs;
+	    tree val = fold_const_from_equiv_heads (2, gimple_cond_code (last),
+						    ops, boolean_type_node);
+	    if (val && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Got simplified result ");
+		print_generic_expr (dump_file, val, TDF_NONE);
+		fprintf (dump_file, " for ");
+		print_gimple_stmt (dump_file, last, TDF_SLIM);
+	      }
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
-		tree ops[2];
-		ops[0] = lhs;
-		ops[1] = rhs;
 		val = vn_nary_op_lookup_pieces (2, gimple_cond_code (last),
 						boolean_type_node, ops,
 						&vnresult);
@@ -7436,6 +7856,15 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* Try to record equivalences generated by previous PHI nodes
+		    on current true & false edges.  */
+		if (true_e)
+		  record_equiv_from_prev_phi (vnresult, true_e, bb, code, ops,
+					      boolean_true_node);
+		if (false_e)
+		  record_equiv_from_prev_phi (vnresult, false_e, bb, code, ops,
+					      boolean_false_node);
 	      }
 	    break;
 	  }
@@ -7558,6 +7987,7 @@ struct unwind_state
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
   vn_avail *avail_top;
+  nary_ref *nary_ref_top;
 };
 
 /* Unwind the RPO VN state for iteration.  */
@@ -7607,6 +8037,17 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       av->next = avail.m_avail_freelist;
       avail.m_avail_freelist = av;
     }
+  /* Drain nary_refs to freelist, to reuse the memory.  */
+  for (; last_pushed_nary_ref && last_pushed_nary_ref->ref != to->nary_ref_top;)
+    {
+      vn_ssa_aux_t val = last_pushed_nary_ref;
+      nary_ref *ref = val->ref;
+      val->ref = ref->next;
+      last_pushed_nary_ref = ref->next_undo;
+      ref->next = nary_ref_freelist;
+      nary_ref_freelist = ref;
+      --val->num_nary_ref;
+    }
 }
 
 /* Do VN on a SEME region specified by ENTRY and EXIT_BBS in FN.
@@ -7720,6 +8161,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_pushed_nary_ref = NULL;
+  nary_ref_freelist = NULL;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7814,6 +8257,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].nary_top = last_inserted_nary;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
+	    rpo_state[idx].nary_ref_top
+	      = last_pushed_nary_ref ? last_pushed_nary_ref->ref : NULL;
 	  }
 
 	if (!(bb->flags & BB_EXECUTABLE))
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 7d53ab5e39f..0476dc07292 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -217,6 +217,16 @@ struct vn_avail
   struct vn_ssa_aux *next_undo;
 };
 
+/* In vn_ssa_aux structure, hold a lists of references to the nary map entries,
+   so that when recording new equivalence to the value number, we can re-insert
+   expressions' results based on this.  */
+struct nary_ref
+{
+  nary_ref *next;
+  vn_nary_op_t slot;
+  struct vn_ssa_aux *next_undo;
+};
+
 typedef struct vn_ssa_aux
 {
   /* SSA name this vn_ssa_aux is associated with in the lattice.  */
@@ -230,9 +240,15 @@ typedef struct vn_ssa_aux
      for SSA names also serving as values (NAME == VALNUM).  */
   vn_avail *avail;
 
+  /* References to slots in the nary map, with VALNUM as operand.  */
+  nary_ref *ref;
+
   /* Unique identifier that all expressions with the same value have. */
   unsigned int value_id;
 
+  /* nary_ref count.  */
+  unsigned short num_nary_ref;
+
   /* Whether the SSA_NAME has been processed at least once.  */
   unsigned visited : 1;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-11-15 17:24       ` Di Zhao OS
@ 2021-11-18 15:47         ` Andrew MacLeod
  2021-12-03  6:42         ` Di Zhao OS
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew MacLeod @ 2021-11-18 15:47 UTC (permalink / raw)
  To: Di Zhao OS, Richard Biener; +Cc: gcc-patches, Aldy Hernandez

On 11/15/21 12:24, Di Zhao OS via Gcc-patches wrote:
> Attached is the updated patch. Fixed some errors in testcases.
>
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com>
>> Sent: Wednesday, November 10, 2021 5:44 PM
>> To: Di Zhao OS <dizhao@os.amperecomputing.com>
>> Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod <amacleod@redhat.com>
>> Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
>> "equivalence map" for condition prediction
>>
>> On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
>> <dizhao@os.amperecomputing.com> wrote:
>>> Hi,
>>>
>>> Attached is a new version of the patch, mainly for improving performance
>>> and simplifying the code.
>> The patch doesn't apply anymore, can you update it please?
>>
>> I see the new ssa-fre-101.c test already passing without the patch.
> It was a mistake in test ssa-fre-101.c::g to define the variables with
> the unsigned integers, in this way "a >= 0" is always true. After modified
> the case, now fre1 in the patch can remove unreachable call "foo ()", and
> evrp on trunk does not.
>
>> Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
>> the pass dump after fre1 which is evrp so it seems that evrp already
>> handles the equivalences (likely with the relation oracle) now?
>> I'm sure there are second order effects when eliminating conditions
>> in FRE but did you re-evaluate what made you improve VN to see
>> if the cases are handled as expected now without this change?
> In case ssa-fre-102.c, the unreachable call "foo ()" can be removed by
> evrp, but fre in the patch can additionally replace "g = x + b" with
> "g = 99". (Again I'm sorry the regex to check this was wrong..)
>
> Test cases to simulate the original problem I found are moved into
> gcc.dg/tree-ssa/ssa-pre-34.c. The unreachable calls to "foo ()" are
> still removed by pre with the patch. (If compiled with -O3, the
> "foo ()"s in test file can now be removed by thread2/threadfull2 and
> dom3 on trunk. This relies on jump threading across the loops, so
> even with -O3, similar cases may not get optimized say if there're
> too many statements to copy.)
>
> Thanks,
> Di Zhao
>
>> I will still look at and consider the change btw, but given the EVRP
>> improvements I'm also considering to remove the predication
>> support from VN alltogether.  At least in the non-iterating mode
>> it should be trivially easy to use rangers relation oracle to simplify
>> predicates.  For the iterating mode it might not be 100% effective
>> since I'm not sure we can make it use the current SSA values and
>> how it would behave with those eventually changing to worse.
>>
>> Andrew, how would one ask the relation oracle to simplify a
>> condition?  Do I have to do any bookkeeping to register
>> predicates on edges for it?
>>
Sorry, missed this.

Presuming you have an active ranger,  the range_query class has a 
query_relation method which works on either a stmt or an edge. That 
method has an optional  flag (get_range) which defaults to true, and if 
true, it triggers a range query before asking if there is a 
relationship.. Which means even if you have done nothing else, it will 
go pre-populate whatever is necessary in order determine those ranges at 
that point which means filling in the relation oracle.

What that boils down to is on any edge or stmt,  you can ask:

   get_range_query (cfun)->query_relation (edge e, a_5, b_8)
And it will go and figure out the ranges of a_5 and b_8 at the point, 
and any relations between them.   so bookkeeping should already be done, 
you just have to ask.

if you want to simplify a condition, say you are looking at

c_3 = b_4 < x_7

you dont need to do anything more than ask for the range_of_stmt ().  if 
its [1,1] or [0,0] you already have your answer. similarly if that was 
an if (b_4 < x_7).. There doesn't need to be a LHS.

If you want to manually do it for whatever reason, ie you are tracking a 
table or something,   then

query_relation(stmt, b_4, x_7)  will return a tree code for the 
relation,  whatever it knows..  ie,  maybe LT_EXPR or  VREL_NONE if nothing.

value-relation.h has a set of transformation routines:

// General relation kind transformations.
relation_kind relation_union (relation_kind r1, relation_kind r2);
relation_kind relation_intersect (relation_kind r1, relation_kind r2);
relation_kind relation_negate (relation_kind r);
relation_kind relation_swap (relation_kind r);

so you can intersect or union 2 relations to get a result.   This is 
effectively what asking for the range of c_3 in the above example does 
under the covers.

It gets the known relation, intersects it with the condition on the 
stmt, and if the result is VREL_EMPTY (empty intersect), the condition 
is not true, and you get [0,0]
Similarly, if the known relation union with the condition on the stmt is 
the same as the stmt's condition, then we know it is a subset and 
therefore true, and return [1,1].   everything else produces [0,1]/VARYING.

I'm sorry I haven't had a chance to document all this stuff yet... I 
have most of December scheduled for documentation :-)

Andrew

PS. if you have expressions as trees you are tracking instead of in the 
IL, the range_of_expr() query supports arbitrary expressions trees I 
believe (which I think we can leverage eventually into predicate 
anaylsis) .  so if you have stashed the tree (a_3 < b_6 && a_3 > d_2) 
somewhere, and wonder what that evaluates at some stmt, you simply query:
   if (range_of_expr (r, expr_tree, stmt))   and you should get a [0,0] 
[1,1] or [0,1] based on what the values are at that point.
Also note that by using ranger this way instead of just looking at 
relations, it will also pick up the ranges which can cause one or more 
expressions to be true

Ie, if at that point in the IL  a_3 = [100, 200] and d_2 = [0, 14] the  
a_3 > d_2 will be true, regardless of whether there is a relation 
between them.

This should all work, but hasn't been super extensively stressed yet.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction
  2021-11-15 17:24       ` Di Zhao OS
  2021-11-18 15:47         ` Andrew MacLeod
@ 2021-12-03  6:42         ` Di Zhao OS
  1 sibling, 0 replies; 7+ messages in thread
From: Di Zhao OS @ 2021-12-03  6:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 20858 bytes --]

I'm very sorry there seems to be encoding issue in the attachment
in my last email. Attached is the new patch.  

Thanks,
Di Zhao

> -----Original Message-----
> From: Di Zhao OS
> Sent: Tuesday, November 16, 2021 1:24 AM
> To: 'Richard Biener' <richard.guenther@gmail.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH v2] tree-optimization/101186 - extend FRE with
> "equivalence map" for condition prediction
> 
> Attached is the updated patch. Fixed some errors in testcases.
> 
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: Wednesday, November 10, 2021 5:44 PM
> > To: Di Zhao OS <dizhao@os.amperecomputing.com>
> > Cc: gcc-patches@gcc.gnu.org; Andrew MacLeod <amacleod@redhat.com>
> > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > "equivalence map" for condition prediction
> >
> > On Sun, Oct 24, 2021 at 9:03 PM Di Zhao OS
> > <dizhao@os.amperecomputing.com> wrote:
> > >
> > > Hi,
> > >
> > > Attached is a new version of the patch, mainly for improving performance
> > > and simplifying the code.
> >
> > The patch doesn't apply anymore, can you update it please?
> >
> > I see the new ssa-fre-101.c test already passing without the patch.
> 
> It was a mistake in test ssa-fre-101.c::g to define the variables with
> the unsigned integers, in this way "a >= 0" is always true. After modified
> the case, now fre1 in the patch can remove unreachable call "foo ()", and
> evrp on trunk does not.
> 
> > Likewise ssa-fre-100.c and ssa-fre-102.c would PASS if you scan
> > the pass dump after fre1 which is evrp so it seems that evrp already
> > handles the equivalences (likely with the relation oracle) now?
> > I'm sure there are second order effects when eliminating conditions
> > in FRE but did you re-evaluate what made you improve VN to see
> > if the cases are handled as expected now without this change?
> 
> In case ssa-fre-102.c, the unreachable call "foo ()" can be removed by
> evrp, but fre in the patch can additionally replace "g = x + b" with
> "g = 99". (Again I'm sorry the regex to check this was wrong..)
> 
> Test cases to simulate the original problem I found are moved into
> gcc.dg/tree-ssa/ssa-pre-34.c. The unreachable calls to "foo ()" are
> still removed by pre with the patch. (If compiled with -O3, the
> "foo ()"s in test file can now be removed by thread2/threadfull2 and
> dom3 on trunk. This relies on jump threading across the loops, so
> even with -O3, similar cases may not get optimized say if there're
> too many statements to copy.)
> 
> Thanks,
> Di Zhao
> 
> >
> > I will still look at and consider the change btw, but given the EVRP
> > improvements I'm also considering to remove the predication
> > support from VN alltogether.  At least in the non-iterating mode
> > it should be trivially easy to use rangers relation oracle to simplify
> > predicates.  For the iterating mode it might not be 100% effective
> > since I'm not sure we can make it use the current SSA values and
> > how it would behave with those eventually changing to worse.
> >
> > Andrew, how would one ask the relation oracle to simplify a
> > condition?  Do I have to do any bookkeeping to register
> > predicates on edges for it?
> >
> > Thanks,
> > Richard.
> >
> > > First, regarding the comments:
> > >
> > > > -----Original Message-----
> > > > From: Richard Biener <richard.guenther@gmail.com>
> > > > Sent: Friday, October 1, 2021 9:00 PM
> > > > To: Di Zhao OS <dizhao@os.amperecomputing.com>
> > > > Cc: gcc-patches@gcc.gnu.org
> > > > Subject: Re: [PATCH v2] tree-optimization/101186 - extend FRE with
> > > > "equivalence map" for condition prediction
> > > >
> > > > On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
> > > > <dizhao@os.amperecomputing.com> wrote:
> > > > >
> > > > > Sorry about updating on this after so long. It took me much time to work out a
> > > > > new plan and pass the tests.
> > > > >
> > > > > The new idea is to use one variable to represent a set of equal variables at
> > > > > some basic-block. This variable is called a "equivalence head" or "equiv-head"
> > > > > in the code. (There's no-longer a "equivalence map".)
> > > > >
> > > > > - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
> > > > >   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
> > > > >   map. Besides, when inserting into vn_nary_op_t map, make the new result at
> > > > >   front of the vn_pval list, so that when searching for a variable's
> > > > >   equivalence head, the first result represents the largest equivalence set at
> > > > >   current location.
> > > > > - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
> > > > >   For recorded equivalences, the reference is result->entry; for normal N-ary
> > > > >   operations, the reference is operand->entry.
> > > > > - When recording equivalences, if one side A is constant or has more refs, make
> > > > >   it the new equivalence head of the other side B. Traverse B's ref- list,if a
> > > > >   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
> > > > >   operations by replacing B with A.
> > > > > - When inserting and looking for the results of n-ary operations, insert and
> > > > >   lookup by the operands' equiv-heads.
> > > > > ...
> > > > >
> > > > > Thanks,
> > > > > Di Zhao
> > > > >
> > > > > --------
> > > > > Extend FRE with temporary equivalences.
> > > >
> > > > Comments on the patch:
> > > >
> > > > +  /* nary_ref count.  */
> > > > +  unsigned num_nary_ref;
> > > > +
> > > >
> > > > I think a unsigned short should be enough and that would nicely
> > > > pack after value_id together with the bitfield (maybe change that
> > > > to unsigned short :1 then).
> > >
> > > Changed num_nary_ref to unsigned short and moved after value_id.
> > >
> > > > @@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
> > > >             tree val = gimple_simplify (gimple_cond_code (last),
> > > >                                         boolean_type_node, lhs, rhs,
> > > >                                         NULL, vn_valueize);
> > > > +           vn_nary_op_t vnresult = NULL;
> > > >             /* If the condition didn't simplfy see if we have recorded
> > > >                an expression from sofar taken edges.  */
> > > >             if (! val || TREE_CODE (val) != INTEGER_CST)
> > > >               {
> > > > -               vn_nary_op_t vnresult;
> > > >
> > > > looks like you don't need vnresult outside of the if()?
> > >
> > > vnresult is reused later to record equivalences generated by PHI nodes.
> > >
> > > > +/* Find predicated value of vn_nary_op by the operands' equivalences. Return
> > > > + * NULL_TREE if no known result is found.  */
> > > > +
> > > > +static tree
> > > > +find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
> > > > +                                vn_nary_op_t *vnresult)
> > > > +{
> > > > +  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
> > > > +  tree result
> > > > +    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);
> > > >
> > > > why is it necessary to simplify here?  It looks like the caller
> > > > already does this.
> > >
> > > In the new patch, changed the code a little to remove redundant calculation.
> > >
> > > > I wonder whether it's valid to always perform find_predicated_value_by_equivs
> > > > from inside vn_nary_op_get_predicated_value instead of bolting it to only
> > > > a single place?
> > >
> > > Removed function find_predicated_value_by_equivs and inlined the code.
> > >
> > > Because lookup_equiv_head uses vn_nary_op_get_predicated_value, so I left
> > > vn_nary_op_get_predicated_value unchanged. Instead, operands are set to
> > > equiv-heads in init_vn_nary_op_from_stmt. So altogether, predicates are always
> > > inserted and searched by equiv-heads.
> > >
> > > > +
> > > > +static vn_nary_op_t
> > > > +val_equiv_insert (tree op1, tree op2, edge e)
> > > > +{
> > > >
> > > > +  if (is_gimple_min_invariant (lhs))
> > > > +    std::swap (lhs, rhs);
> > > > +  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
> > > > +    /* Possible if invoked from record_equiv_from_previous_cond.  */
> > > > +    return NULL;
> > > >
> > > > Better formulate all of the above in terms of only SSA_NAME checks since...
> > > >
> > > > +  /* Make the hand-side with more recorded n-ary expressions new
> > > > +   * equivalence-head, to make fewer re-insertions.  */
> > > > +  if (TREE_CODE (rhs) == SSA_NAME
> > > > +      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
> > > > +    std::swap (lhs, rhs);
> > > >
> > > > here LHS needs to be an SSA_NAME.
> > >
> > > Tried to fix this in the new patch.
> > >
> > > > +  /* Record equivalence as unary NOP_EXPR.  */
> > > > +  vn_nary_op_t val
> > > > +    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
> > > > +                                            &lhs, rhs, 0, bb);
> > > >
> > > > so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
> > > > with value rhs?
> > > > Presumably you think that's good enough to not overlap with entries
> > > > for "real" IL?
> > >
> > > By checking related codes, it seems to me there's no overlap as long as
> > > equivalences are inserted and searched as predicated_values. It should be easy
> > > to replace this with other preferred opcode.
> > >
> > > > In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
> > > > &[lhs, rhs], boolean_true_node, ...) like I think what would be
> > > > inserted for by the
> > > > existing code in process_bb via vn_nary_op_insert_pieces_predicated?
> > > > It would probably felt more natural if the "new" code bolted on the case where
> > > > that is called with an equivalence?
> > >
> > > The "equivalence head"s actually implement Disjoint-sets. So by a hashtable
> > > search in the nary-map, the set of equivalences of a variable can be found.
> > > Also it's easy to find out whether two variables are equal. And the
> > > disjoint-sets are path-compressed by updating "equivalence head" and
> > > re-inserting predicates.
> > >
> > > In the new attached patch, predicates like "A==B is true" and "A!=B is false"
> > > are not inserted, as they can be computed from equivalences. Same with the
> > > predicates derived from them. This reduces insertions of predicates by a lot (
> > > counting the equivalences themselves and re-inserted predicates). Below are
> > > test results on SPEC intrate, comparing with trunk (f45610a4, Oct 19):
> > >
> > >               |more bb |more bb |more stmt|more stmt|less      |less
> > >               |removed |removed |removed  |removed  |predicates|predicates
> > >               |at fre1 |at fre1 |at fre1  |at fre1  |inserted  |inserted
> > > --------------------------------------------------------------------------
> > > 500.perlbench_r| 64    | 1.56%  | 102    | 0.18%   | 51928    | 53.44%
> > > 502.gcc_r      | 679   | 4.88%  | 290    | 0.21%   | 149968   | 65.5%
> > > 505.mcf_r      | 5     | 35.71% | 9      | 1.40%   | 349      | 27.48%
> > > 520.omnetpp    | 132   | 5.36%  | 48     | 0.14%   | 28192    | 58.38%
> > > 523.xalancbmk_r| 231   | 3.15%  | 308    | 0.35%   | 65131    | 58.95%
> > > 525.x264_r     | 4     | 1.36%  | 27     | 0.11%   | 10401    | 40.68%
> > > 531.deepsjeng_r| 1     | 3.45%  | 2      | 0.14%   | 1412     | 53.81%
> > > 541.leela_r    | 2     | 0.62%  | 5      | 0.06%   | 3874     | 48.56%
> > > 548.exchange2_r| 0     | 0.00%  | 3      | 0.04%   | 159      | 3.33%
> > > 557.xz_r       | 0     | 0.00%  | 3      | 0.07%   | 1894     | 52.39%
> > >
> > > Considering the copying and unwinding work saved, this patch could reduce
> > > compile time rather than increase it.
> > >
> > > > Btw, I didn't find any condition that prevents equivalences being used for
> > > > non-integer types?  Are you sure this is all valid for FP compares without
> > > > further restrictions?
> > >
> > > I think this approach follows sccvn's previous decisions on FP comparisons,
> > > for equivalences are inserted only when EQ_EXPRs were evaluated to be true.
> > > The patch is tested on regression tests and SPEC fprate cases.
> > >
> > > >
> > > > @@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
> > > > int length, enum tree_code code,
> > > >        fprintf (dump_file, " == %s\n",
> > > >                integer_zerop (result) ? "false" : "true");
> > > >      }
> > > > -  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
> > > > -  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
> > > > -  vno1->predicated_values = 1;
> > > > -  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
> > > > -                                             sizeof (vn_pval));
> > > > -  vno1->u.values->next = NULL;
> > > > -  vno1->u.values->result = result;
> > > > -  vno1->u.values->n = 1;
> > > > -  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
> > > > -  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
> > > > +  /* Insert by operands' equiv-heads.  */
> > > > +  tree new_ops[3];
> > > > +  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
> > > > +  /* Skip if the result is known.  */
> > > > +  tree simplified = simplify_nary_op (length, code, new_ops, type);
> > > > +  if (simplified)
> > > > +    {
> > > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > > +       {
> > > > +         fprintf (dump_file, "Result is known: ");
> > > > +         print_generic_expr (dump_file, simplified, TDF_NONE);
> > > > +         fprintf (dump_file, ", skipped.\n");
> > > > +       }
> > > > +      return NULL;
> > > > +    }
> > > >
> > > > soo - how do the existing predicated expressions change to be handled here?
> > > > Why would they ever be simplified?  Should this simplification not better happen
> > > > upthread when visiting the conditional?  Maybe that needs to use the equivs
> > > > already?
> > >
> > > This happens when updating variable A's equiv_head from B to C, and
> > > re-inserting former predicates of B with C. For example, "x < 0" is recorded,
> > > then x is assigned with new equiv_head -1 at some location, then -1 < 0 can
> > > simplify. It shouldn't happen when inserting new predicates.
> > >
> > > >
> > > > You re-use the predicated value list which I think is great, can you explain
> > > > how "latest" is meant in the following?
> > > >
> > > > @@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
> > > > vn_nary_op_table_type *table,
> > > >               next = &(*next)->next;
> > > >             }
> > > >           if (!found)
> > > > -           *next = nval;
> > > > +           {
> > > > +             /* Insert new value at the front, so that lookup_equiv_head can
> > > > +              * find the latest result.  */
> > > > +             nval->next = vno->u.values;
> > > > +             vno->u.values = nval;
> > > > +           }
> > >
> > > It meant the "most general" result. (I picked the wrong word. The description
> > > is changed in the new patch file. )
> > >
> > > There can be multiple valid results at the same time. As we traverse in RPO and
> > > update equiv-heads (i.e. unite disjoint-sets), the last inserted result
> > > represents the most complete set of equivalences of the variable.
> > >
> > > (In the last patch version, when "Appending predicate to value", the new result
> > > is not moved to the front. That violates the property of equiv-heads and some
> > > opportunities will be lost. Fixed this in the new patch file.)
> > >
> > > >
> > > > OK, I have to stop now but will try to understand and review further next week.
> > > > Sorry for the delays but the patch is quite complex to understand :/
> > > >
> > > > Richard.
> > > >
> > > > > 2021-09-16  Di Zhao  <dizhao@os.amperecomputing.com>
> > > > >
> > >
> > > Other changes:
> > > - Fixed some errs in comments.
> > > - On recording temporary equivalences generated by PHI nodes, altered the
> > >   logic, hope it can be a bit more clear.
> > >   1. When a conditional expression's true/false edge E is assessed to be
> > >      executable, search for conflicting predicates for E to be taken. If a
> > >      conflicting predicate is found at L, and L's single predecessor P
> > >      dominates current BB, then we know P->L cannot be taken if E is taken.
> > >   2. Process phi_bb (immediately dominated by P, and dominates BB), rule out
> > >      PHI arguments (x2) that can only come from P->L. If there's a single
> > >      PHI argument (x1) left, record the equivalence of x1 and PHI result x
> > >      at edge E.
> > >           -----
> > >           | P |
> > >           -----
> > >            |  \
> > >            |   -----
> > >            |   | L |  <== conflicting predicate's location
> > >            |   -----
> > >            |    /
> > >   -------------------------
> > >   | x = PHI<x1(P), x2(L)> | phi_bb
> > >   -------------------------
> > >              |
> > >             ....
> > >              |
> > >            -----
> > >            | BB | <== current bb
> > >            -----
> > >           /     \ edge E
> > >         ...     -------
> > >                 | dest |
> > >                 -------
> > >
> > > - Fixed a correctness problem when recording equivalences from PHI in previous
> > >   version. Also test case ssa-fre-95.c:k in previous version was incorrect,
> > >   fixed that.
> > >
> > > Thanks,
> > > Di Zhao
> 
> ---
> 
> Extend FRE with temporary equivalences.
> 
> 2021-11-15  Di Zhao  <dizhao@os.amperecomputing.com>
> 
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (VN_INFO): remove assertions (there could be a
>         predicate already).
>         (dominated_by_p_w_unex): Moved upward.
>         (vn_nary_op_get_predicated_value): Moved upward.
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (lookup_equiv_head): Lookup the "equivalence head" of given node.
>         (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
>         (vn_tracking_edge): Extracted utility function.
>         (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
>         (vn_nary_op_insert_into): Insert new value at the front.
>         (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
>         from pieces.
>         (fold_const_from_equiv_heads): Fold N-ary expression of equiv-heads.
>         (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
>         (val_equiv_insert): Record temporary equivalence.
>         (vn_nary_op_insert_pieces_predicated): Record equivalences instead of
>         some predicates; insert back-refs.
>         (record_equiv_from_prev_phi_1): Record temporary equivalences generated
>         by PHI nodes.
>         (record_equiv_from_prev_phi): Given an outgoing edge of a conditional
>         expression taken, record equivalences generated by PHI nodes.
>         (visit_nary_op): Add lookup previous results of N-ary operations by
>         equivalences.
>         (insert_related_predicates_on_edge): Some predicates can be computed
>         from equivalences, no need to insert them.
>         (process_bb): Add lookup predicated values by equivalenc~/tmp/time- frees.
>         (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
>         (do_unwind): Unwind the back-refs to vn_nary_op_t.
>         (do_rpo_vn): Update back-reference unwind state.
>         * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to the
>         nary map entries.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-100.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-101.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-102.c: New test.
>         * gcc.dg/tree-ssa/ssa-pre-34.c: New test.

[-- Attachment #2: v4-tree-optimization-101186.patch --]
[-- Type: application/octet-stream, Size: 36430 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
index cca706e0c4f..0a99c3c620e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68619-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dom2-details -w" } */
+/* { dg-options "-O2 -fno-tree-fre -fdump-tree-dom2-details -w" } */
 
 typedef union tree_node *tree;
 struct gcc_options
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
index ac8271cc574..e9a797e82b0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */ 
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
index b2c09cbb021..bcbc419575c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
index 2316f7e1c04..d63291da440 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 int f(int x, int y)
 {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index e7038d0237f..c1e5667b45a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 static inline long load(long *p)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
index b44c064aa23..069e4a106c3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
index 26e5abbdc29..26b303e896e 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
index 22b68d5ede0..3ed3bf6ad29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
+/* { dg-options "-O2 -fno-tree-vrp -fno-tree-fre -fdump-tree-dom-details" } */
 
 
 int f(int x, int y)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 00000000000..ccb0fdaa698
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void boo(void);
+extern void foo(void);
+
+/* Test temporary equivalences: basic.  */
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = 0; i < a; i++)
+	{
+	  bar ();
+	  if (i >= b)
+	    /* Unreachable.  */
+	    foo ();
+	}
+    }
+}
+
+/* Test temporary equivalences: transitive.  */
+
+void g (unsigned int a, unsigned int b, unsigned int c)
+{
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a == b)
+	{
+	  if (b == c)
+	    {
+	      boo ();
+	      if (i >= a)
+		/* Unreachable.  */
+		foo ();
+	    }
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "boo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
new file mode 100644
index 00000000000..6178437a309
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-101.c
@@ -0,0 +1,83 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void bas(void);
+extern void foo(void);
+
+void f (unsigned int a, unsigned int b)
+{
+  if (a == b)
+    {
+      for (unsigned i = a; i < 100; i++)
+	{
+	  if (i > b)
+	    /* Should not be eliminated.  */
+	    bar ();
+	}
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+int gg;
+
+void g (int a, int b, int d)
+{
+  int c = 100;
+  if (a > 0)
+    {
+      c = b;
+      gg++;
+    }
+  else
+    /* a <= 0 */
+    {
+      c = d;
+      gg--;
+    }
+  for (int i = 0; i < c; i++)
+    {
+      if (a >= 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated. ("a >= 0" cannot derive "a > 0".)  */
+	    bas ();
+	}
+      else
+	/* a < 0 */
+	{
+	  if (i >= d)
+	    /* Should be eliminated, as "a < 0" can derive "a <= 0".  */
+	    foo ();
+	}
+    }
+}
+
+/* Test temporary equivalences: no domination.  */
+
+void h (unsigned int a, unsigned int b)
+{
+  unsigned int c = 100;
+  if (b % 2)
+    {
+      if (a != 0)
+	c = b;
+    }
+  for (unsigned i = 0; i < c; i++)
+    {
+      if (a != 0)
+	{
+	  if (i >= b)
+	    /* Should not be eliminated.  */
+	    bag ();
+	}
+      i++;
+    }
+}
+
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump "bas" "fre1" } } */
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
new file mode 100644
index 00000000000..3c8900c4219
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-102.c
@@ -0,0 +1,45 @@
+
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+extern void bar(void);
+extern void bag(void);
+extern void foo(void);
+int g;
+
+/* Test temporary equivalences: optimize N-ary expressions.  */
+
+void
+f (unsigned int a, unsigned int b, unsigned int x, unsigned int y)
+{
+  if (a == b)
+    {
+      g = 100;
+      if (a + x == 99)
+	g = x + b; /* should be simplified to 99 */
+    }
+  else
+    // a!= b
+    {
+      if (a == 100 - x)
+	{
+	  g++;
+	  if (b == 100 - x)
+	    foo (); /* should be removed */
+	}
+      else
+	{
+	  g--;
+	  if (b == 100 - x)
+	    bag (); /* should not be removed */
+	  else
+	    bar (); /* should not be removed */
+	}
+    }
+}
+
+/* { dg-final { scan-tree-dump "bag" "fre1" } } */
+/* { dg-final { scan-tree-dump "bar" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "foo" "fre1" } } */
+/* { dg-final { scan-tree-dump-not "= b_\[0-9]+\\(\[A-Z]+\\) \\+ x_" "fre1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
new file mode 100644
index 00000000000..091a67ec0a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-34.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+extern void foo(void);
+extern void bag(void);
+extern void bas(void);
+
+/* Test temporary equivalences: identical condition.  */
+
+void h (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != 0)
+    c = x;
+  while (b < 1000)
+    {
+      if (a != 0)
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bas ();
+      b++;
+    }
+}
+
+/* Test temporary equivalences: derived condition.  */
+
+void k (int a, int b, int x, int y)
+{
+  int c = y;
+  if (a != b)
+    c = x;
+  for (int i = 0; i < 1000; i++)
+    {
+      if (a < b)
+	/* All paths reaching "a>0 is true" come from "a!=0 is true".  */
+	{
+	  if (c > x)
+	    /* Unreachable.  */
+	    foo ();
+	}
+      else
+	bag ();
+    }
+}
+
+/* { dg-final { scan-tree-dump-not "foo \\(\\)" "pre" } } */
+/* { dg-final { scan-tree-dump "bag" "pre" } } */
+/* { dg-final { scan-tree-dump "bas" "pre" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
index 4cbaca41332..69b83b6cc30 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp03.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps" } */
+/* { dg-options "-O2 -fdisable-tree-evrp -fdump-tree-vrp1 -fno-thread-jumps -fno-tree-fre" } */
 
 struct A
 {
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 149674e6a16..2061647ae06 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -353,6 +353,8 @@ static vn_reference_t last_inserted_ref;
 static vn_phi_t last_inserted_phi;
 static vn_nary_op_t last_inserted_nary;
 static vn_ssa_aux_t last_pushed_avail;
+static vn_ssa_aux_t last_pushed_nary_ref;
+static nary_ref* nary_ref_freelist;
 
 /* Valid hashtables storing information we have proven to be
    correct.  */
@@ -491,9 +493,9 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_true_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary);
-	    gcc_assert (nary->unwind_to == NULL);
 	    /* Also do not link it into the undo chain.  */
 	    last_inserted_nary = nary->next;
+	    /* There could be a predicate already.  */
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    nary = alloc_vn_nary_op_noinit (2, &vn_tables_insert_obstack);
 	    init_vn_nary_op_from_pieces (nary, 2, EQ_EXPR,
@@ -501,7 +503,6 @@ VN_INFO (tree name)
 	    nary->predicated_values = 0;
 	    nary->u.result = boolean_false_node;
 	    vn_nary_op_insert_into (nary, valid_info->nary);
-	    gcc_assert (nary->unwind_to == NULL);
 	    last_inserted_nary = nary->next;
 	    nary->next = (vn_nary_op_t)(void *)-1;
 	    if (dump_file && (dump_flags & TDF_DETAILS))
@@ -3850,6 +3851,114 @@ vn_reference_insert_pieces (tree vuse, alias_set_type set,
   return vr1;
 }
 
+static bool
+dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+
+static tree
+vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+{
+  if (! vno->predicated_values)
+    return vno->u.result;
+  for (vn_pval *val = vno->u.values; val; val = val->next)
+    for (unsigned i = 0; i < val->n; ++i)
+      {
+	/* Do not handle backedge executability optimistically since
+	 when figuring out whether to iterate we do not consider
+	 changed predication.  */
+	if (dominated_by_p_w_unex (
+	      bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
+	      false))
+	return val->result;
+      }
+  return NULL_TREE;
+}
+
+/* Whether a predicated value VAL is valid at BB.  */
+
+static inline bool
+is_vn_valid_at_bb (vn_pval *val, basic_block bb)
+{
+  for (unsigned i = 0; i < val->n; ++i)
+    {
+      basic_block val_bb
+	= BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+      if (dominated_by_p_w_unex (bb, val_bb, false))
+	return true;
+    }
+  return false;
+}
+
+/* Lookup the variable that represents a set of equivalences of NAME at
+   basic-block BB.  This variable is called the "equivalence head" of NAME.  */
+
+static tree
+lookup_equiv_head (tree name, basic_block bb)
+{
+  if (TREE_CODE (name) != SSA_NAME)
+    return name;
+  tree result = VN_INFO (name)->valnum;
+  if (result == VN_TOP)
+    /* VN_TOP is not allowed, otherwise it will be too optimistic for
+       non-iterating mode.  */
+    return name;
+  vn_nary_op_t vnresult;
+  /* As we insert new value at front in vn_nary_op_insert_into, we don't need
+     search iteratively for the most general equiv-head.  */
+  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (result), &result,
+			    &vnresult);
+  if (vnresult && vnresult->predicated_values)
+    {
+      tree eq = vn_nary_op_get_predicated_value (vnresult, bb);
+      if (eq)
+	return eq;
+    }
+  return result;
+}
+
+/* Lookup the "equivalence head"s of OPS at BB, and store them in TO.  N is the
+   size of OPS.  */
+
+static void
+lookup_equiv_heads (unsigned n, tree *ops, tree *to, basic_block bb)
+{
+  for (unsigned i = 0; i < n; i++)
+    {
+      tree equiv_head = lookup_equiv_head (ops[i], bb);
+      if (dump_file && (dump_flags & TDF_DETAILS) && equiv_head != ops[i])
+	{
+	  fprintf (dump_file, "Use equiv-head ");
+	  print_generic_expr (dump_file, equiv_head, TDF_NONE);
+	  fprintf (dump_file, " for ");
+	  print_generic_expr (dump_file, ops[i], TDF_NONE);
+	  fprintf (dump_file, "\n");
+	}
+      to[i] = equiv_head;
+    }
+}
+
+/* Returns whether edge PRED_E is currently being tracked.  */
+
+static inline bool
+vn_tracking_edge (edge pred_e)
+{
+  if (!single_pred_p (pred_e->dest))
+    {
+      /* Never record for backedges.  */
+      if (pred_e->flags & EDGE_DFS_BACK)
+	return false;
+      edge_iterator ei;
+      edge e;
+      int cnt = 0;
+      /* Ignore backedges.  */
+      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
+	if (!dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+	  cnt++;
+      if (cnt != 1)
+	return false;
+    }
+  return true;
+}
+
 /* Compute and return the hash value for nary operation VBO1.  */
 
 static hashval_t
@@ -3983,6 +4092,9 @@ init_vn_nary_op_from_stmt (vn_nary_op_t vno, gassign *stmt)
       for (i = 0; i < vno->length; ++i)
 	vno->op[i] = gimple_op (stmt, i + 1);
     }
+  /* Insert and lookup N-ary results by the operands' equivalence heads.  */
+  if (gimple_bb (stmt))
+    lookup_equiv_heads (vno->length, vno->op, vno->op, gimple_bb (stmt));
 }
 
 /* Compute the hashcode for VNO and look for it in the hash table;
@@ -4128,12 +4240,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 	    = BASIC_BLOCK_FOR_FN (cfun, vno->u.values->valid_dominated_by_p[0]);
 	  vn_pval *nval = vno->u.values;
 	  vn_pval **next = &vno->u.values;
-	  bool found = false;
 	  for (vn_pval *val = (*slot)->u.values; val; val = val->next)
 	    {
 	      if (expressions_equal_p (val->result, nval->result))
 		{
-		  found = true;
 		  for (unsigned i = 0; i < val->n; ++i)
 		    {
 		      basic_block val_bb
@@ -4147,17 +4257,15 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 			gcc_unreachable ();
 		    }
 		  /* Append value.  */
-		  *next = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-						     sizeof (vn_pval)
-						     + val->n * sizeof (int));
-		  (*next)->next = NULL;
-		  (*next)->result = val->result;
-		  (*next)->n = val->n + 1;
-		  memcpy ((*next)->valid_dominated_by_p,
-			  val->valid_dominated_by_p,
+		  nval = (vn_pval *) obstack_alloc (&vn_tables_obstack,
+						    sizeof (vn_pval)
+						      + val->n * sizeof (int));
+		  nval->next = NULL;
+		  nval->result = val->result;
+		  nval->n = val->n + 1;
+		  memcpy (nval->valid_dominated_by_p, val->valid_dominated_by_p,
 			  val->n * sizeof (int));
-		  (*next)->valid_dominated_by_p[val->n] = vno_bb->index;
-		  next = &(*next)->next;
+		  nval->valid_dominated_by_p[val->n] = vno_bb->index;
 		  if (dump_file && (dump_flags & TDF_DETAILS))
 		    fprintf (dump_file, "Appending predicate to value.\n");
 		  continue;
@@ -4170,8 +4278,10 @@ vn_nary_op_insert_into (vn_nary_op_t vno, vn_nary_op_table_type *table)
 	      (*next)->next = NULL;
 	      next = &(*next)->next;
 	    }
-	  if (!found)
-	    *next = nval;
+	  /* Put new result at front, so lookup_equiv_head can find the last
+	     inserted (most general) result.  */
+	  nval->next = vno->u.values;
+	  vno->u.values = nval;
 
 	  *slot = vno;
 	  vno->next = last_inserted_nary;
@@ -4216,6 +4326,86 @@ vn_nary_op_insert_pieces (unsigned int length, enum tree_code code,
   return vn_nary_op_insert_into (vno1, valid_info->nary);
 }
 
+/* Similar with vn_nary_op_insert_pieces, but insert as predicated values.  */
+
+static vn_nary_op_t
+vn_nary_op_insert_pieces_predicated_1 (unsigned int length, enum tree_code code,
+				       tree type, tree *ops, tree result,
+				       unsigned int value_id, basic_block bb)
+{
+  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
+  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
+  vno1->predicated_values = 1;
+  vno1->u.values
+    = (vn_pval *) obstack_alloc (&vn_tables_obstack, sizeof (vn_pval));
+  vno1->u.values->next = NULL;
+  vno1->u.values->result = result;
+  vno1->u.values->n = 1;
+  vno1->u.values->valid_dominated_by_p[0] = bb->index;
+  return vn_nary_op_insert_into (vno1, valid_info->nary);
+}
+
+/* Fold N-ary expression of equiv-heads, return NULL_TREE if not folded.  */
+
+static tree
+fold_const_from_equiv_heads (unsigned length, tree_code opcode, tree *op,
+			     tree type)
+{
+  tree result = NULL_TREE;
+  /* We're using equiv-heads, i.e. the most general operands, shortcut for
+     situations that can fold.  */
+  switch (length)
+    {
+    case 1:
+      if (CONSTANT_CLASS_P (op[0]))
+	result
+	  = gimple_simplify (opcode, type, op[0], NULL, no_follow_ssa_edges);
+      break;
+    case 2:
+      if ((CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1]))
+	  || op[0] == op[1])
+	result = gimple_simplify (opcode, type, op[0], op[1], NULL,
+				  no_follow_ssa_edges);
+      break;
+    case 3:
+      if (CONSTANT_CLASS_P (op[0]) && CONSTANT_CLASS_P (op[1])
+	  && CONSTANT_CLASS_P (op[2]))
+	result = gimple_simplify (opcode, type, op[0], op[1], op[2], NULL,
+				  no_follow_ssa_edges);
+      break;
+    default:
+      break;
+    }
+  if (result && !useless_type_conversion_p (type, TREE_TYPE (result)))
+    result = fold_convert (type, result);
+  return result;
+}
+
+/* Insert a back-reference to vn_nary_op_t in vn_ssa_aux.  */
+
+static inline void
+push_new_nary_ref (vn_nary_op_t slot, vn_ssa_aux_t info)
+{
+  nary_ref *ref;
+  if (nary_ref_freelist)
+    {
+      ref = nary_ref_freelist;
+      nary_ref_freelist = nary_ref_freelist->next;
+    }
+  else
+    ref = XOBNEW (&vn_ssa_aux_obstack, nary_ref);
+  ref->next = info->ref;
+  ref->next_undo = last_pushed_nary_ref;
+  ref->slot = slot;
+  last_pushed_nary_ref = info;
+  info->ref = ref;
+  ++info->num_nary_ref;
+  gcc_assert (info->num_nary_ref > 0);
+}
+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e);
+
 static vn_nary_op_t
 vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     tree type, tree *ops,
@@ -4223,21 +4413,15 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
 				     edge pred_e)
 {
   /* ???  Currently tracking BBs.  */
-  if (! single_pred_p (pred_e->dest))
-    {
-      /* Never record for backedges.  */
-      if (pred_e->flags & EDGE_DFS_BACK)
-	return NULL;
-      edge_iterator ei;
-      edge e;
-      int cnt = 0;
-      /* Ignore backedges.  */
-      FOR_EACH_EDGE (e, ei, pred_e->dest->preds)
-	if (! dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
-	  cnt++;
-      if (cnt != 1)
-	return NULL;
-    }
+  if (!vn_tracking_edge (pred_e))
+    return NULL;
+  /* Equivalence can represent this.  */
+  if (code == EQ_EXPR && result == boolean_true_node)
+    return val_equiv_insert (ops[0], ops[1], pred_e);
+  /* Leave this for inverted condition.  */
+  if (code == NE_EXPR && result == boolean_false_node)
+    return NULL;
+
   if (dump_file && (dump_flags & TDF_DETAILS)
       /* ???  Fix dumping, but currently we only get comparisons.  */
       && TREE_CODE_CLASS (code) == tcc_comparison)
@@ -4250,36 +4434,252 @@ vn_nary_op_insert_pieces_predicated (unsigned int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
 	       integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-					      sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary);
+  for (unsigned i = 0; i < length; i++)
+    gcc_checking_assert (lookup_equiv_head (ops[i], pred_e->dest) == ops[i]);
+
+  /* Skip if the result is known. (Caused by re-inserting predicates.)  */
+  tree simplified = fold_const_from_equiv_heads (length, code, ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Result is known: ");
+	  print_generic_expr (dump_file, simplified, TDF_NONE);
+	  fprintf (dump_file, ", skipped.\n");
+	}
+      return NULL;
+    }
+
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (length, code, type, ops,
+					     result, value_id, pred_e->dest);
+  /* Insert back-refs (operand->slot) to vn_ssa_aux structure.  */
+  for (unsigned i = 0; i < length; i++)
+    if (TREE_CODE (ops[i]) == SSA_NAME)
+      {
+	vn_ssa_aux_t info = VN_INFO (ops[i]);
+	if (info->valnum == ops[i])
+	  push_new_nary_ref (val, info);
+      }
+  return val;
 }
 
-static bool
-dominated_by_p_w_unex (basic_block bb1, basic_block bb2, bool);
+/* Record the equivalence of OP1 and OP2 on edge E.  */
 
-static tree
-vn_nary_op_get_predicated_value (vn_nary_op_t vno, basic_block bb)
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
 {
-  if (! vno->predicated_values)
-    return vno->u.result;
+  /* Record by equivalence heads.  */
+  basic_block bb = e->dest;
+  tree lhs = lookup_equiv_head (op1, bb);
+  tree rhs = lookup_equiv_head (op2, bb);
+  if (lhs == rhs)
+    return NULL;
+  /* Set RHS to be new equiv-head for LHS.  LHS must be SSA_NAME.  */
+  if (TREE_CODE (lhs) != SSA_NAME)
+    {
+      if (TREE_CODE (rhs) == SSA_NAME)
+	std::swap (lhs, rhs);
+      else
+	return NULL;
+    }
+  vn_ssa_aux_t rhs_info = NULL, lhs_info = VN_INFO (lhs);
+
+  /* Choose the hand-side with more recorded n-ary expressions as new
+     equivalence head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    {
+      rhs_info = VN_INFO (rhs);
+      if (rhs_info->num_nary_ref < lhs_info->num_nary_ref)
+	{
+	  std::swap (lhs, rhs);
+	  std::swap (rhs_info, lhs_info);
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Recording equivalence of ");
+      print_generic_expr (dump_file, lhs);
+      fprintf (dump_file, " and ");
+      print_generic_expr (dump_file, rhs);
+      fprintf (dump_file, " at BB%d\n", bb->index);
+    }
+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+					     &lhs, rhs, 0, bb);
+  /* Back-ref for equivalences is result->slot rather than op->slot, so that
+     when new equiv-head is appended, we can update them.  */
+  if (TREE_CODE (rhs) == SSA_NAME)
+    push_new_nary_ref (val, rhs_info);
+
+  /* Iterate on LHS's references, first update equiv-heads, then re-insert
+     former predicates, so that the result will be latest.  */
+  auto_vec<vn_nary_op_t> predicates;
+  auto_vec<tree> results;
+  vn_nary_op_t slot;
+  tree old_head = lhs_info->valnum;
+  nary_ref *ref = lhs_info->ref;
+  for (; ref; ref = ref->next)
+    {
+      slot = ref->slot;
+      if (!slot->predicated_values)
+	continue;
+      /* Update recorded equivalences with new equiv-head.  */
+      if (slot->opcode == NOP_EXPR && slot->length == 1)
+	{
+	  if (slot->op[0] == rhs)
+	    continue;
+	  for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	    if (pval->result == old_head && is_vn_valid_at_bb (pval, bb))
+	      {
+		if (dump_file && (dump_flags & TDF_DETAILS))
+		  {
+		    fprintf (dump_file, "Updating equiv-head of ");
+		    print_generic_expr (dump_file, slot->op[0], TDF_SLIM);
+		    fprintf (dump_file, " to ");
+		    print_generic_expr (dump_file, rhs, TDF_SLIM);
+		    fprintf (dump_file, " at BB%d\n", bb->index);
+		  }
+		vn_nary_op_t new_val
+		  = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR,
+							   slot->type, slot->op,
+							   rhs, 0, bb);
+		/* Push back-ref (result->slot) for new equivalence.  */
+		if (TREE_CODE (rhs) == SSA_NAME)
+		  push_new_nary_ref (new_val, rhs_info);
+	      }
+	}
+      else
+	/* Predicated values are collected, to be re-inserted later.  */
+	for (vn_pval *pval = slot->u.values; pval; pval = pval->next)
+	  {
+	    if (!is_vn_valid_at_bb (pval, bb))
+	      continue;
+	    predicates.safe_push (slot);
+	    results.safe_push (pval->result);
+	  }
+    }
+  unsigned i;
+  if (predicates.length () > 0 && dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "Re-inserting %d predicates...\n",
+	     predicates.length ());
+  tree new_ops[3];
+  FOR_EACH_VEC_ELT (predicates, i, slot)
+    {
+      lookup_equiv_heads (slot->length, slot->op, new_ops, e->dest);
+      vn_nary_op_insert_pieces_predicated (slot->length, slot->opcode,
+					   slot->type, new_ops, results[i],
+					   slot->value_id, e);
+    }
+  return val;
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  VNO contains the interested expression.  If
+   INVERSE is true, rule out VNO's results that are different with RESULT;
+   otherwise rule out RESULT itself.  */
+
+static void
+record_equiv_from_prev_phi_1 (vn_nary_op_t vno, edge e, basic_block bb,
+			      tree result, bool inverse)
+{
+  if (!(vno && vno->predicated_values))
+    return;
+
+  /* Process predicated values in VNO, looking for a conflicting result, and
+     the location's single_pred dominates BB.  */
   for (vn_pval *val = vno->u.values; val; val = val->next)
-    for (unsigned i = 0; i < val->n; ++i)
-      /* Do not handle backedge executability optimistically since
-	 when figuring out whether to iterate we do not consider
-	 changed predication.  */
-      if (dominated_by_p_w_unex
-	    (bb, BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]),
-	     false))
-	return val->result;
-  return NULL_TREE;
+    {
+      if (inverse ? val->result == result : val->result != result)
+	continue;
+      /* Found a result to rule out, process each location.  */
+      for (unsigned i = 0; i < val->n; ++i)
+	{
+	  basic_block bb1
+	    = BASIC_BLOCK_FOR_FN (cfun, val->valid_dominated_by_p[i]);
+	  if (!single_pred_p (bb1))
+	    continue;
+	  basic_block p = single_pred (bb1);
+	  if (!dominated_by_p_w_unex (bb, p, false))
+	    continue;
+	  gcc_assert (EDGE_COUNT (p->succs) == 2);
+	  /* Iterate on possible phi_bbs (can be BB itself). Considering
+	     non-executable edges, there can be multiple phi_bbs.  */
+	  for (basic_block phi_bb : get_dominated_by (CDI_DOMINATORS, p))
+	    {
+	      if (!dominated_by_p_w_unex (bb, phi_bb, false))
+		continue;
+	      /* Process PHIs, rule out incoming edges that definitely come
+		 from p->bb1.  If there's a single PHI argument X left, record
+		 the equivalence of X and PHI result at edge e.  */
+	      for (gphi_iterator gsi = gsi_start_nonvirtual_phis (phi_bb);
+		   !gsi_end_p (gsi); gsi_next_nonvirtual_phi (&gsi))
+		{
+		  gphi *phi = gsi.phi ();
+		  tree equiv = NULL_TREE;
+		  for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
+		    {
+		      edge in = gimple_phi_arg_edge (phi, j);
+		      if (in->src == bb1
+			  || dominated_by_p_w_unex (in->src, bb1, false))
+			/* Definitely comes from p->bb1.  */
+			continue;
+		      tree arg = vn_valueize (gimple_phi_arg_def (phi, j));
+		      if (equiv && equiv != arg)
+			{
+			  equiv = NULL_TREE;
+			  break;
+			}
+		      equiv = arg;
+		    }
+		  if (equiv)
+		    val_equiv_insert (equiv, PHI_RESULT (phi), e);
+		}
+	    }
+	}
+    }
+}
+
+/* Record temporary equivalences generated by PHI nodes, and are valid when
+   BB's outgoing edge E is taken.  CODE, OPS and RESULT are the condition and
+   result for E to be taken.  VNO contains the condition's results recorded in
+   nary map.
+
+   The idea is to find a conflicting predicate at some location L, where L's
+   predecessor P dominates BB, so we know path P->L will not be taken if E is
+   taken.  If by ruling out P->L can generate some equivalences, record them at
+   E->dest.  */
+
+static void
+record_equiv_from_prev_phi (vn_nary_op_t vno, edge e, basic_block bb,
+			    tree_code code, tree *ops, tree result)
+{
+  if (!vn_tracking_edge (e))
+    return;
+  record_equiv_from_prev_phi_1 (vno, e, bb, result, true);
+
+  /* For predicates like "a==b is false" or "a!=b is true", equivalences are
+     conflicting results, i.e. "NO_OP (a) is b" or "NO_OP (b) is a".  */
+  if ((result == boolean_true_node
+       && (code == NE_EXPR || code == GT_EXPR || code == LT_EXPR))
+      || (result == boolean_false_node && code == EQ_EXPR))
+    {
+      vn_nary_op_t vnresult;
+      if (TREE_CODE (ops[0]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[0]), ops,
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[1], false);
+	}
+      if (TREE_CODE (ops[1]) == SSA_NAME)
+	{
+	  vn_nary_op_lookup_pieces (1, NOP_EXPR, TREE_TYPE (ops[1]), &ops[1],
+				    &vnresult);
+	  record_equiv_from_prev_phi_1 (vnresult, e, bb, ops[0], false);
+	}
+    }
 }
 
 /* Insert the rhs of STMT into the current hash table with a value number of
@@ -4902,8 +5302,25 @@ static bool
 visit_nary_op (tree lhs, gassign *stmt)
 {
   vn_nary_op_t vnresult;
-  tree result = vn_nary_op_lookup_stmt (stmt, &vnresult);
-  if (! result && vnresult)
+  unsigned length = vn_nary_length_from_stmt (stmt);
+  vn_nary_op_t vno
+    = XALLOCAVAR (struct vn_nary_op_s, sizeof_vn_nary_op (length));
+  init_vn_nary_op_from_stmt (vno, stmt);
+  tree result = NULL_TREE;
+  /* Try to get a simplified result.  */
+  /* Do not simplify variable used in PHI at loop exit, or
+     simplify_peeled_chrec/constant_after_peeling may miss the loop.  */
+  gimple *use_stmt;
+  use_operand_p use_p;
+  if (!(single_imm_use (lhs, &use_p, &use_stmt)
+	&& gimple_code (use_stmt) == GIMPLE_PHI
+	&& single_succ_p (gimple_bb (use_stmt))
+	&& (single_succ_edge (gimple_bb (use_stmt))->flags & EDGE_DFS_BACK)))
+    result = fold_const_from_equiv_heads (vno->length, vno->opcode, vno->op,
+					  vno->type);
+  if (!result)
+    result = vn_nary_op_lookup_1 (vno, &vnresult);
+  if (!result && vnresult)
     result = vn_nary_op_get_predicated_value (vnresult, gimple_bb (stmt));
   if (result)
     return set_ssa_val_to (lhs, result);
@@ -7179,12 +7596,8 @@ insert_related_predicates_on_edge (enum tree_code code, tree *ops, edge pred_e)
 					   ops, boolean_false_node, 0, pred_e);
       break;
     case EQ_EXPR:
-      /* a == b -> ! a {<,>} b */
-      vn_nary_op_insert_pieces_predicated (2, LT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      vn_nary_op_insert_pieces_predicated (2, GT_EXPR, boolean_type_node,
-					   ops, boolean_false_node, 0, pred_e);
-      break;
+      /* No need to insert derived predicates for '==', as they can be computed
+	 by equiv-heads & fold_const_from_equiv_heads.  */
     case LE_EXPR:
     case GE_EXPR:
     case NE_EXPR:
@@ -7351,24 +7764,31 @@ process_bb (rpo_elim &avail, basic_block bb,
       switch (gimple_code (last))
 	{
 	case GIMPLE_SWITCH:
+	  /* TODO: utilize temporary equivalences.  */
 	  e = find_taken_edge (bb, vn_valueize (gimple_switch_index
 						(as_a <gswitch *> (last))));
 	  break;
 	case GIMPLE_COND:
 	  {
-	    tree lhs = vn_valueize (gimple_cond_lhs (last));
-	    tree rhs = vn_valueize (gimple_cond_rhs (last));
-	    tree val = gimple_simplify (gimple_cond_code (last),
-					boolean_type_node, lhs, rhs,
-					NULL, vn_valueize);
+	    tree lhs = lookup_equiv_head (gimple_cond_lhs (last), bb);
+	    tree rhs = lookup_equiv_head (gimple_cond_rhs (last), bb);
+	    tree ops[2];
+	    ops[0] = lhs;
+	    ops[1] = rhs;
+	    tree val = fold_const_from_equiv_heads (2, gimple_cond_code (last),
+						    ops, boolean_type_node);
+	    if (val && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "Got simplified result ");
+		print_generic_expr (dump_file, val, TDF_NONE);
+		fprintf (dump_file, " for ");
+		print_gimple_stmt (dump_file, last, TDF_SLIM);
+	      }
+	    vn_nary_op_t vnresult = NULL;
 	    /* If the condition didn't simplfy see if we have recorded
 	       an expression from sofar taken edges.  */
 	    if (! val || TREE_CODE (val) != INTEGER_CST)
 	      {
-		vn_nary_op_t vnresult;
-		tree ops[2];
-		ops[0] = lhs;
-		ops[1] = rhs;
 		val = vn_nary_op_lookup_pieces (2, gimple_cond_code (last),
 						boolean_type_node, ops,
 						&vnresult);
@@ -7436,6 +7856,15 @@ process_bb (rpo_elim &avail, basic_block bb,
 		    if (false_e)
 		      insert_related_predicates_on_edge (icode, ops, false_e);
 		  }
+
+		/* Try to record equivalences generated by previous PHI nodes
+		    on current true & false edges.  */
+		if (true_e)
+		  record_equiv_from_prev_phi (vnresult, true_e, bb, code, ops,
+					      boolean_true_node);
+		if (false_e)
+		  record_equiv_from_prev_phi (vnresult, false_e, bb, code, ops,
+					      boolean_false_node);
 	      }
 	    break;
 	  }
@@ -7558,6 +7987,7 @@ struct unwind_state
   vn_phi_t phi_top;
   vn_nary_op_t nary_top;
   vn_avail *avail_top;
+  nary_ref *nary_ref_top;
 };
 
 /* Unwind the RPO VN state for iteration.  */
@@ -7607,6 +8037,17 @@ do_unwind (unwind_state *to, rpo_elim &avail)
       av->next = avail.m_avail_freelist;
       avail.m_avail_freelist = av;
     }
+  /* Drain nary_refs to freelist, to reuse the memory.  */
+  for (; last_pushed_nary_ref && last_pushed_nary_ref->ref != to->nary_ref_top;)
+    {
+      vn_ssa_aux_t val = last_pushed_nary_ref;
+      nary_ref *ref = val->ref;
+      val->ref = ref->next;
+      last_pushed_nary_ref = ref->next_undo;
+      ref->next = nary_ref_freelist;
+      nary_ref_freelist = ref;
+      --val->num_nary_ref;
+    }
 }
 
 /* Do VN on a SEME region specified by ENTRY and EXIT_BBS in FN.
@@ -7720,6 +8161,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
   last_inserted_phi = NULL;
   last_inserted_nary = NULL;
   last_pushed_avail = NULL;
+  last_pushed_nary_ref = NULL;
+  nary_ref_freelist = NULL;
 
   vn_valueize = rpo_vn_valueize;
 
@@ -7814,6 +8257,8 @@ do_rpo_vn (function *fn, edge entry, bitmap exit_bbs,
 	    rpo_state[idx].nary_top = last_inserted_nary;
 	    rpo_state[idx].avail_top
 	      = last_pushed_avail ? last_pushed_avail->avail : NULL;
+	    rpo_state[idx].nary_ref_top
+	      = last_pushed_nary_ref ? last_pushed_nary_ref->ref : NULL;
 	  }
 
 	if (!(bb->flags & BB_EXECUTABLE))
diff --git a/gcc/tree-ssa-sccvn.h b/gcc/tree-ssa-sccvn.h
index 7d53ab5e39f..0476dc07292 100644
--- a/gcc/tree-ssa-sccvn.h
+++ b/gcc/tree-ssa-sccvn.h
@@ -217,6 +217,16 @@ struct vn_avail
   struct vn_ssa_aux *next_undo;
 };
 
+/* In vn_ssa_aux structure, hold a lists of references to the nary map entries,
+   so that when recording new equivalence to the value number, we can re-insert
+   expressions' results based on this.  */
+struct nary_ref
+{
+  nary_ref *next;
+  vn_nary_op_t slot;
+  struct vn_ssa_aux *next_undo;
+};
+
 typedef struct vn_ssa_aux
 {
   /* SSA name this vn_ssa_aux is associated with in the lattice.  */
@@ -230,9 +240,15 @@ typedef struct vn_ssa_aux
      for SSA names also serving as values (NAME == VALNUM).  */
   vn_avail *avail;
 
+  /* References to slots in the nary map, with VALNUM as operand.  */
+  nary_ref *ref;
+
   /* Unique identifier that all expressions with the same value have. */
   unsigned int value_id;
 
+  /* nary_ref count.  */
+  unsigned short num_nary_ref;
+
   /* Whether the SSA_NAME has been processed at least once.  */
   unsigned visited : 1;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-03  6:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16 18:13 [PATCH v2] tree-optimization/101186 - extend FRE with "equivalence map" for condition prediction Di Zhao OS
2021-10-01 12:59 ` Richard Biener
2021-10-24 19:03   ` Di Zhao OS
2021-11-10  9:43     ` Richard Biener
2021-11-15 17:24       ` Di Zhao OS
2021-11-18 15:47         ` Andrew MacLeod
2021-12-03  6:42         ` Di Zhao OS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).