public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] A new copy propagation and PHI elimination pass
@ 2023-11-02 13:00 Filip Kastl
  2023-11-17 13:01 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Filip Kastl @ 2023-11-02 13:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenther, hubicka

> Hi,
> 
> this is a patch that I submitted two months ago as an RFC. I added some polish
> since.
> 
> It is a new lightweight pass that removes redundant PHI functions and as a
> bonus does basic copy propagation. With Jan Hubička we measured that it is able
> to remove usually more than 5% of all PHI functions when run among early passes
> (sometimes even 13% or more). Those are mostly PHI functions that would be
> later optimized away but with this pass it is possible to remove them early
> enough so that they don't get streamed when runing LTO (and also potentially
> inlined at multiple places). It is also able to remove some redundant PHIs
> that otherwise would still be present during RTL expansion.
> 
> Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> with and without this patch. These are the sizes of .debug_info and
> .debug_loclists
> 
> .debug_info without patch 181694311
> .debug_info    with patch 181692320
> +0.0011% change
> 
> .debug_loclists without patch 47934753
> .debug_loclists    with patch 47934966
> -0.0004% change
> 
> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> the program working on my machine sadly. Hope this suffices. Seems to me that
> my patch doesn't have a significant impact on debug info.
> 
> Bootstraped and tested* on x86_64-pc-linux-gnu.
> 
> * One testcase (pr79691.c) did regress. However that is because the test is
> dependent on a certain variable not being copy propagated. I will go into more
> detail about this in a reply to this mail.
> 
> Ok to commit?

This is a second version of the patch.  In this version, I modified the
pr79691.c testcase so that it works as intended with other changes from the
patch.

The pr79691.c testcase checks that we get constants from snprintf calls and
that they simplify into a single constant.  The testcase doesn't account for
the fact that this constant may be further copy propagated which is exactly
what happens with this patch applied.

Bootstrapped and tested on x86_64-pc-linux-gnu.

Ok to commit?

Filip Kastl

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

	* Makefile.in: Added sccopy pass.
	* passes.def: Added sccopy pass before LTO streaming and before
	  RTL expansion.
	* tree-pass.h (make_pass_sccopy): Added sccopy pass.
	* tree-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
	  for additional copy propagation this patch adds.
	* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>
---
 gcc/Makefile.in                         |   1 +
 gcc/passes.def                          |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c         |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h                         |   1 +
 gcc/tree-ssa-sccopy.cc                  | 867 ++++++++++++++++++++++++
 6 files changed, 951 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
 create mode 100644 gcc/tree-ssa-sccopy.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a25a1e32fbc..2bd5a015676 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1736,6 +1736,7 @@ OBJS = \
 	tree-ssa-pre.o \
 	tree-ssa-propagate.o \
 	tree-ssa-reassoc.o \
+	tree-ssa-sccopy.o \
 	tree-ssa-sccvn.o \
 	tree-ssa-scopedtables.o \
 	tree-ssa-sink.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_if_to_switch);
 	  NEXT_PASS (pass_convert_switch);
 	  NEXT_PASS (pass_cleanup_eh);
+	  NEXT_PASS (pass_sccopy);
 	  NEXT_PASS (pass_profile);
 	  NEXT_PASS (pass_local_pure_const);
 	  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 	 However, this also causes us to misdiagnose cases that should be
 	 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+      NEXT_PASS (pass_sccopy);
       NEXT_PASS (pass_tail_calls);
       /* Split critical edges before late uninit warning to reduce the
          number of false positives from it.  */
@@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
       NEXT_PASS (pass_tsan);
+      NEXT_PASS (pass_sccopy);
       /* ???  We do want some kind of loop invariant motion, but we possibly
          need to adjust LIM to be more friendly towards preserving accurate
 	 debug information here.  */
diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
new file mode 100644
index 00000000000..1e61a6b320e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sccopy-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
+/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
+
+int __GIMPLE (ssa, startwith ("sccopy"))
+main ()
+{
+  int a;
+  int y;
+  int x;
+  int _1;
+  int _2;
+  int _13;
+
+  __BB(2):
+  if (x_7(D) == 5)
+    goto __BB3;
+  else
+    goto __BB4;
+
+  __BB(3):
+  a_10 = x_7(D);
+  goto __BB5;
+
+  __BB(4):
+  a_9 = y_8(D);
+  goto __BB5;
+
+  __BB(5):
+  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
+  if (x_7(D) == y_8(D))
+    goto __BB6;
+  else
+    goto __BB11;
+
+  __BB(6):
+  a_11 = a_3 + 1;
+  goto __BB7;
+
+  __BB(7):
+  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
+label1:
+  if (x_7(D) != y_8(D))
+    goto __BB8;
+  else
+    goto __BB10;
+
+  __BB(8):
+  goto __BB9;
+
+  __BB(9):
+  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
+  goto __BB10;
+
+  __BB(10,loop_header(1)):
+  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
+label2:
+  _1 = y_8(D) * 2;
+  if (x_7(D) == _1)
+    goto __BB9;
+  else
+    goto __BB11;
+
+  __BB(11):
+  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
+  _2 = x_7(D) * 3;
+  if (y_8(D) == _2)
+    goto __BB7;
+  else
+    goto __BB12;
+
+  __BB(12):
+  _13 = 0;
+  return _13;
+
+}
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
index bf889318c06..43770c95bca 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
@@ -34,4 +34,4 @@ int f4 (int i)
 
 /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } }
    { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } }
-   { dg-final { scan-tree-dump " = 9;" "optimized" } } */
+   { dg-final { scan-tree-dump "return 9;" "optimized" } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 09e6ada5b2f..94a48461590 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-sccopy.cc b/gcc/tree-ssa-sccopy.cc
new file mode 100644
index 00000000000..f6e462f4c71
--- /dev/null
+++ b/gcc/tree-ssa-sccopy.cc
@@ -0,0 +1,867 @@
+/* Strongly-connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Filip Kastl <fkastl@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include <algorithm>
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "tree-cfgcleanup.h"
+#include "builtins.h"
+
+/* Strongly connected copy propagation pass.
+
+   This is a lightweight copy propagation pass that is also able to eliminate
+   redundant PHI statements.  The pass considers the following types of copy
+   statements:
+
+   1 An assignment statement with a single argument.
+
+   _3 = _2;
+   _4 = 5;
+
+   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
+     itself or one other value.
+
+   _5 = PHI <_1>;
+   _6 = PHI <_6, _6, _1, _1>;
+   _7 = PHI <16, _7>;
+
+   3 A set of PHI statements that only refer to each other or to one other
+     value.
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>;
+
+   All of these statements produce copies and can be eliminated from the
+   program.  For a copy statement we identify the value it creates a copy of
+   and replace references to the statement with the value -- we propagate the
+   copy.
+
+   _3 = _2; // Replace all occurences of _3 by _2
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
+
+   To find all three types of copy statements we use an algorithm based on
+   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
+   introduced in an article from 2013[1]. We describe the algorithm bellow.
+
+   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
+   SCC computation we wrap potential copy statements in the 'vertex' struct.
+   To each of these statements we also assign a vertex number ('vxnum'). Since
+   the main algorithm has to be able to compute SCCs of subgraphs of the whole
+   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
+   leaving the subgraph.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+/* State of vertex during Tarjan computation.
+
+   unvisited  Vertex hasn't yet been popped from worklist.
+   vopen      DFS has visited vertex for the first time.  Vertex has been put
+	      on Tarjan stack.
+   closed     DFS has backtracked through vertex.  At this point, vertex
+	      doesn't have any unvisited neighbors.
+   in_scc     Vertex has been popped from Tarjan stack.  */
+
+enum vstate
+{
+  unvisited,
+  vopen,
+  closed,
+  in_scc
+};
+
+/* Information about a vertex.  Used by Tarjan.  */
+
+struct vertex
+{
+  vstate state;
+  unsigned index;
+  unsigned lowlink;
+  gimple* stmt;
+};
+
+/* Set 'dead' flag of gimple statement to true.
+   We remove these statements at the end of the pass.  */
+
+static void
+set_stmt_dead (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_1, true);
+}
+
+/* For each statement from given SCC, mark it 'dead'.  */
+
+static void
+set_scc_dead (vec<gimple *> scc)
+{
+  for (gimple *stmt : scc)
+    set_stmt_dead (stmt);
+}
+
+/* Clear 'dead' flag of gimple statement to false.  */
+
+static void
+clear_stmt_dead (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_1, false);
+}
+
+/* Return value of 'dead' flag of gimple statement.  */
+
+static bool
+is_stmt_dead (gimple* stmt)
+{
+  return gimple_plf (stmt, GF_PLF_1);
+}
+
+/* Set 'using' flag of gimple statement to true.
+   If the flag isn't set, Tarjan will ignore the statement.  */
+
+static void
+tarjan_set_using (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_2, true);
+}
+
+/* Clear 'using' flag of gimple statement to false.  */
+
+static void
+tarjan_clear_using (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_2, false);
+}
+
+/* Return value of 'using' flag of gimple statement.  */
+
+static bool
+tarjan_is_using (gimple* stmt)
+{
+  return gimple_plf (stmt, GF_PLF_2);
+}
+
+/* Set 'vxnum' (vertex number) of statement.  Used by Tarjan.  */
+
+static void
+tarjan_set_vxnum (gimple* stmt, unsigned num)
+{
+  gimple_set_uid (stmt, num);
+}
+
+/* Return 'vxnum' (vertex number) of statement.  Used by Tarjan.  */
+
+static unsigned
+tarjan_vxnum (gimple* stmt)
+{
+  return gimple_uid (stmt);
+}
+
+/* Create and initialize vertex struct for each given statement.  */
+
+static auto_vec<vertex>
+tarjan_stmts_to_vertices (auto_vec<gimple *> &stmts)
+{
+  auto_vec<vertex> result;
+  for (gimple *stmt : stmts)
+    {
+      vertex v;
+      v.state = unvisited;
+      v.index = 0;
+      v.lowlink = 0;
+      v.stmt = stmt;
+
+      result.safe_push (v);
+    }
+  return result;
+}
+
+/* Part of 'tarjan_compute_sccs ()'.  */
+
+static void
+tarjan_visit_neighbor (tree neigh_tree, unsigned parent_vxnum,
+		       auto_vec<vertex> &vs, auto_vec<unsigned> &worklist)
+{
+  if (TREE_CODE (neigh_tree) != SSA_NAME)
+    return; /* Skip any neighbor that isn't an SSA name.  */
+
+  gimple *neigh_stmt = SSA_NAME_DEF_STMT (neigh_tree);
+
+  /* Skip neighbors outside the induced subgraph that Tarjan currently works
+     with.  */
+  if (!tarjan_is_using (neigh_stmt))
+    return;
+  unsigned neigh_vxnum = tarjan_vxnum (neigh_stmt);
+
+  vstate neigh_state = vs[neigh_vxnum].state;
+  vstate parent_state = vs[parent_vxnum].state;
+  if (parent_state == vopen) /* We're currently opening parent.  */
+    {
+      /* Put unvisited neighbors on worklist.  Update lowlink of parent
+	 vertex according to indices of neighbors present on stack.  */
+      switch (neigh_state)
+	{
+	case unvisited:
+	  worklist.safe_push (neigh_vxnum);
+	  break;
+	case vopen:
+	case closed:
+	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
+					       vs[neigh_vxnum].index);
+	  break;
+	case in_scc:
+	  /* Ignore these edges.  */
+	  break;
+	}
+    }
+  else if (parent_state == closed) /* We're currently closing parent.  */
+    {
+      /* Update lowlink of parent vertex according to lowlinks of
+	 children of parent (in terms of DFS tree).  */
+      if (neigh_state == closed)
+	{
+	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
+					       vs[neigh_vxnum].lowlink);
+	}
+    }
+}
+
+/* Implementation of Tarjan's SCC algorithm.
+
+   Compute SCCs in dataflow graph on given statements.  Return the
+   SCCs in a reverse topological order.
+
+   Given statements should be only those for which stmt_may_generate_copy
+   returns 'true'.  */
+
+static auto_vec<vec<gimple *>>
+tarjan_compute_sccs (auto_vec<gimple *> &copy_stmts)
+{
+  auto_vec<vec<gimple *>> sccs;
+  auto_vec<unsigned> worklist; /* DFS stack.  */
+  auto_vec<unsigned> stack; /* Tarjan stack.  */
+  unsigned curr_index = 0;
+
+  auto_vec<vertex> vs = tarjan_stmts_to_vertices (copy_stmts);
+
+  /* Mark the subgraph induced by 'copy_stmts' and index it by vxnums.  */
+  unsigned i;
+  for (i = 0; i < vs.length (); i++)
+    {
+      gimple *stmt = vs[i].stmt;
+      tarjan_set_using (stmt);
+      tarjan_set_vxnum (stmt, i);
+    }
+
+  /* Put all vertices on worklist.  */
+  for (i = 0; i < vs.length (); i++)
+    {
+      worklist.safe_push (i);
+    }
+
+  /* Worklist loop.  */
+  while (!worklist.is_empty ())
+    {
+      unsigned i = worklist.pop ();
+      gimple *stmt = vs[i].stmt;
+      vstate state = vs[i].state;
+
+      if (state == unvisited)
+	{
+	  vs[i].state = vopen;
+
+	  /* Assign index to this vertex.  */
+	  vs[i].index = curr_index;
+	  vs[i].lowlink = curr_index;
+	  curr_index++;
+
+	  /* Put vertex on stack and also on worklist to be closed later.  */
+	  stack.safe_push (i);
+	  worklist.safe_push (i);
+	}
+      else if (state == vopen)
+	vs[i].state = closed;
+
+      /* Visit neighbors of this vertex.  */
+      tree op;
+      gphi *phi;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_PHI:
+	    phi = as_a <gphi *> (stmt);
+	    unsigned j;
+	    for (j = 0; j < gimple_phi_num_args (phi); j++)
+	      {
+		op = gimple_phi_arg_def (phi, j);
+		tarjan_visit_neighbor (op, i, vs, worklist);
+	      }
+	    break;
+	  case GIMPLE_ASSIGN:
+	    op = gimple_assign_rhs1 (stmt);
+	    tarjan_visit_neighbor (op, i, vs, worklist);
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
+      if (state == vopen && vs[i].lowlink == vs[i].index)
+	{
+	  vec<gimple *> scc = vNULL;
+
+	  unsigned j;
+	  do
+	    {
+	      j = stack.pop ();
+	      scc.safe_push (vs[j].stmt);
+	      vs[j].state = in_scc;
+	    }
+	  while (j != i);
+
+	  sccs.safe_push (scc);
+	}
+    }
+
+  if (!stack.is_empty ())
+    gcc_unreachable ();
+
+  /* Clear copy stmts' 'using' flags.  */
+  for (vertex v : vs)
+    {
+      gimple *s = v.stmt;
+      tarjan_clear_using (s);
+    }
+
+  return sccs;
+}
+
+/* Could this statement potentially be a copy statement?
+
+   This pass only considers statements for which this function returns 'true'.
+   Those are basically PHI functions and assignment statements similar to
+
+   _2 = _1;
+   or
+   _2 = 5;  */
+
+static bool
+stmt_may_generate_copy (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      gphi *phi = as_a <gphi *> (stmt);
+
+      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
+	return false;
+
+      unsigned i;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) == SSA_NAME
+	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+	    return false;
+	}
+
+      return true;
+    }
+
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    return false;
+
+  /* If the statement has volatile operands, it won't generate a
+     useful copy.  */
+  if (gimple_has_volatile_ops (stmt))
+    return false;
+
+  /* Statements with loads and/or stores will never generate a useful copy.  */
+  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
+    return false;
+
+  if (!gimple_assign_single_p (stmt))
+    return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+    return false;
+
+  /* If the assignment is from a constant it generates a useful copy.  */
+  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
+    return true;
+
+  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
+
+  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
+    return false;
+
+  /* rhs shouldn't flow through any abnormal edges.  */
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
+    return false;
+
+  if (!rhs)
+    return false;
+
+  /* If rhs and lhs are pointers, alignment of lhs and rhs should be the same.
+     Usage of __builtin_assume_aligned can cause alignment of lhs to be greater
+     than alignment of rhs.  In that case we don't want to propagate rhs since
+     we would lose the alignment information.  */
+  if (POINTER_TYPE_P (TREE_TYPE (lhs))
+      && POINTER_TYPE_P (TREE_TYPE (rhs))
+      && get_pointer_alignment (lhs) != get_pointer_alignment (rhs))
+    return false;
+
+  return true;
+}
+
+/* Return all statements in cfun that could generate copies.  All statements
+   for which stmt_may_generate_copy returns 'true'.  */
+
+static auto_vec<gimple *>
+get_all_stmt_may_generate_copy (void)
+{
+  auto_vec<gimple *> result;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *s = gsi_stmt (gsi);
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *s = pi.phi ();
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+    }
+
+  return result;
+}
+
+/* Cleanup to be performed after every call of 'replace_use_by ()'.  */
+
+void
+cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap need_eh_cleanup,
+		       bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup,
+		       bool can_make_abnormal_goto, bool was_noreturn)
+{
+  basic_block bb = stmt->bb;
+
+  /* If we cleaned up EH information from the statement,
+     remove EH edges.  */
+  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
+    bitmap_set_bit (need_eh_cleanup, bb->index);
+
+  /* If we turned a call with possible abnormal control transfer
+     into one that doesn't, remove abnormal edges.  */
+  if (can_make_abnormal_goto
+      && !stmt_can_make_abnormal_goto (stmt))
+    bitmap_set_bit (need_ab_cleanup, bb->index);
+
+  /* If we turned a not noreturn call into a noreturn one
+     schedule it for fixup.  */
+  if (!was_noreturn
+      && is_gimple_call (stmt)
+      && gimple_call_noreturn_p (stmt))
+    stmts_to_fixup.safe_push (stmt);
+
+  if (gimple_assign_single_p (stmt))
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+
+      if (TREE_CODE (rhs) == ADDR_EXPR)
+	recompute_tree_invariant_for_addr_expr (rhs);
+    }
+
+  update_stmt_if_modified (stmt);
+}
+
+/* Cleanup related to 'replace_use_by ()'. In contrast to
+   'cleanup_after_replace ()', this function needs to be called only at the
+   end of the pass.  */
+
+void
+cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
+				 need_ab_cleanup, vec<gimple *> stmts_to_fixup)
+{
+  if (!bitmap_empty_p (need_eh_cleanup))
+    gimple_purge_all_dead_eh_edges (need_eh_cleanup);
+  if (!bitmap_empty_p (need_ab_cleanup))
+    gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
+
+  /* Fixup stmts that became noreturn calls.  This may require splitting
+     blocks and thus isn't possible during the dominator walk.  Do this
+     in reverse order so we don't inadvertedly remove a stmt we want to
+     fixup by visiting a dominating now noreturn call first.  */
+  while (!stmts_to_fixup.is_empty ())
+    {
+      gimple *stmt = stmts_to_fixup.pop ();
+      fixup_noreturn_call (stmt);
+    }
+}
+
+/* Replace each use of stmt 'get_replaced' by a use of stmt 'replace_by'.  */
+
+static void
+replace_use_by (tree get_replaced, tree replace_by, bitmap need_eh_cleanup,
+		bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup)
+{
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple *use_stmt;
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, get_replaced)
+    {
+      bool was_noreturn = false;
+      bool can_make_abnormal_goto = false;
+      if (is_gimple_call (use_stmt))
+	{
+	  was_noreturn = gimple_call_noreturn_p (use_stmt);
+	  can_make_abnormal_goto = stmt_can_make_abnormal_goto (use_stmt);
+	}
+
+      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+	SET_USE (use_p, unshare_expr (replace_by));
+
+      /* Recompute tree invariant.  */
+      if (gimple_assign_single_p (use_stmt))
+	{
+	  tree rhs = gimple_assign_rhs1 (use_stmt);
+
+	  if (TREE_CODE (rhs) == ADDR_EXPR)
+	    recompute_tree_invariant_for_addr_expr (rhs);
+	}
+
+      /* Cleanup.  */
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      fold_stmt (&gsi);
+      gimple_set_modified (gsi_stmt (gsi), true);
+      cleanup_after_replace (use_stmt, gsi_stmt (gsi), need_eh_cleanup,
+			     need_ab_cleanup, stmts_to_fixup,
+			     can_make_abnormal_goto, was_noreturn);
+    }
+}
+
+/* For each statement from given SCC, replace its usages by value
+   'replace_by'.  */
+
+static void
+replace_scc_by_value (vec<gimple *> scc, tree replace_by, bitmap
+		      need_eh_cleanup, bitmap need_ab_cleanup, vec<gimple *>
+		      stmts_to_fixup)
+{
+  for (gimple *stmt : scc)
+    {
+      tree get_replaced = gimple_get_lhs (stmt);
+      replace_use_by (get_replaced, replace_by, need_eh_cleanup,
+		      need_ab_cleanup, stmts_to_fixup);
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
+}
+
+/* Part of 'sccopy_propagate ()'.  */
+
+static void
+sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
+		 hash_set<gimple *> &scc_set, bool &is_inner,
+		 tree &last_outer_op)
+{
+  bool op_in_scc = false;
+
+  if (TREE_CODE (op) == SSA_NAME)
+    {
+      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
+      if (scc_set.contains (op_stmt))
+	op_in_scc = true;
+    }
+
+  if (!op_in_scc)
+    {
+      outer_ops.add (op);
+      last_outer_op = op;
+      is_inner = false;
+    }
+}
+
+/* Main function of this pass.  Find and propagate all three types of copy
+   statements (see pass description above).
+
+   This is an implementation of an algorithm from the paper Simple and
+   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
+   on strongly-connected components (SCCs) in dataflow graph.  The original
+   algorithm only considers PHI statements.  We extend it to also consider
+   assignment statements of type _2 = _1;.
+
+   The algorithm is based on this definition of a set of redundant PHIs[1]:
+
+     A non-empty set P of PHI functions is redundant iff the PHI functions just
+     reference each other or one other value
+
+   It uses this lemma[1]:
+
+     Let P be a redundant set of PHI functions.  Then there is a
+     strongly-connected component S subset of P that is also redundant.
+
+   The algorithm works in this way:
+
+     1 Find SCCs
+     2 For each SCC S in topological order:
+     3   Construct set 'inner' of statements that only have other statements
+	 from S on their right hand side
+     4   Construct set 'outer' of values that originate outside S and appear on
+	 right hand side of some statement from S
+     5   If |outer| = 1, outer only contains a value v.  Statements in S only
+	 refer to each other or to v -- they are redundant.  Propagate v.
+	 Else, recurse on statements in inner.
+
+   The implementation is non-recursive.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+static void
+sccopy_propagate ()
+{
+  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
+
+  auto_vec<vec<gimple *>> worklist;
+  worklist = tarjan_compute_sccs (useful_stmts);
+
+  /* Prepare data structs for cleanup after stmt modification.  */
+  bitmap need_eh_cleanup = BITMAP_ALLOC (NULL);
+  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
+  vec<gimple *> stmts_to_fixup = vNULL;
+
+  while (!worklist.is_empty ())
+    {
+      vec<gimple *> scc = worklist.pop ();
+
+      auto_vec<gimple *> inner;
+      hash_set<tree> outer_ops;
+      tree last_outer_op = NULL_TREE;
+
+      /* Prepare hash set of PHIs in scc to query later.  */
+      hash_set<gimple *> scc_set;
+      for (gimple *stmt : scc)
+	scc_set.add (stmt);
+
+      for (gimple *stmt : scc)
+	{
+	  bool is_inner = true;
+
+	  gphi *phi;
+	  tree op;
+
+	  switch (gimple_code (stmt))
+	    {
+	      case GIMPLE_PHI:
+		phi = as_a <gphi *> (stmt);
+		unsigned j;
+		for (j = 0; j < gimple_phi_num_args (phi); j++)
+		  {
+		    op = gimple_phi_arg_def (phi, j);
+		    sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+				   last_outer_op);
+		  }
+		break;
+	      case GIMPLE_ASSIGN:
+		op = gimple_assign_rhs1 (stmt);
+		sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+			       last_outer_op);
+		break;
+	      default:
+		gcc_unreachable ();
+	    }
+
+	  if (is_inner)
+	    {
+	      inner.safe_push (stmt);
+	    }
+	}
+
+      if (outer_ops.elements () == 1)
+	{
+	  /* The only operand in outer_ops.  */
+	  tree outer_op = last_outer_op;
+
+	  replace_scc_by_value (scc, outer_op, need_eh_cleanup,
+				need_ab_cleanup, stmts_to_fixup);
+	  set_scc_dead (scc);
+	}
+      else if (outer_ops.elements () > 1)
+	{
+	  /* Add inner sccs to worklist.  */
+	  auto_vec<vec<gimple *>> inner_sccs = tarjan_compute_sccs (inner);
+	  for (vec<gimple *> inner_scc : inner_sccs)
+	    worklist.safe_push (inner_scc);
+	}
+      else
+	gcc_unreachable ();
+
+      scc.release ();
+    }
+
+  cleanup_after_all_replaces_done (need_eh_cleanup, need_ab_cleanup,
+				   stmts_to_fixup);
+
+  /* Remove data structs for cleanup after stmt modification.  */
+  BITMAP_FREE (need_eh_cleanup);
+  BITMAP_FREE (need_ab_cleanup);
+  stmts_to_fixup.release ();
+}
+
+/* Called when pass execution starts.  */
+
+static void
+init_sccopy (void)
+{
+  /* Clear statement flags.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple* stmt = gsi_stmt (gsi);
+	  clear_stmt_dead (stmt);
+	  tarjan_clear_using (stmt);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *stmt = pi.phi ();
+	  clear_stmt_dead (stmt);
+	  tarjan_clear_using (stmt);
+	}
+    }
+}
+
+/* Called before pass execution ends.  */
+
+static void
+finalize_sccopy (void)
+{
+  basic_block bb;
+
+  /* Remove all propagated statements.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (is_stmt_dead (stmt))
+	    gsi_remove (&gsi, true);
+	  else
+	    gsi_next (&gsi);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
+	{
+	  gphi *stmt = pi.phi ();
+
+	  if (is_stmt_dead (stmt))
+	    remove_phi_node (&pi, true);
+	  else
+	    gsi_next (&pi);
+	}
+    }
+
+  /* More cleanup.  */
+  FOR_EACH_BB_FN (bb, cfun)
+    gimple_purge_dead_eh_edges (bb);
+}
+
+namespace {
+
+const pass_data pass_data_sccopy =
+{
+  GIMPLE_PASS, /* type */
+  "sccopy", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_sccopy : public gimple_opt_pass
+{
+public:
+  pass_sccopy (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_sccopy, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
+}; // class pass_sccopy
+
+unsigned
+pass_sccopy::execute (function *)
+{
+  init_sccopy ();
+  sccopy_propagate ();
+  finalize_sccopy ();
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_sccopy (gcc::context *ctxt)
+{
+  return new pass_sccopy (ctxt);
+}
-- 
2.42.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] A new copy propagation and PHI elimination pass
  2023-11-02 13:00 [PATCH v2] A new copy propagation and PHI elimination pass Filip Kastl
@ 2023-11-17 13:01 ` Richard Biener
  2023-11-22 14:20   ` Filip Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-11-17 13:01 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka

On Thu, 2 Nov 2023, Filip Kastl wrote:

> > Hi,
> > 
> > this is a patch that I submitted two months ago as an RFC. I added some polish
> > since.
> > 
> > It is a new lightweight pass that removes redundant PHI functions and as a
> > bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able
> > to remove usually more than 5% of all PHI functions when run among early passes
> > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > later optimized away but with this pass it is possible to remove them early
> > enough so that they don't get streamed when runing LTO (and also potentially
> > inlined at multiple places). It is also able to remove some redundant PHIs
> > that otherwise would still be present during RTL expansion.
> > 
> > Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
> > with and without this patch. These are the sizes of .debug_info and
> > .debug_loclists
> > 
> > .debug_info without patch 181694311
> > .debug_info    with patch 181692320
> > +0.0011% change
> > 
> > .debug_loclists without patch 47934753
> > .debug_loclists    with patch 47934966
> > -0.0004% change
> > 
> > I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> > the program working on my machine sadly. Hope this suffices. Seems to me that
> > my patch doesn't have a significant impact on debug info.
> > 
> > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > 
> > * One testcase (pr79691.c) did regress. However that is because the test is
> > dependent on a certain variable not being copy propagated. I will go into more
> > detail about this in a reply to this mail.
> > 
> > Ok to commit?
> 
> This is a second version of the patch.  In this version, I modified the
> pr79691.c testcase so that it works as intended with other changes from the
> patch.
> 
> The pr79691.c testcase checks that we get constants from snprintf calls and
> that they simplify into a single constant.  The testcase doesn't account for
> the fact that this constant may be further copy propagated which is exactly
> what happens with this patch applied.
> 
> Bootstrapped and tested on x86_64-pc-linux-gnu.
> 
> Ok to commit?
> 
> Filip Kastl
> 
> -- >8 --
> 
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
> 
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
> 
> It also handles more complicated situations, e.g.:
> 
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Added sccopy pass.
> 	* passes.def: Added sccopy pass before LTO streaming and before
> 	  RTL expansion.
> 	* tree-pass.h (make_pass_sccopy): Added sccopy pass.
> 	* tree-ssa-sccopy.cc: New file.

Can you name the new file gimple-ssa-sccopy.cc please?

> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
> 	  for additional copy propagation this patch adds.
> 	* gcc.dg/sccopy-1.c: New test.
> 
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> ---
>  gcc/Makefile.in                         |   1 +
>  gcc/passes.def                          |   3 +
>  gcc/testsuite/gcc.dg/sccopy-1.c         |  78 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
>  gcc/tree-pass.h                         |   1 +
>  gcc/tree-ssa-sccopy.cc                  | 867 ++++++++++++++++++++++++
>  6 files changed, 951 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
>  create mode 100644 gcc/tree-ssa-sccopy.cc
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a25a1e32fbc..2bd5a015676 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1736,6 +1736,7 @@ OBJS = \
>  	tree-ssa-pre.o \
>  	tree-ssa-propagate.o \
>  	tree-ssa-reassoc.o \
> +	tree-ssa-sccopy.o \
>  	tree-ssa-sccvn.o \
>  	tree-ssa-scopedtables.o \
>  	tree-ssa-sink.o \
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..fa6c5a2c9fa 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>  	  NEXT_PASS (pass_if_to_switch);
>  	  NEXT_PASS (pass_convert_switch);
>  	  NEXT_PASS (pass_cleanup_eh);
> +	  NEXT_PASS (pass_sccopy);
>  	  NEXT_PASS (pass_profile);
>  	  NEXT_PASS (pass_local_pure_const);
>  	  NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
>  	 However, this also causes us to misdiagnose cases that should be
>  	 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
>        NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> +      NEXT_PASS (pass_sccopy);
>        NEXT_PASS (pass_tail_calls);
>        /* Split critical edges before late uninit warning to reduce the
>           number of false positives from it.  */
> @@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_sancov);
>        NEXT_PASS (pass_asan);
>        NEXT_PASS (pass_tsan);
> +      NEXT_PASS (pass_sccopy);
>        /* ???  We do want some kind of loop invariant motion, but we possibly
>           need to adjust LIM to be more friendly towards preserving accurate
>  	 debug information here.  */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> +  int a;
> +  int y;
> +  int x;
> +  int _1;
> +  int _2;
> +  int _13;
> +
> +  __BB(2):
> +  if (x_7(D) == 5)
> +    goto __BB3;
> +  else
> +    goto __BB4;
> +
> +  __BB(3):
> +  a_10 = x_7(D);
> +  goto __BB5;
> +
> +  __BB(4):
> +  a_9 = y_8(D);
> +  goto __BB5;
> +
> +  __BB(5):
> +  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> +  if (x_7(D) == y_8(D))
> +    goto __BB6;
> +  else
> +    goto __BB11;
> +
> +  __BB(6):
> +  a_11 = a_3 + 1;
> +  goto __BB7;
> +
> +  __BB(7):
> +  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> +  if (x_7(D) != y_8(D))
> +    goto __BB8;
> +  else
> +    goto __BB10;
> +
> +  __BB(8):
> +  goto __BB9;
> +
> +  __BB(9):
> +  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> +  goto __BB10;
> +
> +  __BB(10,loop_header(1)):
> +  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> +  _1 = y_8(D) * 2;
> +  if (x_7(D) == _1)
> +    goto __BB9;
> +  else
> +    goto __BB11;
> +
> +  __BB(11):
> +  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> +  _2 = x_7(D) * 3;
> +  if (y_8(D) == _2)
> +    goto __BB7;
> +  else
> +    goto __BB12;
> +
> +  __BB(12):
> +  _13 = 0;
> +  return _13;
> +
> +}
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> index bf889318c06..43770c95bca 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> @@ -34,4 +34,4 @@ int f4 (int i)
>  
>  /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } }
>     { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } }
> -   { dg-final { scan-tree-dump " = 9;" "optimized" } } */
> +   { dg-final { scan-tree-dump "return 9;" "optimized" } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-sccopy.cc b/gcc/tree-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..f6e462f4c71
> --- /dev/null
> +++ b/gcc/tree-ssa-sccopy.cc
> @@ -0,0 +1,867 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <fkastl@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "tree-cfgcleanup.h"
> +#include "builtins.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> +   This is a lightweight copy propagation pass that is also able to eliminate
> +   redundant PHI statements.  The pass considers the following types of copy
> +   statements:
> +
> +   1 An assignment statement with a single argument.
> +
> +   _3 = _2;
> +   _4 = 5;
> +
> +   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
> +     itself or one other value.
> +
> +   _5 = PHI <_1>;
> +   _6 = PHI <_6, _6, _1, _1>;
> +   _7 = PHI <16, _7>;
> +
> +   3 A set of PHI statements that only refer to each other or to one other
> +     value.
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>;

this case necessarily involves a cyclic CFG, so maybe say

"This is a lightweight SSA copy propagation pass that is able to handle
cycles optimistically, eliminating PHIs within those."

?  Or is this a mis-characterization?

> +   All of these statements produce copies and can be eliminated from the
> +   program.  For a copy statement we identify the value it creates a copy of
> +   and replace references to the statement with the value -- we propagate the
> +   copy.
> +
> +   _3 = _2; // Replace all occurences of _3 by _2
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> +   To find all three types of copy statements we use an algorithm based on
> +   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
> +   introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> +   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
> +   SCC computation we wrap potential copy statements in the 'vertex' struct.
> +   To each of these statements we also assign a vertex number ('vxnum'). Since
> +   the main algorithm has to be able to compute SCCs of subgraphs of the whole
> +   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> +   leaving the subgraph.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +/* State of vertex during Tarjan computation.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time.  Vertex has been put
> +	      on Tarjan stack.
> +   closed     DFS has backtracked through vertex.  At this point, vertex
> +	      doesn't have any unvisited neighbors.
> +   in_scc     Vertex has been popped from Tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex.  Used by Tarjan.  */
> +
> +struct vertex
> +{
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +  gimple* stmt;
> +};
> +
> +/* Set 'dead' flag of gimple statement to true.
> +   We remove these statements at the end of the pass.  */
> +
> +static void
> +set_stmt_dead (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_1, true);
> +}
> +
> +/* For each statement from given SCC, mark it 'dead'.  */
> +
> +static void
> +set_scc_dead (vec<gimple *> scc)
> +{
> +  for (gimple *stmt : scc)
> +    set_stmt_dead (stmt);
> +}
> +
> +/* Clear 'dead' flag of gimple statement to false.  */
> +
> +static void
> +clear_stmt_dead (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_1, false);
> +}
> +
> +/* Return value of 'dead' flag of gimple statement.  */
> +
> +static bool
> +is_stmt_dead (gimple* stmt)
> +{
> +  return gimple_plf (stmt, GF_PLF_1);
> +}
> +
> +/* Set 'using' flag of gimple statement to true.
> +   If the flag isn't set, Tarjan will ignore the statement.  */
> +
> +static void
> +tarjan_set_using (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_2, true);
> +}
> +
> +/* Clear 'using' flag of gimple statement to false.  */
> +
> +static void
> +tarjan_clear_using (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_2, false);
> +}
> +
> +/* Return value of 'using' flag of gimple statement.  */
> +
> +static bool
> +tarjan_is_using (gimple* stmt)
> +{
> +  return gimple_plf (stmt, GF_PLF_2);
> +}
> +
> +/* Set 'vxnum' (vertex number) of statement.  Used by Tarjan.  */
> +
> +static void
> +tarjan_set_vxnum (gimple* stmt, unsigned num)
> +{
> +  gimple_set_uid (stmt, num);
> +}
> +
> +/* Return 'vxnum' (vertex number) of statement.  Used by Tarjan.  */
> +
> +static unsigned
> +tarjan_vxnum (gimple* stmt)
> +{
> +  return gimple_uid (stmt);
> +}
> +
> +/* Create and initialize vertex struct for each given statement.  */
> +
> +static auto_vec<vertex>
> +tarjan_stmts_to_vertices (auto_vec<gimple *> &stmts)
> +{
> +  auto_vec<vertex> result;
> +  for (gimple *stmt : stmts)
> +    {
> +      vertex v;
> +      v.state = unvisited;
> +      v.index = 0;
> +      v.lowlink = 0;
> +      v.stmt = stmt;
> +
> +      result.safe_push (v);
> +    }
> +  return result;
> +}
> +
> +/* Part of 'tarjan_compute_sccs ()'.  */
> +
> +static void
> +tarjan_visit_neighbor (tree neigh_tree, unsigned parent_vxnum,
> +		       auto_vec<vertex> &vs, auto_vec<unsigned> &worklist)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +
> +  gimple *neigh_stmt = SSA_NAME_DEF_STMT (neigh_tree);
> +
> +  /* Skip neighbors outside the induced subgraph that Tarjan currently works
> +     with.  */
> +  if (!tarjan_is_using (neigh_stmt))
> +    return;
> +  unsigned neigh_vxnum = tarjan_vxnum (neigh_stmt);
> +
> +  vstate neigh_state = vs[neigh_vxnum].state;
> +  vstate parent_state = vs[parent_vxnum].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist.  Update lowlink of parent
> +	 vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +	{
> +	case unvisited:
> +	  worklist.safe_push (neigh_vxnum);
> +	  break;
> +	case vopen:
> +	case closed:
> +	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
> +					       vs[neigh_vxnum].index);
> +	  break;
> +	case in_scc:
> +	  /* Ignore these edges.  */
> +	  break;
> +	}
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +	 children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +	{
> +	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
> +					       vs[neigh_vxnum].lowlink);
> +	}
> +    }
> +}
> +
> +/* Implementation of Tarjan's SCC algorithm.
> +
> +   Compute SCCs in dataflow graph on given statements.  Return the
> +   SCCs in a reverse topological order.
> +
> +   Given statements should be only those for which stmt_may_generate_copy
> +   returns 'true'.  */
> +
> +static auto_vec<vec<gimple *>>
> +tarjan_compute_sccs (auto_vec<gimple *> &copy_stmts)
> +{
> +  auto_vec<vec<gimple *>> sccs;
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +  unsigned curr_index = 0;
> +
> +  auto_vec<vertex> vs = tarjan_stmts_to_vertices (copy_stmts);
> +
> +  /* Mark the subgraph induced by 'copy_stmts' and index it by vxnums.  */
> +  unsigned i;
> +  for (i = 0; i < vs.length (); i++)
> +    {
> +      gimple *stmt = vs[i].stmt;
> +      tarjan_set_using (stmt);
> +      tarjan_set_vxnum (stmt, i);
> +    }
> +
> +  /* Put all vertices on worklist.  */
> +  for (i = 0; i < vs.length (); i++)
> +    {
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop ();
> +      gimple *stmt = vs[i].stmt;
> +      vstate state = vs[i].state;
> +
> +      if (state == unvisited)
> +	{
> +	  vs[i].state = vopen;
> +
> +	  /* Assign index to this vertex.  */
> +	  vs[i].index = curr_index;
> +	  vs[i].lowlink = curr_index;
> +	  curr_index++;
> +
> +	  /* Put vertex on stack and also on worklist to be closed later.  */
> +	  stack.safe_push (i);
> +	  worklist.safe_push (i);
> +	}
> +      else if (state == vopen)
> +	vs[i].state = closed;
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +	{
> +	  case GIMPLE_PHI:
> +	    phi = as_a <gphi *> (stmt);
> +	    unsigned j;
> +	    for (j = 0; j < gimple_phi_num_args (phi); j++)
> +	      {
> +		op = gimple_phi_arg_def (phi, j);
> +		tarjan_visit_neighbor (op, i, vs, worklist);
> +	      }
> +	    break;
> +	  case GIMPLE_ASSIGN:
> +	    op = gimple_assign_rhs1 (stmt);
> +	    tarjan_visit_neighbor (op, i, vs, worklist);
> +	    break;
> +	  default:
> +	    gcc_unreachable ();
> +	}
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
> +      if (state == vopen && vs[i].lowlink == vs[i].index)
> +	{
> +	  vec<gimple *> scc = vNULL;
> +
> +	  unsigned j;
> +	  do
> +	    {
> +	      j = stack.pop ();
> +	      scc.safe_push (vs[j].stmt);
> +	      vs[j].state = in_scc;
> +	    }
> +	  while (j != i);
> +
> +	  sccs.safe_push (scc);

It might be nice to optimize SCCs of size 1 somehow, not sure how
many times these appear - possibly prevent them from even entering
the SCC discovery?

I'll note that while you are working with stmts everywhere that
you are really bound to using SSA defs and those would already
nicely have numbers (the SSA_NAME_VERSION).  In principle the
SCC lattice could be pre-allocated once, indexed by
SSA_NAME_VERSION and you could keep a "generation" number
indicating what SCC discovery round it belongs to (aka the
set_using).

There's a old SCC finding algorithm working on the SSA graph
in the old SCC based value-numbering, for example on the
gcc 7 branch in tree-ssa-sccvn.c:DFS

For reading it would be nice to put the SCC finding in its
own class.

> +	}
> +    }
> +
> +  if (!stack.is_empty ())
> +    gcc_unreachable ();
> +
> +  /* Clear copy stmts' 'using' flags.  */
> +  for (vertex v : vs)
> +    {
> +      gimple *s = v.stmt;
> +      tarjan_clear_using (s);
> +    }
> +
> +  return sccs;
> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> +   This pass only considers statements for which this function returns 'true'.
> +   Those are basically PHI functions and assignment statements similar to
> +
> +   _2 = _1;
> +   or
> +   _2 = 5;  */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +	return false;
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +	{
> +	  tree op = gimple_phi_arg_def (phi, i);
> +	  if (TREE_CODE (op) == SSA_NAME
> +	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +	    return false;
> +	}

When there's more than one non-SSA PHI argument and they are not
the same then the stmt also cannot be a copy, right?

> +      return true;
> +    }
> +
> +  if (gimple_code (stmt) != GIMPLE_ASSIGN)
> +    return false;
> +
> +  /* If the statement has volatile operands, it won't generate a
> +     useful copy.  */
> +  if (gimple_has_volatile_ops (stmt))
> +    return false;
> +
> +  /* Statements with loads and/or stores will never generate a useful copy.  */
> +  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
> +    return false;
> +
> +  if (!gimple_assign_single_p (stmt))
> +    return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    return false;
> +
> +  /* If the assignment is from a constant it generates a useful copy.  */
> +  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
> +    return true;
> +
> +  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
> +
> +  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
> +    return false;
> +
> +  /* rhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    return false;
> +
> +  if (!rhs)
> +    return false;

so this means

 _1 = 0;

doesn't generate a copy?  But you allowed those in PHIs?

Generally I wanted to mention there's gimple_assign_ssa_name_copy_p
which exactly matches _1 = _2 (and nothing more).  There cannot
be volatiles or side-effects in those, no stores or loads
(but SSA_NAME_OCCURS_IN_ABNORMAL_PHI can appear).

Can you use that and remove the unneeded checks above?

> +
> +  /* If rhs and lhs are pointers, alignment of lhs and rhs should be the same.
> +     Usage of __builtin_assume_aligned can cause alignment of lhs to be greater
> +     than alignment of rhs.  In that case we don't want to propagate rhs since
> +     we would lose the alignment information.  */

The same might be true for value-ranges.  I think it would be better to
do it like copy-prop does and in cycles propagate the info to the
copy-of variable we substitute with.  For non-cyclic cases we cannot
generally copy flow-sensitive info.

A more conservative test would be to compare
SSA_NAME_PTR_INFO for pointers and SSA_NAME_RANGE_INFO for non-pointers.

> +  if (POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && get_pointer_alignment (lhs) != get_pointer_alignment (rhs))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return all statements in cfun that could generate copies.  All statements
> +   for which stmt_may_generate_copy returns 'true'.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple *s = gsi_stmt (gsi);
> +	  if (stmt_may_generate_copy (s))
> +	    result.safe_push (s);
> +	}
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +	{
> +	  gimple *s = pi.phi ();
> +	  if (stmt_may_generate_copy (s))
> +	    result.safe_push (s);
> +	}
> +    }

I wonder if this can do stmt-to-vertices on-the-fly, thus not
build the vector of gimple *.  That is, you build very many
vec<>s (probably not many in practice, but ...)

> +  return result;
> +}
> +
> +/* Cleanup to be performed after every call of 'replace_use_by ()'.  */
> +
> +void
> +cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap need_eh_cleanup,
> +		       bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup,
> +		       bool can_make_abnormal_goto, bool was_noreturn)
> +{
> +  basic_block bb = stmt->bb;
> +
> +  /* If we cleaned up EH information from the statement,
> +     remove EH edges.  */
> +  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
> +    bitmap_set_bit (need_eh_cleanup, bb->index);
> +
> +  /* If we turned a call with possible abnormal control transfer
> +     into one that doesn't, remove abnormal edges.  */
> +  if (can_make_abnormal_goto
> +      && !stmt_can_make_abnormal_goto (stmt))
> +    bitmap_set_bit (need_ab_cleanup, bb->index);
> +
> +  /* If we turned a not noreturn call into a noreturn one
> +     schedule it for fixup.  */
> +  if (!was_noreturn
> +      && is_gimple_call (stmt)
> +      && gimple_call_noreturn_p (stmt))
> +    stmts_to_fixup.safe_push (stmt);
> +
> +  if (gimple_assign_single_p (stmt))
> +    {
> +      tree rhs = gimple_assign_rhs1 (stmt);
> +
> +      if (TREE_CODE (rhs) == ADDR_EXPR)
> +	recompute_tree_invariant_for_addr_expr (rhs);
> +    }
> +
> +  update_stmt_if_modified (stmt);
> +}
> +
> +/* Cleanup related to 'replace_use_by ()'. In contrast to
> +   'cleanup_after_replace ()', this function needs to be called only at the
> +   end of the pass.  */
> +
> +void
> +cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
> +				 need_ab_cleanup, vec<gimple *> stmts_to_fixup)
> +{
> +  if (!bitmap_empty_p (need_eh_cleanup))
> +    gimple_purge_all_dead_eh_edges (need_eh_cleanup);
> +  if (!bitmap_empty_p (need_ab_cleanup))
> +    gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
> +
> +  /* Fixup stmts that became noreturn calls.  This may require splitting
> +     blocks and thus isn't possible during the dominator walk.  Do this
> +     in reverse order so we don't inadvertedly remove a stmt we want to
> +     fixup by visiting a dominating now noreturn call first.  */
> +  while (!stmts_to_fixup.is_empty ())
> +    {
> +      gimple *stmt = stmts_to_fixup.pop ();
> +      fixup_noreturn_call (stmt);
> +    }
> +}
> +
> +/* Replace each use of stmt 'get_replaced' by a use of stmt 'replace_by'.  */
> +
> +static void
> +replace_use_by (tree get_replaced, tree replace_by, bitmap need_eh_cleanup,
> +		bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup)
> +{

If I understand correctly sofar you'll never replace a SSA name by
a constant, so REPLACE_BY is always an SSA_NAME.

I wonder if you ever hit the need to do any of the fancy cleanups.

In fact I suggest you use the existing replace_uses_by API instead
of re-inventing it.

> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *use_stmt;
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, get_replaced)
> +    {
> +      bool was_noreturn = false;
> +      bool can_make_abnormal_goto = false;
> +      if (is_gimple_call (use_stmt))
> +	{
> +	  was_noreturn = gimple_call_noreturn_p (use_stmt);
> +	  can_make_abnormal_goto = stmt_can_make_abnormal_goto (use_stmt);
> +	}
> +
> +      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +	SET_USE (use_p, unshare_expr (replace_by));
> +
> +      /* Recompute tree invariant.  */
> +      if (gimple_assign_single_p (use_stmt))
> +	{
> +	  tree rhs = gimple_assign_rhs1 (use_stmt);
> +
> +	  if (TREE_CODE (rhs) == ADDR_EXPR)
> +	    recompute_tree_invariant_for_addr_expr (rhs);
> +	}
> +
> +      /* Cleanup.  */
> +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +      fold_stmt (&gsi);
> +      gimple_set_modified (gsi_stmt (gsi), true);
> +      cleanup_after_replace (use_stmt, gsi_stmt (gsi), need_eh_cleanup,
> +			     need_ab_cleanup, stmts_to_fixup,
> +			     can_make_abnormal_goto, was_noreturn);
> +    }
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> +   'replace_by'.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree replace_by, bitmap
> +		      need_eh_cleanup, bitmap need_ab_cleanup, vec<gimple *>
> +		      stmts_to_fixup)
> +{
> +  for (gimple *stmt : scc)
> +    {
> +      tree get_replaced = gimple_get_lhs (stmt);
> +      replace_use_by (get_replaced, replace_by, need_eh_cleanup,
> +		      need_ab_cleanup, stmts_to_fixup);
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'.  */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> +		 hash_set<gimple *> &scc_set, bool &is_inner,
> +		 tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +	op_in_scc = true;
> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);
> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Main function of this pass.  Find and propagate all three types of copy
> +   statements (see pass description above).
> +
> +   This is an implementation of an algorithm from the paper Simple and
> +   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
> +   on strongly-connected components (SCCs) in dataflow graph.  The original
> +   algorithm only considers PHI statements.  We extend it to also consider
> +   assignment statements of type _2 = _1;.
> +
> +   The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> +     A non-empty set P of PHI functions is redundant iff the PHI functions just
> +     reference each other or one other value
> +
> +   It uses this lemma[1]:
> +
> +     Let P be a redundant set of PHI functions.  Then there is a
> +     strongly-connected component S subset of P that is also redundant.
> +
> +   The algorithm works in this way:
> +
> +     1 Find SCCs
> +     2 For each SCC S in topological order:
> +     3   Construct set 'inner' of statements that only have other statements
> +	 from S on their right hand side
> +     4   Construct set 'outer' of values that originate outside S and appear on
> +	 right hand side of some statement from S
> +     5   If |outer| = 1, outer only contains a value v.  Statements in S only
> +	 refer to each other or to v -- they are redundant.  Propagate v.
> +	 Else, recurse on statements in inner.
> +
> +   The implementation is non-recursive.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +static void
> +sccopy_propagate ()
> +{
> +  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> +
> +  auto_vec<vec<gimple *>> worklist;
> +  worklist = tarjan_compute_sccs (useful_stmts);
> +
> +  /* Prepare data structs for cleanup after stmt modification.  */
> +  bitmap need_eh_cleanup = BITMAP_ALLOC (NULL);
> +  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
> +  vec<gimple *> stmts_to_fixup = vNULL;
> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */
> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +	scc_set.add (stmt);
> +
> +      for (gimple *stmt : scc)
> +	{
> +	  bool is_inner = true;
> +
> +	  gphi *phi;
> +	  tree op;
> +
> +	  switch (gimple_code (stmt))
> +	    {
> +	      case GIMPLE_PHI:
> +		phi = as_a <gphi *> (stmt);
> +		unsigned j;
> +		for (j = 0; j < gimple_phi_num_args (phi); j++)
> +		  {
> +		    op = gimple_phi_arg_def (phi, j);
> +		    sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +				   last_outer_op);
> +		  }
> +		break;
> +	      case GIMPLE_ASSIGN:
> +		op = gimple_assign_rhs1 (stmt);
> +		sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +			       last_outer_op);
> +		break;
> +	      default:
> +		gcc_unreachable ();
> +	    }
> +
> +	  if (is_inner)
> +	    {
> +	      inner.safe_push (stmt);
> +	    }
> +	}
> +
> +      if (outer_ops.elements () == 1)
> +	{
> +	  /* The only operand in outer_ops.  */
> +	  tree outer_op = last_outer_op;
> +
> +	  replace_scc_by_value (scc, outer_op, need_eh_cleanup,
> +				need_ab_cleanup, stmts_to_fixup);
> +	  set_scc_dead (scc);
> +	}
> +      else if (outer_ops.elements () > 1)
> +	{
> +	  /* Add inner sccs to worklist.  */
> +	  auto_vec<vec<gimple *>> inner_sccs = tarjan_compute_sccs (inner);
> +	  for (vec<gimple *> inner_scc : inner_sccs)
> +	    worklist.safe_push (inner_scc);
> +	}
> +      else
> +	gcc_unreachable ();
> +
> +      scc.release ();
> +    }
> +
> +  cleanup_after_all_replaces_done (need_eh_cleanup, need_ab_cleanup,
> +				   stmts_to_fixup);
> +
> +  /* Remove data structs for cleanup after stmt modification.  */
> +  BITMAP_FREE (need_eh_cleanup);
> +  BITMAP_FREE (need_ab_cleanup);
> +  stmts_to_fixup.release ();
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccopy (void)
> +{
> +  /* Clear statement flags.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple* stmt = gsi_stmt (gsi);
> +	  clear_stmt_dead (stmt);
> +	  tarjan_clear_using (stmt);
> +	}
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +	{
> +	  gimple *stmt = pi.phi ();
> +	  clear_stmt_dead (stmt);
> +	  tarjan_clear_using (stmt);
> +	}
> +    }
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccopy (void)
> +{
> +  basic_block bb;
> +
> +  /* Remove all propagated statements.  */
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)

You generally should remove stmts back-to-front to make
debug stmt generation work.  To not re-invent the wheel
it might be simplest to use simple_dce_from_worklist with
the dead stmts defs as seeds (just remove set/is_stmt_dead
and track dead SSA defs in a bitmap).

> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  if (is_stmt_dead (stmt))
> +	    gsi_remove (&gsi, true);
> +	  else
> +	    gsi_next (&gsi);

this also fails to release the SSA def in the stmt.

  release_defs (stmt);

would do that (or simple_dce_from_worklist).

> +	}
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
> +	{
> +	  gphi *stmt = pi.phi ();
> +
> +	  if (is_stmt_dead (stmt))
> +	    remove_phi_node (&pi, true);
> +	  else
> +	    gsi_next (&pi);
> +	}
> +    }
> +
> +  /* More cleanup.  */
> +  FOR_EACH_BB_FN (bb, cfun)
> +    gimple_purge_dead_eh_edges (bb);

You only removed copies, I'm sure this should neve do anything?

As a general comment I wonder if we can simplify most of the
pass by starting SSA based SCC discovery from a copy stmt and
whether we can reduce the number of vec<>s that are allocated
and possibly re-allocated many times?

Thanks,
Richard.

> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccopy", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> +  pass_sccopy (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccopy, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> +  init_sccopy ();
> +  sccopy_propagate ();
> +  finalize_sccopy ();
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> +  return new pass_sccopy (ctxt);
> +}
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] A new copy propagation and PHI elimination pass
  2023-11-17 13:01 ` Richard Biener
@ 2023-11-22 14:20   ` Filip Kastl
  2023-11-30 13:04     ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Filip Kastl @ 2023-11-22 14:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hubicka

Hi Richard,

> Can you name the new file gimple-ssa-sccopy.cc please?

Yes, no problem.

Btw, I thought that it is standard that gimple ssa passes have the tree-ssa-
prefix. Do I understand it correctly that this is not true and many
tree-ssa-*.cc passes should actually be named gimple-ssa-*.cc but remain
tree-ssa-*.cc for historical reasons?

>> +   3 A set of PHI statements that only refer to each other or to one other
>> +     value.
>> +
>> +   _8 = PHI <_9, _10>;
>> +   _9 = PHI <_8, _10>;
>> +   _10 = PHI <_8, _9, _1>;
> 
> this case necessarily involves a cyclic CFG, so maybe say
> 
> "This is a lightweight SSA copy propagation pass that is able to handle
> cycles optimistically, eliminating PHIs within those."
> 
> ?  Or is this a mis-characterization?

I'm not sure what you mean here. Yes, this case always involves a cyclic CFG.
Is it weird that a lightweight pass is able to handle cyclic CFG and therefore
you suggest to comment this fact and say that the pass handles cycles
optimistically?

I'm not sure if optimistic is a good word to characterize the pass. I'd expect
an "optimistic" pass to make assumptions which may not be true and therefore
not always all redundancies it can. This pass however should achieve all that
it sets out to do.

> It might be nice to optimize SCCs of size 1 somehow, not sure how
> many times these appear - possibly prevent them from even entering
> the SCC discovery?

Maybe that could be done. I would have to think about it and make sure it
doesn't break anything. I'd prefer to get this version into upstream and then
possibly post this upgrade later.

Btw, SCCs of size of size 1 appear all the time. Those are the cases 1 and 2
described in the comment at the beginning of the file.

> I'll note that while you are working with stmts everywhere that
> you are really bound to using SSA defs and those would already
> nicely have numbers (the SSA_NAME_VERSION).  In principle the
> SCC lattice could be pre-allocated once, indexed by
> SSA_NAME_VERSION and you could keep a "generation" number
> indicating what SCC discovery round it belongs to (aka the
> set_using).

I see. I could allocate a vertex struct for each statement only once when the
pass is invoked instead of allocating the structs each time tarjan_compute_sccs
is called. Will do that.

I'm not sure if I want to use SSA_NAME_VERSION for indexing an vec/array with
all those vertex structs. Many SSA names will be defined neither by PHI nor by
a copy assignment statement. If I created a vertex struct for every SSA name I
would allocate a lot of extra memory.

> There's a old SCC finding algorithm working on the SSA graph
> in the old SCC based value-numbering, for example on the
> gcc 7 branch in tree-ssa-sccvn.c:DFS

> For reading it would be nice to put the SCC finding in its
> own class.

Okay, I'll do that.

> > +	}
> > +    }
> > +
> > +  if (!stack.is_empty ())
> > +    gcc_unreachable ();
> > +
> > +  /* Clear copy stmts' 'using' flags.  */
> > +  for (vertex v : vs)
> > +    {
> > +      gimple *s = v.stmt;
> > +      tarjan_clear_using (s);
> > +    }
> > +
> > +  return sccs;
> > +}
> > +
> > +/* Could this statement potentially be a copy statement?
> > +
> > +   This pass only considers statements for which this function returns 'true'.
> > +   Those are basically PHI functions and assignment statements similar to
> > +
> > +   _2 = _1;
> > +   or
> > +   _2 = 5;  */
> > +
> > +static bool
> > +stmt_may_generate_copy (gimple *stmt)
> > +{
> > +  if (gimple_code (stmt) == GIMPLE_PHI)
> > +    {
> > +      gphi *phi = as_a <gphi *> (stmt);
> > +
> > +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> > +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> > +	return false;
> > +
> > +      unsigned i;
> > +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> > +	{
> > +	  tree op = gimple_phi_arg_def (phi, i);
> > +	  if (TREE_CODE (op) == SSA_NAME
> > +	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> > +	    return false;
> > +	}
> 
> When there's more than one non-SSA PHI argument and they are not
> the same then the stmt also cannot be a copy, right?
> 
> > +      return true;
> > +    }

Do I understand you correctly that you propose to put another check here?
Something like

unsigned nonssa_args_num = 0;
unsigned i;
for (i = 0; i < gimple_phi_num_args (phi); i++)
  {
    tree op = gimple_phi_arg_def (phi, i);
    if (TREE_CODE (op) == SSA_NAME)
      {
        nonssa_args_num++;
	if (nonssa_args_num >= 2)
	  return false;

	if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
	  return false;
      }
  }

> > +
> > +  if (gimple_code (stmt) != GIMPLE_ASSIGN)
> > +    return false;
> > +
> > +  /* If the statement has volatile operands, it won't generate a
> > +     useful copy.  */
> > +  if (gimple_has_volatile_ops (stmt))
> > +    return false;
> > +
> > +  /* Statements with loads and/or stores will never generate a useful copy.  */
> > +  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
> > +    return false;
> > +
> > +  if (!gimple_assign_single_p (stmt))
> > +    return false;
> > +
> > +  tree lhs = gimple_assign_lhs (stmt);
> > +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> > +    return false;
> > +
> > +  /* If the assignment is from a constant it generates a useful copy.  */
> > +  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
> > +    return true;
> > +
> > +  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
> > +
> > +  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
> > +    return false;
> > +
> > +  /* rhs shouldn't flow through any abnormal edges.  */
> > +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> > +    return false;
> > +
> > +  if (!rhs)
> > +    return false;
> 
> so this means
> 
>  _1 = 0;
> 
> doesn't generate a copy?  But you allowed those in PHIs?
> 
> Generally I wanted to mention there's gimple_assign_ssa_name_copy_p
> which exactly matches _1 = _2 (and nothing more).  There cannot
> be volatiles or side-effects in those, no stores or loads
> (but SSA_NAME_OCCURS_IN_ABNORMAL_PHI can appear).
> 
> Can you use that and remove the unneeded checks above?

I'm pretty sure that _1 = 0; generates a copy. The lines

if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
  return true;

ensure that. I think I have even considered using gimple_assign_ssa_name_copy_p
at some point but decided to instead write more elementary checks. I think I
did that so that I am able to mark assignments of constants like _1 = 0; as
generating copy statements too.

> > +  /* If rhs and lhs are pointers, alignment of lhs and rhs should be the same.
> > +     Usage of __builtin_assume_aligned can cause alignment of lhs to be greater
> > +     than alignment of rhs.  In that case we don't want to propagate rhs since
> > +     we would lose the alignment information.  */
> 
> The same might be true for value-ranges.  I think it would be better to
> do it like copy-prop does and in cycles propagate the info to the
> copy-of variable we substitute with.  For non-cyclic cases we cannot
> generally copy flow-sensitive info.

This sounds complicated. I already thought about propagating the alignment
info. But detecting if it is part of a cycle or not... right now I'm not sure
how I would go about doing that. Thanks for the note that also value-range info
needs to be handled.

> A more conservative test would be to compare
> SSA_NAME_PTR_INFO for pointers and SSA_NAME_RANGE_INFO for non-pointers.

Let's do this more conservative check. Then later I could upgrade the pass to
do the aligment and value-range info propagation.

> > +static auto_vec<gimple *>
> > +get_all_stmt_may_generate_copy (void)
> > +{
> > +  auto_vec<gimple *> result;
> > +
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    {
> > +      gimple_stmt_iterator gsi;
> > +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +	{
> > +	  gimple *s = gsi_stmt (gsi);
> > +	  if (stmt_may_generate_copy (s))
> > +	    result.safe_push (s);
> > +	}
> > +
> > +      gphi_iterator pi;
> > +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> > +	{
> > +	  gimple *s = pi.phi ();
> > +	  if (stmt_may_generate_copy (s))
> > +	    result.safe_push (s);
> > +	}
> > +    }
> 
> I wonder if this can do stmt-to-vertices on-the-fly, thus not
> build the vector of gimple *.  That is, you build very many
> vec<>s (probably not many in practice, but ...)

If I understand correctly, you're suggesting to check if a statement may
generate copy during runtime of the sccopy algorithm. I guess that could be
done but I think that it would make the code less readable. And it wouldn't
reduce the number of used vec<>s that much -- get_all_stmt_may_generate_copy is
run just once during the execution of the pass so it only creates one vec<>.
Correct me if I didn't get what you meant here.


> If I understand correctly sofar you'll never replace a SSA name by
> a constant, so REPLACE_BY is always an SSA_NAME.

No, I certainly do replace SSA names by constants.

> I wonder if you ever hit the need to do any of the fancy cleanups.

I originally didn't have them then I ran the GCC testsuite and some tests were
failing so I added the cleanups.

> In fact I suggest you use the existing replace_uses_by API instead
> of re-inventing it.

Oh nice. I didn't know that replace_uses_by API already exists in GCC. Will
look into that.

> > +  /* Remove all propagated statements.  */
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    {
> > +      gimple_stmt_iterator gsi;
> > +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)

> You generally should remove stmts back-to-front to make
> debug stmt generation work.  To not re-invent the wheel
> it might be simplest to use simple_dce_from_worklist with
> the dead stmts defs as seeds (just remove set/is_stmt_dead
> and track dead SSA defs in a bitmap).
> 
> > +	{
> > +	  gimple *stmt = gsi_stmt (gsi);
> > +	  if (is_stmt_dead (stmt))
> > +	    gsi_remove (&gsi, true);
> > +	  else
> > +	    gsi_next (&gsi);
> 
> this also fails to release the SSA def in the stmt.
> 
>   release_defs (stmt);
> 
> would do that (or simple_dce_from_worklist).

Interesting. Wouldn't expect that the order of removing statements would have
influence on debug information.

Alright. Will rewrite this part of the pass to use simple_dce_from_worklist.

> > +	}
> > +
> > +      gphi_iterator pi;
> > +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
> > +	{
> > +	  gphi *stmt = pi.phi ();
> > +
> > +	  if (is_stmt_dead (stmt))
> > +	    remove_phi_node (&pi, true);
> > +	  else
> > +	    gsi_next (&pi);
> > +	}
> > +    }
> > +
> > +  /* More cleanup.  */
> > +  FOR_EACH_BB_FN (bb, cfun)
> > +    gimple_purge_dead_eh_edges (bb);
> 
> You only removed copies, I'm sure this should neve do anything?

I think some testsuite testcases fail if I don't include this eh cleanup in my
pass. I don't understand eh edges very well but my explanation for this is that
I may replace an SSA name with a constant and ensure that a statement won't
throw any exceptions anymore. For example in a x = 10 / y; statement I may
replace y by 5 which ensures that we won't have to handle zero division
exception for this statement.

> As a general comment I wonder if we can simplify most of the
> pass by starting SSA based SCC discovery from a copy stmt and
> whether we can reduce the number of vec<>s that are allocated
> and possibly re-allocated many times?

I'm not sure what you mean by "starting SCC discovery from a copy stmt" here.

Thanks for the review!
Filip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] A new copy propagation and PHI elimination pass
  2023-11-22 14:20   ` Filip Kastl
@ 2023-11-30 13:04     ` Richard Biener
  2023-12-08 15:15       ` [PATCH v3] " Filip Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-11-30 13:04 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka

On Wed, 22 Nov 2023, Filip Kastl wrote:

> Hi Richard,
> 
> > Can you name the new file gimple-ssa-sccopy.cc please?
> 
> Yes, no problem.
> 
> Btw, I thought that it is standard that gimple ssa passes have the tree-ssa-
> prefix. Do I understand it correctly that this is not true and many
> tree-ssa-*.cc passes should actually be named gimple-ssa-*.cc but remain
> tree-ssa-*.cc for historical reasons?

Yes.  Newer passes are gimple[-ssa]-xyz.cc, but it's just names ...

> >> +   3 A set of PHI statements that only refer to each other or to one other
> >> +     value.
> >> +
> >> +   _8 = PHI <_9, _10>;
> >> +   _9 = PHI <_8, _10>;
> >> +   _10 = PHI <_8, _9, _1>;
> > 
> > this case necessarily involves a cyclic CFG, so maybe say
> > 
> > "This is a lightweight SSA copy propagation pass that is able to handle
> > cycles optimistically, eliminating PHIs within those."
> > 
> > ?  Or is this a mis-characterization?
> 
> I'm not sure what you mean here. Yes, this case always involves a cyclic CFG.
> Is it weird that a lightweight pass is able to handle cyclic CFG and therefore
> you suggest to comment this fact and say that the pass handles cycles
> optimistically?
>
> I'm not sure if optimistic is a good word to characterize the pass. I'd expect
> an "optimistic" pass to make assumptions which may not be true and therefore
> not always all redundancies it can. This pass however should achieve all that
> it sets out to do.

I thought "optimistic" refers to defering processing of the backedge
value, "optimistically" treating it equal until you prove otherwise
when visiting its definition.

Otherwise there would be no difference to the existing copy propagation
pass which handles copies not optimistically (well, for other reasons).
 
> > It might be nice to optimize SCCs of size 1 somehow, not sure how
> > many times these appear - possibly prevent them from even entering
> > the SCC discovery?
> 
> Maybe that could be done. I would have to think about it and make sure it
> doesn't break anything. I'd prefer to get this version into upstream and then
> possibly post this upgrade later.
> 
> Btw, SCCs of size of size 1 appear all the time. Those are the cases 1 and 2
> described in the comment at the beginning of the file.
> 
> > I'll note that while you are working with stmts everywhere that
> > you are really bound to using SSA defs and those would already
> > nicely have numbers (the SSA_NAME_VERSION).  In principle the
> > SCC lattice could be pre-allocated once, indexed by
> > SSA_NAME_VERSION and you could keep a "generation" number
> > indicating what SCC discovery round it belongs to (aka the
> > set_using).
> 
> I see. I could allocate a vertex struct for each statement only once when the
> pass is invoked instead of allocating the structs each time tarjan_compute_sccs
> is called. Will do that.
> 
> I'm not sure if I want to use SSA_NAME_VERSION for indexing an vec/array with
> all those vertex structs. Many SSA names will be defined neither by PHI nor by
> a copy assignment statement. If I created a vertex struct for every SSA name I
> would allocate a lot of extra memory.

Not sure I would characterize it as a lot.

> > There's a old SCC finding algorithm working on the SSA graph
> > in the old SCC based value-numbering, for example on the
> > gcc 7 branch in tree-ssa-sccvn.c:DFS
> 
> > For reading it would be nice to put the SCC finding in its
> > own class.
> 
> Okay, I'll do that.
> 
> > > +	}
> > > +    }
> > > +
> > > +  if (!stack.is_empty ())
> > > +    gcc_unreachable ();
> > > +
> > > +  /* Clear copy stmts' 'using' flags.  */
> > > +  for (vertex v : vs)
> > > +    {
> > > +      gimple *s = v.stmt;
> > > +      tarjan_clear_using (s);
> > > +    }
> > > +
> > > +  return sccs;
> > > +}
> > > +
> > > +/* Could this statement potentially be a copy statement?
> > > +
> > > +   This pass only considers statements for which this function returns 'true'.
> > > +   Those are basically PHI functions and assignment statements similar to
> > > +
> > > +   _2 = _1;
> > > +   or
> > > +   _2 = 5;  */
> > > +
> > > +static bool
> > > +stmt_may_generate_copy (gimple *stmt)
> > > +{
> > > +  if (gimple_code (stmt) == GIMPLE_PHI)
> > > +    {
> > > +      gphi *phi = as_a <gphi *> (stmt);
> > > +
> > > +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> > > +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> > > +	return false;
> > > +
> > > +      unsigned i;
> > > +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> > > +	{
> > > +	  tree op = gimple_phi_arg_def (phi, i);
> > > +	  if (TREE_CODE (op) == SSA_NAME
> > > +	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> > > +	    return false;
> > > +	}
> > 
> > When there's more than one non-SSA PHI argument and they are not
> > the same then the stmt also cannot be a copy, right?
> > 
> > > +      return true;
> > > +    }
> 
> Do I understand you correctly that you propose to put another check here?
> Something like
> 
> unsigned nonssa_args_num = 0;
> unsigned i;
> for (i = 0; i < gimple_phi_num_args (phi); i++)
>   {
>     tree op = gimple_phi_arg_def (phi, i);
>     if (TREE_CODE (op) == SSA_NAME)
>       {
>         nonssa_args_num++;
> 	if (nonssa_args_num >= 2)
> 	  return false;
> 
> 	if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> 	  return false;
>       }
>   }


If you do allow constants then maybe

  tree const_arg = NULL_TREE:
  for (...)
   {
     if (TREE_CODE (op) != SSA_NAME)
       {
          if (!const_arg)
            const_arg = op;
          else if (!operand_equal_p (const_arg, op))
            return false;
       }

instead?  You'd then optimistically assume all SSA name args are
equal to the constant by other means.

> > > +
> > > +  if (gimple_code (stmt) != GIMPLE_ASSIGN)
> > > +    return false;
> > > +
> > > +  /* If the statement has volatile operands, it won't generate a
> > > +     useful copy.  */
> > > +  if (gimple_has_volatile_ops (stmt))
> > > +    return false;
> > > +
> > > +  /* Statements with loads and/or stores will never generate a useful copy.  */
> > > +  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
> > > +    return false;
> > > +
> > > +  if (!gimple_assign_single_p (stmt))
> > > +    return false;
> > > +
> > > +  tree lhs = gimple_assign_lhs (stmt);
> > > +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> > > +    return false;
> > > +
> > > +  /* If the assignment is from a constant it generates a useful copy.  */
> > > +  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
> > > +    return true;
> > > +
> > > +  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
> > > +
> > > +  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
> > > +    return false;
> > > +
> > > +  /* rhs shouldn't flow through any abnormal edges.  */
> > > +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> > > +    return false;
> > > +
> > > +  if (!rhs)
> > > +    return false;
> > 
> > so this means
> > 
> >  _1 = 0;
> > 
> > doesn't generate a copy?  But you allowed those in PHIs?
> > 
> > Generally I wanted to mention there's gimple_assign_ssa_name_copy_p
> > which exactly matches _1 = _2 (and nothing more).  There cannot
> > be volatiles or side-effects in those, no stores or loads
> > (but SSA_NAME_OCCURS_IN_ABNORMAL_PHI can appear).
> > 
> > Can you use that and remove the unneeded checks above?
> 
> I'm pretty sure that _1 = 0; generates a copy. The lines
> 
> if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
>   return true;
> 
> ensure that. I think I have even considered using gimple_assign_ssa_name_copy_p
> at some point but decided to instead write more elementary checks. I think I
> did that so that I am able to mark assignments of constants like _1 = 0; as
> generating copy statements too.

I see.  But then you simply want

     if (!(TREE_CODE (gimple_assign_rhs1 (stmt)) == SSA_NAME
           || is_gimple_min_invariant (gimple_assign_rhs1 (stmt))))
       return false;

and check the LHS to be a SSA_NAME.  

gimple_store_p (stmt) || gimple_assign_load_p (stmt)

isn't really "elementary" here.

> 
> > > +  /* If rhs and lhs are pointers, alignment of lhs and rhs should be the same.
> > > +     Usage of __builtin_assume_aligned can cause alignment of lhs to be greater
> > > +     than alignment of rhs.  In that case we don't want to propagate rhs since
> > > +     we would lose the alignment information.  */
> > 
> > The same might be true for value-ranges.  I think it would be better to
> > do it like copy-prop does and in cycles propagate the info to the
> > copy-of variable we substitute with.  For non-cyclic cases we cannot
> > generally copy flow-sensitive info.
> 
> This sounds complicated. I already thought about propagating the alignment
> info. But detecting if it is part of a cycle or not... right now I'm not sure
> how I would go about doing that. Thanks for the note that also value-range info
> needs to be handled.
> 
> > A more conservative test would be to compare
> > SSA_NAME_PTR_INFO for pointers and SSA_NAME_RANGE_INFO for non-pointers.
> 
> Let's do this more conservative check. Then later I could upgrade the pass to
> do the aligment and value-range info propagation.

OK.

> > > +static auto_vec<gimple *>
> > > +get_all_stmt_may_generate_copy (void)
> > > +{
> > > +  auto_vec<gimple *> result;
> > > +
> > > +  basic_block bb;
> > > +  FOR_EACH_BB_FN (bb, cfun)
> > > +    {
> > > +      gimple_stmt_iterator gsi;
> > > +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > +	{
> > > +	  gimple *s = gsi_stmt (gsi);
> > > +	  if (stmt_may_generate_copy (s))
> > > +	    result.safe_push (s);
> > > +	}
> > > +
> > > +      gphi_iterator pi;
> > > +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> > > +	{
> > > +	  gimple *s = pi.phi ();
> > > +	  if (stmt_may_generate_copy (s))
> > > +	    result.safe_push (s);
> > > +	}
> > > +    }
> > 
> > I wonder if this can do stmt-to-vertices on-the-fly, thus not
> > build the vector of gimple *.  That is, you build very many
> > vec<>s (probably not many in practice, but ...)
> 
> If I understand correctly, you're suggesting to check if a statement may
> generate copy during runtime of the sccopy algorithm. I guess that could be
> done but I think that it would make the code less readable. And it wouldn't
> reduce the number of used vec<>s that much -- get_all_stmt_may_generate_copy is
> run just once during the execution of the pass so it only creates one vec<>.
> Correct me if I didn't get what you meant here.

Yes, you'd simply only consider copy edges during SCC finding rather than
first collecting copy stmts.

> 
> > If I understand correctly sofar you'll never replace a SSA name by
> > a constant, so REPLACE_BY is always an SSA_NAME.
> 
> No, I certainly do replace SSA names by constants.
> 
> > I wonder if you ever hit the need to do any of the fancy cleanups.
> 
> I originally didn't have them then I ran the GCC testsuite and some tests were
> failing so I added the cleanups.

I see.

> > In fact I suggest you use the existing replace_uses_by API instead
> > of re-inventing it.
> 
> Oh nice. I didn't know that replace_uses_by API already exists in GCC. Will
> look into that.
> 
> > > +  /* Remove all propagated statements.  */
> > > +  FOR_EACH_BB_FN (bb, cfun)
> > > +    {
> > > +      gimple_stmt_iterator gsi;
> > > +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
> 
> > You generally should remove stmts back-to-front to make
> > debug stmt generation work.  To not re-invent the wheel
> > it might be simplest to use simple_dce_from_worklist with
> > the dead stmts defs as seeds (just remove set/is_stmt_dead
> > and track dead SSA defs in a bitmap).
> > 
> > > +	{
> > > +	  gimple *stmt = gsi_stmt (gsi);
> > > +	  if (is_stmt_dead (stmt))
> > > +	    gsi_remove (&gsi, true);
> > > +	  else
> > > +	    gsi_next (&gsi);
> > 
> > this also fails to release the SSA def in the stmt.
> > 
> >   release_defs (stmt);
> > 
> > would do that (or simple_dce_from_worklist).
> 
> Interesting. Wouldn't expect that the order of removing statements would have
> influence on debug information.
> 
> Alright. Will rewrite this part of the pass to use simple_dce_from_worklist.
> 
> > > +	}
> > > +
> > > +      gphi_iterator pi;
> > > +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
> > > +	{
> > > +	  gphi *stmt = pi.phi ();
> > > +
> > > +	  if (is_stmt_dead (stmt))
> > > +	    remove_phi_node (&pi, true);
> > > +	  else
> > > +	    gsi_next (&pi);
> > > +	}
> > > +    }
> > > +
> > > +  /* More cleanup.  */
> > > +  FOR_EACH_BB_FN (bb, cfun)
> > > +    gimple_purge_dead_eh_edges (bb);
> > 
> > You only removed copies, I'm sure this should neve do anything?
> 
> I think some testsuite testcases fail if I don't include this eh cleanup in my
> pass. I don't understand eh edges very well but my explanation for this is that
> I may replace an SSA name with a constant and ensure that a statement won't
> throw any exceptions anymore. For example in a x = 10 / y; statement I may
> replace y by 5 which ensures that we won't have to handle zero division
> exception for this statement.

Yes, when you replace with constants that can happen.

> > As a general comment I wonder if we can simplify most of the
> > pass by starting SSA based SCC discovery from a copy stmt and
> > whether we can reduce the number of vec<>s that are allocated
> > and possibly re-allocated many times?
> 
> I'm not sure what you mean by "starting SCC discovery from a copy stmt" here.

When you currently collect copy stmts in the vec<> you could do SCC
discovery with that as the starting stmt instead (if SCC discovery
would handle identifying copy stmts itself).

Thanks and sorry for the late reply.
Richard.

> Thanks for the review!
> Filip
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] A new copy propagation and PHI elimination pass
  2023-11-30 13:04     ` Richard Biener
@ 2023-12-08 15:15       ` Filip Kastl
  2023-12-13 11:35         ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Filip Kastl @ 2023-12-08 15:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, Richard Biener

> > Hi,
> > 
> > this is a patch that I submitted two months ago as an RFC. I added some polish
> > since.
> > 
> > It is a new lightweight pass that removes redundant PHI functions and as a
> > bonus does basic copy propagation. With Jan Hubička we measured that it is able
> > to remove usually more than 5% of all PHI functions when run among early passes
> > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > later optimized away but with this pass it is possible to remove them early
> > enough so that they don't get streamed when runing LTO (and also potentially
> > inlined at multiple places). It is also able to remove some redundant PHIs
> > that otherwise would still be present during RTL expansion.
> > 
> > Jakub Jelínek was concerned about debug info coverage so I compiled cc1plus
> > with and without this patch. These are the sizes of .debug_info and
> > .debug_loclists
> > 
> > .debug_info without patch 181694311
> > .debug_info    with patch 181692320
> > +0.0011% change
> > 
> > .debug_loclists without patch 47934753
> > .debug_loclists    with patch 47934966
> > -0.0004% change
> > 
> > I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> > the program working on my machine sadly. Hope this suffices. Seems to me that
> > my patch doesn't have a significant impact on debug info.
> > 
> > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > 
> > * One testcase (pr79691.c) did regress. However that is because the test is
> > dependent on a certain variable not being copy propagated. I will go into more
> > detail about this in a reply to this mail.
> > 
> > Ok to commit?
> 
> This is a second version of the patch.  In this version, I modified the
> pr79691.c testcase so that it works as intended with other changes from the
> patch.
> 
> The pr79691.c testcase checks that we get constants from snprintf calls and
> that they simplify into a single constant.  The testcase doesn't account for
> the fact that this constant may be further copy propagated which is exactly
> what happens with this patch applied.
> 
> Bootstrapped and tested on x86_64-pc-linux-gnu.
> 
> Ok to commit?

This is the third version of the patch. In this version, I adressed most of
Richards remarks about the second version. Here is a summary of changes I made:

- Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
- Use simple_dce_from_worklist to remove propagated statements
- Use existing replace_uses API instead of reinventing it
  - This allowed me to get rid of some now redundant cleanup code
- Encapsulate the SCC finding into a class
- Rework stmt_may_generate_copy to get rid of redundant checks
- Add check that PHI doesn't contain two non-SSA-name values to
  stmt_may_generate_copy
- Regarding alignment and value ranges in stmt_may_generate_copy: For now use
  the conservative check that Richard suggested
- Index array of vertices that SCC discovery uses by SSA name version numbers
  instead of numbering statements myself.


I didn't make any changes based on these remarks:

1 It might be nice to optimize SCCs of size 1 somehow, not sure how
  many times these appear - possibly prevent them from even entering
  the SCC discovery?

It would be nice. But the only way to do this that I see right now is to first
propagate SCCs of size 1 and then the rest. This would mean adding a new copy
propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
pass relies on having SCCs topogically sorted so this procedure would have to
implement some topological sort algorithm.

This could be done. It could save allocating some vec<>s (right now, SCCs of
size 1 are represented by a vec<> with a single element). But is it worth it to
optimize the pass this way right now? If possible, I'd like to see that the
pass works and sort out any problems people encounter with it before I start
optimizing it.

2 Instead of collecting all stmts that may generate a copy at the beginning of
  the pass into a vec<>, let the SCC discovery check that statements may
  generate a copy on the fly.

This would be a big change to the pass, it would require a lot of reworking.
I'm also not sure if this would help reduce the number of allocated vec<>s that
much because I'll still want to represent SCCs by vec<>s.

Again - its possible I'll want to rework the pass in this way in the future but
I'd like to leave it as it is for now.

3 Add a comment saying that the pass is doing optimistic copy propagation

I don't think the pass works in an optimistic way. It doesn't assume that all
variables are copies of each other at any point. It instead identifies copy
statements (or PHI SCCs that act as copy statements) and then for each of these
it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
then it replaces all uses of _3 by _2.

But its possible that I still misinterpret what 'optimistic' means. If
mentioning that it works in an optimistic way would help readability of the
pass, I'm happy to add that comment.


Richard, is the patch ok as it is now? Or do you think that making changes 1, 2
or 3 is necessary? Or are there any other issues which should be adressed?

Filip


-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

	* Makefile.in: Added sccopy pass.
	* passes.def: Added sccopy pass before LTO streaming and before
	  RTL expanstion.
	* tree-pass.h (make_pass_sccopy): Added sccopy pass.
	* gimple-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
	  for additional copy propagation this patch adds.
	* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>
---
 gcc/Makefile.in                         |   1 +
 gcc/gimple-ssa-sccopy.cc                | 687 ++++++++++++++++++++++++
 gcc/passes.def                          |   3 +
 gcc/testsuite/gcc.dg/sccopy-1.c         |  78 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
 gcc/tree-pass.h                         |   1 +
 6 files changed, 771 insertions(+), 1 deletion(-)
 create mode 100644 gcc/gimple-ssa-sccopy.cc
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 753f2f36618..2e6f2fef1ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1497,6 +1497,7 @@ OBJS = \
 	gimple-ssa-backprop.o \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
+	gimple-ssa-sccopy.o \
 	gimple-ssa-split-paths.o \
 	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
new file mode 100644
index 00000000000..b74f7a62569
--- /dev/null
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -0,0 +1,687 @@
+/* Strongly-connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Filip Kastl <fkastl@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include <algorithm>
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "builtins.h"
+#include "tree-ssa-dce.h"
+#include "fold-const.h"
+
+/* Strongly connected copy propagation pass.
+
+   This is a lightweight copy propagation pass that is also able to eliminate
+   redundant PHI statements.  The pass considers the following types of copy
+   statements:
+
+   1 An assignment statement with a single argument.
+
+   _3 = _2;
+   _4 = 5;
+
+   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
+     itself or one other value.
+
+   _5 = PHI <_1>;
+   _6 = PHI <_6, _6, _1, _1>;
+   _7 = PHI <16, _7>;
+
+   3 A set of PHI statements that only refer to each other or to one other
+     value.
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>;
+
+   All of these statements produce copies and can be eliminated from the
+   program.  For a copy statement we identify the value it creates a copy of
+   and replace references to the statement with the value -- we propagate the
+   copy.
+
+   _3 = _2; // Replace all occurences of _3 by _2
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
+
+   To find all three types of copy statements we use an algorithm based on
+   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
+   introduced in an article from 2013[1]. We describe the algorithm bellow.
+
+   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
+   SCC computation we wrap potential copy statements in the 'vertex' struct.
+   To each of these statements we also assign a vertex number ('vxnum'). Since
+   the main algorithm has to be able to compute SCCs of subgraphs of the whole
+   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
+   leaving the subgraph.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+/* Bitmap tracking statements which were propagated to be removed at the end of
+   the pass.  */
+
+static bitmap dead_stmts;
+
+/* State of vertex during SCC discovery.
+
+   unvisited  Vertex hasn't yet been popped from worklist.
+   vopen      DFS has visited vertex for the first time.  Vertex has been put
+	      on Tarjan stack.
+   closed     DFS has backtracked through vertex.  At this point, vertex
+	      doesn't have any unvisited neighbors.
+   in_scc     Vertex has been popped from Tarjan stack.  */
+
+enum vstate
+{
+  unvisited,
+  vopen,
+  closed,
+  in_scc
+};
+
+/* Information about a vertex.  Used by SCC discovery.  */
+
+struct vertex
+{
+  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
+		  the whole dataflow graph.  It uses this flag so that it knows
+		  which vertices are part of this subgraph.  */
+  vstate state;
+  unsigned index;
+  unsigned lowlink;
+};
+
+/* SCC discovery.
+
+   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
+   algorithm.  */
+
+class scc_discovery
+{
+public:
+  scc_discovery ();
+  ~scc_discovery ();
+  auto_vec<vec<gimple *>> compute_sccs (auto_vec<gimple *> &stmts);
+
+private:
+  unsigned curr_generation = 0;
+  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
+  auto_vec<unsigned> worklist; /* DFS stack.  */
+  auto_vec<unsigned> stack; /* Tarjan stack.  */
+
+  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
+};
+
+scc_discovery::scc_discovery ()
+{
+  /* Create vertex struct for each SSA name.  */
+  vertices = XNEWVEC (struct vertex, num_ssa_names);
+  unsigned i = 0;
+  for (i = 0; i < num_ssa_names; i++)
+    vertices[i].active = false;
+}
+
+scc_discovery::~scc_discovery ()
+{
+  XDELETEVEC (vertices);
+}
+
+/* Part of 'scc_discovery::compute_sccs()'.  */
+
+void
+scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
+{
+  if (TREE_CODE (neigh_tree) != SSA_NAME)
+    return; /* Skip any neighbor that isn't an SSA name.  */
+  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
+
+  /* Skip neighbors outside the subgraph that Tarjan currently works
+     with.  */
+  if (!vertices[neigh_version].active)
+    return;
+
+  vstate neigh_state = vertices[neigh_version].state;
+  vstate parent_state = vertices[parent_version].state;
+  if (parent_state == vopen) /* We're currently opening parent.  */
+    {
+      /* Put unvisited neighbors on worklist.  Update lowlink of parent
+	 vertex according to indices of neighbors present on stack.  */
+      switch (neigh_state)
+	{
+	case unvisited:
+	  worklist.safe_push (neigh_version);
+	  break;
+	case vopen:
+	case closed:
+	  vertices[parent_version].lowlink =
+	    std::min (vertices[parent_version].lowlink,
+		      vertices[neigh_version].index);
+	  break;
+	case in_scc:
+	  /* Ignore these edges.  */
+	  break;
+	}
+    }
+  else if (parent_state == closed) /* We're currently closing parent.  */
+    {
+      /* Update lowlink of parent vertex according to lowlinks of
+	 children of parent (in terms of DFS tree).  */
+      if (neigh_state == closed)
+	{
+	  vertices[parent_version].lowlink =
+	    std::min (vertices[parent_version].lowlink,
+		      vertices[neigh_version].lowlink);
+	}
+    }
+}
+
+/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
+   statements outside 'stmts'.  Return the SCCs in a reverse topological
+   order.
+
+   stmt_may_generate_copy () must be true for all statements from 'stmts'!  */
+
+auto_vec<vec<gimple *>>
+scc_discovery::compute_sccs (auto_vec<gimple *> &stmts)
+{
+  auto_vec<vec<gimple *>> sccs;
+
+  for (gimple *stmt : stmts)
+    {
+      unsigned i;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_ASSIGN:
+	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
+	    break;
+	  case GIMPLE_PHI:
+	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      vertices[i].index = 0;
+      vertices[i].lowlink = 0;
+      vertices[i].state = unvisited;
+      vertices[i].active = true; /* Mark the subgraph we'll be working on so
+				    that we don't leave it.  */
+
+      worklist.safe_push (i);
+    }
+
+  /* Worklist loop.  */
+  unsigned curr_index = 0;
+  while (!worklist.is_empty ())
+    {
+      unsigned i = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
+      vstate state = vertices[i].state;
+
+      if (state == unvisited)
+	{
+	  vertices[i].state = vopen;
+
+	  /* Assign index to this vertex.  */
+	  vertices[i].index = curr_index;
+	  vertices[i].lowlink = curr_index;
+	  curr_index++;
+
+	  /* Put vertex on stack and also on worklist to be closed later.  */
+	  stack.safe_push (i);
+	  worklist.safe_push (i);
+	}
+      else if (state == vopen)
+	vertices[i].state = closed;
+
+      /* Visit neighbors of this vertex.  */
+      tree op;
+      gphi *phi;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_PHI:
+	    phi = as_a <gphi *> (stmt);
+	    unsigned j;
+	    for (j = 0; j < gimple_phi_num_args (phi); j++)
+	      {
+		op = gimple_phi_arg_def (phi, j);
+		visit_neighbor (op, i);
+	      }
+	    break;
+	  case GIMPLE_ASSIGN:
+	    op = gimple_assign_rhs1 (stmt);
+	    visit_neighbor (op, i);
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
+      if (state == vopen && vertices[i].lowlink == vertices[i].index)
+	{
+	  vec<gimple *> scc = vNULL;
+
+	  unsigned j;
+	  do
+	    {
+	      j = stack.pop ();
+	      scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
+	      vertices[j].state = in_scc;
+	    }
+	  while (j != i);
+
+	  sccs.safe_push (scc);
+	}
+    }
+
+  if (!stack.is_empty ())
+    gcc_unreachable ();
+
+  /* Clear 'active' flags.  */
+  for (gimple *stmt : stmts)
+    {
+      unsigned i;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_ASSIGN:
+	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
+	    break;
+	  case GIMPLE_PHI:
+	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      vertices[i].active = false;
+    }
+
+  return sccs;
+}
+
+/* Could this statement potentially be a copy statement?
+
+   This pass only considers statements for which this function returns 'true'.
+   Those are basically PHI functions and assignment statements similar to
+
+   _2 = _1;
+   or
+   _2 = 5;  */
+
+static bool
+stmt_may_generate_copy (gimple *stmt)
+{
+  /* A PHI may generate a copy.  */
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      gphi *phi = as_a <gphi *> (stmt);
+
+      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
+	return false;
+
+      unsigned i;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) == SSA_NAME
+	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+	    return false;
+	}
+
+      /* If PHI has more than one unique non-SSA arguments, it won't generate a
+	 copy.  */
+      tree const_op = NULL_TREE;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) != SSA_NAME)
+	    {
+	      if (const_op && !operand_equal_p (op, const_op))
+		return false;
+	      const_op = op;
+	    }
+	}
+
+      return true;
+    }
+
+  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
+
+  if (!gimple_assign_single_p (stmt))
+    return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+
+  /* lhs shouldn't flow through any abnormal edges.  */
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+    return false;
+
+  if (is_gimple_min_invariant (rhs))
+    return true;  /* A statement of type _2 = 5;.  */
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+
+  /* rhs shouldn't flow through any abnormal edges.  */
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
+    return false;
+
+  /* It is possible that lhs has more alignment or value range information.  By
+     propagating we would lose this information.  So in the case that alignment
+     or value range information differs, we are conservative and do not
+     propagate.
+
+     FIXME: Propagate alignment and value range info the same way copy-prop
+     does.  */
+  if (POINTER_TYPE_P (TREE_TYPE (lhs))
+      && POINTER_TYPE_P (TREE_TYPE (rhs))
+      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
+    return false;
+  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
+      && !POINTER_TYPE_P (TREE_TYPE (rhs))
+      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
+    return false;
+
+  return true;  /* A statement of type _2 = _1;.  */
+}
+
+/* Return all statements in cfun that could generate copies.  All statements
+   for which stmt_may_generate_copy returns 'true'.  */
+
+static auto_vec<gimple *>
+get_all_stmt_may_generate_copy (void)
+{
+  auto_vec<gimple *> result;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *s = gsi_stmt (gsi);
+	  if (stmt_may_generate_copy (s))
+	    {
+	      result.safe_push (s);
+	    }
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *s = pi.phi ();
+	  if (stmt_may_generate_copy (s))
+	    {
+	      result.safe_push (s);
+	    }
+	}
+    }
+
+  return result;
+}
+
+/* For each statement from given SCC, replace its usages by value
+   VAL.  */
+
+static void
+replace_scc_by_value (vec<gimple *> scc, tree val)
+{
+  for (gimple *stmt : scc)
+    {
+      tree name = gimple_get_lhs (stmt);
+      replace_uses_by (name, val);
+      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
+}
+
+/* Part of 'sccopy_propagate ()'.  */
+
+static void
+sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
+		 hash_set<gimple *> &scc_set, bool &is_inner,
+		 tree &last_outer_op)
+{
+  bool op_in_scc = false;
+
+  if (TREE_CODE (op) == SSA_NAME)
+    {
+      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
+      if (scc_set.contains (op_stmt))
+	op_in_scc = true;
+    }
+
+  if (!op_in_scc)
+    {
+      outer_ops.add (op);
+      last_outer_op = op;
+      is_inner = false;
+    }
+}
+
+/* Main function of this pass.  Find and propagate all three types of copy
+   statements (see pass description above).
+
+   This is an implementation of an algorithm from the paper Simple and
+   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
+   on strongly-connected components (SCCs) in dataflow graph.  The original
+   algorithm only considers PHI statements.  We extend it to also consider
+   assignment statements of type _2 = _1;.
+
+   The algorithm is based on this definition of a set of redundant PHIs[1]:
+
+     A non-empty set P of PHI functions is redundant iff the PHI functions just
+     reference each other or one other value
+
+   It uses this lemma[1]:
+
+     Let P be a redundant set of PHI functions.  Then there is a
+     strongly-connected component S subset of P that is also redundant.
+
+   The algorithm works in this way:
+
+     1 Find SCCs
+     2 For each SCC S in topological order:
+     3   Construct set 'inner' of statements that only have other statements
+	 from S on their right hand side
+     4   Construct set 'outer' of values that originate outside S and appear on
+	 right hand side of some statement from S
+     5   If |outer| = 1, outer only contains a value v.  Statements in S only
+	 refer to each other or to v -- they are redundant.  Propagate v.
+	 Else, recurse on statements in inner.
+
+   The implementation is non-recursive.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+static void
+sccopy_propagate ()
+{
+  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
+  scc_discovery discovery;
+
+  auto_vec<vec<gimple *>> worklist;
+  worklist = discovery.compute_sccs (useful_stmts);
+
+  while (!worklist.is_empty ())
+    {
+      vec<gimple *> scc = worklist.pop ();
+
+      auto_vec<gimple *> inner;
+      hash_set<tree> outer_ops;
+      tree last_outer_op = NULL_TREE;
+
+      /* Prepare hash set of PHIs in scc to query later.  */
+      hash_set<gimple *> scc_set;
+      for (gimple *stmt : scc)
+	scc_set.add (stmt);
+
+      for (gimple *stmt : scc)
+	{
+	  bool is_inner = true;
+
+	  gphi *phi;
+	  tree op;
+
+	  switch (gimple_code (stmt))
+	    {
+	      case GIMPLE_PHI:
+		phi = as_a <gphi *> (stmt);
+		unsigned j;
+		for (j = 0; j < gimple_phi_num_args (phi); j++)
+		  {
+		    op = gimple_phi_arg_def (phi, j);
+		    sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+				   last_outer_op);
+		  }
+		break;
+	      case GIMPLE_ASSIGN:
+		op = gimple_assign_rhs1 (stmt);
+		sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+			       last_outer_op);
+		break;
+	      default:
+		gcc_unreachable ();
+	    }
+
+	  if (is_inner)
+	    {
+	      inner.safe_push (stmt);
+	    }
+	}
+
+      if (outer_ops.elements () == 1)
+	{
+	  /* The only operand in outer_ops.  */
+	  tree outer_op = last_outer_op;
+	  replace_scc_by_value (scc, outer_op);
+	}
+      else if (outer_ops.elements () > 1)
+	{
+	  /* Add inner sccs to worklist.  */
+	  auto_vec<vec<gimple *>> inner_sccs =
+	    discovery.compute_sccs (inner);
+	  for (vec<gimple *> inner_scc : inner_sccs)
+	    worklist.safe_push (inner_scc);
+	}
+      else
+	gcc_unreachable ();
+
+      scc.release ();
+    }
+}
+
+/* Called when pass execution starts.  */
+
+static void
+init_sccopy (void)
+{
+  /* For propagated statements.  */
+  dead_stmts = BITMAP_ALLOC (NULL);
+}
+
+/* Called before pass execution ends.  */
+
+static void
+finalize_sccopy (void)
+{
+  /* Remove all propagated statements.  */
+  simple_dce_from_worklist (dead_stmts);
+  BITMAP_FREE (dead_stmts);
+
+  /* Propagating a constant may create dead eh edges.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    gimple_purge_dead_eh_edges (bb);
+}
+
+namespace {
+
+const pass_data pass_data_sccopy =
+{
+  GIMPLE_PASS, /* type */
+  "sccopy", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_sccopy : public gimple_opt_pass
+{
+public:
+  pass_sccopy (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_sccopy, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
+}; // class pass_sccopy
+
+unsigned
+pass_sccopy::execute (function *)
+{
+  init_sccopy ();
+  sccopy_propagate ();
+  finalize_sccopy ();
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_sccopy (gcc::context *ctxt)
+{
+  return new pass_sccopy (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..fa6c5a2c9fa 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_if_to_switch);
 	  NEXT_PASS (pass_convert_switch);
 	  NEXT_PASS (pass_cleanup_eh);
+	  NEXT_PASS (pass_sccopy);
 	  NEXT_PASS (pass_profile);
 	  NEXT_PASS (pass_local_pure_const);
 	  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 	 However, this also causes us to misdiagnose cases that should be
 	 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+      NEXT_PASS (pass_sccopy);
       NEXT_PASS (pass_tail_calls);
       /* Split critical edges before late uninit warning to reduce the
          number of false positives from it.  */
@@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
       NEXT_PASS (pass_tsan);
+      NEXT_PASS (pass_sccopy);
       /* ???  We do want some kind of loop invariant motion, but we possibly
          need to adjust LIM to be more friendly towards preserving accurate
 	 debug information here.  */
diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
new file mode 100644
index 00000000000..1e61a6b320e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sccopy-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
+/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
+
+int __GIMPLE (ssa, startwith ("sccopy"))
+main ()
+{
+  int a;
+  int y;
+  int x;
+  int _1;
+  int _2;
+  int _13;
+
+  __BB(2):
+  if (x_7(D) == 5)
+    goto __BB3;
+  else
+    goto __BB4;
+
+  __BB(3):
+  a_10 = x_7(D);
+  goto __BB5;
+
+  __BB(4):
+  a_9 = y_8(D);
+  goto __BB5;
+
+  __BB(5):
+  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
+  if (x_7(D) == y_8(D))
+    goto __BB6;
+  else
+    goto __BB11;
+
+  __BB(6):
+  a_11 = a_3 + 1;
+  goto __BB7;
+
+  __BB(7):
+  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
+label1:
+  if (x_7(D) != y_8(D))
+    goto __BB8;
+  else
+    goto __BB10;
+
+  __BB(8):
+  goto __BB9;
+
+  __BB(9):
+  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
+  goto __BB10;
+
+  __BB(10,loop_header(1)):
+  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
+label2:
+  _1 = y_8(D) * 2;
+  if (x_7(D) == _1)
+    goto __BB9;
+  else
+    goto __BB11;
+
+  __BB(11):
+  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
+  _2 = x_7(D) * 3;
+  if (y_8(D) == _2)
+    goto __BB7;
+  else
+    goto __BB12;
+
+  __BB(12):
+  _13 = 0;
+  return _13;
+
+}
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
index bf889318c06..43770c95bca 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
@@ -34,4 +34,4 @@ int f4 (int i)
 
 /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } }
    { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } }
-   { dg-final { scan-tree-dump " = 9;" "optimized" } } */
+   { dg-final { scan-tree-dump "return 9;" "optimized" } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 09e6ada5b2f..94a48461590 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
-- 
2.42.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] A new copy propagation and PHI elimination pass
  2023-12-08 15:15       ` [PATCH v3] " Filip Kastl
@ 2023-12-13 11:35         ` Richard Biener
  2023-12-13 16:12           ` [PATCH v4] " Filip Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-12-13 11:35 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka

On Fri, 8 Dec 2023, Filip Kastl wrote:

> > > Hi,
> > > 
> > > this is a patch that I submitted two months ago as an RFC. I added some polish
> > > since.
> > > 
> > > It is a new lightweight pass that removes redundant PHI functions and as a
> > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able
> > > to remove usually more than 5% of all PHI functions when run among early passes
> > > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > > later optimized away but with this pass it is possible to remove them early
> > > enough so that they don't get streamed when runing LTO (and also potentially
> > > inlined at multiple places). It is also able to remove some redundant PHIs
> > > that otherwise would still be present during RTL expansion.
> > > 
> > > Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
> > > with and without this patch. These are the sizes of .debug_info and
> > > .debug_loclists
> > > 
> > > .debug_info without patch 181694311
> > > .debug_info    with patch 181692320
> > > +0.0011% change
> > > 
> > > .debug_loclists without patch 47934753
> > > .debug_loclists    with patch 47934966
> > > -0.0004% change
> > > 
> > > I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> > > the program working on my machine sadly. Hope this suffices. Seems to me that
> > > my patch doesn't have a significant impact on debug info.
> > > 
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test is
> > > dependent on a certain variable not being copy propagated. I will go into more
> > > detail about this in a reply to this mail.
> > > 
> > > Ok to commit?
> > 
> > This is a second version of the patch.  In this version, I modified the
> > pr79691.c testcase so that it works as intended with other changes from the
> > patch.
> > 
> > The pr79691.c testcase checks that we get constants from snprintf calls and
> > that they simplify into a single constant.  The testcase doesn't account for
> > the fact that this constant may be further copy propagated which is exactly
> > what happens with this patch applied.
> > 
> > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > 
> > Ok to commit?
> 
> This is the third version of the patch. In this version, I adressed most of
> Richards remarks about the second version. Here is a summary of changes I made:
> 
> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
> - Use simple_dce_from_worklist to remove propagated statements
> - Use existing replace_uses API instead of reinventing it
>   - This allowed me to get rid of some now redundant cleanup code
> - Encapsulate the SCC finding into a class
> - Rework stmt_may_generate_copy to get rid of redundant checks
> - Add check that PHI doesn't contain two non-SSA-name values to
>   stmt_may_generate_copy
> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>   the conservative check that Richard suggested
> - Index array of vertices that SCC discovery uses by SSA name version numbers
>   instead of numbering statements myself.
> 
> 
> I didn't make any changes based on these remarks:
> 
> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>   many times these appear - possibly prevent them from even entering
>   the SCC discovery?
> 
> It would be nice. But the only way to do this that I see right now is to first
> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
> pass relies on having SCCs topogically sorted so this procedure would have to
> implement some topological sort algorithm.
> 
> This could be done. It could save allocating some vec<>s (right now, SCCs of
> size 1 are represented by a vec<> with a single element). But is it worth it to
> optimize the pass this way right now? If possible, I'd like to see that the
> pass works and sort out any problems people encounter with it before I start
> optimizing it.
> 
> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>   the pass into a vec<>, let the SCC discovery check that statements may
>   generate a copy on the fly.
> 
> This would be a big change to the pass, it would require a lot of reworking.
> I'm also not sure if this would help reduce the number of allocated vec<>s that
> much because I'll still want to represent SCCs by vec<>s.
> 
> Again - its possible I'll want to rework the pass in this way in the future but
> I'd like to leave it as it is for now.
> 
> 3 Add a comment saying that the pass is doing optimistic copy propagation
> 
> I don't think the pass works in an optimistic way. It doesn't assume that all
> variables are copies of each other at any point. It instead identifies copy
> statements (or PHI SCCs that act as copy statements) and then for each of these
> it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
> then it replaces all uses of _3 by _2.
> 
> But its possible that I still misinterpret what 'optimistic' means. If
> mentioning that it works in an optimistic way would help readability of the
> pass, I'm happy to add that comment.

I understood it handles copy cycles?  Consider

 _1 = _2;

bb3:
 # _3 = PHI <_1, _4>
 _4 = _3;
 if (...)
   break;
 else
   goto bb3;

It should see that _4 and _3 are a copy of _1 (and thus _2)?  When
processing the cycle comprising the defs of _3 and _4 don't you
then optimistically assume that _1 == _4 when processing the PHI?
I realize this might be "trivial" given the SCCs are constructed
from copies.  But still with say

 _3 = PHI <_1, _4>

 _4 = PHI <_3, _2>


the cycle would have two entries (it's irreducible) and neither
_3 nor _4 are a copy of _1 (or _2) unless _1 is a copy of _2
(or vice versa).


> Richard, is the patch ok as it is now? Or do you think that making changes 1, 2
> or 3 is necessary? Or are there any other issues which should be adressed?

It's OK to not do the changes you skipped.

> Filip
> 
> 
> -- >8 --
> 
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
> 
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
> 
> It also handles more complicated situations, e.g.:
> 
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Added sccopy pass.
> 	* passes.def: Added sccopy pass before LTO streaming and before
> 	  RTL expanstion.
> 	* tree-pass.h (make_pass_sccopy): Added sccopy pass.
> 	* gimple-ssa-sccopy.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
> 	  for additional copy propagation this patch adds.
> 	* gcc.dg/sccopy-1.c: New test.
> 
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> ---
>  gcc/Makefile.in                         |   1 +
>  gcc/gimple-ssa-sccopy.cc                | 687 ++++++++++++++++++++++++
>  gcc/passes.def                          |   3 +
>  gcc/testsuite/gcc.dg/sccopy-1.c         |  78 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
>  gcc/tree-pass.h                         |   1 +
>  6 files changed, 771 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/gimple-ssa-sccopy.cc
>  create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 753f2f36618..2e6f2fef1ba 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1497,6 +1497,7 @@ OBJS = \
>  	gimple-ssa-backprop.o \
>  	gimple-ssa-isolate-paths.o \
>  	gimple-ssa-nonnull-compare.o \
> +	gimple-ssa-sccopy.o \
>  	gimple-ssa-split-paths.o \
>  	gimple-ssa-store-merging.o \
>  	gimple-ssa-strength-reduction.o \
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..b74f7a62569
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -0,0 +1,687 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <fkastl@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "builtins.h"
> +#include "tree-ssa-dce.h"
> +#include "fold-const.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> +   This is a lightweight copy propagation pass that is also able to eliminate
> +   redundant PHI statements.  The pass considers the following types of copy
> +   statements:
> +
> +   1 An assignment statement with a single argument.
> +
> +   _3 = _2;
> +   _4 = 5;
> +
> +   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
> +     itself or one other value.
> +
> +   _5 = PHI <_1>;
> +   _6 = PHI <_6, _6, _1, _1>;
> +   _7 = PHI <16, _7>;
> +
> +   3 A set of PHI statements that only refer to each other or to one other
> +     value.
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>;
> +
> +   All of these statements produce copies and can be eliminated from the
> +   program.  For a copy statement we identify the value it creates a copy of
> +   and replace references to the statement with the value -- we propagate the
> +   copy.
> +
> +   _3 = _2; // Replace all occurences of _3 by _2
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> +   To find all three types of copy statements we use an algorithm based on
> +   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
> +   introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> +   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
> +   SCC computation we wrap potential copy statements in the 'vertex' struct.
> +   To each of these statements we also assign a vertex number ('vxnum'). Since
> +   the main algorithm has to be able to compute SCCs of subgraphs of the whole
> +   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> +   leaving the subgraph.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +/* Bitmap tracking statements which were propagated to be removed at the end of
> +   the pass.  */
> +
> +static bitmap dead_stmts;
> +
> +/* State of vertex during SCC discovery.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time.  Vertex has been put
> +	      on Tarjan stack.
> +   closed     DFS has backtracked through vertex.  At this point, vertex
> +	      doesn't have any unvisited neighbors.
> +   in_scc     Vertex has been popped from Tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex.  Used by SCC discovery.  */
> +
> +struct vertex
> +{
> +  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
> +		  the whole dataflow graph.  It uses this flag so that it knows
> +		  which vertices are part of this subgraph.  */
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +};
> +
> +/* SCC discovery.
> +
> +   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
> +   algorithm.  */
> +
> +class scc_discovery
> +{
> +public:
> +  scc_discovery ();
> +  ~scc_discovery ();
> +  auto_vec<vec<gimple *>> compute_sccs (auto_vec<gimple *> &stmts);
> +
> +private:
> +  unsigned curr_generation = 0;
> +  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +
> +  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
> +};
> +
> +scc_discovery::scc_discovery ()
> +{
> +  /* Create vertex struct for each SSA name.  */
> +  vertices = XNEWVEC (struct vertex, num_ssa_names);
> +  unsigned i = 0;
> +  for (i = 0; i < num_ssa_names; i++)
> +    vertices[i].active = false;
> +}
> +
> +scc_discovery::~scc_discovery ()
> +{
> +  XDELETEVEC (vertices);
> +}
> +
> +/* Part of 'scc_discovery::compute_sccs()'.  */
> +
> +void
> +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
> +
> +  /* Skip neighbors outside the subgraph that Tarjan currently works
> +     with.  */
> +  if (!vertices[neigh_version].active)
> +    return;
> +
> +  vstate neigh_state = vertices[neigh_version].state;
> +  vstate parent_state = vertices[parent_version].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist.  Update lowlink of parent
> +	 vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +	{
> +	case unvisited:
> +	  worklist.safe_push (neigh_version);
> +	  break;
> +	case vopen:
> +	case closed:
> +	  vertices[parent_version].lowlink =
> +	    std::min (vertices[parent_version].lowlink,
> +		      vertices[neigh_version].index);
> +	  break;
> +	case in_scc:
> +	  /* Ignore these edges.  */
> +	  break;
> +	}
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +	 children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +	{
> +	  vertices[parent_version].lowlink =
> +	    std::min (vertices[parent_version].lowlink,
> +		      vertices[neigh_version].lowlink);
> +	}
> +    }
> +}
> +
> +/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
> +   statements outside 'stmts'.  Return the SCCs in a reverse topological
> +   order.
> +
> +   stmt_may_generate_copy () must be true for all statements from 'stmts'!  */
> +
> +auto_vec<vec<gimple *>>
> +scc_discovery::compute_sccs (auto_vec<gimple *> &stmts)

Please use vec<gimple *> &stmts, not auto_vec<..>.

> +{
> +  auto_vec<vec<gimple *>> sccs;
> +
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +	{
> +	  case GIMPLE_ASSIGN:
> +	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +	    break;
> +	  case GIMPLE_PHI:
> +	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +	    break;
> +	  default:
> +	    gcc_unreachable ();
> +	}
> +
> +      vertices[i].index = 0;
> +      vertices[i].lowlink = 0;
> +      vertices[i].state = unvisited;
> +      vertices[i].active = true; /* Mark the subgraph we'll be working on so
> +				    that we don't leave it.  */
> +
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  unsigned curr_index = 0;
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
> +      vstate state = vertices[i].state;
> +
> +      if (state == unvisited)
> +	{
> +	  vertices[i].state = vopen;
> +
> +	  /* Assign index to this vertex.  */
> +	  vertices[i].index = curr_index;
> +	  vertices[i].lowlink = curr_index;
> +	  curr_index++;
> +
> +	  /* Put vertex on stack and also on worklist to be closed later.  */
> +	  stack.safe_push (i);
> +	  worklist.safe_push (i);
> +	}
> +      else if (state == vopen)
> +	vertices[i].state = closed;
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +	{
> +	  case GIMPLE_PHI:
> +	    phi = as_a <gphi *> (stmt);
> +	    unsigned j;
> +	    for (j = 0; j < gimple_phi_num_args (phi); j++)
> +	      {
> +		op = gimple_phi_arg_def (phi, j);
> +		visit_neighbor (op, i);
> +	      }
> +	    break;
> +	  case GIMPLE_ASSIGN:
> +	    op = gimple_assign_rhs1 (stmt);
> +	    visit_neighbor (op, i);
> +	    break;
> +	  default:
> +	    gcc_unreachable ();
> +	}
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
> +      if (state == vopen && vertices[i].lowlink == vertices[i].index)
> +	{
> +	  vec<gimple *> scc = vNULL;
> +
> +	  unsigned j;
> +	  do
> +	    {
> +	      j = stack.pop ();
> +	      scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
> +	      vertices[j].state = in_scc;
> +	    }
> +	  while (j != i);
> +
> +	  sccs.safe_push (scc);
> +	}
> +    }
> +
> +  if (!stack.is_empty ())
> +    gcc_unreachable ();
> +
> +  /* Clear 'active' flags.  */
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +	{
> +	  case GIMPLE_ASSIGN:
> +	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +	    break;
> +	  case GIMPLE_PHI:
> +	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +	    break;
> +	  default:
> +	    gcc_unreachable ();
> +	}
> +
> +      vertices[i].active = false;
> +    }
> +
> +  return sccs;

I think you need to use

  return std::move (sccs);

here?  But maybe I'm confusing C++ here and this auto-magically works.

Please try to run valgrind leak check on a testcase you know the pass
performs some optimization.

> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> +   This pass only considers statements for which this function returns 'true'.
> +   Those are basically PHI functions and assignment statements similar to
> +
> +   _2 = _1;
> +   or
> +   _2 = 5;  */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  /* A PHI may generate a copy.  */
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +	return false;
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +	{
> +	  tree op = gimple_phi_arg_def (phi, i);
> +	  if (TREE_CODE (op) == SSA_NAME
> +	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +	    return false;
> +	}
> +
> +      /* If PHI has more than one unique non-SSA arguments, it won't generate a
> +	 copy.  */
> +      tree const_op = NULL_TREE;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +	{
> +	  tree op = gimple_phi_arg_def (phi, i);
> +	  if (TREE_CODE (op) != SSA_NAME)
> +	    {
> +	      if (const_op && !operand_equal_p (op, const_op))
> +		return false;
> +	      const_op = op;
> +	    }
> +	}
> +
> +      return true;
> +    }
> +
> +  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
> +
> +  if (!gimple_assign_single_p (stmt))
> +    return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +
> +  /* lhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    return false;
> +
> +  if (is_gimple_min_invariant (rhs))
> +    return true;  /* A statement of type _2 = 5;.  */
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  /* rhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    return false;
> +
> +  /* It is possible that lhs has more alignment or value range information.  By
> +     propagating we would lose this information.  So in the case that alignment
> +     or value range information differs, we are conservative and do not
> +     propagate.
> +
> +     FIXME: Propagate alignment and value range info the same way copy-prop
> +     does.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
> +    return false;
> +  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && !POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
> +    return false;
> +
> +  return true;  /* A statement of type _2 = _1;.  */
> +}
> +
> +/* Return all statements in cfun that could generate copies.  All statements
> +   for which stmt_may_generate_copy returns 'true'.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple *s = gsi_stmt (gsi);
> +	  if (stmt_may_generate_copy (s))
> +	    {
> +	      result.safe_push (s);
> +	    }
> +	}
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +	{
> +	  gimple *s = pi.phi ();
> +	  if (stmt_may_generate_copy (s))
> +	    {
> +	      result.safe_push (s);
> +	    }
> +	}
> +    }
> +
> +  return result;
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> +   VAL.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree val)
> +{
> +  for (gimple *stmt : scc)
> +    {
> +      tree name = gimple_get_lhs (stmt);
> +      replace_uses_by (name, val);
> +      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'.  */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> +		 hash_set<gimple *> &scc_set, bool &is_inner,
> +		 tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +	op_in_scc = true;

As optimization this could be a [s]bitmap indexed by
SSA_NAME_VERSION (if you have many instances of scc_set
a bitmap is better, a sbitmap has O(n) initialization overhead).

> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);

I do wonder if it's relevant to track non-SSA_NAME 'op' here?
Otherwise the same as above would apply.

This can be done as followup (if possible) if you like.

> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Main function of this pass.  Find and propagate all three types of copy
> +   statements (see pass description above).
> +
> +   This is an implementation of an algorithm from the paper Simple and
> +   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
> +   on strongly-connected components (SCCs) in dataflow graph.  The original
> +   algorithm only considers PHI statements.  We extend it to also consider
> +   assignment statements of type _2 = _1;.
> +
> +   The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> +     A non-empty set P of PHI functions is redundant iff the PHI functions just
> +     reference each other or one other value
> +
> +   It uses this lemma[1]:
> +
> +     Let P be a redundant set of PHI functions.  Then there is a
> +     strongly-connected component S subset of P that is also redundant.
> +
> +   The algorithm works in this way:
> +
> +     1 Find SCCs
> +     2 For each SCC S in topological order:
> +     3   Construct set 'inner' of statements that only have other statements
> +	 from S on their right hand side
> +     4   Construct set 'outer' of values that originate outside S and appear on
> +	 right hand side of some statement from S
> +     5   If |outer| = 1, outer only contains a value v.  Statements in S only
> +	 refer to each other or to v -- they are redundant.  Propagate v.
> +	 Else, recurse on statements in inner.
> +
> +   The implementation is non-recursive.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +static void
> +sccopy_propagate ()
> +{
> +  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> +  scc_discovery discovery;
> +
> +  auto_vec<vec<gimple *>> worklist;
> +  worklist = discovery.compute_sccs (useful_stmts);

using initialization might be cheaper than first default-constructing

> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */

ah, that was to identify in-SCC vs. out-of-SCC stmts ...

> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +	scc_set.add (stmt);
> +
> +      for (gimple *stmt : scc)
> +	{
> +	  bool is_inner = true;
> +
> +	  gphi *phi;
> +	  tree op;
> +
> +	  switch (gimple_code (stmt))
> +	    {
> +	      case GIMPLE_PHI:
> +		phi = as_a <gphi *> (stmt);
> +		unsigned j;
> +		for (j = 0; j < gimple_phi_num_args (phi); j++)
> +		  {
> +		    op = gimple_phi_arg_def (phi, j);
> +		    sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +				   last_outer_op);
> +		  }
> +		break;
> +	      case GIMPLE_ASSIGN:
> +		op = gimple_assign_rhs1 (stmt);
> +		sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +			       last_outer_op);
> +		break;
> +	      default:
> +		gcc_unreachable ();
> +	    }
> +
> +	  if (is_inner)
> +	    {
> +	      inner.safe_push (stmt);
> +	    }

general coding-style comment, we don't put braces around
single stmts.

> +	}
> +
> +      if (outer_ops.elements () == 1)
> +	{
> +	  /* The only operand in outer_ops.  */
> +	  tree outer_op = last_outer_op;
> +	  replace_scc_by_value (scc, outer_op);
> +	}
> +      else if (outer_ops.elements () > 1)
> +	{
> +	  /* Add inner sccs to worklist.  */
> +	  auto_vec<vec<gimple *>> inner_sccs =
> +	    discovery.compute_sccs (inner);
> +	  for (vec<gimple *> inner_scc : inner_sccs)
> +	    worklist.safe_push (inner_scc);
> +	}
> +      else
> +	gcc_unreachable ();
> +
> +      scc.release ();
> +    }
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccopy (void)
> +{
> +  /* For propagated statements.  */
> +  dead_stmts = BITMAP_ALLOC (NULL);
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccopy (void)
> +{
> +  /* Remove all propagated statements.  */
> +  simple_dce_from_worklist (dead_stmts);
> +  BITMAP_FREE (dead_stmts);
> +
> +  /* Propagating a constant may create dead eh edges.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    gimple_purge_dead_eh_edges (bb);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccopy", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> +  pass_sccopy (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccopy, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> +  init_sccopy ();
> +  sccopy_propagate ();
> +  finalize_sccopy ();
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> +  return new pass_sccopy (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..fa6c5a2c9fa 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>  	  NEXT_PASS (pass_if_to_switch);
>  	  NEXT_PASS (pass_convert_switch);
>  	  NEXT_PASS (pass_cleanup_eh);
> +	  NEXT_PASS (pass_sccopy);
>  	  NEXT_PASS (pass_profile);
>  	  NEXT_PASS (pass_local_pure_const);
>  	  NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
>  	 However, this also causes us to misdiagnose cases that should be
>  	 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
>        NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> +      NEXT_PASS (pass_sccopy);
>        NEXT_PASS (pass_tail_calls);
>        /* Split critical edges before late uninit warning to reduce the
>           number of false positives from it.  */
> @@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_sancov);
>        NEXT_PASS (pass_asan);
>        NEXT_PASS (pass_tsan);
> +      NEXT_PASS (pass_sccopy);

I don't think we want to run the pass again with -Og.

OK with those changes.

Thanks,
Richard.

>        /* ???  We do want some kind of loop invariant motion, but we possibly
>           need to adjust LIM to be more friendly towards preserving accurate
>  	 debug information here.  */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> +  int a;
> +  int y;
> +  int x;
> +  int _1;
> +  int _2;
> +  int _13;
> +
> +  __BB(2):
> +  if (x_7(D) == 5)
> +    goto __BB3;
> +  else
> +    goto __BB4;
> +
> +  __BB(3):
> +  a_10 = x_7(D);
> +  goto __BB5;
> +
> +  __BB(4):
> +  a_9 = y_8(D);
> +  goto __BB5;
> +
> +  __BB(5):
> +  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> +  if (x_7(D) == y_8(D))
> +    goto __BB6;
> +  else
> +    goto __BB11;
> +
> +  __BB(6):
> +  a_11 = a_3 + 1;
> +  goto __BB7;
> +
> +  __BB(7):
> +  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> +  if (x_7(D) != y_8(D))
> +    goto __BB8;
> +  else
> +    goto __BB10;
> +
> +  __BB(8):
> +  goto __BB9;
> +
> +  __BB(9):
> +  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> +  goto __BB10;
> +
> +  __BB(10,loop_header(1)):
> +  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> +  _1 = y_8(D) * 2;
> +  if (x_7(D) == _1)
> +    goto __BB9;
> +  else
> +    goto __BB11;
> +
> +  __BB(11):
> +  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> +  _2 = x_7(D) * 3;
> +  if (y_8(D) == _2)
> +    goto __BB7;
> +  else
> +    goto __BB12;
> +
> +  __BB(12):
> +  _13 = 0;
> +  return _13;
> +
> +}
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> index bf889318c06..43770c95bca 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> @@ -34,4 +34,4 @@ int f4 (int i)
>  
>  /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } }
>     { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } }
> -   { dg-final { scan-tree-dump " = 9;" "optimized" } } */
> +   { dg-final { scan-tree-dump "return 9;" "optimized" } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4] A new copy propagation and PHI elimination pass
  2023-12-13 11:35         ` Richard Biener
@ 2023-12-13 16:12           ` Filip Kastl
  2023-12-13 16:15             ` Richard Biener
  2023-12-14 13:22             ` In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>' (was: [PATCH v4] A new copy propagation and PHI elimination pass) Thomas Schwinge
  0 siblings, 2 replies; 10+ messages in thread
From: Filip Kastl @ 2023-12-13 16:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, hubicka

> > > Hi,
> > > 
> > > this is a patch that I submitted two months ago as an RFC. I added some polish
> > > since.
> > > 
> > > It is a new lightweight pass that removes redundant PHI functions and as a
> > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able
> > > to remove usually more than 5% of all PHI functions when run among early passes
> > > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > > later optimized away but with this pass it is possible to remove them early
> > > enough so that they don't get streamed when runing LTO (and also potentially
> > > inlined at multiple places). It is also able to remove some redundant PHIs
> > > that otherwise would still be present during RTL expansion.
> > > 
> > > Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
> > > with and without this patch. These are the sizes of .debug_info and
> > > .debug_loclists
> > > 
> > > .debug_info without patch 181694311
> > > .debug_info    with patch 181692320
> > > +0.0011% change
> > > 
> > > .debug_loclists without patch 47934753
> > > .debug_loclists    with patch 47934966
> > > -0.0004% change
> > > 
> > > I wanted to use dwlocstat to compare debug coverages but didn't manage to get
> > > the program working on my machine sadly. Hope this suffices. Seems to me that
> > > my patch doesn't have a significant impact on debug info.
> > > 
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test is
> > > dependent on a certain variable not being copy propagated. I will go into more
> > > detail about this in a reply to this mail.
> > > 
> > > Ok to commit?
> > 
> > This is a second version of the patch.  In this version, I modified the
> > pr79691.c testcase so that it works as intended with other changes from the
> > patch.
> > 
> > The pr79691.c testcase checks that we get constants from snprintf calls and
> > that they simplify into a single constant.  The testcase doesn't account for
> > the fact that this constant may be further copy propagated which is exactly
> > what happens with this patch applied.
> > 
> > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > 
> > Ok to commit?
> 
> This is the third version of the patch. In this version, I adressed most of
> Richards remarks about the second version. Here is a summary of changes I made:
> 
> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
> - Use simple_dce_from_worklist to remove propagated statements
> - Use existing replace_uses API instead of reinventing it
>   - This allowed me to get rid of some now redundant cleanup code
> - Encapsulate the SCC finding into a class
> - Rework stmt_may_generate_copy to get rid of redundant checks
> - Add check that PHI doesn't contain two non-SSA-name values to
>   stmt_may_generate_copy
> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>   the conservative check that Richard suggested
> - Index array of vertices that SCC discovery uses by SSA name version numbers
>   instead of numbering statements myself.
> 
> 
> I didn't make any changes based on these remarks:
> 
> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>   many times these appear - possibly prevent them from even entering
>   the SCC discovery?
> 
> It would be nice. But the only way to do this that I see right now is to first
> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
> pass relies on having SCCs topogically sorted so this procedure would have to
> implement some topological sort algorithm.
> 
> This could be done. It could save allocating some vec<>s (right now, SCCs of
> size 1 are represented by a vec<> with a single element). But is it worth it to
> optimize the pass this way right now? If possible, I'd like to see that the
> pass works and sort out any problems people encounter with it before I start
> optimizing it.
> 
> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>   the pass into a vec<>, let the SCC discovery check that statements may
>   generate a copy on the fly.
> 
> This would be a big change to the pass, it would require a lot of reworking.
> I'm also not sure if this would help reduce the number of allocated vec<>s that
> much because I'll still want to represent SCCs by vec<>s.
> 
> Again - its possible I'll want to rework the pass in this way in the future but
> I'd like to leave it as it is for now.
> 
> 3 Add a comment saying that the pass is doing optimistic copy propagation
> 
> I don't think the pass works in an optimistic way. It doesn't assume that all
> variables are copies of each other at any point. It instead identifies copy
> statements (or PHI SCCs that act as copy statements) and then for each of these
> it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
> then it replaces all uses of _3 by _2.
> 
> But its possible that I still misinterpret what 'optimistic' means. If
> mentioning that it works in an optimistic way would help readability of the
> pass, I'm happy to add that comment.
>
> Richard, is the patch ok as it is now? Or do you think that making changes 1, 2
> or 3 is necessary? Or are there any other issues which should be adressed?
>
> Filip

This is the fourth version of this patch. Changes in this version:
- Disabled the pass for -Og
  - This meant that I could revert my changes to gcc.dg/tree-ssa/pr79691.c
- Fixed formatting of the patch (like braces around a single-statment if body)
- compute_sccs (auto_vec<>...) -> compute_sccs (vec<> ...)
- auto_vec<gimple *> worklist; worklist = ...; ->
  auto_vec<gimple *> worklist = ...;

I've also checked that there are no memory leak issues. Valgrind doesn't detect
any aditional leaks when running with sccopy vs running without sccopy:

[fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
==27363== Memcheck, a memory error detector
==27363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==27363== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==27363== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
==27363== 
==27363== 
==27363== HEAP SUMMARY:
==27363==     in use at exit: 174,234 bytes in 92 blocks
==27363==   total heap usage: 408 allocs, 316 frees, 228,050 bytes allocated
==27363== 
==27363== LEAK SUMMARY:
==27363==    definitely lost: 5,935 bytes in 23 blocks
==27363==    indirectly lost: 82 bytes in 5 blocks
==27363==      possibly lost: 0 bytes in 0 blocks
==27363==    still reachable: 168,217 bytes in 64 blocks
==27363==                       of which reachable via heuristic:
==27363==                         newarray           : 1,544 bytes in 1 blocks
==27363==         suppressed: 0 bytes in 0 blocks
==27363== Rerun with --leak-check=full to see details of leaked memory
==27363== 
==27363== For lists of detected and suppressed errors, rerun with: -s
==27363== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

[fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
==27372== Memcheck, a memory error detector
==27372== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==27372== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==27372== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
==27372== 
cc1: note: disable pass tree-sccopy1 for functions in the range of [0, 4294967295]
cc1: note: disable pass tree-sccopy2 for functions in the range of [0, 4294967295]
==27372== 
==27372== HEAP SUMMARY:
==27372==     in use at exit: 174,282 bytes in 92 blocks
==27372==   total heap usage: 408 allocs, 316 frees, 228,674 bytes allocated
==27372== 
==27372== LEAK SUMMARY:
==27372==    definitely lost: 5,935 bytes in 23 blocks
==27372==    indirectly lost: 82 bytes in 5 blocks
==27372==      possibly lost: 0 bytes in 0 blocks
==27372==    still reachable: 168,265 bytes in 64 blocks
==27372==                       of which reachable via heuristic:
==27372==                         newarray           : 1,544 bytes in 1 blocks
==27372==         suppressed: 0 bytes in 0 blocks
==27372== Rerun with --leak-check=full to see details of leaked memory
==27372== 
==27372== For lists of detected and suppressed errors, rerun with: -s
==27372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Richard, I wrote down the bitmap vs hash_set optimizations you suggested so
that I can work on the pass more later and optimize it.

I didn't yet bootstrap this version. I just regtested it. Will bootstrap and
regtest on the most current commit. Once that is successfully done, is the pass
OK to be pushed to main?

Filip

-- >8 --

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

	* Makefile.in: Added sccopy pass.
	* passes.def: Added sccopy pass before LTO streaming and before
	  RTL expansion.
	* tree-pass.h (make_pass_sccopy): Added sccopy pass.
	* gimple-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>
---
 gcc/Makefile.in                 |   1 +
 gcc/gimple-ssa-sccopy.cc        | 680 ++++++++++++++++++++++++++++++++
 gcc/passes.def                  |   2 +
 gcc/testsuite/gcc.dg/sccopy-1.c |  78 ++++
 gcc/tree-pass.h                 |   1 +
 5 files changed, 762 insertions(+)
 create mode 100644 gcc/gimple-ssa-sccopy.cc
 create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 753f2f36618..2e6f2fef1ba 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1497,6 +1497,7 @@ OBJS = \
 	gimple-ssa-backprop.o \
 	gimple-ssa-isolate-paths.o \
 	gimple-ssa-nonnull-compare.o \
+	gimple-ssa-sccopy.o \
 	gimple-ssa-split-paths.o \
 	gimple-ssa-store-merging.o \
 	gimple-ssa-strength-reduction.o \
diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
new file mode 100644
index 00000000000..ac5ec32eb32
--- /dev/null
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -0,0 +1,680 @@
+/* Strongly-connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   Contributed by Filip Kastl <fkastl@suse.cz>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include <algorithm>
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-eh.h"
+#include "builtins.h"
+#include "tree-ssa-dce.h"
+#include "fold-const.h"
+
+/* Strongly connected copy propagation pass.
+
+   This is a lightweight copy propagation pass that is also able to eliminate
+   redundant PHI statements.  The pass considers the following types of copy
+   statements:
+
+   1 An assignment statement with a single argument.
+
+   _3 = _2;
+   _4 = 5;
+
+   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
+     itself or one other value.
+
+   _5 = PHI <_1>;
+   _6 = PHI <_6, _6, _1, _1>;
+   _7 = PHI <16, _7>;
+
+   3 A set of PHI statements that only refer to each other or to one other
+     value.
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>;
+
+   All of these statements produce copies and can be eliminated from the
+   program.  For a copy statement we identify the value it creates a copy of
+   and replace references to the statement with the value -- we propagate the
+   copy.
+
+   _3 = _2; // Replace all occurences of _3 by _2
+
+   _8 = PHI <_9, _10>;
+   _9 = PHI <_8, _10>;
+   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
+
+   To find all three types of copy statements we use an algorithm based on
+   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
+   introduced in an article from 2013[1]. We describe the algorithm bellow.
+
+   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
+   SCC computation we wrap potential copy statements in the 'vertex' struct.
+   To each of these statements we also assign a vertex number ('vxnum'). Since
+   the main algorithm has to be able to compute SCCs of subgraphs of the whole
+   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
+   leaving the subgraph.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+/* Bitmap tracking statements which were propagated to be removed at the end of
+   the pass.  */
+
+static bitmap dead_stmts;
+
+/* State of vertex during SCC discovery.
+
+   unvisited  Vertex hasn't yet been popped from worklist.
+   vopen      DFS has visited vertex for the first time.  Vertex has been put
+	      on Tarjan stack.
+   closed     DFS has backtracked through vertex.  At this point, vertex
+	      doesn't have any unvisited neighbors.
+   in_scc     Vertex has been popped from Tarjan stack.  */
+
+enum vstate
+{
+  unvisited,
+  vopen,
+  closed,
+  in_scc
+};
+
+/* Information about a vertex.  Used by SCC discovery.  */
+
+struct vertex
+{
+  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
+		  the whole dataflow graph.  It uses this flag so that it knows
+		  which vertices are part of this subgraph.  */
+  vstate state;
+  unsigned index;
+  unsigned lowlink;
+};
+
+/* SCC discovery.
+
+   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
+   algorithm.  */
+
+class scc_discovery
+{
+public:
+  scc_discovery ();
+  ~scc_discovery ();
+  auto_vec<vec<gimple *>> compute_sccs (vec<gimple *> &stmts);
+
+private:
+  unsigned curr_generation = 0;
+  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
+  auto_vec<unsigned> worklist; /* DFS stack.  */
+  auto_vec<unsigned> stack; /* Tarjan stack.  */
+
+  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
+};
+
+scc_discovery::scc_discovery ()
+{
+  /* Create vertex struct for each SSA name.  */
+  vertices = XNEWVEC (struct vertex, num_ssa_names);
+  unsigned i = 0;
+  for (i = 0; i < num_ssa_names; i++)
+    vertices[i].active = false;
+}
+
+scc_discovery::~scc_discovery ()
+{
+  XDELETEVEC (vertices);
+}
+
+/* Part of 'scc_discovery::compute_sccs ()'.  */
+
+void
+scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
+{
+  if (TREE_CODE (neigh_tree) != SSA_NAME)
+    return; /* Skip any neighbor that isn't an SSA name.  */
+  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
+
+  /* Skip neighbors outside the subgraph that Tarjan currently works
+     with.  */
+  if (!vertices[neigh_version].active)
+    return;
+
+  vstate neigh_state = vertices[neigh_version].state;
+  vstate parent_state = vertices[parent_version].state;
+  if (parent_state == vopen) /* We're currently opening parent.  */
+    {
+      /* Put unvisited neighbors on worklist.  Update lowlink of parent
+	 vertex according to indices of neighbors present on stack.  */
+      switch (neigh_state)
+	{
+	case unvisited:
+	  worklist.safe_push (neigh_version);
+	  break;
+	case vopen:
+	case closed:
+	  vertices[parent_version].lowlink
+	    = std::min (vertices[parent_version].lowlink,
+			vertices[neigh_version].index);
+	  break;
+	case in_scc:
+	  /* Ignore these edges.  */
+	  break;
+	}
+    }
+  else if (parent_state == closed) /* We're currently closing parent.  */
+    {
+      /* Update lowlink of parent vertex according to lowlinks of
+	 children of parent (in terms of DFS tree).  */
+      if (neigh_state == closed)
+	{
+	  vertices[parent_version].lowlink
+	    = std::min (vertices[parent_version].lowlink,
+			vertices[neigh_version].lowlink);
+	}
+    }
+}
+
+/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
+   statements outside 'stmts'.  Return the SCCs in a reverse topological
+   order.
+
+   stmt_may_generate_copy () must be true for all statements from 'stmts'!  */
+
+auto_vec<vec<gimple *>>
+scc_discovery::compute_sccs (vec<gimple *> &stmts)
+{
+  auto_vec<vec<gimple *>> sccs;
+
+  for (gimple *stmt : stmts)
+    {
+      unsigned i;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_ASSIGN:
+	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
+	    break;
+	  case GIMPLE_PHI:
+	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      vertices[i].index = 0;
+      vertices[i].lowlink = 0;
+      vertices[i].state = unvisited;
+      vertices[i].active = true; /* Mark the subgraph we'll be working on so
+				    that we don't leave it.  */
+
+      worklist.safe_push (i);
+    }
+
+  /* Worklist loop.  */
+  unsigned curr_index = 0;
+  while (!worklist.is_empty ())
+    {
+      unsigned i = worklist.pop ();
+      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
+      vstate state = vertices[i].state;
+
+      if (state == unvisited)
+	{
+	  vertices[i].state = vopen;
+
+	  /* Assign index to this vertex.  */
+	  vertices[i].index = curr_index;
+	  vertices[i].lowlink = curr_index;
+	  curr_index++;
+
+	  /* Put vertex on stack and also on worklist to be closed later.  */
+	  stack.safe_push (i);
+	  worklist.safe_push (i);
+	}
+      else if (state == vopen)
+	vertices[i].state = closed;
+
+      /* Visit neighbors of this vertex.  */
+      tree op;
+      gphi *phi;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_PHI:
+	    phi = as_a <gphi *> (stmt);
+	    unsigned j;
+	    for (j = 0; j < gimple_phi_num_args (phi); j++)
+	      {
+		op = gimple_phi_arg_def (phi, j);
+		visit_neighbor (op, i);
+	      }
+	    break;
+	  case GIMPLE_ASSIGN:
+	    op = gimple_assign_rhs1 (stmt);
+	    visit_neighbor (op, i);
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
+      if (state == vopen && vertices[i].lowlink == vertices[i].index)
+	{
+	  vec<gimple *> scc = vNULL;
+
+	  unsigned j;
+	  do
+	    {
+	      j = stack.pop ();
+	      scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
+	      vertices[j].state = in_scc;
+	    }
+	  while (j != i);
+
+	  sccs.safe_push (scc);
+	}
+    }
+
+  if (!stack.is_empty ())
+    gcc_unreachable ();
+
+  /* Clear 'active' flags.  */
+  for (gimple *stmt : stmts)
+    {
+      unsigned i;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_ASSIGN:
+	    i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
+	    break;
+	  case GIMPLE_PHI:
+	    i = SSA_NAME_VERSION (gimple_phi_result (stmt));
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      vertices[i].active = false;
+    }
+
+  return sccs;
+}
+
+/* Could this statement potentially be a copy statement?
+
+   This pass only considers statements for which this function returns 'true'.
+   Those are basically PHI functions and assignment statements similar to
+
+   _2 = _1;
+   or
+   _2 = 5;  */
+
+static bool
+stmt_may_generate_copy (gimple *stmt)
+{
+  /* A PHI may generate a copy.  */
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      gphi *phi = as_a <gphi *> (stmt);
+
+      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
+	return false;
+
+      unsigned i;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) == SSA_NAME
+	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+	    return false;
+	}
+
+      /* If PHI has more than one unique non-SSA arguments, it won't generate a
+	 copy.  */
+      tree const_op = NULL_TREE;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) != SSA_NAME)
+	    {
+	      if (const_op && !operand_equal_p (op, const_op))
+		return false;
+	      const_op = op;
+	    }
+	}
+
+      return true;
+    }
+
+  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
+
+  if (!gimple_assign_single_p (stmt))
+    return false;
+
+  tree lhs = gimple_assign_lhs (stmt);
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+
+  /* lhs shouldn't flow through any abnormal edges.  */
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+    return false;
+
+  if (is_gimple_min_invariant (rhs))
+    return true;  /* A statement of type _2 = 5;.  */
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+
+  /* rhs shouldn't flow through any abnormal edges.  */
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
+    return false;
+
+  /* It is possible that lhs has more alignment or value range information.  By
+     propagating we would lose this information.  So in the case that alignment
+     or value range information differs, we are conservative and do not
+     propagate.
+
+     FIXME: Propagate alignment and value range info the same way copy-prop
+     does.  */
+  if (POINTER_TYPE_P (TREE_TYPE (lhs))
+      && POINTER_TYPE_P (TREE_TYPE (rhs))
+      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
+    return false;
+  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
+      && !POINTER_TYPE_P (TREE_TYPE (rhs))
+      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
+    return false;
+
+  return true;  /* A statement of type _2 = _1;.  */
+}
+
+/* Return all statements in cfun that could generate copies.  All statements
+   for which stmt_may_generate_copy returns 'true'.  */
+
+static auto_vec<gimple *>
+get_all_stmt_may_generate_copy (void)
+{
+  auto_vec<gimple *> result;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *s = gsi_stmt (gsi);
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *s = pi.phi ();
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+    }
+
+  return result;
+}
+
+/* For each statement from given SCC, replace its usages by value
+   VAL.  */
+
+static void
+replace_scc_by_value (vec<gimple *> scc, tree val)
+{
+  for (gimple *stmt : scc)
+    {
+      tree name = gimple_get_lhs (stmt);
+      replace_uses_by (name, val);
+      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
+}
+
+/* Part of 'sccopy_propagate ()'.  */
+
+static void
+sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
+		 hash_set<gimple *> &scc_set, bool &is_inner,
+		 tree &last_outer_op)
+{
+  bool op_in_scc = false;
+
+  if (TREE_CODE (op) == SSA_NAME)
+    {
+      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
+      if (scc_set.contains (op_stmt))
+	op_in_scc = true;
+    }
+
+  if (!op_in_scc)
+    {
+      outer_ops.add (op);
+      last_outer_op = op;
+      is_inner = false;
+    }
+}
+
+/* Main function of this pass.  Find and propagate all three types of copy
+   statements (see pass description above).
+
+   This is an implementation of an algorithm from the paper Simple and
+   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
+   on strongly-connected components (SCCs) in dataflow graph.  The original
+   algorithm only considers PHI statements.  We extend it to also consider
+   assignment statements of type _2 = _1;.
+
+   The algorithm is based on this definition of a set of redundant PHIs[1]:
+
+     A non-empty set P of PHI functions is redundant iff the PHI functions just
+     reference each other or one other value
+
+   It uses this lemma[1]:
+
+     Let P be a redundant set of PHI functions.  Then there is a
+     strongly-connected component S subset of P that is also redundant.
+
+   The algorithm works in this way:
+
+     1 Find SCCs
+     2 For each SCC S in topological order:
+     3   Construct set 'inner' of statements that only have other statements
+	 from S on their right hand side
+     4   Construct set 'outer' of values that originate outside S and appear on
+	 right hand side of some statement from S
+     5   If |outer| = 1, outer only contains a value v.  Statements in S only
+	 refer to each other or to v -- they are redundant.  Propagate v.
+	 Else, recurse on statements in inner.
+
+   The implementation is non-recursive.
+
+   References:
+
+     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
+     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
+     Section 3.2.  */
+
+static void
+sccopy_propagate ()
+{
+  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
+  scc_discovery discovery;
+
+  auto_vec<vec<gimple *>> worklist = discovery.compute_sccs (useful_stmts);
+
+  while (!worklist.is_empty ())
+    {
+      vec<gimple *> scc = worklist.pop ();
+
+      auto_vec<gimple *> inner;
+      hash_set<tree> outer_ops;
+      tree last_outer_op = NULL_TREE;
+
+      /* Prepare hash set of PHIs in scc to query later.  */
+      hash_set<gimple *> scc_set;
+      for (gimple *stmt : scc)
+	scc_set.add (stmt);
+
+      for (gimple *stmt : scc)
+	{
+	  bool is_inner = true;
+
+	  gphi *phi;
+	  tree op;
+
+	  switch (gimple_code (stmt))
+	    {
+	      case GIMPLE_PHI:
+		phi = as_a <gphi *> (stmt);
+		unsigned j;
+		for (j = 0; j < gimple_phi_num_args (phi); j++)
+		  {
+		    op = gimple_phi_arg_def (phi, j);
+		    sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+				   last_outer_op);
+		  }
+		break;
+	      case GIMPLE_ASSIGN:
+		op = gimple_assign_rhs1 (stmt);
+		sccopy_visit_op (op, outer_ops, scc_set, is_inner,
+			       last_outer_op);
+		break;
+	      default:
+		gcc_unreachable ();
+	    }
+
+	  if (is_inner)
+	    inner.safe_push (stmt);
+	}
+
+      if (outer_ops.elements () == 1)
+	{
+	  /* The only operand in outer_ops.  */
+	  tree outer_op = last_outer_op;
+	  replace_scc_by_value (scc, outer_op);
+	}
+      else if (outer_ops.elements () > 1)
+	{
+	  /* Add inner sccs to worklist.  */
+	  auto_vec<vec<gimple *>> inner_sccs
+	    = discovery.compute_sccs (inner);
+	  for (vec<gimple *> inner_scc : inner_sccs)
+	    worklist.safe_push (inner_scc);
+	}
+      else
+	gcc_unreachable ();
+
+      scc.release ();
+    }
+}
+
+/* Called when pass execution starts.  */
+
+static void
+init_sccopy (void)
+{
+  /* For propagated statements.  */
+  dead_stmts = BITMAP_ALLOC (NULL);
+}
+
+/* Called before pass execution ends.  */
+
+static void
+finalize_sccopy (void)
+{
+  /* Remove all propagated statements.  */
+  simple_dce_from_worklist (dead_stmts);
+  BITMAP_FREE (dead_stmts);
+
+  /* Propagating a constant may create dead eh edges.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    gimple_purge_dead_eh_edges (bb);
+}
+
+namespace {
+
+const pass_data pass_data_sccopy =
+{
+  GIMPLE_PASS, /* type */
+  "sccopy", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_sccopy : public gimple_opt_pass
+{
+public:
+  pass_sccopy (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_sccopy, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
+}; // class pass_sccopy
+
+unsigned
+pass_sccopy::execute (function *)
+{
+  init_sccopy ();
+  sccopy_propagate ();
+  finalize_sccopy ();
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_sccopy (gcc::context *ctxt)
+{
+  return new pass_sccopy (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 1e1950bdb39..048ad3ee7a5 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_if_to_switch);
 	  NEXT_PASS (pass_convert_switch);
 	  NEXT_PASS (pass_cleanup_eh);
+	  NEXT_PASS (pass_sccopy);
 	  NEXT_PASS (pass_profile);
 	  NEXT_PASS (pass_local_pure_const);
 	  NEXT_PASS (pass_modref);
@@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
 	 However, this also causes us to misdiagnose cases that should be
 	 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
+      NEXT_PASS (pass_sccopy);
       NEXT_PASS (pass_tail_calls);
       /* Split critical edges before late uninit warning to reduce the
          number of false positives from it.  */
diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
new file mode 100644
index 00000000000..1e61a6b320e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/sccopy-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
+/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
+
+int __GIMPLE (ssa, startwith ("sccopy"))
+main ()
+{
+  int a;
+  int y;
+  int x;
+  int _1;
+  int _2;
+  int _13;
+
+  __BB(2):
+  if (x_7(D) == 5)
+    goto __BB3;
+  else
+    goto __BB4;
+
+  __BB(3):
+  a_10 = x_7(D);
+  goto __BB5;
+
+  __BB(4):
+  a_9 = y_8(D);
+  goto __BB5;
+
+  __BB(5):
+  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
+  if (x_7(D) == y_8(D))
+    goto __BB6;
+  else
+    goto __BB11;
+
+  __BB(6):
+  a_11 = a_3 + 1;
+  goto __BB7;
+
+  __BB(7):
+  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
+label1:
+  if (x_7(D) != y_8(D))
+    goto __BB8;
+  else
+    goto __BB10;
+
+  __BB(8):
+  goto __BB9;
+
+  __BB(9):
+  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
+  goto __BB10;
+
+  __BB(10,loop_header(1)):
+  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
+label2:
+  _1 = y_8(D) * 2;
+  if (x_7(D) == _1)
+    goto __BB9;
+  else
+    goto __BB11;
+
+  __BB(11):
+  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
+  _2 = x_7(D) * 3;
+  if (y_8(D) == _2)
+    goto __BB7;
+  else
+    goto __BB12;
+
+  __BB(12):
+  _13 = 0;
+  return _13;
+
+}
+
+
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 09e6ada5b2f..94a48461590 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] A new copy propagation and PHI elimination pass
  2023-12-13 16:12           ` [PATCH v4] " Filip Kastl
@ 2023-12-13 16:15             ` Richard Biener
  2023-12-14 10:26               ` Filip Kastl
  2023-12-14 13:22             ` In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>' (was: [PATCH v4] A new copy propagation and PHI elimination pass) Thomas Schwinge
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-12-13 16:15 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka



> Am 13.12.2023 um 17:12 schrieb Filip Kastl <fkastl@suse.cz>:
> 
> 
>> 
>>>> Hi,
>>>> 
>>>> this is a patch that I submitted two months ago as an RFC. I added some polish
>>>> since.
>>>> 
>>>> It is a new lightweight pass that removes redundant PHI functions and as a
>>>> bonus does basic copy propagation. With Jan Hubi?ka we measured that it is able
>>>> to remove usually more than 5% of all PHI functions when run among early passes
>>>> (sometimes even 13% or more). Those are mostly PHI functions that would be
>>>> later optimized away but with this pass it is possible to remove them early
>>>> enough so that they don't get streamed when runing LTO (and also potentially
>>>> inlined at multiple places). It is also able to remove some redundant PHIs
>>>> that otherwise would still be present during RTL expansion.
>>>> 
>>>> Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
>>>> with and without this patch. These are the sizes of .debug_info and
>>>> .debug_loclists
>>>> 
>>>> .debug_info without patch 181694311
>>>> .debug_info    with patch 181692320
>>>> +0.0011% change
>>>> 
>>>> .debug_loclists without patch 47934753
>>>> .debug_loclists    with patch 47934966
>>>> -0.0004% change
>>>> 
>>>> I wanted to use dwlocstat to compare debug coverages but didn't manage to get
>>>> the program working on my machine sadly. Hope this suffices. Seems to me that
>>>> my patch doesn't have a significant impact on debug info.
>>>> 
>>>> Bootstraped and tested* on x86_64-pc-linux-gnu.
>>>> 
>>>> * One testcase (pr79691.c) did regress. However that is because the test is
>>>> dependent on a certain variable not being copy propagated. I will go into more
>>>> detail about this in a reply to this mail.
>>>> 
>>>> Ok to commit?
>>> 
>>> This is a second version of the patch.  In this version, I modified the
>>> pr79691.c testcase so that it works as intended with other changes from the
>>> patch.
>>> 
>>> The pr79691.c testcase checks that we get constants from snprintf calls and
>>> that they simplify into a single constant.  The testcase doesn't account for
>>> the fact that this constant may be further copy propagated which is exactly
>>> what happens with this patch applied.
>>> 
>>> Bootstrapped and tested on x86_64-pc-linux-gnu.
>>> 
>>> Ok to commit?
>> 
>> This is the third version of the patch. In this version, I adressed most of
>> Richards remarks about the second version. Here is a summary of changes I made:
>> 
>> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
>> - Use simple_dce_from_worklist to remove propagated statements
>> - Use existing replace_uses API instead of reinventing it
>>  - This allowed me to get rid of some now redundant cleanup code
>> - Encapsulate the SCC finding into a class
>> - Rework stmt_may_generate_copy to get rid of redundant checks
>> - Add check that PHI doesn't contain two non-SSA-name values to
>>  stmt_may_generate_copy
>> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>>  the conservative check that Richard suggested
>> - Index array of vertices that SCC discovery uses by SSA name version numbers
>>  instead of numbering statements myself.
>> 
>> 
>> I didn't make any changes based on these remarks:
>> 
>> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>>  many times these appear - possibly prevent them from even entering
>>  the SCC discovery?
>> 
>> It would be nice. But the only way to do this that I see right now is to first
>> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
>> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
>> pass relies on having SCCs topogically sorted so this procedure would have to
>> implement some topological sort algorithm.
>> 
>> This could be done. It could save allocating some vec<>s (right now, SCCs of
>> size 1 are represented by a vec<> with a single element). But is it worth it to
>> optimize the pass this way right now? If possible, I'd like to see that the
>> pass works and sort out any problems people encounter with it before I start
>> optimizing it.
>> 
>> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>>  the pass into a vec<>, let the SCC discovery check that statements may
>>  generate a copy on the fly.
>> 
>> This would be a big change to the pass, it would require a lot of reworking.
>> I'm also not sure if this would help reduce the number of allocated vec<>s that
>> much because I'll still want to represent SCCs by vec<>s.
>> 
>> Again - its possible I'll want to rework the pass in this way in the future but
>> I'd like to leave it as it is for now.
>> 
>> 3 Add a comment saying that the pass is doing optimistic copy propagation
>> 
>> I don't think the pass works in an optimistic way. It doesn't assume that all
>> variables are copies of each other at any point. It instead identifies copy
>> statements (or PHI SCCs that act as copy statements) and then for each of these
>> it propagates: By that I mean if a copy statement says that _3 is a copy of _2,
>> then it replaces all uses of _3 by _2.
>> 
>> But its possible that I still misinterpret what 'optimistic' means. If
>> mentioning that it works in an optimistic way would help readability of the
>> pass, I'm happy to add that comment.
>> 
>> Richard, is the patch ok as it is now? Or do you think that making changes 1, 2
>> or 3 is necessary? Or are there any other issues which should be adressed?
>> 
>> Filip
> 
> This is the fourth version of this patch. Changes in this version:
> - Disabled the pass for -Og
>  - This meant that I could revert my changes to gcc.dg/tree-ssa/pr79691.c
> - Fixed formatting of the patch (like braces around a single-statment if body)
> - compute_sccs (auto_vec<>...) -> compute_sccs (vec<> ...)
> - auto_vec<gimple *> worklist; worklist = ...; ->
>  auto_vec<gimple *> worklist = ...;
> 
> I've also checked that there are no memory leak issues. Valgrind doesn't detect
> any aditional leaks when running with sccopy vs running without sccopy:
> 
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363== Memcheck, a memory error detector
> ==27363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27363== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27363== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363== 
> ==27363== 
> ==27363== HEAP SUMMARY:
> ==27363==     in use at exit: 174,234 bytes in 92 blocks
> ==27363==   total heap usage: 408 allocs, 316 frees, 228,050 bytes allocated
> ==27363== 
> ==27363== LEAK SUMMARY:
> ==27363==    definitely lost: 5,935 bytes in 23 blocks
> ==27363==    indirectly lost: 82 bytes in 5 blocks
> ==27363==      possibly lost: 0 bytes in 0 blocks
> ==27363==    still reachable: 168,217 bytes in 64 blocks
> ==27363==                       of which reachable via heuristic:
> ==27363==                         newarray           : 1,544 bytes in 1 blocks
> ==27363==         suppressed: 0 bytes in 0 blocks
> ==27363== Rerun with --leak-check=full to see details of leaked memory
> ==27363== 
> ==27363== For lists of detected and suppressed errors, rerun with: -s
> ==27363== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372== Memcheck, a memory error detector
> ==27372== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27372== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27372== Command: /home/fkastl/gcc/inst/bin/gcc /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372== 
> cc1: note: disable pass tree-sccopy1 for functions in the range of [0, 4294967295]
> cc1: note: disable pass tree-sccopy2 for functions in the range of [0, 4294967295]
> ==27372== 
> ==27372== HEAP SUMMARY:
> ==27372==     in use at exit: 174,282 bytes in 92 blocks
> ==27372==   total heap usage: 408 allocs, 316 frees, 228,674 bytes allocated
> ==27372== 
> ==27372== LEAK SUMMARY:
> ==27372==    definitely lost: 5,935 bytes in 23 blocks
> ==27372==    indirectly lost: 82 bytes in 5 blocks
> ==27372==      possibly lost: 0 bytes in 0 blocks
> ==27372==    still reachable: 168,265 bytes in 64 blocks
> ==27372==                       of which reachable via heuristic:
> ==27372==                         newarray           : 1,544 bytes in 1 blocks
> ==27372==         suppressed: 0 bytes in 0 blocks
> ==27372== Rerun with --leak-check=full to see details of leaked memory
> ==27372== 
> ==27372== For lists of detected and suppressed errors, rerun with: -s
> ==27372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 
> Richard, I wrote down the bitmap vs hash_set optimizations you suggested so
> that I can work on the pass more later and optimize it.
> 
> I didn't yet bootstrap this version. I just regtested it. Will bootstrap and
> regtest on the most current commit. Once that is successfully done, is the pass
> OK to be pushed to main?

Yes.

Thanks,
Richard 


> Filip
> 
> -- >8 --
> 
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
> 
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
> 
> It also handles more complicated situations, e.g.:
> 
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
> 
> gcc/ChangeLog:
> 
>    * Makefile.in: Added sccopy pass.
>    * passes.def: Added sccopy pass before LTO streaming and before
>      RTL expansion.
>    * tree-pass.h (make_pass_sccopy): Added sccopy pass.
>    * gimple-ssa-sccopy.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>    * gcc.dg/sccopy-1.c: New test.
> 
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> ---
> gcc/Makefile.in                 |   1 +
> gcc/gimple-ssa-sccopy.cc        | 680 ++++++++++++++++++++++++++++++++
> gcc/passes.def                  |   2 +
> gcc/testsuite/gcc.dg/sccopy-1.c |  78 ++++
> gcc/tree-pass.h                 |   1 +
> 5 files changed, 762 insertions(+)
> create mode 100644 gcc/gimple-ssa-sccopy.cc
> create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 753f2f36618..2e6f2fef1ba 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1497,6 +1497,7 @@ OBJS = \
>    gimple-ssa-backprop.o \
>    gimple-ssa-isolate-paths.o \
>    gimple-ssa-nonnull-compare.o \
> +    gimple-ssa-sccopy.o \
>    gimple-ssa-split-paths.o \
>    gimple-ssa-store-merging.o \
>    gimple-ssa-strength-reduction.o \
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..ac5ec32eb32
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -0,0 +1,680 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <fkastl@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "builtins.h"
> +#include "tree-ssa-dce.h"
> +#include "fold-const.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> +   This is a lightweight copy propagation pass that is also able to eliminate
> +   redundant PHI statements.  The pass considers the following types of copy
> +   statements:
> +
> +   1 An assignment statement with a single argument.
> +
> +   _3 = _2;
> +   _4 = 5;
> +
> +   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers to
> +     itself or one other value.
> +
> +   _5 = PHI <_1>;
> +   _6 = PHI <_6, _6, _1, _1>;
> +   _7 = PHI <16, _7>;
> +
> +   3 A set of PHI statements that only refer to each other or to one other
> +     value.
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>;
> +
> +   All of these statements produce copies and can be eliminated from the
> +   program.  For a copy statement we identify the value it creates a copy of
> +   and replace references to the statement with the value -- we propagate the
> +   copy.
> +
> +   _3 = _2; // Replace all occurences of _3 by _2
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> +   To find all three types of copy statements we use an algorithm based on
> +   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
> +   introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> +   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
> +   SCC computation we wrap potential copy statements in the 'vertex' struct.
> +   To each of these statements we also assign a vertex number ('vxnum'). Since
> +   the main algorithm has to be able to compute SCCs of subgraphs of the whole
> +   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> +   leaving the subgraph.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +/* Bitmap tracking statements which were propagated to be removed at the end of
> +   the pass.  */
> +
> +static bitmap dead_stmts;
> +
> +/* State of vertex during SCC discovery.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time.  Vertex has been put
> +          on Tarjan stack.
> +   closed     DFS has backtracked through vertex.  At this point, vertex
> +          doesn't have any unvisited neighbors.
> +   in_scc     Vertex has been popped from Tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex.  Used by SCC discovery.  */
> +
> +struct vertex
> +{
> +  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
> +          the whole dataflow graph.  It uses this flag so that it knows
> +          which vertices are part of this subgraph.  */
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +};
> +
> +/* SCC discovery.
> +
> +   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
> +   algorithm.  */
> +
> +class scc_discovery
> +{
> +public:
> +  scc_discovery ();
> +  ~scc_discovery ();
> +  auto_vec<vec<gimple *>> compute_sccs (vec<gimple *> &stmts);
> +
> +private:
> +  unsigned curr_generation = 0;
> +  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +
> +  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
> +};
> +
> +scc_discovery::scc_discovery ()
> +{
> +  /* Create vertex struct for each SSA name.  */
> +  vertices = XNEWVEC (struct vertex, num_ssa_names);
> +  unsigned i = 0;
> +  for (i = 0; i < num_ssa_names; i++)
> +    vertices[i].active = false;
> +}
> +
> +scc_discovery::~scc_discovery ()
> +{
> +  XDELETEVEC (vertices);
> +}
> +
> +/* Part of 'scc_discovery::compute_sccs ()'.  */
> +
> +void
> +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
> +
> +  /* Skip neighbors outside the subgraph that Tarjan currently works
> +     with.  */
> +  if (!vertices[neigh_version].active)
> +    return;
> +
> +  vstate neigh_state = vertices[neigh_version].state;
> +  vstate parent_state = vertices[parent_version].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist.  Update lowlink of parent
> +     vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +    {
> +    case unvisited:
> +      worklist.safe_push (neigh_version);
> +      break;
> +    case vopen:
> +    case closed:
> +      vertices[parent_version].lowlink
> +        = std::min (vertices[parent_version].lowlink,
> +            vertices[neigh_version].index);
> +      break;
> +    case in_scc:
> +      /* Ignore these edges.  */
> +      break;
> +    }
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +     children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +    {
> +      vertices[parent_version].lowlink
> +        = std::min (vertices[parent_version].lowlink,
> +            vertices[neigh_version].lowlink);
> +    }
> +    }
> +}
> +
> +/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
> +   statements outside 'stmts'.  Return the SCCs in a reverse topological
> +   order.
> +
> +   stmt_may_generate_copy () must be true for all statements from 'stmts'!  */
> +
> +auto_vec<vec<gimple *>>
> +scc_discovery::compute_sccs (vec<gimple *> &stmts)
> +{
> +  auto_vec<vec<gimple *>> sccs;
> +
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_ASSIGN:
> +        i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +        break;
> +      case GIMPLE_PHI:
> +        i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      vertices[i].index = 0;
> +      vertices[i].lowlink = 0;
> +      vertices[i].state = unvisited;
> +      vertices[i].active = true; /* Mark the subgraph we'll be working on so
> +                    that we don't leave it.  */
> +
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  unsigned curr_index = 0;
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
> +      vstate state = vertices[i].state;
> +
> +      if (state == unvisited)
> +    {
> +      vertices[i].state = vopen;
> +
> +      /* Assign index to this vertex.  */
> +      vertices[i].index = curr_index;
> +      vertices[i].lowlink = curr_index;
> +      curr_index++;
> +
> +      /* Put vertex on stack and also on worklist to be closed later.  */
> +      stack.safe_push (i);
> +      worklist.safe_push (i);
> +    }
> +      else if (state == vopen)
> +    vertices[i].state = closed;
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +        op = gimple_phi_arg_def (phi, j);
> +        visit_neighbor (op, i);
> +          }
> +        break;
> +      case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        visit_neighbor (op, i);
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
> +      if (state == vopen && vertices[i].lowlink == vertices[i].index)
> +    {
> +      vec<gimple *> scc = vNULL;
> +
> +      unsigned j;
> +      do
> +        {
> +          j = stack.pop ();
> +          scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
> +          vertices[j].state = in_scc;
> +        }
> +      while (j != i);
> +
> +      sccs.safe_push (scc);
> +    }
> +    }
> +
> +  if (!stack.is_empty ())
> +    gcc_unreachable ();
> +
> +  /* Clear 'active' flags.  */
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_ASSIGN:
> +        i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +        break;
> +      case GIMPLE_PHI:
> +        i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      vertices[i].active = false;
> +    }
> +
> +  return sccs;
> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> +   This pass only considers statements for which this function returns 'true'.
> +   Those are basically PHI functions and assignment statements similar to
> +
> +   _2 = _1;
> +   or
> +   _2 = 5;  */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  /* A PHI may generate a copy.  */
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +    return false;
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +    {
> +      tree op = gimple_phi_arg_def (phi, i);
> +      if (TREE_CODE (op) == SSA_NAME
> +          && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +        return false;
> +    }
> +
> +      /* If PHI has more than one unique non-SSA arguments, it won't generate a
> +     copy.  */
> +      tree const_op = NULL_TREE;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +    {
> +      tree op = gimple_phi_arg_def (phi, i);
> +      if (TREE_CODE (op) != SSA_NAME)
> +        {
> +          if (const_op && !operand_equal_p (op, const_op))
> +        return false;
> +          const_op = op;
> +        }
> +    }
> +
> +      return true;
> +    }
> +
> +  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
> +
> +  if (!gimple_assign_single_p (stmt))
> +    return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +
> +  /* lhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    return false;
> +
> +  if (is_gimple_min_invariant (rhs))
> +    return true;  /* A statement of type _2 = 5;.  */
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  /* rhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    return false;
> +
> +  /* It is possible that lhs has more alignment or value range information.  By
> +     propagating we would lose this information.  So in the case that alignment
> +     or value range information differs, we are conservative and do not
> +     propagate.
> +
> +     FIXME: Propagate alignment and value range info the same way copy-prop
> +     does.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
> +    return false;
> +  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && !POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
> +    return false;
> +
> +  return true;  /* A statement of type _2 = _1;.  */
> +}
> +
> +/* Return all statements in cfun that could generate copies.  All statements
> +   for which stmt_may_generate_copy returns 'true'.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *s = gsi_stmt (gsi);
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +    {
> +      gimple *s = pi.phi ();
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +    }
> +
> +  return result;
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> +   VAL.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree val)
> +{
> +  for (gimple *stmt : scc)
> +    {
> +      tree name = gimple_get_lhs (stmt);
> +      replace_uses_by (name, val);
> +      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'.  */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> +         hash_set<gimple *> &scc_set, bool &is_inner,
> +         tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +    op_in_scc = true;
> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);
> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Main function of this pass.  Find and propagate all three types of copy
> +   statements (see pass description above).
> +
> +   This is an implementation of an algorithm from the paper Simple and
> +   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
> +   on strongly-connected components (SCCs) in dataflow graph.  The original
> +   algorithm only considers PHI statements.  We extend it to also consider
> +   assignment statements of type _2 = _1;.
> +
> +   The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> +     A non-empty set P of PHI functions is redundant iff the PHI functions just
> +     reference each other or one other value
> +
> +   It uses this lemma[1]:
> +
> +     Let P be a redundant set of PHI functions.  Then there is a
> +     strongly-connected component S subset of P that is also redundant.
> +
> +   The algorithm works in this way:
> +
> +     1 Find SCCs
> +     2 For each SCC S in topological order:
> +     3   Construct set 'inner' of statements that only have other statements
> +     from S on their right hand side
> +     4   Construct set 'outer' of values that originate outside S and appear on
> +     right hand side of some statement from S
> +     5   If |outer| = 1, outer only contains a value v.  Statements in S only
> +     refer to each other or to v -- they are redundant.  Propagate v.
> +     Else, recurse on statements in inner.
> +
> +   The implementation is non-recursive.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +static void
> +sccopy_propagate ()
> +{
> +  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> +  scc_discovery discovery;
> +
> +  auto_vec<vec<gimple *>> worklist = discovery.compute_sccs (useful_stmts);
> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */
> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +    scc_set.add (stmt);
> +
> +      for (gimple *stmt : scc)
> +    {
> +      bool is_inner = true;
> +
> +      gphi *phi;
> +      tree op;
> +
> +      switch (gimple_code (stmt))
> +        {
> +          case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +            op = gimple_phi_arg_def (phi, j);
> +            sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +          }
> +        break;
> +          case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +        break;
> +          default:
> +        gcc_unreachable ();
> +        }
> +
> +      if (is_inner)
> +        inner.safe_push (stmt);
> +    }
> +
> +      if (outer_ops.elements () == 1)
> +    {
> +      /* The only operand in outer_ops.  */
> +      tree outer_op = last_outer_op;
> +      replace_scc_by_value (scc, outer_op);
> +    }
> +      else if (outer_ops.elements () > 1)
> +    {
> +      /* Add inner sccs to worklist.  */
> +      auto_vec<vec<gimple *>> inner_sccs
> +        = discovery.compute_sccs (inner);
> +      for (vec<gimple *> inner_scc : inner_sccs)
> +        worklist.safe_push (inner_scc);
> +    }
> +      else
> +    gcc_unreachable ();
> +
> +      scc.release ();
> +    }
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccopy (void)
> +{
> +  /* For propagated statements.  */
> +  dead_stmts = BITMAP_ALLOC (NULL);
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccopy (void)
> +{
> +  /* Remove all propagated statements.  */
> +  simple_dce_from_worklist (dead_stmts);
> +  BITMAP_FREE (dead_stmts);
> +
> +  /* Propagating a constant may create dead eh edges.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    gimple_purge_dead_eh_edges (bb);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccopy", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> +  pass_sccopy (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccopy, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> +  init_sccopy ();
> +  sccopy_propagate ();
> +  finalize_sccopy ();
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> +  return new pass_sccopy (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..048ad3ee7a5 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>      NEXT_PASS (pass_if_to_switch);
>      NEXT_PASS (pass_convert_switch);
>      NEXT_PASS (pass_cleanup_eh);
> +      NEXT_PASS (pass_sccopy);
>      NEXT_PASS (pass_profile);
>      NEXT_PASS (pass_local_pure_const);
>      NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
>     However, this also causes us to misdiagnose cases that should be
>     real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> +      NEXT_PASS (pass_sccopy);
>       NEXT_PASS (pass_tail_calls);
>       /* Split critical edges before late uninit warning to reduce the
>          number of false positives from it.  */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> +  int a;
> +  int y;
> +  int x;
> +  int _1;
> +  int _2;
> +  int _13;
> +
> +  __BB(2):
> +  if (x_7(D) == 5)
> +    goto __BB3;
> +  else
> +    goto __BB4;
> +
> +  __BB(3):
> +  a_10 = x_7(D);
> +  goto __BB5;
> +
> +  __BB(4):
> +  a_9 = y_8(D);
> +  goto __BB5;
> +
> +  __BB(5):
> +  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> +  if (x_7(D) == y_8(D))
> +    goto __BB6;
> +  else
> +    goto __BB11;
> +
> +  __BB(6):
> +  a_11 = a_3 + 1;
> +  goto __BB7;
> +
> +  __BB(7):
> +  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> +  if (x_7(D) != y_8(D))
> +    goto __BB8;
> +  else
> +    goto __BB10;
> +
> +  __BB(8):
> +  goto __BB9;
> +
> +  __BB(9):
> +  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> +  goto __BB10;
> +
> +  __BB(10,loop_header(1)):
> +  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> +  _1 = y_8(D) * 2;
> +  if (x_7(D) == _1)
> +    goto __BB9;
> +  else
> +    goto __BB11;
> +
> +  __BB(11):
> +  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> +  _2 = x_7(D) * 3;
> +  if (y_8(D) == _2)
> +    goto __BB7;
> +  else
> +    goto __BB12;
> +
> +  __BB(12):
> +  _13 = 0;
> +  return _13;
> +
> +}
> +
> +
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> --
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] A new copy propagation and PHI elimination pass
  2023-12-13 16:15             ` Richard Biener
@ 2023-12-14 10:26               ` Filip Kastl
  0 siblings, 0 replies; 10+ messages in thread
From: Filip Kastl @ 2023-12-14 10:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, hubicka

Successfully bootstrapped and regtested on x86_64-linux. Will push to master.

Filip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>' (was: [PATCH v4] A new copy propagation and PHI elimination pass)
  2023-12-13 16:12           ` [PATCH v4] " Filip Kastl
  2023-12-13 16:15             ` Richard Biener
@ 2023-12-14 13:22             ` Thomas Schwinge
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Schwinge @ 2023-12-14 13:22 UTC (permalink / raw)
  To: Filip Kastl, gcc-patches; +Cc: Richard Biener, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

Hi!

On 2023-12-13T17:12:11+0100, Filip Kastl <fkastl@suse.cz> wrote:
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc

> +#include <algorithm>

Pushed to master branch commit 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2
"In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>'",
see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-In-gcc-gimple-ssa-sccopy.cc-define-INCLUDE_ALGORITHM.patch --]
[-- Type: text/x-diff, Size: 2111 bytes --]

From 65e41f4fbfc539c5cc429c684176f8ea39f4b8f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 14 Dec 2023 14:12:45 +0100
Subject: [PATCH] In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM'
 instead of '#include <algorithm>'

... to avoid issues such as:

    In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
                     from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
                     from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
    [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
         return malloc (size);
                ^
    make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1

Minor fix-up for commit cd794c3961017703a4d2ca0e854ea23b3d4b6373
"A new copy propagation and PHI elimination pass".

	gcc/
	* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
	'#include <algorithm>'.
---
 gcc/gimple-ssa-sccopy.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
index ac5ec32eb32..7ebb6c05caf 100644
--- a/gcc/gimple-ssa-sccopy.cc
+++ b/gcc/gimple-ssa-sccopy.cc
@@ -18,6 +18,7 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "vec.h"
 #include "hash-set.h"
-#include <algorithm>
 #include "ssa-iterators.h"
 #include "gimple-fold.h"
 #include "gimplify.h"
-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-14 13:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-02 13:00 [PATCH v2] A new copy propagation and PHI elimination pass Filip Kastl
2023-11-17 13:01 ` Richard Biener
2023-11-22 14:20   ` Filip Kastl
2023-11-30 13:04     ` Richard Biener
2023-12-08 15:15       ` [PATCH v3] " Filip Kastl
2023-12-13 11:35         ` Richard Biener
2023-12-13 16:12           ` [PATCH v4] " Filip Kastl
2023-12-13 16:15             ` Richard Biener
2023-12-14 10:26               ` Filip Kastl
2023-12-14 13:22             ` In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>' (was: [PATCH v4] A new copy propagation and PHI elimination pass) Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).