public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] gimple ssa: SCCP - A new PHI optimization pass
@ 2023-08-24 15:07 Filip Kastl
  2023-08-24 15:47 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Filip Kastl @ 2023-08-24 15:07 UTC (permalink / raw)
  To: Gcc-patches; +Cc: mjambor, hubicka, rguenther

Hi,

As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), I
implemented a new PHI elimination algorithm into GCC. The algorithm is
described in the following article:

Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

In the article the PHI elimination algorithm is only considered a part of
another algorithm. However, with Honza we tried running the algorithm as a
standalone pass and found that there is a surprisingly big number of PHI
functions it is able to remove -- sometimes even ~13% of PHI functions or more.
This email contains a patch with the pass and with the placement in passes.def
we used to measure this.

Do you think that the pass is worthy of inclusion into upstream GCC? What are
some things that I should change? Should I try to put the pass in different
places in passes.def?

Things I already know I'd like to change:
- Split the patch into two (one for sccp, one for the utility functions)
- Change the name SCCP to something else since there already is a pass with
  that name (any suggestions?)
- Add a comment into sccp.cc explaining the algorithm

I successfully bootstrapped and tested GCC with the patch applied (with the
commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). 

Here are my measurements. I measured the number of PHIs before the PHI
elimination algorithm was run and after it was run. I measured on the standard
2017 benchmarks with -O3. Since the pass is present in passes.def twice,
results of the first run are marked (1) and results of the second are marked
(2). Honza also did measurements with profile feedback and got even bigger
percentages.

500.perlbench_r
Started with (1) 30287
Ended with (1) 26188
Removed PHI % (1) 13.53385941162875161000
Started with (2) 38005
Ended with (2) 37897
Removed PHI % (2) .28417313511380081600

502.gcc_r
Started with (1) 148187
Ended with (1) 140292
Removed PHI % (1) 5.32772780338356266100
Started with (2) 211479
Ended with (2) 210635
Removed PHI % (2) .39909399987705635100

505.mcf_r
Started with (1) 341
Ended with (1) 303
Removed PHI % (1) 11.14369501466275659900
Started with (2) 430
Ended with (2) 426
Removed PHI % (2) .93023255813953488400

523.xalancbmk_r
Started with (1) 62514
Ended with (1) 57785
Removed PHI % (1) 7.56470550596666346800
Started with (2) 132561
Ended with (2) 131726
Removed PHI % (2) .62989868815111533600

531.deepsjeng_r
Started with (1) 1388
Ended with (1) 1250
Removed PHI % (1) 9.94236311239193083600
Started with (2) 1887
Ended with (2) 1879
Removed PHI % (2) .42395336512983571900

541.leela_r
Started with (1) 3332
Ended with (1) 2994
Removed PHI % (1) 10.14405762304921968800
Started with (2) 4372
Ended with (2) 4352
Removed PHI % (2) .45745654162854528900

Here is the patch:

-- >8 --

This patch introduces two things:
- A new PHI elimination pass (major)
- New utility functions for passes that replace one ssa name with a
  different one (minor)

Strongly-connected copy propagation (SCCP) pass

The PHI elimination pass is a lightweight optimization pass based on
strongly-connected components. Some set of PHIs may be redundant because
the PHIs only refer to each other or to a single value from outside the
set. This pass finds and eliminates these sets. As a bonus the pass also
does some basic copy propagation because it considers a copy statement
to be a PHI with a single argument.

SCCP uses an algorithm from this article:
Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

cleanup_after_replace and cleanup_after_all_replaces_done

Whenever you replace all uses of an ssa name by a different ssa name,
some GCC internal structures have to be updated. To streamline this
process, the patch adds the cleanup_after_replace function that should
be called after an ssa name is replaced by a different one and the
cleanup_after_all_replaces_done that should be called before a pass that
replaced one or more ssa names exits. The SCCP pass uses these
functions.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

gcc/ChangeLog:

	* Makefile.in: Added sccp pass.
	* passes.def: Added sccp pass to early and late optimizations.
	* tree-pass.h (make_pass_sccp): Added sccp pass.
	* tree-ssa-propagate.cc (cleanup_after_replace): New function.
	(cleanup_after_all_replaces_done): New function.
	* tree-ssa-propagate.h (cleanup_after_replace): New function.
	(cleanup_after_all_replaces_done): New function.
	* sccp.cc: New file.
---
 gcc/Makefile.in           |   1 +
 gcc/passes.def            |   2 +
 gcc/sccp.cc               | 722 ++++++++++++++++++++++++++++++++++++++
 gcc/tree-pass.h           |   1 +
 gcc/tree-ssa-propagate.cc |  62 ++++
 gcc/tree-ssa-propagate.h  |   7 +
 6 files changed, 795 insertions(+)
 create mode 100644 gcc/sccp.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 78779546459..7c34eb6ceda 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1706,6 +1706,7 @@ OBJS = \
 	tree-ssa-forwprop.o \
 	tree-ssa-ifcombine.o \
 	tree-ssa-live.o \
+	sccp.o \
 	tree-ssa-loop-ch.o \
 	tree-ssa-loop-im.o \
 	tree-ssa-loop-ivcanon.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index ef5a21afe49..4c94f9cec8b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.  If not see
           NEXT_PASS (pass_dse);
 	  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
 	  NEXT_PASS (pass_phiopt, true /* early_p */);
+	  NEXT_PASS (pass_sccp);
 	  NEXT_PASS (pass_tail_recursion);
 	  NEXT_PASS (pass_if_to_switch);
 	  NEXT_PASS (pass_convert_switch);
@@ -271,6 +272,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_tsan);
       NEXT_PASS (pass_dse, true /* use DR analysis */);
       NEXT_PASS (pass_dce);
+      NEXT_PASS (pass_sccp);
       /* Pass group that runs when 1) enabled, 2) there are loops
 	 in the function.  Make sure to run pass_fix_loops before
 	 to discover/remove loops before running the gate function
diff --git a/gcc/sccp.cc b/gcc/sccp.cc
new file mode 100644
index 00000000000..86a79303902
--- /dev/null
+++ b/gcc/sccp.cc
@@ -0,0 +1,722 @@
+/* Strongly connected copy propagation pass for the GNU compiler.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   Contributed by Filip Kastl <filip.kastl@gmail.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "vec.h"
+#include "hash-set.h"
+#include <algorithm>
+#include "ssa-iterators.h"
+#include "gimple-fold.h"
+#include "gimplify.h"
+#include "tree-cfg.h"
+#include "tree-ssa-propagate.h"
+
+/* State of vertex during tarjan computation.
+
+   unvisited  Vertex hasn't yet been popped from worklist.
+   vopen      DFS has visited vertex for the first time. Vertex has been put on
+	      Tarjan stack.
+   closed     DFS has backtracked through vertex. At this point, vertex doesn't
+	      have any unvisited neighbors.
+   in_scc     Vertex has been popped from tarjan stack.  */
+
+enum vstate
+{
+  unvisited,
+  vopen,
+  closed,
+  in_scc
+};
+
+/* Information about a vertex used by tarjan.  */
+
+struct vertex
+{
+  vstate state;
+  unsigned index;
+  unsigned lowlink;
+  gimple* stmt;
+};
+
+static bool
+stmt_may_generate_copy (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_PHI)
+    {
+      gphi *phi = as_a <gphi *> (stmt);
+
+      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
+	{
+	  return false;
+	}
+
+      unsigned i;
+      for (i = 0; i < gimple_phi_num_args (phi); i++)
+	{
+	  tree op = gimple_phi_arg_def (phi, i);
+	  if (TREE_CODE (op) == SSA_NAME &&
+	      SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+	    {
+	      return false;
+	    }
+	}
+
+      return true;
+    }
+
+  if (gimple_code (stmt) != GIMPLE_ASSIGN)
+    {
+      return false;
+    }
+
+  /* If the statement has volatile operands, it won't generate a
+     useful copy.  */
+  if (gimple_has_volatile_ops (stmt))
+    {
+      return false;
+    }
+
+  /* Statements with loads and/or stores will never generate a useful copy.  */
+  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
+    {
+      return false;
+    }
+
+  if (!gimple_assign_single_p (stmt))
+    {
+      return false;
+    }
+
+  tree lhs = gimple_assign_lhs (stmt);
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+    {
+      return false;
+    }
+
+  /* If the assignment is from a constant it generates a useful copy.  */
+  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
+    {
+      return true;
+    }
+
+  /* Otherwise, the only statements that generate useful copies are
+     assignments whose single SSA use doesn't flow through abnormal
+     edges.  */
+  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
+
+  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
+    {
+      return false;
+    }
+
+  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
+    {
+      return false;
+    }
+
+  if (!rhs)
+    {
+      return false;
+    }
+
+  return true;
+}
+
+/* Set 'using' flag of gimple statement to true.
+   If the flag isn't set, tarjan will ignore the statement.  */
+
+static void
+tarjan_set_using (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_1, true);
+}
+
+/* Clear 'using' flag of gimple statement to false.  */
+
+static void
+tarjan_clear_using (gimple* stmt)
+{
+  gimple_set_plf (stmt, GF_PLF_1, false);
+}
+
+/* Return value of 'using' flag of gimple statement.  */
+
+static bool
+tarjan_is_using (gimple* stmt)
+{
+  return gimple_plf (stmt, GF_PLF_1);
+}
+
+/* Set 'vxnum' (vertex number) of statement. Before computing SCCs, tarjan
+   assigns unique vxnums to statements that it will use.  */
+
+static void
+tarjan_set_vxnum (gimple* stmt, unsigned num)
+{
+  gimple_set_uid (stmt, num);
+}
+
+/* Return 'vxnum' (vertex number) of statement.  */
+
+static unsigned
+tarjan_vxnum (gimple* stmt)
+{
+  return gimple_uid (stmt);
+}
+
+/* Create and initialize vertex struct for each given statement.  */
+
+static auto_vec<vertex>
+tarjan_stmts_to_vertices (auto_vec<gimple *> &stmts)
+{
+  auto_vec<vertex> result;
+  for (gimple *stmt : stmts)
+    {
+      vertex v;
+      v.state = unvisited;
+      v.index = 0;
+      v.lowlink = 0;
+      v.stmt = stmt;
+
+      result.safe_push (v);
+    }
+  return result;
+}
+
+static void
+tarjan_visit_neighbor (tree neigh_tree, unsigned parent_vxnum,
+		       auto_vec<vertex> &vs, auto_vec<unsigned> &worklist)
+{
+  if (TREE_CODE (neigh_tree) != SSA_NAME)
+    return; /* Skip any neighbor that isn't an SSA name.  */
+
+  gimple *neigh_stmt = SSA_NAME_DEF_STMT (neigh_tree);
+
+  /* Skip neighbors outside the induced subgraph that Tarjan currently works
+     with. */
+  if (!tarjan_is_using (neigh_stmt))
+    return;
+  unsigned neigh_vxnum = tarjan_vxnum (neigh_stmt);
+
+  gcc_checking_assert (stmt_may_generate_copy (neigh_stmt));
+
+  vstate neigh_state = vs[neigh_vxnum].state;
+  vstate parent_state = vs[parent_vxnum].state;
+  if (parent_state == vopen) /* We're currently opening parent.  */
+    {
+      /* Put unvisited neighbors on worklist. Update lowlink of parent
+	 vertex according to indices of neighbors present on stack.  */
+      switch (neigh_state)
+	{
+	case unvisited:
+	  worklist.safe_push (neigh_vxnum);
+	  break;
+	case vopen:
+	case closed:
+	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
+					       vs[neigh_vxnum].index);
+	  break;
+	case in_scc:
+	  /* Ignore these edges.  */
+	  break;
+	}
+    }
+  else if (parent_state == closed) /* We're currently closing parent.  */
+    {
+      /* Update lowlink of parent vertex according to lowlinks of
+	 children of parent (in terms of DFS tree).  */
+      if (neigh_state == closed)
+	{
+	  vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
+					       vs[neigh_vxnum].lowlink);
+	}
+    }
+}
+
+/* Nonrecursive implementation of Tarjan's algorithm for computing strongly
+   connected components in a graph. Statements are vertices. Edges lead from a
+   copy stmt p using another copy stmt q to the stmt being used (p -> q when q
+   is operand of p).
+
+   Considers only the subgraph induced by given statements. */
+
+static auto_vec<vec<gimple *>>
+tarjan_compute_sccs (auto_vec<gimple *> &copy_stmts)
+{
+  auto_vec<vec<gimple *>> sccs;
+  auto_vec<unsigned> worklist; /* DFS stack.  */
+  auto_vec<unsigned> stack; /* Tarjan stack.  */
+  unsigned curr_index = 0;
+
+  auto_vec<vertex> vs = tarjan_stmts_to_vertices (copy_stmts);
+
+  /* Mark the subgraph induced by 'copy_stmts' and index it by vxnums. */
+  unsigned i;
+  for (i = 0; i < vs.length (); i++)
+    {
+      gimple *stmt = vs[i].stmt;
+      tarjan_set_using (stmt);
+      tarjan_set_vxnum (stmt, i);
+    }
+
+  /* Put all vertices on worklist.  */
+  for (i = 0; i < vs.length (); i++)
+    {
+      worklist.safe_push (i);
+    }
+
+  /* Worklist loop.  */
+  while (!worklist.is_empty ())
+    {
+      unsigned i = worklist.pop();
+      gimple *stmt = vs[i].stmt;
+      vstate state = vs[i].state;
+
+      if (state == unvisited)
+	{
+	  vs[i].state = vopen;
+
+	  /* Assign index to this vertex.  */
+	  vs[i].index = curr_index;
+	  vs[i].lowlink = curr_index;
+	  curr_index++;
+
+	  /* Put vertex on stack and also on worklist to be closed later.  */
+	  stack.safe_push (i);
+	  worklist.safe_push (i);
+	}
+      else if (state == vopen)
+	{
+	  vs[i].state = closed;
+	}
+
+      /* Visit neighbors of this vertex.  */
+      tree op;
+      gphi *phi;
+      switch (gimple_code (stmt))
+	{
+	  case GIMPLE_PHI:
+	    phi = as_a <gphi *> (stmt);
+	    unsigned j;
+	    for (j = 0; j < gimple_phi_num_args (phi); j++)
+	      {
+		op = gimple_phi_arg_def (phi, j);
+		tarjan_visit_neighbor (op, i, vs, worklist);
+	      }
+	    break;
+	  case GIMPLE_ASSIGN:
+	    op = gimple_assign_rhs1 (stmt);
+	    tarjan_visit_neighbor (op, i, vs, worklist);
+	    break;
+	  default:
+	    gcc_unreachable ();
+	}
+
+      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
+      if (state == vopen && vs[i].lowlink == vs[i].index)
+	{
+	  vec<gimple *> scc = vNULL;
+
+	  unsigned j;
+	  do
+	    {
+	      j = stack.pop();
+	      scc.safe_push (vs[j].stmt);
+	      vs[j].state = in_scc;
+	    }
+	  while (j != i);
+
+	  sccs.safe_push (scc);
+	}
+    }
+
+  if (!stack.is_empty ())
+  {
+    gcc_unreachable();
+  }
+
+  /* Clear copy stmts' 'using' flags.  */
+  for (vertex v : vs)
+    {
+      gimple *s = v.stmt;
+      tarjan_clear_using (s);
+    }
+
+  return sccs;
+}
+
+static void
+replace_use_by (tree get_replaced, tree replace_by, bitmap need_eh_cleanup,
+		bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup)
+{
+  /* Replace each occurence of 'get_replaced' by 'replace_by'.  */
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple *use_stmt;
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, get_replaced)
+    {
+      bool was_noreturn = false;
+      bool can_make_abnormal_goto = false;
+      if (is_gimple_call (use_stmt))
+	{
+	  was_noreturn = gimple_call_noreturn_p (use_stmt);
+	  can_make_abnormal_goto = stmt_can_make_abnormal_goto (use_stmt);
+	}
+
+      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+	SET_USE (use_p, unshare_expr (replace_by));
+
+      /* Recompute tree invariant.  */
+      if (gimple_assign_single_p (use_stmt))
+	{
+	  tree rhs = gimple_assign_rhs1 (use_stmt);
+
+	  if (TREE_CODE (rhs) == ADDR_EXPR)
+	    recompute_tree_invariant_for_addr_expr (rhs);
+	}
+
+      /* Cleanup.  */
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      fold_stmt (&gsi);
+      gimple_set_modified (gsi_stmt (gsi), true);
+      cleanup_after_replace (use_stmt, gsi_stmt (gsi), need_eh_cleanup,
+			     need_ab_cleanup, stmts_to_fixup,
+			     can_make_abnormal_goto, was_noreturn);
+    }
+}
+
+/* Modify all usages of PHIs in a given SCC to instead reference a given
+   variable.  */
+
+static void
+replace_scc_by_value (vec<gimple *> scc, tree replace_by, bitmap
+		      need_eh_cleanup, bitmap need_ab_cleanup, vec<gimple *>
+		      stmts_to_fixup)
+{
+  // DEBUG
+  if (dump_file)
+    {
+      fprintf (dump_file, "Replacing SCC of length %d\n", scc.length ());
+    }
+
+  for (gimple *stmt : scc)
+    {
+      tree get_replaced = gimple_get_lhs (stmt);
+      replace_use_by (get_replaced, replace_by, need_eh_cleanup,
+		      need_ab_cleanup, stmts_to_fixup);
+    }
+}
+
+/* Remove all PHIs with zero uses.  */
+
+static void
+remove_zero_uses_phis ()
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
+	{
+	  gphi *phi = pi.phi ();
+	  tree ssa_name = gimple_phi_result (phi);
+	  if (has_zero_uses (ssa_name))
+	    {
+	      /* Note that remove_phi_node() also frees SSA name.  */
+	      remove_phi_node (&pi, true);
+	    }
+	  else
+	    gsi_next (&pi);
+	}
+    }
+}
+
+static void
+sccp_visit_op (tree op, hash_set<tree> &outer_ops, hash_set<gimple *> &scc_set,
+	       bool &is_inner, tree &last_outer_op)
+{
+  bool op_in_scc = false;
+
+  if (TREE_CODE (op) == SSA_NAME)
+    {
+      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
+      if (scc_set.contains (op_stmt))
+	op_in_scc = true;
+    }
+
+  if (!op_in_scc)
+    {
+      outer_ops.add (op);
+      last_outer_op = op;
+      is_inner = false;
+    }
+}
+
+/* Apply Braun et al.'s algorithm on a given set of statements. Treat copy
+   statements as PHI functions with a single argument.
+   Main function of this pass.  */
+
+static void
+sccp_propagate (auto_vec<gimple *> &copy_stmts)
+{
+  auto_vec<vec<gimple *>> worklist;
+  worklist = tarjan_compute_sccs (copy_stmts);
+
+  /* Prepare data structs for cleanup after stmt modification.  */
+  bitmap need_eh_cleanup = BITMAP_ALLOC (NULL);
+  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
+  vec<gimple *> stmts_to_fixup = vNULL;
+
+  while (!worklist.is_empty ())
+    {
+      vec<gimple *> scc = worklist.pop ();
+
+      auto_vec<gimple *> inner;
+      hash_set<tree> outer_ops;
+      tree last_outer_op = NULL_TREE;
+
+      /* Prepare hash set of PHIs in scc to query later.  */
+      hash_set<gimple *> scc_set;
+      for (gimple *stmt : scc)
+	{
+	  scc_set.add (stmt);
+	}
+
+      for (gimple *stmt : scc)
+	{
+	  bool is_inner = true;
+
+	  gphi *phi;
+	  tree op;
+	  switch (gimple_code (stmt))
+	    {
+	      case GIMPLE_PHI:
+		phi = as_a <gphi *> (stmt);
+		unsigned j;
+		for (j = 0; j < gimple_phi_num_args (phi); j++)
+		  {
+		    op = gimple_phi_arg_def (phi, j);
+		    sccp_visit_op (op, outer_ops, scc_set, is_inner,
+				   last_outer_op);
+		  }
+		break;
+	      case GIMPLE_ASSIGN:
+		op = gimple_assign_rhs1 (stmt);
+		sccp_visit_op (op, outer_ops, scc_set, is_inner,
+			       last_outer_op);
+		break;
+	      default:
+		gcc_unreachable ();
+	    }
+
+	  if (is_inner)
+	    {
+	      inner.safe_push (stmt);
+	    }
+	}
+
+      if (outer_ops.elements () == 1)
+	{
+	  /* The only operand in outer_ops.  */
+	  tree outer_op = last_outer_op;
+
+	  replace_scc_by_value (scc, outer_op, need_eh_cleanup,
+				need_ab_cleanup, stmts_to_fixup);
+	}
+      else if (outer_ops.elements () > 1)
+	{
+	  /* Add inner sccs to worklist.  */
+	  auto_vec<vec<gimple *>> inner_sccs = tarjan_compute_sccs (inner);
+	  for (vec<gimple *> inner_scc : inner_sccs)
+	    {
+	      worklist.safe_push (inner_scc);
+	    }
+	}
+      else
+	{
+	  gcc_unreachable ();
+	}
+
+      scc.release ();
+    }
+
+  cleanup_after_all_replaces_done (need_eh_cleanup, need_ab_cleanup,
+				   stmts_to_fixup);
+
+  /* Remove data structs for cleanup after stmt modification.  */
+  BITMAP_FREE (need_eh_cleanup);
+  BITMAP_FREE (need_ab_cleanup);
+  stmts_to_fixup.release ();
+
+  /* We want to remove dead MEM PHIs because memory is in FUD SSA and the dead
+     PHIs would break the FUD property.  */
+  remove_zero_uses_phis ();
+}
+
+/* Return all statements in cfun that may be useful.  */
+
+static auto_vec<gimple *>
+get_all_stmt_may_generate_copy (void)
+{
+  auto_vec<gimple *> result;
+
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *s = gsi_stmt (gsi);
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *s = pi.phi ();
+	  if (stmt_may_generate_copy (s))
+	    result.safe_push (s);
+	}
+    }
+
+  return result;
+}
+
+/* Called when pass execution starts.  */
+
+static void
+init_sccp (void)
+{
+  // Clear 'tarjan using' flag on all statements
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple* stmt = gsi_stmt (gsi);
+	  tarjan_clear_using (stmt);
+	}
+
+      gphi_iterator pi;
+      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	{
+	  gimple *stmt = pi.phi ();
+	  tarjan_clear_using (stmt);
+	}
+    }
+}
+
+/* Called before pass execution ends.  */
+
+static void
+finalize_sccp (void)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      gimple_purge_dead_eh_edges (bb);
+    }
+}
+
+namespace {
+
+const pass_data pass_data_sccp =
+{
+  GIMPLE_PASS, /* type */
+  "sccp", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_sccp : public gimple_opt_pass
+{
+public:
+  pass_sccp (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_sccp, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+  opt_pass * clone () final override { return new pass_sccp (m_ctxt); }
+}; // class pass_sccp
+
+unsigned
+pass_sccp::execute (function *)
+{
+  // DEBUG
+  if (dump_file)
+    {
+      int phis = 0;
+      basic_block bb;
+      FOR_EACH_BB_FN (bb, cfun)
+	{
+	  gphi_iterator pi;
+	  for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	    phis++;
+	}
+      fprintf (dump_file, "Started with %d PHIs\n", phis);
+    }
+
+  init_sccp ();
+  auto_vec<gimple *> stmts = get_all_stmt_may_generate_copy ();
+  sccp_propagate (stmts);
+  finalize_sccp ();
+
+  // DEBUG
+  if (dump_file)
+    {
+      int phis = 0;
+      basic_block bb;
+      FOR_EACH_BB_FN (bb, cfun)
+	{
+	  gphi_iterator pi;
+	  for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
+	    phis++;
+	}
+      fprintf (dump_file, "Ended with %d PHIs\n", phis);
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_sccp (gcc::context *ctxt)
+{
+  return new pass_sccp (ctxt);
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 57865cdfc42..ac1d8848c25 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -398,6 +398,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_sccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index cb68b419b8c..f70240dcd2f 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -1297,3 +1297,65 @@ clean_up_loop_closed_phi (function *fun)
 
   return 0;
 }
+
+void
+cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap need_eh_cleanup,
+		       bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup,
+		       bool can_make_abnormal_goto, bool was_noreturn)
+{
+  basic_block bb = stmt->bb;
+
+  /* If we cleaned up EH information from the statement,
+     remove EH edges.  */
+  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
+    bitmap_set_bit (need_eh_cleanup, bb->index);
+
+  /* If we turned a call with possible abnormal control transfer
+     into one that doesn't, remove abnormal edges.  */
+  if (can_make_abnormal_goto
+      && !stmt_can_make_abnormal_goto (stmt))
+    bitmap_set_bit (need_ab_cleanup, bb->index);
+
+  /* If we turned a not noreturn call into a noreturn one
+     schedule it for fixup.  */
+  if (!was_noreturn
+      && is_gimple_call (stmt)
+      && gimple_call_noreturn_p (stmt))
+    stmts_to_fixup.safe_push (stmt);
+
+  if (gimple_assign_single_p (stmt))
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+
+      if (TREE_CODE (rhs) == ADDR_EXPR)
+	recompute_tree_invariant_for_addr_expr (rhs);
+    }
+
+  update_stmt_if_modified (stmt);
+}
+
+void
+cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
+				 need_ab_cleanup, vec<gimple *> stmts_to_fixup)
+{
+  if (!bitmap_empty_p (need_eh_cleanup))
+    gimple_purge_all_dead_eh_edges (need_eh_cleanup);
+  if (!bitmap_empty_p (need_ab_cleanup))
+    gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
+
+  /* Fixup stmts that became noreturn calls.  This may require splitting
+     blocks and thus isn't possible during the dominator walk.  Do this
+     in reverse order so we don't inadvertedly remove a stmt we want to
+     fixup by visiting a dominating now noreturn call first.  */
+  while (!stmts_to_fixup.is_empty ())
+    {
+      gimple *stmt = stmts_to_fixup.pop ();
+      if (dump_file && dump_flags & TDF_DETAILS)
+	{
+	  fprintf (dump_file, "Fixing up noreturn call ");
+	  print_gimple_stmt (dump_file, stmt, 0);
+	  fprintf (dump_file, "\n");
+	}
+      fixup_noreturn_call (stmt);
+    }
+}
diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h
index 29bde37add9..a645baeacf4 100644
--- a/gcc/tree-ssa-propagate.h
+++ b/gcc/tree-ssa-propagate.h
@@ -72,6 +72,13 @@ extern void propagate_value (use_operand_p, tree);
 extern void replace_exp (use_operand_p, tree);
 extern void propagate_tree_value (tree *, tree);
 extern void propagate_tree_value_into_stmt (gimple_stmt_iterator *, tree);
+extern void cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap
+				   need_eh_cleanup, bitmap need_ab_cleanup,
+				   vec<gimple *> stmts_to_fixup, bool
+				   can_make_abnormal_goto, bool was_noreturn);
+extern void cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
+					     need_ab_cleanup, vec<gimple *>
+					     stms_to_fixup);
 
 /* Public interface into the SSA propagation engine.  Clients should inherit
    from this class and provide their own visitors.  */
-- 
2.41.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-24 15:07 [RFC] gimple ssa: SCCP - A new PHI optimization pass Filip Kastl
@ 2023-08-24 15:47 ` Richard Biener
  2023-08-24 15:54   ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2023-08-24 15:47 UTC (permalink / raw)
  To: fkastl; +Cc: Gcc-patches, mjambor, hubicka



> Am 24.08.2023 um 17:07 schrieb Filip Kastl <fkastl@suse.cz>:
> 
> Hi,
> 
> As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), I
> implemented a new PHI elimination algorithm into GCC. The algorithm is
> described in the following article:
> 
> Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
> (2013). Simple and Efficient Construction of Static Single Assignment
> Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
> 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
> Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6
> 
> In the article the PHI elimination algorithm is only considered a part of
> another algorithm. However, with Honza we tried running the algorithm as a
> standalone pass and found that there is a surprisingly big number of PHI
> functions it is able to remove -- sometimes even ~13% of PHI functions or more.
> This email contains a patch with the pass and with the placement in passes.def
> we used to measure this.
> 
> Do you think that the pass is worthy of inclusion into upstream GCC? What are
> some things that I should change? Should I try to put the pass in different
> places in passes.def?

The most obvious places would be right after SSA construction and before RTL expansion.  Can you provide measurements for those positions?  Can the pass somehow be used as part of propagations like during value numbering?

Richard 

> Things I already know I'd like to change:
> - Split the patch into two (one for sccp, one for the utility functions)
> - Change the name SCCP to something else since there already is a pass with
>  that name (any suggestions?)
> - Add a comment into sccp.cc explaining the algorithm
> 
> I successfully bootstrapped and tested GCC with the patch applied (with the
> commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). 
> 
> Here are my measurements. I measured the number of PHIs before the PHI
> elimination algorithm was run and after it was run. I measured on the standard
> 2017 benchmarks with -O3. Since the pass is present in passes.def twice,
> results of the first run are marked (1) and results of the second are marked
> (2). Honza also did measurements with profile feedback and got even bigger
> percentages.
> 
> 500.perlbench_r
> Started with (1) 30287
> Ended with (1) 26188
> Removed PHI % (1) 13.53385941162875161000
> Started with (2) 38005
> Ended with (2) 37897
> Removed PHI % (2) .28417313511380081600
> 
> 502.gcc_r
> Started with (1) 148187
> Ended with (1) 140292
> Removed PHI % (1) 5.32772780338356266100
> Started with (2) 211479
> Ended with (2) 210635
> Removed PHI % (2) .39909399987705635100
> 
> 505.mcf_r
> Started with (1) 341
> Ended with (1) 303
> Removed PHI % (1) 11.14369501466275659900
> Started with (2) 430
> Ended with (2) 426
> Removed PHI % (2) .93023255813953488400
> 
> 523.xalancbmk_r
> Started with (1) 62514
> Ended with (1) 57785
> Removed PHI % (1) 7.56470550596666346800
> Started with (2) 132561
> Ended with (2) 131726
> Removed PHI % (2) .62989868815111533600
> 
> 531.deepsjeng_r
> Started with (1) 1388
> Ended with (1) 1250
> Removed PHI % (1) 9.94236311239193083600
> Started with (2) 1887
> Ended with (2) 1879
> Removed PHI % (2) .42395336512983571900
> 
> 541.leela_r
> Started with (1) 3332
> Ended with (1) 2994
> Removed PHI % (1) 10.14405762304921968800
> Started with (2) 4372
> Ended with (2) 4352
> Removed PHI % (2) .45745654162854528900
> 
> Here is the patch:
> 
> -- >8 --
> 
> This patch introduces two things:
> - A new PHI elimination pass (major)
> - New utility functions for passes that replace one ssa name with a
>  different one (minor)
> 
> Strongly-connected copy propagation (SCCP) pass
> 
> The PHI elimination pass is a lightweight optimization pass based on
> strongly-connected components. Some set of PHIs may be redundant because
> the PHIs only refer to each other or to a single value from outside the
> set. This pass finds and eliminates these sets. As a bonus the pass also
> does some basic copy propagation because it considers a copy statement
> to be a PHI with a single argument.
> 
> SCCP uses an algorithm from this article:
> Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
> (2013). Simple and Efficient Construction of Static Single Assignment
> Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
> 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
> Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6
> 
> cleanup_after_replace and cleanup_after_all_replaces_done
> 
> Whenever you replace all uses of an ssa name by a different ssa name,
> some GCC internal structures have to be updated. To streamline this
> process, the patch adds the cleanup_after_replace function that should
> be called after an ssa name is replaced by a different one and the
> cleanup_after_all_replaces_done that should be called before a pass that
> replaced one or more ssa names exits. The SCCP pass uses these
> functions.
> 
> Signed-off-by: Filip Kastl <fkastl@suse.cz>
> 
> gcc/ChangeLog:
> 
>    * Makefile.in: Added sccp pass.
>    * passes.def: Added sccp pass to early and late optimizations.
>    * tree-pass.h (make_pass_sccp): Added sccp pass.
>    * tree-ssa-propagate.cc (cleanup_after_replace): New function.
>    (cleanup_after_all_replaces_done): New function.
>    * tree-ssa-propagate.h (cleanup_after_replace): New function.
>    (cleanup_after_all_replaces_done): New function.
>    * sccp.cc: New file.
> ---
> gcc/Makefile.in           |   1 +
> gcc/passes.def            |   2 +
> gcc/sccp.cc               | 722 ++++++++++++++++++++++++++++++++++++++
> gcc/tree-pass.h           |   1 +
> gcc/tree-ssa-propagate.cc |  62 ++++
> gcc/tree-ssa-propagate.h  |   7 +
> 6 files changed, 795 insertions(+)
> create mode 100644 gcc/sccp.cc
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 78779546459..7c34eb6ceda 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1706,6 +1706,7 @@ OBJS = \
>    tree-ssa-forwprop.o \
>    tree-ssa-ifcombine.o \
>    tree-ssa-live.o \
> +    sccp.o \
>    tree-ssa-loop-ch.o \
>    tree-ssa-loop-im.o \
>    tree-ssa-loop-ivcanon.o \
> diff --git a/gcc/passes.def b/gcc/passes.def
> index ef5a21afe49..4c94f9cec8b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -96,6 +96,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_dse);
>      NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>      NEXT_PASS (pass_phiopt, true /* early_p */);
> +      NEXT_PASS (pass_sccp);
>      NEXT_PASS (pass_tail_recursion);
>      NEXT_PASS (pass_if_to_switch);
>      NEXT_PASS (pass_convert_switch);
> @@ -271,6 +272,7 @@ along with GCC; see the file COPYING3.  If not see
>       NEXT_PASS (pass_tsan);
>       NEXT_PASS (pass_dse, true /* use DR analysis */);
>       NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_sccp);
>       /* Pass group that runs when 1) enabled, 2) there are loops
>     in the function.  Make sure to run pass_fix_loops before
>     to discover/remove loops before running the gate function
> diff --git a/gcc/sccp.cc b/gcc/sccp.cc
> new file mode 100644
> index 00000000000..86a79303902
> --- /dev/null
> +++ b/gcc/sccp.cc
> @@ -0,0 +1,722 @@
> +/* Strongly connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <filip.kastl@gmail.com>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-ssa-propagate.h"
> +
> +/* State of vertex during tarjan computation.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time. Vertex has been put on
> +          Tarjan stack.
> +   closed     DFS has backtracked through vertex. At this point, vertex doesn't
> +          have any unvisited neighbors.
> +   in_scc     Vertex has been popped from tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex used by tarjan.  */
> +
> +struct vertex
> +{
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +  gimple* stmt;
> +};
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +    {
> +      return false;
> +    }
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +    {
> +      tree op = gimple_phi_arg_def (phi, i);
> +      if (TREE_CODE (op) == SSA_NAME &&
> +          SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +        {
> +          return false;
> +        }
> +    }
> +
> +      return true;
> +    }
> +
> +  if (gimple_code (stmt) != GIMPLE_ASSIGN)
> +    {
> +      return false;
> +    }
> +
> +  /* If the statement has volatile operands, it won't generate a
> +     useful copy.  */
> +  if (gimple_has_volatile_ops (stmt))
> +    {
> +      return false;
> +    }
> +
> +  /* Statements with loads and/or stores will never generate a useful copy.  */
> +  if (gimple_store_p (stmt) || gimple_assign_load_p (stmt))
> +    {
> +      return false;
> +    }
> +
> +  if (!gimple_assign_single_p (stmt))
> +    {
> +      return false;
> +    }
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    {
> +      return false;
> +    }
> +
> +  /* If the assignment is from a constant it generates a useful copy.  */
> +  if (is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
> +    {
> +      return true;
> +    }
> +
> +  /* Otherwise, the only statements that generate useful copies are
> +     assignments whose single SSA use doesn't flow through abnormal
> +     edges.  */
> +  tree rhs = single_ssa_tree_operand (stmt, SSA_OP_USE);
> +
> +  if (!is_gimple_val (gimple_assign_rhs1 (stmt)))
> +    {
> +      return false;
> +    }
> +
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    {
> +      return false;
> +    }
> +
> +  if (!rhs)
> +    {
> +      return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Set 'using' flag of gimple statement to true.
> +   If the flag isn't set, tarjan will ignore the statement.  */
> +
> +static void
> +tarjan_set_using (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_1, true);
> +}
> +
> +/* Clear 'using' flag of gimple statement to false.  */
> +
> +static void
> +tarjan_clear_using (gimple* stmt)
> +{
> +  gimple_set_plf (stmt, GF_PLF_1, false);
> +}
> +
> +/* Return value of 'using' flag of gimple statement.  */
> +
> +static bool
> +tarjan_is_using (gimple* stmt)
> +{
> +  return gimple_plf (stmt, GF_PLF_1);
> +}
> +
> +/* Set 'vxnum' (vertex number) of statement. Before computing SCCs, tarjan
> +   assigns unique vxnums to statements that it will use.  */
> +
> +static void
> +tarjan_set_vxnum (gimple* stmt, unsigned num)
> +{
> +  gimple_set_uid (stmt, num);
> +}
> +
> +/* Return 'vxnum' (vertex number) of statement.  */
> +
> +static unsigned
> +tarjan_vxnum (gimple* stmt)
> +{
> +  return gimple_uid (stmt);
> +}
> +
> +/* Create and initialize vertex struct for each given statement.  */
> +
> +static auto_vec<vertex>
> +tarjan_stmts_to_vertices (auto_vec<gimple *> &stmts)
> +{
> +  auto_vec<vertex> result;
> +  for (gimple *stmt : stmts)
> +    {
> +      vertex v;
> +      v.state = unvisited;
> +      v.index = 0;
> +      v.lowlink = 0;
> +      v.stmt = stmt;
> +
> +      result.safe_push (v);
> +    }
> +  return result;
> +}
> +
> +static void
> +tarjan_visit_neighbor (tree neigh_tree, unsigned parent_vxnum,
> +               auto_vec<vertex> &vs, auto_vec<unsigned> &worklist)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +
> +  gimple *neigh_stmt = SSA_NAME_DEF_STMT (neigh_tree);
> +
> +  /* Skip neighbors outside the induced subgraph that Tarjan currently works
> +     with. */
> +  if (!tarjan_is_using (neigh_stmt))
> +    return;
> +  unsigned neigh_vxnum = tarjan_vxnum (neigh_stmt);
> +
> +  gcc_checking_assert (stmt_may_generate_copy (neigh_stmt));
> +
> +  vstate neigh_state = vs[neigh_vxnum].state;
> +  vstate parent_state = vs[parent_vxnum].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist. Update lowlink of parent
> +     vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +    {
> +    case unvisited:
> +      worklist.safe_push (neigh_vxnum);
> +      break;
> +    case vopen:
> +    case closed:
> +      vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
> +                           vs[neigh_vxnum].index);
> +      break;
> +    case in_scc:
> +      /* Ignore these edges.  */
> +      break;
> +    }
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +     children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +    {
> +      vs[parent_vxnum].lowlink = std::min (vs[parent_vxnum].lowlink,
> +                           vs[neigh_vxnum].lowlink);
> +    }
> +    }
> +}
> +
> +/* Nonrecursive implementation of Tarjan's algorithm for computing strongly
> +   connected components in a graph. Statements are vertices. Edges lead from a
> +   copy stmt p using another copy stmt q to the stmt being used (p -> q when q
> +   is operand of p).
> +
> +   Considers only the subgraph induced by given statements. */
> +
> +static auto_vec<vec<gimple *>>
> +tarjan_compute_sccs (auto_vec<gimple *> &copy_stmts)
> +{
> +  auto_vec<vec<gimple *>> sccs;
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +  unsigned curr_index = 0;
> +
> +  auto_vec<vertex> vs = tarjan_stmts_to_vertices (copy_stmts);
> +
> +  /* Mark the subgraph induced by 'copy_stmts' and index it by vxnums. */
> +  unsigned i;
> +  for (i = 0; i < vs.length (); i++)
> +    {
> +      gimple *stmt = vs[i].stmt;
> +      tarjan_set_using (stmt);
> +      tarjan_set_vxnum (stmt, i);
> +    }
> +
> +  /* Put all vertices on worklist.  */
> +  for (i = 0; i < vs.length (); i++)
> +    {
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop();
> +      gimple *stmt = vs[i].stmt;
> +      vstate state = vs[i].state;
> +
> +      if (state == unvisited)
> +    {
> +      vs[i].state = vopen;
> +
> +      /* Assign index to this vertex.  */
> +      vs[i].index = curr_index;
> +      vs[i].lowlink = curr_index;
> +      curr_index++;
> +
> +      /* Put vertex on stack and also on worklist to be closed later.  */
> +      stack.safe_push (i);
> +      worklist.safe_push (i);
> +    }
> +      else if (state == vopen)
> +    {
> +      vs[i].state = closed;
> +    }
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +        op = gimple_phi_arg_def (phi, j);
> +        tarjan_visit_neighbor (op, i, vs, worklist);
> +          }
> +        break;
> +      case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        tarjan_visit_neighbor (op, i, vs, worklist);
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  */
> +      if (state == vopen && vs[i].lowlink == vs[i].index)
> +    {
> +      vec<gimple *> scc = vNULL;
> +
> +      unsigned j;
> +      do
> +        {
> +          j = stack.pop();
> +          scc.safe_push (vs[j].stmt);
> +          vs[j].state = in_scc;
> +        }
> +      while (j != i);
> +
> +      sccs.safe_push (scc);
> +    }
> +    }
> +
> +  if (!stack.is_empty ())
> +  {
> +    gcc_unreachable();
> +  }
> +
> +  /* Clear copy stmts' 'using' flags.  */
> +  for (vertex v : vs)
> +    {
> +      gimple *s = v.stmt;
> +      tarjan_clear_using (s);
> +    }
> +
> +  return sccs;
> +}
> +
> +static void
> +replace_use_by (tree get_replaced, tree replace_by, bitmap need_eh_cleanup,
> +        bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup)
> +{
> +  /* Replace each occurence of 'get_replaced' by 'replace_by'.  */
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *use_stmt;
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, get_replaced)
> +    {
> +      bool was_noreturn = false;
> +      bool can_make_abnormal_goto = false;
> +      if (is_gimple_call (use_stmt))
> +    {
> +      was_noreturn = gimple_call_noreturn_p (use_stmt);
> +      can_make_abnormal_goto = stmt_can_make_abnormal_goto (use_stmt);
> +    }
> +
> +      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +    SET_USE (use_p, unshare_expr (replace_by));
> +
> +      /* Recompute tree invariant.  */
> +      if (gimple_assign_single_p (use_stmt))
> +    {
> +      tree rhs = gimple_assign_rhs1 (use_stmt);
> +
> +      if (TREE_CODE (rhs) == ADDR_EXPR)
> +        recompute_tree_invariant_for_addr_expr (rhs);
> +    }
> +
> +      /* Cleanup.  */
> +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +      fold_stmt (&gsi);
> +      gimple_set_modified (gsi_stmt (gsi), true);
> +      cleanup_after_replace (use_stmt, gsi_stmt (gsi), need_eh_cleanup,
> +                 need_ab_cleanup, stmts_to_fixup,
> +                 can_make_abnormal_goto, was_noreturn);
> +    }
> +}
> +
> +/* Modify all usages of PHIs in a given SCC to instead reference a given
> +   variable.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree replace_by, bitmap
> +              need_eh_cleanup, bitmap need_ab_cleanup, vec<gimple *>
> +              stmts_to_fixup)
> +{
> +  // DEBUG
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "Replacing SCC of length %d\n", scc.length ());
> +    }
> +
> +  for (gimple *stmt : scc)
> +    {
> +      tree get_replaced = gimple_get_lhs (stmt);
> +      replace_use_by (get_replaced, replace_by, need_eh_cleanup,
> +              need_ab_cleanup, stmts_to_fixup);
> +    }
> +}
> +
> +/* Remove all PHIs with zero uses.  */
> +
> +static void
> +remove_zero_uses_phis ()
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi);)
> +    {
> +      gphi *phi = pi.phi ();
> +      tree ssa_name = gimple_phi_result (phi);
> +      if (has_zero_uses (ssa_name))
> +        {
> +          /* Note that remove_phi_node() also frees SSA name.  */
> +          remove_phi_node (&pi, true);
> +        }
> +      else
> +        gsi_next (&pi);
> +    }
> +    }
> +}
> +
> +static void
> +sccp_visit_op (tree op, hash_set<tree> &outer_ops, hash_set<gimple *> &scc_set,
> +           bool &is_inner, tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +    op_in_scc = true;
> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);
> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Apply Braun et al.'s algorithm on a given set of statements. Treat copy
> +   statements as PHI functions with a single argument.
> +   Main function of this pass.  */
> +
> +static void
> +sccp_propagate (auto_vec<gimple *> &copy_stmts)
> +{
> +  auto_vec<vec<gimple *>> worklist;
> +  worklist = tarjan_compute_sccs (copy_stmts);
> +
> +  /* Prepare data structs for cleanup after stmt modification.  */
> +  bitmap need_eh_cleanup = BITMAP_ALLOC (NULL);
> +  bitmap need_ab_cleanup = BITMAP_ALLOC (NULL);
> +  vec<gimple *> stmts_to_fixup = vNULL;
> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */
> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +    {
> +      scc_set.add (stmt);
> +    }
> +
> +      for (gimple *stmt : scc)
> +    {
> +      bool is_inner = true;
> +
> +      gphi *phi;
> +      tree op;
> +      switch (gimple_code (stmt))
> +        {
> +          case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +            op = gimple_phi_arg_def (phi, j);
> +            sccp_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +          }
> +        break;
> +          case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        sccp_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +        break;
> +          default:
> +        gcc_unreachable ();
> +        }
> +
> +      if (is_inner)
> +        {
> +          inner.safe_push (stmt);
> +        }
> +    }
> +
> +      if (outer_ops.elements () == 1)
> +    {
> +      /* The only operand in outer_ops.  */
> +      tree outer_op = last_outer_op;
> +
> +      replace_scc_by_value (scc, outer_op, need_eh_cleanup,
> +                need_ab_cleanup, stmts_to_fixup);
> +    }
> +      else if (outer_ops.elements () > 1)
> +    {
> +      /* Add inner sccs to worklist.  */
> +      auto_vec<vec<gimple *>> inner_sccs = tarjan_compute_sccs (inner);
> +      for (vec<gimple *> inner_scc : inner_sccs)
> +        {
> +          worklist.safe_push (inner_scc);
> +        }
> +    }
> +      else
> +    {
> +      gcc_unreachable ();
> +    }
> +
> +      scc.release ();
> +    }
> +
> +  cleanup_after_all_replaces_done (need_eh_cleanup, need_ab_cleanup,
> +                   stmts_to_fixup);
> +
> +  /* Remove data structs for cleanup after stmt modification.  */
> +  BITMAP_FREE (need_eh_cleanup);
> +  BITMAP_FREE (need_ab_cleanup);
> +  stmts_to_fixup.release ();
> +
> +  /* We want to remove dead MEM PHIs because memory is in FUD SSA and the dead
> +     PHIs would break the FUD property.  */
> +  remove_zero_uses_phis ();
> +}
> +
> +/* Return all statements in cfun that may be useful.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *s = gsi_stmt (gsi);
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +    {
> +      gimple *s = pi.phi ();
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +    }
> +
> +  return result;
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccp (void)
> +{
> +  // Clear 'tarjan using' flag on all statements
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple* stmt = gsi_stmt (gsi);
> +      tarjan_clear_using (stmt);
> +    }
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +    {
> +      gimple *stmt = pi.phi ();
> +      tarjan_clear_using (stmt);
> +    }
> +    }
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccp (void)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_purge_dead_eh_edges (bb);
> +    }
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccp =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccp", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccp : public gimple_opt_pass
> +{
> +public:
> +  pass_sccp (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccp, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccp (m_ctxt); }
> +}; // class pass_sccp
> +
> +unsigned
> +pass_sccp::execute (function *)
> +{
> +  // DEBUG
> +  if (dump_file)
> +    {
> +      int phis = 0;
> +      basic_block bb;
> +      FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +        phis++;
> +    }
> +      fprintf (dump_file, "Started with %d PHIs\n", phis);
> +    }
> +
> +  init_sccp ();
> +  auto_vec<gimple *> stmts = get_all_stmt_may_generate_copy ();
> +  sccp_propagate (stmts);
> +  finalize_sccp ();
> +
> +  // DEBUG
> +  if (dump_file)
> +    {
> +      int phis = 0;
> +      basic_block bb;
> +      FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +        phis++;
> +    }
> +      fprintf (dump_file, "Ended with %d PHIs\n", phis);
> +    }
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccp (gcc::context *ctxt)
> +{
> +  return new pass_sccp (ctxt);
> +}
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 57865cdfc42..ac1d8848c25 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -398,6 +398,7 @@ extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccp (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
> index cb68b419b8c..f70240dcd2f 100644
> --- a/gcc/tree-ssa-propagate.cc
> +++ b/gcc/tree-ssa-propagate.cc
> @@ -1297,3 +1297,65 @@ clean_up_loop_closed_phi (function *fun)
> 
>   return 0;
> }
> +
> +void
> +cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap need_eh_cleanup,
> +               bitmap need_ab_cleanup, vec<gimple *> stmts_to_fixup,
> +               bool can_make_abnormal_goto, bool was_noreturn)
> +{
> +  basic_block bb = stmt->bb;
> +
> +  /* If we cleaned up EH information from the statement,
> +     remove EH edges.  */
> +  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
> +    bitmap_set_bit (need_eh_cleanup, bb->index);
> +
> +  /* If we turned a call with possible abnormal control transfer
> +     into one that doesn't, remove abnormal edges.  */
> +  if (can_make_abnormal_goto
> +      && !stmt_can_make_abnormal_goto (stmt))
> +    bitmap_set_bit (need_ab_cleanup, bb->index);
> +
> +  /* If we turned a not noreturn call into a noreturn one
> +     schedule it for fixup.  */
> +  if (!was_noreturn
> +      && is_gimple_call (stmt)
> +      && gimple_call_noreturn_p (stmt))
> +    stmts_to_fixup.safe_push (stmt);
> +
> +  if (gimple_assign_single_p (stmt))
> +    {
> +      tree rhs = gimple_assign_rhs1 (stmt);
> +
> +      if (TREE_CODE (rhs) == ADDR_EXPR)
> +    recompute_tree_invariant_for_addr_expr (rhs);
> +    }
> +
> +  update_stmt_if_modified (stmt);
> +}
> +
> +void
> +cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
> +                 need_ab_cleanup, vec<gimple *> stmts_to_fixup)
> +{
> +  if (!bitmap_empty_p (need_eh_cleanup))
> +    gimple_purge_all_dead_eh_edges (need_eh_cleanup);
> +  if (!bitmap_empty_p (need_ab_cleanup))
> +    gimple_purge_all_dead_abnormal_call_edges (need_ab_cleanup);
> +
> +  /* Fixup stmts that became noreturn calls.  This may require splitting
> +     blocks and thus isn't possible during the dominator walk.  Do this
> +     in reverse order so we don't inadvertedly remove a stmt we want to
> +     fixup by visiting a dominating now noreturn call first.  */
> +  while (!stmts_to_fixup.is_empty ())
> +    {
> +      gimple *stmt = stmts_to_fixup.pop ();
> +      if (dump_file && dump_flags & TDF_DETAILS)
> +    {
> +      fprintf (dump_file, "Fixing up noreturn call ");
> +      print_gimple_stmt (dump_file, stmt, 0);
> +      fprintf (dump_file, "\n");
> +    }
> +      fixup_noreturn_call (stmt);
> +    }
> +}
> diff --git a/gcc/tree-ssa-propagate.h b/gcc/tree-ssa-propagate.h
> index 29bde37add9..a645baeacf4 100644
> --- a/gcc/tree-ssa-propagate.h
> +++ b/gcc/tree-ssa-propagate.h
> @@ -72,6 +72,13 @@ extern void propagate_value (use_operand_p, tree);
> extern void replace_exp (use_operand_p, tree);
> extern void propagate_tree_value (tree *, tree);
> extern void propagate_tree_value_into_stmt (gimple_stmt_iterator *, tree);
> +extern void cleanup_after_replace (gimple *old_stmt, gimple *stmt, bitmap
> +                   need_eh_cleanup, bitmap need_ab_cleanup,
> +                   vec<gimple *> stmts_to_fixup, bool
> +                   can_make_abnormal_goto, bool was_noreturn);
> +extern void cleanup_after_all_replaces_done (bitmap need_eh_cleanup, bitmap
> +                         need_ab_cleanup, vec<gimple *>
> +                         stms_to_fixup);
> 
> /* Public interface into the SSA propagation engine.  Clients should inherit
>    from this class and provide their own visitors.  */
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-24 15:47 ` Richard Biener
@ 2023-08-24 15:54   ` Jakub Jelinek
  2023-08-31 11:26     ` Filip Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2023-08-24 15:54 UTC (permalink / raw)
  To: Richard Biener; +Cc: fkastl, Gcc-patches, mjambor, hubicka

On Thu, Aug 24, 2023 at 05:47:09PM +0200, Richard Biener via Gcc-patches wrote:
> > Do you think that the pass is worthy of inclusion into upstream GCC? What are
> > some things that I should change? Should I try to put the pass in different
> > places in passes.def?
> 
> The most obvious places would be right after SSA construction and before RTL expansion.
> Can you provide measurements for those positions?
> Can the pass somehow be used as part of propagations like during value numbering?

Could the new file be called gimple-ssa-sccp.cc or something similar?
Removing some PHIs is nice, but it would be also interesting to know what
are the effects on generated code size and/or performance.
And also if it has any effects on debug information coverage.

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-24 15:54   ` Jakub Jelinek
@ 2023-08-31 11:26     ` Filip Kastl
  2023-08-31 11:44       ` Jakub Jelinek
  2023-08-31 12:13       ` Richard Biener
  0 siblings, 2 replies; 10+ messages in thread
From: Filip Kastl @ 2023-08-31 11:26 UTC (permalink / raw)
  To: gcc-patches; +Cc: hubicka, rguenther, jakub

> The most obvious places would be right after SSA construction and before RTL expansion.
> Can you provide measurements for those positions?

The algorithm should only remove PHIs that break SSA form minimality. Since
GCC's SSA construction already produces minimal SSA form, the algorithm isn't
expected to remove any PHIs if run right after the construction. I even
measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
algorithm managed to remove exactly 1 PHI, which is weird). 

I tried putting the pass before pass_expand. There isn't a lot of PHIs to
remove at that point, but there still are some.

500.perlbench_r
Started with 43111
Ended with 42942
Removed PHI % .39201131961680313700

502.gcc_r
Started with 141392
Ended with 140455
Removed PHI % .66269661649881181400

505.mcf_r
Started with 482
Ended with 478
Removed PHI % .82987551867219917100

523.xalancbmk_r
Started with 136040
Ended with 135629
Removed PHI % .30211702440458688700

531.deepsjeng_r
Started with 2150
Ended with 2148
Removed PHI % .09302325581395348900

541.leela_r
Started with 4664
Ended with 4650
Removed PHI % .30017152658662092700

557.xz_r
Started with 43
Ended with 43
Removed PHI % 0

> Can the pass somehow be used as part of propagations like during value numbering?

I don't think that the pass could be used as a part of different optimizations
since it works on the whole CFG (except for copy propagation as I noted in the
RFC). I'm adding Honza into Cc. He'll have more insight into this.

> Could the new file be called gimple-ssa-sccp.cc or something similar?

Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
appropriate?

I'm thinking about naming the pass 'scc-copy' and the file
'tree-ssa-scc-copy.cc'.

> Removing some PHIs is nice, but it would be also interesting to know what
> are the effects on generated code size and/or performance.
> And also if it has any effects on debug information coverage.

Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
Here are the full results. What do you think? Should I run more benchmarks? Or
benchmark multiple times? Or run the benchmarks on different machines?*

500.perlbench_r    
Without SCCP: 244.151807s    
With SCCP: 242.448438s    
-0.7025695913124297%    
    
502.gcc_r    
Without SCCP: 211.029606s    
With SCCP: 211.614523s    
+0.27640683243653763%    
    
505.mcf_r    
Without SCCP: 298.782621s    
With SCCP: 291.671468s    
-2.438069465197046%    
    
523.xalancbmk_r    
Without SCCP: 189.940639s    
With SCCP: 189.876261s    
-0.03390523894928332%    
    
531.deepsjeng_r    
Without SCCP: 250.63648s    
With SCCP: 250.988624s    
+0.1403027732444051%    
    
541.leela_r    
Without SCCP: 346.066278s    
With SCCP: 339.692987s    
-1.8761915152519792%

Regarding size: The pass doesn't seem to significantly reduce or increase the
size of the result binary. The differences were at most ~0.1%.

Regarding debug info coverage: I didn't notice any additional guality testcases
failing after I applied the patch. *Is there any other way how I should check
debug info coverage?*


Filip K

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-31 11:26     ` Filip Kastl
@ 2023-08-31 11:44       ` Jakub Jelinek
  2023-08-31 12:13       ` Richard Biener
  1 sibling, 0 replies; 10+ messages in thread
From: Jakub Jelinek @ 2023-08-31 11:44 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka, rguenther

On Thu, Aug 31, 2023 at 01:26:37PM +0200, Filip Kastl wrote:
> Regarding debug info coverage: I didn't notice any additional guality testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*

I'm usually using https://github.com/pmachata/dwlocstat
for that, usually on cc1plus from the last stage of gcc bootstrap.
Though of course, if one tree is unpatched and one patched, that results in
different code not just because the optimization did something, but because
it is different source.  So, for such purposes, I usually after one of the
2 bootstraps apply resp. revert the patch, make a copy of the cc1plus binary
and do make -jN cc1plus in the last stage directory (only there, not attempt
to rebootstrap).

	Jakub


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-31 11:26     ` Filip Kastl
  2023-08-31 11:44       ` Jakub Jelinek
@ 2023-08-31 12:13       ` Richard Biener
  2023-08-31 21:44         ` Andrew Pinski
  2023-09-01 10:10         ` Filip Kastl
  1 sibling, 2 replies; 10+ messages in thread
From: Richard Biener @ 2023-08-31 12:13 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, hubicka, jakub

On Thu, 31 Aug 2023, Filip Kastl wrote:

> > The most obvious places would be right after SSA construction and before RTL expansion.
> > Can you provide measurements for those positions?
> 
> The algorithm should only remove PHIs that break SSA form minimality. Since
> GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> expected to remove any PHIs if run right after the construction. I even
> measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> algorithm managed to remove exactly 1 PHI, which is weird). 
> 
> I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> remove at that point, but there still are some.

That's interesting.  Your placement at

          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
          NEXT_PASS (pass_phiopt, true /* early_p */);
+         NEXT_PASS (pass_sccp);

and

       NEXT_PASS (pass_tsan);
       NEXT_PASS (pass_dse, true /* use DR analysis */);
       NEXT_PASS (pass_dce);
+      NEXT_PASS (pass_sccp);

isn't immediately after the "best" existing pass we have to
remove dead PHIs which is pass_cd_dce.  phiopt might leave
dead PHIs around and the second instance runs long after the
last CD-DCE.

So I wonder if your pass just detects unnecessary PHIs we'd have
removed by other means and what survives until RTL expansion is
what we should count?

Can you adjust your original early placement to right after
the cd-dce pass and for the late placement turn the dce pass
before it into cd-dce and re-do your measurements?

> 500.perlbench_r
> Started with 43111
> Ended with 42942
> Removed PHI % .39201131961680313700
> 
> 502.gcc_r
> Started with 141392
> Ended with 140455
> Removed PHI % .66269661649881181400
> 
> 505.mcf_r
> Started with 482
> Ended with 478
> Removed PHI % .82987551867219917100
> 
> 523.xalancbmk_r
> Started with 136040
> Ended with 135629
> Removed PHI % .30211702440458688700
> 
> 531.deepsjeng_r
> Started with 2150
> Ended with 2148
> Removed PHI % .09302325581395348900
> 
> 541.leela_r
> Started with 4664
> Ended with 4650
> Removed PHI % .30017152658662092700
> 
> 557.xz_r
> Started with 43
> Ended with 43
> Removed PHI % 0
> 
> > Can the pass somehow be used as part of propagations like during value numbering?
> 
> I don't think that the pass could be used as a part of different optimizations
> since it works on the whole CFG (except for copy propagation as I noted in the
> RFC). I'm adding Honza into Cc. He'll have more insight into this.
> 
> > Could the new file be called gimple-ssa-sccp.cc or something similar?
> 
> Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> appropriate?
> 
> I'm thinking about naming the pass 'scc-copy' and the file
> 'tree-ssa-scc-copy.cc'.
> 
> > Removing some PHIs is nice, but it would be also interesting to know what
> > are the effects on generated code size and/or performance.
> > And also if it has any effects on debug information coverage.
> 
> Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> Here are the full results. What do you think? Should I run more benchmarks? Or
> benchmark multiple times? Or run the benchmarks on different machines?*
> 
> 500.perlbench_r    
> Without SCCP: 244.151807s    
> With SCCP: 242.448438s    
> -0.7025695913124297%    
>     
> 502.gcc_r    
> Without SCCP: 211.029606s    
> With SCCP: 211.614523s    
> +0.27640683243653763%    
>     
> 505.mcf_r    
> Without SCCP: 298.782621s    
> With SCCP: 291.671468s    
> -2.438069465197046%    
>     
> 523.xalancbmk_r    
> Without SCCP: 189.940639s    
> With SCCP: 189.876261s    
> -0.03390523894928332%    
>     
> 531.deepsjeng_r    
> Without SCCP: 250.63648s    
> With SCCP: 250.988624s    
> +0.1403027732444051%    
>     
> 541.leela_r    
> Without SCCP: 346.066278s    
> With SCCP: 339.692987s    
> -1.8761915152519792%
> 
> Regarding size: The pass doesn't seem to significantly reduce or increase the
> size of the result binary. The differences were at most ~0.1%.
> 
> Regarding debug info coverage: I didn't notice any additional guality testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*
> 
> 
> Filip K
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-31 12:13       ` Richard Biener
@ 2023-08-31 21:44         ` Andrew Pinski
  2023-09-01  6:34           ` Richard Biener
  2023-09-01 10:10         ` Filip Kastl
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Pinski @ 2023-08-31 21:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: Filip Kastl, gcc-patches, hubicka, jakub

On Thu, Aug 31, 2023 at 5:15 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, 31 Aug 2023, Filip Kastl wrote:
>
> > > The most obvious places would be right after SSA construction and before RTL expansion.
> > > Can you provide measurements for those positions?
> >
> > The algorithm should only remove PHIs that break SSA form minimality. Since
> > GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> > expected to remove any PHIs if run right after the construction. I even
> > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> > algorithm managed to remove exactly 1 PHI, which is weird).
> >
> > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > remove at that point, but there still are some.
>
> That's interesting.  Your placement at
>
>           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>           NEXT_PASS (pass_phiopt, true /* early_p */);
> +         NEXT_PASS (pass_sccp);
>
> and
>
>        NEXT_PASS (pass_tsan);
>        NEXT_PASS (pass_dse, true /* use DR analysis */);
>        NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_sccp);
>
> isn't immediately after the "best" existing pass we have to
> remove dead PHIs which is pass_cd_dce.  phiopt might leave
> dead PHIs around and the second instance runs long after the
> last CD-DCE.

Actually the last phiopt is run before last pass_cd_dce:
      NEXT_PASS (pass_dce, true /* update_address_taken_p */);
      /* After late DCE we rewrite no longer addressed locals into SSA
         form if possible.  */
      NEXT_PASS (pass_forwprop);
      NEXT_PASS (pass_sink_code, true /* unsplit edges */);
      NEXT_PASS (pass_phiopt, false /* early_p */);
      NEXT_PASS (pass_fold_builtins);
      NEXT_PASS (pass_optimize_widening_mul);
      NEXT_PASS (pass_store_merging);
      /* If DCE is not run before checking for uninitialized uses,
         we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
         However, this also causes us to misdiagnose cases that should be
         real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
      NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

Thanks,
Andrew Pinski


>
> So I wonder if your pass just detects unnecessary PHIs we'd have
> removed by other means and what survives until RTL expansion is
> what we should count?



>
> Can you adjust your original early placement to right after
> the cd-dce pass and for the late placement turn the dce pass
> before it into cd-dce and re-do your measurements?
>
> > 500.perlbench_r
> > Started with 43111
> > Ended with 42942
> > Removed PHI % .39201131961680313700
> >
> > 502.gcc_r
> > Started with 141392
> > Ended with 140455
> > Removed PHI % .66269661649881181400
> >
> > 505.mcf_r
> > Started with 482
> > Ended with 478
> > Removed PHI % .82987551867219917100
> >
> > 523.xalancbmk_r
> > Started with 136040
> > Ended with 135629
> > Removed PHI % .30211702440458688700
> >
> > 531.deepsjeng_r
> > Started with 2150
> > Ended with 2148
> > Removed PHI % .09302325581395348900
> >
> > 541.leela_r
> > Started with 4664
> > Ended with 4650
> > Removed PHI % .30017152658662092700
> >
> > 557.xz_r
> > Started with 43
> > Ended with 43
> > Removed PHI % 0
> >
> > > Can the pass somehow be used as part of propagations like during value numbering?
> >
> > I don't think that the pass could be used as a part of different optimizations
> > since it works on the whole CFG (except for copy propagation as I noted in the
> > RFC). I'm adding Honza into Cc. He'll have more insight into this.
> >
> > > Could the new file be called gimple-ssa-sccp.cc or something similar?
> >
> > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> > appropriate?
> >
> > I'm thinking about naming the pass 'scc-copy' and the file
> > 'tree-ssa-scc-copy.cc'.
> >
> > > Removing some PHIs is nice, but it would be also interesting to know what
> > > are the effects on generated code size and/or performance.
> > > And also if it has any effects on debug information coverage.
> >
> > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> > Here are the full results. What do you think? Should I run more benchmarks? Or
> > benchmark multiple times? Or run the benchmarks on different machines?*
> >
> > 500.perlbench_r
> > Without SCCP: 244.151807s
> > With SCCP: 242.448438s
> > -0.7025695913124297%
> >
> > 502.gcc_r
> > Without SCCP: 211.029606s
> > With SCCP: 211.614523s
> > +0.27640683243653763%
> >
> > 505.mcf_r
> > Without SCCP: 298.782621s
> > With SCCP: 291.671468s
> > -2.438069465197046%
> >
> > 523.xalancbmk_r
> > Without SCCP: 189.940639s
> > With SCCP: 189.876261s
> > -0.03390523894928332%
> >
> > 531.deepsjeng_r
> > Without SCCP: 250.63648s
> > With SCCP: 250.988624s
> > +0.1403027732444051%
> >
> > 541.leela_r
> > Without SCCP: 346.066278s
> > With SCCP: 339.692987s
> > -1.8761915152519792%
> >
> > Regarding size: The pass doesn't seem to significantly reduce or increase the
> > size of the result binary. The differences were at most ~0.1%.
> >
> > Regarding debug info coverage: I didn't notice any additional guality testcases
> > failing after I applied the patch. *Is there any other way how I should check
> > debug info coverage?*
> >
> >
> > Filip K
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-31 21:44         ` Andrew Pinski
@ 2023-09-01  6:34           ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2023-09-01  6:34 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Filip Kastl, gcc-patches, hubicka, jakub

On Thu, 31 Aug 2023, Andrew Pinski wrote:

> On Thu, Aug 31, 2023 at 5:15?AM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Thu, 31 Aug 2023, Filip Kastl wrote:
> >
> > > > The most obvious places would be right after SSA construction and before RTL expansion.
> > > > Can you provide measurements for those positions?
> > >
> > > The algorithm should only remove PHIs that break SSA form minimality. Since
> > > GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> > > expected to remove any PHIs if run right after the construction. I even
> > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> > > algorithm managed to remove exactly 1 PHI, which is weird).
> > >
> > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > > remove at that point, but there still are some.
> >
> > That's interesting.  Your placement at
> >
> >           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >           NEXT_PASS (pass_phiopt, true /* early_p */);
> > +         NEXT_PASS (pass_sccp);
> >
> > and
> >
> >        NEXT_PASS (pass_tsan);
> >        NEXT_PASS (pass_dse, true /* use DR analysis */);
> >        NEXT_PASS (pass_dce);
> > +      NEXT_PASS (pass_sccp);
> >
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> 
> Actually the last phiopt is run before last pass_cd_dce:

I meant the second instance of pass_sccp, not phiopt.

Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-08-31 12:13       ` Richard Biener
  2023-08-31 21:44         ` Andrew Pinski
@ 2023-09-01 10:10         ` Filip Kastl
  2023-09-01 10:53           ` Richard Biener
  1 sibling, 1 reply; 10+ messages in thread
From: Filip Kastl @ 2023-09-01 10:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener, Jan Hubicka

> That's interesting.  Your placement at
> 
>           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>           NEXT_PASS (pass_phiopt, true /* early_p */);
> +         NEXT_PASS (pass_sccp);
> 
> and
> 
>        NEXT_PASS (pass_tsan);
>        NEXT_PASS (pass_dse, true /* use DR analysis */);
>        NEXT_PASS (pass_dce);
> +      NEXT_PASS (pass_sccp);
> 
> isn't immediately after the "best" existing pass we have to
> remove dead PHIs which is pass_cd_dce.  phiopt might leave
> dead PHIs around and the second instance runs long after the
> last CD-DCE.
> 
> So I wonder if your pass just detects unnecessary PHIs we'd have
> removed by other means and what survives until RTL expansion is
> what we should count?
> 
> Can you adjust your original early placement to right after
> the cd-dce pass and for the late placement turn the dce pass
> before it into cd-dce and re-do your measurements?

So I did this

          NEXT_PASS (pass_dse);    
      NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
      NEXT_PASS (pass_sccp);    
      NEXT_PASS (pass_phiopt, true /* early_p */);    
      NEXT_PASS (pass_tail_recursion);         

and this

      NEXT_PASS (pass_dse, true /* use DR analysis */);
      NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
      NEXT_PASS (pass_sccp);    
      /* Pass group that runs when 1) enabled, 2) there are loops

and got these results:

500.perlbench_r
Started with (1) 30318
Ended with (1) 26219
Removed PHI % (1) 13.52002110957187149600
Started with (2) 39043
Ended with (2) 38941
Removed PHI % (2) .26125041620777092000

502.gcc_r
Started with (1) 148361
Ended with (1) 140464
Removed PHI % (1) 5.32282742769326170700
Started with (2) 216209
Ended with (2) 215367
Removed PHI % (2) .38943799749316633500

505.mcf_r
Started with (1) 342
Ended with (1) 304
Removed PHI % (1) 11.11111111111111111200
Started with (2) 437    
Ended with (2) 433    
Removed PHI % (2) .91533180778032036700    
     
523.xalancbmk_r    
Started with (1) 62995    
Ended with (1) 58289     
Removed PHI % (1) 7.47043416144138423700    
Started with (2) 134026    
Ended with (2) 133193    
Removed PHI % (2) .62152119737961291100    
                      
531.deepsjeng_r    
Started with (1) 1402    
Ended with (1) 1264    
Removed PHI % (1) 9.84308131241084165500    
Started with (2) 1928    
Ended with (2) 1920    
Removed PHI % (2) .41493775933609958600    
    
541.leela_r    
Started with (1) 3398    
Ended with (1) 3060    
Removed PHI % (1) 9.94702766333137139500    
Started with (2) 4473    
Ended with (2) 4453    
Removed PHI % (2) .44712720769058797300    

557.xz_r
Started with (1) 47
Ended with (1) 44
Removed PHI % (1) 6.38297872340425532000
Started with (2) 43
Ended with (2) 43
Removed PHI % (2) 0

These measurements don't differ very much from the previous. It seems to me
that phiopt does output some redundant PHIs but the vast majority of the
eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
rid of them.

A noteworthy information might be that most of the eliminated PHIs are actually
trivial PHIs. I consider a PHI to be trivial if it only references itself or
one other SSA name.

Here is a comparison of the newest measurements (sccp after cd_dce) with the
previous ones (sccp after phiopt and dce):

500.perlbench_r
 
Started with (1-PREV) 30287
Started with (1-NEW) 30318
 
Ended with (1-PREV) 26188
Ended with (1-NEW) 26219
 
Removed PHI % (1-PREV) 13.53385941162875161000
Removed PHI % (1-NEW) 13.52002110957187149600
 
Started with (2-PREV) 38005
Started with (2-NEW) 39043
 
Ended with (2-PREV) 37897
Ended with (2-NEW) 38941
 
Removed PHI % (2-PREV) .28417313511380081600
Removed PHI % (2-NEW) .26125041620777092000
 
502.gcc_r
 
Started with (1-PREV) 148187
Started with (1-NEW) 148361
 
Ended with (1-PREV) 140292
Ended with (1-NEW) 140464
 
Removed PHI % (1-PREV) 5.32772780338356266100
Removed PHI % (1-NEW) 5.32282742769326170700
                      
Started with (2-PREV) 211479
Started with (2-NEW) 216209
 
Ended with (2-PREV) 210635
Ended with (2-NEW) 215367
 
Removed PHI % (2-PREV) .39909399987705635100
Removed PHI % (2-NEW) .38943799749316633500


Filip K


P.S. I made a small mistake and didn't compute the benchmark speedup
percentages right in the previous email. Here are the corrected results. The
correct percentages are a little bit smaller but very similar. There is still a
~2% speedup with 505.mcf_r and 541.leela_r.

500.perlbench_r
Without SCCP: 244.151807s
With SCCP: 242.448438s
-0.6976679881791663%

502.gcc_r
Without SCCP: 211.029606s
With SCCP: 211.614523s
+0.27717295742853737%

505.mcf_r
Without SCCP: 298.782621s
With SCCP: 291.671468s
-2.380042378703145%

523.xalancbmk_r
Without SCCP: 189.940639s
With SCCP: 189.876261s
-0.03389374719330334%

531.deepsjeng_r
Without SCCP: 250.63648s
With SCCP: 250.988624s
+0.14049989849840747%

541.leela_r
Without SCCP: 346.066278s
With SCCP: 339.692987s
-1.8416388435281157%


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass
  2023-09-01 10:10         ` Filip Kastl
@ 2023-09-01 10:53           ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2023-09-01 10:53 UTC (permalink / raw)
  To: Filip Kastl; +Cc: gcc-patches, Jan Hubicka

On Fri, 1 Sep 2023, Filip Kastl wrote:

> > That's interesting.  Your placement at
> > 
> >           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >           NEXT_PASS (pass_phiopt, true /* early_p */);
> > +         NEXT_PASS (pass_sccp);
> > 
> > and
> > 
> >        NEXT_PASS (pass_tsan);
> >        NEXT_PASS (pass_dse, true /* use DR analysis */);
> >        NEXT_PASS (pass_dce);
> > +      NEXT_PASS (pass_sccp);
> > 
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> > 
> > So I wonder if your pass just detects unnecessary PHIs we'd have
> > removed by other means and what survives until RTL expansion is
> > what we should count?
> > 
> > Can you adjust your original early placement to right after
> > the cd-dce pass and for the late placement turn the dce pass
> > before it into cd-dce and re-do your measurements?
> 
> So I did this
> 
>           NEXT_PASS (pass_dse);    
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
>       NEXT_PASS (pass_sccp);    
>       NEXT_PASS (pass_phiopt, true /* early_p */);    
>       NEXT_PASS (pass_tail_recursion);         
> 
> and this
> 
>       NEXT_PASS (pass_dse, true /* use DR analysis */);
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
>       NEXT_PASS (pass_sccp);    
>       /* Pass group that runs when 1) enabled, 2) there are loops
> 
> and got these results:
> 
> 500.perlbench_r
> Started with (1) 30318
> Ended with (1) 26219
> Removed PHI % (1) 13.52002110957187149600
> Started with (2) 39043
> Ended with (2) 38941
> Removed PHI % (2) .26125041620777092000
> 
> 502.gcc_r
> Started with (1) 148361
> Ended with (1) 140464
> Removed PHI % (1) 5.32282742769326170700
> Started with (2) 216209
> Ended with (2) 215367
> Removed PHI % (2) .38943799749316633500
> 
> 505.mcf_r
> Started with (1) 342
> Ended with (1) 304
> Removed PHI % (1) 11.11111111111111111200
> Started with (2) 437    
> Ended with (2) 433    
> Removed PHI % (2) .91533180778032036700    
>      
> 523.xalancbmk_r    
> Started with (1) 62995    
> Ended with (1) 58289     
> Removed PHI % (1) 7.47043416144138423700    
> Started with (2) 134026    
> Ended with (2) 133193    
> Removed PHI % (2) .62152119737961291100    
>                       
> 531.deepsjeng_r    
> Started with (1) 1402    
> Ended with (1) 1264    
> Removed PHI % (1) 9.84308131241084165500    
> Started with (2) 1928    
> Ended with (2) 1920    
> Removed PHI % (2) .41493775933609958600    
>     
> 541.leela_r    
> Started with (1) 3398    
> Ended with (1) 3060    
> Removed PHI % (1) 9.94702766333137139500    
> Started with (2) 4473    
> Ended with (2) 4453    
> Removed PHI % (2) .44712720769058797300    
> 
> 557.xz_r
> Started with (1) 47
> Ended with (1) 44
> Removed PHI % (1) 6.38297872340425532000
> Started with (2) 43
> Ended with (2) 43
> Removed PHI % (2) 0
> 
> These measurements don't differ very much from the previous. It seems to me
> that phiopt does output some redundant PHIs but the vast majority of the
> eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
> rid of them.
> 
> A noteworthy information might be that most of the eliminated PHIs are actually
> trivial PHIs. I consider a PHI to be trivial if it only references itself or
> one other SSA name.

Ah.  The early pass numbers are certainly intresting - can you elaborate
on the last bit?  We have for example loop-closed PHI nodes like

_1 = PHI <_2>

and there are non-trivial degenerate PHIs like

_1 = PHI <_2, _2>

those are generally removed by value-numbering (FRE, DOM and PRE) and SSA 
propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't
remove them.

But we do have passes removing these kind of PHIs.

The issue with the early pass is likely that we have

          NEXT_PASS (pass_fre, true /* may_iterate */);
^^
would elimimate these kind of PHIs

          NEXT_PASS (pass_early_vrp);
^^
rewrites into loop-closed SSA, adding many such PHIs

          NEXT_PASS (pass_merge_phi);
          NEXT_PASS (pass_dse);
          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

and until here there's no pass eliding the LC SSA PHIs.

You could add a pass_copy_prop after early_vrp, the later sccp
pass shouldn't run into this issue I think so it must be other
passes adding such kind of PHIs.

Maybe you can count single-argument PHIs, degenerate multi-arg PHIs
and "other" PHIs separately as you remove them?


> Here is a comparison of the newest measurements (sccp after cd_dce) with the
> previous ones (sccp after phiopt and dce):
> 
> 500.perlbench_r
>  
> Started with (1-PREV) 30287
> Started with (1-NEW) 30318
>  
> Ended with (1-PREV) 26188
> Ended with (1-NEW) 26219
>  
> Removed PHI % (1-PREV) 13.53385941162875161000
> Removed PHI % (1-NEW) 13.52002110957187149600
>  
> Started with (2-PREV) 38005
> Started with (2-NEW) 39043
>  
> Ended with (2-PREV) 37897
> Ended with (2-NEW) 38941
>  
> Removed PHI % (2-PREV) .28417313511380081600
> Removed PHI % (2-NEW) .26125041620777092000
>  
> 502.gcc_r
>  
> Started with (1-PREV) 148187
> Started with (1-NEW) 148361
>  
> Ended with (1-PREV) 140292
> Ended with (1-NEW) 140464
>  
> Removed PHI % (1-PREV) 5.32772780338356266100
> Removed PHI % (1-NEW) 5.32282742769326170700
>                       
> Started with (2-PREV) 211479
> Started with (2-NEW) 216209
>  
> Ended with (2-PREV) 210635
> Ended with (2-NEW) 215367
>  
> Removed PHI % (2-PREV) .39909399987705635100
> Removed PHI % (2-NEW) .38943799749316633500
> 
> 
> Filip K
> 
> 
> P.S. I made a small mistake and didn't compute the benchmark speedup
> percentages right in the previous email. Here are the corrected results. The
> correct percentages are a little bit smaller but very similar. There is still a
> ~2% speedup with 505.mcf_r and 541.leela_r.
> 
> 500.perlbench_r
> Without SCCP: 244.151807s
> With SCCP: 242.448438s
> -0.6976679881791663%
> 
> 502.gcc_r
> Without SCCP: 211.029606s
> With SCCP: 211.614523s
> +0.27717295742853737%
> 
> 505.mcf_r
> Without SCCP: 298.782621s
> With SCCP: 291.671468s
> -2.380042378703145%
> 
> 523.xalancbmk_r
> Without SCCP: 189.940639s
> With SCCP: 189.876261s
> -0.03389374719330334%
> 
> 531.deepsjeng_r
> Without SCCP: 250.63648s
> With SCCP: 250.988624s
> +0.14049989849840747%
> 
> 541.leela_r
> Without SCCP: 346.066278s
> With SCCP: 339.692987s
> -1.8416388435281157%
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-01 10:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24 15:07 [RFC] gimple ssa: SCCP - A new PHI optimization pass Filip Kastl
2023-08-24 15:47 ` Richard Biener
2023-08-24 15:54   ` Jakub Jelinek
2023-08-31 11:26     ` Filip Kastl
2023-08-31 11:44       ` Jakub Jelinek
2023-08-31 12:13       ` Richard Biener
2023-08-31 21:44         ` Andrew Pinski
2023-09-01  6:34           ` Richard Biener
2023-09-01 10:10         ` Filip Kastl
2023-09-01 10:53           ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).