public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR94274] fold phi whose incoming args are defined from binary
@ 2020-06-11 12:01 Zhanghaijian (A)
  2020-06-15 12:38 ` Richard Biener
  2020-11-23 22:37 ` Jeff Law
  0 siblings, 2 replies; 3+ messages in thread
From: Zhanghaijian (A) @ 2020-06-11 12:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: rguenth

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

Hi,

This is a experimental fix for pr94274.
For if/else structure, when the expressions is binary operation and have a common subexpression, and the opcode is the same, we can
fold the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can be optimized to do csel first and then do binary operations.
This can eliminate one or more instructions. And this patch is effective for 500.perlbench_r in spec2017.
Bootstrap and tested on aarch64/x86_64 Linux platform. No new regression witnessed.

Any suggestion?  

Thanks,
Haijian Zhang

[-- Attachment #2: pr94274-v1.diff --]
[-- Type: application/octet-stream, Size: 21444 bytes --]

From 1f0f09ef3170569ef15791cf3e70de781a9a4cb0 Mon Sep 17 00:00:00 2001
From: Haijian Zhang <z.zhanghaijian@huawei.com>
Date: Thu, 11 Jun 2020 14:56:44 +0800
Subject: [PATCH] fold phi whose incoming args are defined from binary
 operations [PR94274]

For if/else structure, when the expressions is binary operation and
have a common subexpression, and the opcode is the same. We can fold
the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can
be optimized to do csel first and then do binary operations. This can
eliminate one or more instructions.

2020-06-11  Haijian Zhang  <z.zhanghaijian@huawei.com>

gcc/

	PR tree-optimization/94274
	* common.opt (ftree-combine): New option.
	* doc/invoke.texi (-ftree-combine): Document new option.
	* opts.c (default_options_table): Enable -ftree-combine at -O1.
	* passes.def: Add pass_adjust_alignment.
	* tree-pass.h (make_pass_tree_combine): New.
	* tree-ssa-phiopt.c (pass_data_tree_combine): New.
gcc/testsuite/
	PR tree-optimization/94274
	* gcc.dg/tree-ssa/pr94274-1.c: New test.
	* gcc.dg/tree-ssa/pr94274-2.c: Likewise.
	* gcc.dg/tree-ssa/pr94274-3.c: Likewise.
---
 gcc/ChangeLog                             |  10 +
 gcc/common.opt                            |   4 +
 gcc/doc/invoke.texi                       |  15 +-
 gcc/opts.c                                |   1 +
 gcc/passes.def                            |   1 +
 gcc/testsuite/ChangeLog                   |   7 +
 gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c |  15 +
 gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c |  26 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c |  16 +
 gcc/tree-pass.h                           |   1 +
 gcc/tree-ssa-phiopt.c                     | 352 +++++++++++++++++++++-
 11 files changed, 440 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b0860738d04..c1a5bf87626 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2020-06-11  Haijian Zhang  <z.zhanghaijian@huawei.com>
+
+	PR tree-optimization/94274
+	* common.opt (ftree-combine): New option.
+	* doc/invoke.texi (-ftree-combine): Document new option.
+	* opts.c (default_options_table): Enable -ftree-combine at -O1.
+	* passes.def: Add pass_adjust_alignment.
+	* tree-pass.h (make_pass_tree_combine): New.
+	* tree-ssa-phiopt.c (pass_data_tree_combine): New.
+
 2020-06-10  Martin Sebor  <msebor@redhat.com>
 
 	PR middle-end/95353
diff --git a/gcc/common.opt b/gcc/common.opt
index df8af365d1b..f4441aa9ddd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2721,6 +2721,10 @@ ftree-cselim
 Common Report Var(flag_tree_cselim) Init(2) Optimization
 Transform condition stores into unconditional ones.
 
+ftree-combine
+Common Report Var(flag_tree_combine) Optimization
+Combine binary operations using SSA PHI nodes.
+
 ftree-switch-conversion
 Common Report Var(flag_tree_switch_conversion) Optimization
 Perform conversions of switch initializations.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06a04e3d7dd..921b505db6d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -529,9 +529,9 @@ Objective-C and Objective-C++ Dialects}.
 -fstdarg-opt  -fstore-merging  -fstrict-aliasing @gol
 -fthread-jumps  -ftracer  -ftree-bit-ccp @gol
 -ftree-builtin-call-dce  -ftree-ccp  -ftree-ch @gol
--ftree-coalesce-vars  -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts @gol
--ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting @gol
--ftree-loop-if-convert  -ftree-loop-im @gol
+-ftree-coalesce-vars  -ftree-combine  -ftree-copy-prop  -ftree-dce @gol
+-ftree-dominator-opts  -ftree-dse  -ftree-forwprop  -ftree-fre @gol
+-fcode-hoisting  -ftree-loop-if-convert  -ftree-loop-im @gol
 -ftree-phiprop  -ftree-loop-distribution  -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon  -ftree-loop-linear  -ftree-loop-optimize @gol
 -ftree-loop-vectorize @gol
@@ -9465,6 +9465,7 @@ compilation time.
 -ftree-ccp @gol
 -ftree-ch @gol
 -ftree-coalesce-vars @gol
+-ftree-combine @gol
 -ftree-copy-prop @gol
 -ftree-dce @gol
 -ftree-dominator-opts @gol
@@ -9598,7 +9599,7 @@ optimization flags except for those that may interfere with debugging:
 -fdse  -fif-conversion  -fif-conversion2  @gol
 -finline-functions-called-once @gol
 -fmove-loop-invariants  -fssa-phiopt @gol
--ftree-bit-ccp  -ftree-dse  -ftree-pta  -ftree-sra}
+-ftree-bit-ccp  -ftree-combine  -ftree-dse  -ftree-pta  -ftree-sra}
 
 @end table
 
@@ -10703,6 +10704,12 @@ Perform pattern matching on SSA PHI nodes to optimize conditional
 code.  This pass is enabled by default at @option{-O1} and higher,
 except for @option{-Og}.
 
+@item -ftree-combine
+@opindex ftree-combine
+Perform pattern matching on SSA PHI nodes to combine binary
+operations.  This pass is enabled by default at @option{-O1}
+and higher, except for @option{-Og}.
+
 @item -ftree-switch-conversion
 @opindex ftree-switch-conversion
 Perform conversion of simple initializations in a switch to
diff --git a/gcc/opts.c b/gcc/opts.c
index 340d99434b3..128d187df44 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -470,6 +470,7 @@ static const struct default_options default_options_table[] =
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_dse, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_sra, NULL, 1 },
+    { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_combine, NULL, 1 },
 
     /* -O2 and -Os optimizations.  */
     { OPT_LEVELS_2_PLUS, OPT_fcaller_saves, NULL, 1 },
diff --git a/gcc/passes.def b/gcc/passes.def
index 56322025226..37fa919408f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -332,6 +332,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_cd_dce);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt, false /* early_p */);
+      NEXT_PASS (pass_tree_combine);
       NEXT_PASS (pass_fold_builtins);
       NEXT_PASS (pass_optimize_widening_mul);
       NEXT_PASS (pass_store_merging);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 83751a117d9..e5d0c2fe053 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2020-06-11  Haijian Zhang  <z.zhanghaijian@huawei.com>
+
+	PR tree-optimization/94274
+	* gcc.dg/tree-ssa/pr94274-1.c: New test.
+	* gcc.dg/tree-ssa/pr94274-2.c: Likewise.
+	* gcc.dg/tree-ssa/pr94274-3.c: Likewise.
+
 2020-06-10  Alexandre Oliva  <oliva@adacore.com>
 
 	PR rtl-optimization/51447
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
new file mode 100644
index 00000000000..f883c6a3201
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-combine" } */
+
+int foo (int cond, int a, int b, int c)
+{
+  int result = 0;
+
+  if (cond == 1)
+    result = b + a;
+  else
+    result = a + c;
+  return result;
+}
+
+/* { dg-final { scan-tree-dump-times " \\+ " 1 "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c
new file mode 100644
index 00000000000..682957e2190
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-combine" } */
+
+int foo (int cond, int a, int b, int c, int d, int e)
+{
+  int result = 0;
+
+  switch (cond) 
+    {
+      case 1:
+	result = a + b;
+	break;
+      case 8:
+	result = a + c;
+	break;
+      case 6:
+	result = a + d;
+	break;
+      default:
+	result = a + e;
+	break;
+    }
+  return result;
+}
+
+/* { dg-final { scan-tree-dump-times " \\+ " 1 "combine"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c
new file mode 100644
index 00000000000..3f500baf1d6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-combine" } */
+
+int foo (int cond, int a, int b, int c, int d)
+{
+  int result = 0;
+
+  if (cond == 1)
+    result = a / b - d;
+  else
+    result = a / c - d;
+  return result;
+}
+
+/* { dg-final { scan-tree-dump-times " / " 1 "combine" } } */
+/* { dg-final { scan-tree-dump-times " \\- " 1 "combine" } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 396428f167f..03cce426c74 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -434,6 +434,7 @@ extern gimple_opt_pass *make_pass_warn_function_return (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_warn_function_noreturn (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cselim (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phiopt (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_tree_combine (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_forwprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phiprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_ifcombine (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 5f283890998..9b09330c026 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "gimple-fold.h"
 
-static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
+static unsigned int tree_ssa_phiopt_worker (bool, bool, bool, bool);
 static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
 				   tree, tree);
 static bool conditional_replacement (basic_block, basic_block,
@@ -122,7 +122,7 @@ tree_ssa_cs_elim (void)
      An interfacing issue of find_data_references_in_bb.  */
   loop_optimizer_init (LOOPS_NORMAL);
   scev_initialize ();
-  todo = tree_ssa_phiopt_worker (true, false, false);
+  todo = tree_ssa_phiopt_worker (true, false, false, false);
   scev_finalize ();
   loop_optimizer_finalize ();
   return todo;
@@ -155,14 +155,262 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge e0, edge e1)
   return phi;
 }
 
+/* Compare the operands of stmts.  Return 1 if only rhs1 is different.
+   Return 2 if only rhs2 is different.  Otherwise return false.  */
+
+static unsigned
+compare_operands (gimple *first_stmt, tree rhs1, tree rhs2)
+{
+  /* Check if the type of operands are the same.  */
+  if (TREE_TYPE (gimple_assign_rhs1 (first_stmt)) != TREE_TYPE (rhs1)
+      || TREE_TYPE (gimple_assign_rhs2 (first_stmt)) != TREE_TYPE (rhs2))
+    return false;
+
+  /* Keep track of which operand needs a phi node.  */
+  if (gimple_assign_rhs1 (first_stmt) != rhs1
+      && gimple_assign_rhs2 (first_stmt) != rhs2)
+    return false;
+  else if (gimple_assign_rhs1 (first_stmt) != rhs1)
+    return 1;
+  else if (gimple_assign_rhs2 (first_stmt) != rhs2)
+    return 2;
+
+  return false;
+}
+
+/* Check all stmts for phis, return true if the stmts are binary operation,
+   and have the same opcode.  Otherwise return false.  */
+
+static bool
+check_stmt_for_phi (gimple *phi)
+{
+  gimple *first_stmt;
+  enum tree_code opcode;
+  tree arg;
+
+  /* First clear flag for stmts which used to track of which operand
+     need phi node.  */
+  gimple_set_plf (phi, GF_PLF_1, false);
+  gimple_set_plf (phi, GF_PLF_2, false);
+
+  if (gimple_code (phi) != GIMPLE_PHI)
+    return false;
+
+  arg = PHI_ARG_DEF (phi, 0);
+  if (TREE_CODE (arg) != SSA_NAME)
+    return false;
+
+  first_stmt = SSA_NAME_DEF_STMT (arg);
+
+  /* Check the stmt is GIMPLE_ASSIGN, and the stmt is binary operation.  */
+  if (!first_stmt || !is_gimple_assign (first_stmt)
+      || gimple_assign_rhs_class (first_stmt) != GIMPLE_BINARY_RHS
+      || !has_single_use (gimple_assign_lhs (first_stmt)))
+    return false;
+
+  opcode = gimple_assign_rhs_code (first_stmt);
+
+  /* Scan all args to find which can do the replacement.  */
+  for (unsigned i = 1; i != gimple_phi_num_args (phi); i++)
+    {
+      gimple *stmt;
+      arg = PHI_ARG_DEF (phi, i);
+
+      if (TREE_CODE (arg) != SSA_NAME)
+	return false;
+
+      stmt = SSA_NAME_DEF_STMT (arg);
+
+      /* Scan to see if all operands are the same opcode, and all have
+	 one use.  */
+      if (!stmt || !is_gimple_assign (stmt)
+	  || gimple_assign_rhs_class (stmt) != GIMPLE_BINARY_RHS
+	  || gimple_assign_rhs_code (stmt) != opcode
+	  || !has_single_use (gimple_assign_lhs (stmt)))
+	return false;
+    }
+  return true;
+}
+
+/* Check all operands of stmts, keep track of which operand needs a phi.
+   Return true if the phis can be replaced, otherwise return false.  */
+
+static bool
+check_operands_for_stmt (gimple *phi)
+{
+  tree arg = PHI_ARG_DEF (phi, 0);
+  gimple *first_stmt = SSA_NAME_DEF_STMT (arg);
+
+  enum tree_code opcode = gimple_assign_rhs_code (first_stmt);
+  tree rhs1 = gimple_assign_rhs1 (first_stmt);
+  tree rhs2 = gimple_assign_rhs2 (first_stmt);
+
+  /* Scan all args to find which can do the replacement.  */
+  for (unsigned i = 1; i != gimple_phi_num_args (phi); i++)
+    {
+      gimple *stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
+      unsigned rhschanged = 0;
+
+      rhschanged = compare_operands (first_stmt,
+				     gimple_assign_rhs1 (stmt),
+				     gimple_assign_rhs2 (stmt));
+      /* If the operands are different and the opcode are commutative,
+	 then compare again after swapping.  */
+      if (rhschanged == 0 && commutative_tree_code (opcode))
+	{
+	  swap_ssa_operands (stmt,
+			     gimple_assign_rhs1_ptr (stmt),
+			     gimple_assign_rhs2_ptr (stmt));
+	  rhschanged = compare_operands (first_stmt,
+					 gimple_assign_rhs1 (stmt),
+					 gimple_assign_rhs2 (stmt));
+	  if (rhschanged == 0)
+	    return false;
+	}
+      /* Keep track of which operand needs a phi node.  */
+      if (rhschanged == 1
+	  /* If INTEGER_CST needs a phi node, need to mov it to register,
+	     which leads to higher register pressure.  */
+	  && TREE_CODE (gimple_assign_rhs1 (stmt)) != INTEGER_CST)
+	rhs1 = NULL;
+      else if (rhschanged == 2
+	       && TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)
+	rhs2 = NULL;
+      else
+	return false;
+    }
+
+  /* If both rhs1 and rhs2 would need a PHI, don't do this
+     transformation.  because it would increase the number of phis
+     entering the block, which leads to higher register pressure.  */
+  if (!rhs1 && !rhs2)
+    return false;
+  else if (!rhs1)
+    gimple_set_plf (phi, GF_PLF_1, true);
+  else
+    gimple_set_plf (phi, GF_PLF_2, true);
+
+  return true;
+}
+
+/* Scan and check all phis.  Return true if the phis can all be replaced,
+   otherwise return false.  */
+
+static bool
+check_phis (gimple_seq phis)
+{
+  gimple_stmt_iterator gsi;
+
+  if (!phis)
+    return false;
+
+  for (gsi = gsi_last (phis); !gsi_end_p (gsi); gsi_prev (&gsi))
+    {
+      gimple *phi = as_a <gphi *> (gsi_stmt (gsi));
+      if (!check_stmt_for_phi (phi)
+	  || !check_operands_for_stmt (phi))
+	return false;
+    }
+  return true;
+}
+
+/* The function binary_rhs_replacement does the main work of doing the
+   replacement of binary operation.  */
+
+static void
+binary_rhs_replacement (gimple *phi)
+{
+  edge e;
+  gphi *newphi = NULL;
+  gimple *first_stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, 0));
+  tree rhs1 = gimple_assign_rhs1 (first_stmt);
+  tree rhs2 = gimple_assign_rhs2 (first_stmt);
+
+  /*  Create a new PHI for replacement.  */
+  if (gimple_plf (phi, GF_PLF_1))
+    {
+      tree temp = make_ssa_name (TREE_TYPE (rhs1), NULL);
+      newphi = create_phi_node (temp, gimple_bb (phi));
+      e = gimple_phi_arg_edge (newphi, 0);
+      add_phi_arg (newphi, gimple_assign_rhs1 (first_stmt), e,
+		   UNKNOWN_LOCATION);
+    }
+  else if (gimple_plf (phi, GF_PLF_2))
+    {
+      tree temp = make_ssa_name (TREE_TYPE (rhs2), NULL);
+      newphi = create_phi_node (temp, gimple_bb (phi));
+      e = gimple_phi_arg_edge (newphi, 0);
+      add_phi_arg (newphi, gimple_assign_rhs2 (first_stmt), e,
+		   UNKNOWN_LOCATION);
+    }
+
+  /* Add all operands to the new PHI.  */
+  if (newphi)
+    {
+      gimple *stmt = NULL;
+      gimple_stmt_iterator gsi_from;
+      enum tree_code opcode;
+
+      for (unsigned i = 1; i != gimple_phi_num_args (phi); i++)
+	{
+	  stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
+	  e = gimple_phi_arg_edge (newphi, i);
+
+	  if (gimple_plf (phi, GF_PLF_1))
+	    add_phi_arg (newphi, gimple_assign_rhs1 (stmt), e,
+			 UNKNOWN_LOCATION);
+	  if (gimple_plf (phi, GF_PLF_2))
+	    add_phi_arg (newphi, gimple_assign_rhs2 (stmt), e,
+			 UNKNOWN_LOCATION);
+
+	  /* Remove the original stmt.  */
+	  gsi_from = gsi_for_stmt (stmt);
+	  gsi_remove (&gsi_from, true);
+	}
+
+      /* Remove the first stmt.  */
+      gsi_from = gsi_for_stmt (first_stmt);
+      gsi_remove (&gsi_from, true);
+
+      opcode = gimple_assign_rhs_code (first_stmt);
+
+      /* Create a new stmt to do the binary operator.  */
+      if (gimple_plf (phi, GF_PLF_1))
+	stmt = gimple_build_assign (PHI_RESULT (phi), opcode,
+				    PHI_RESULT (newphi), rhs2);
+      else if (gimple_plf (phi, GF_PLF_2))
+	stmt = gimple_build_assign (PHI_RESULT (phi), opcode,
+				    rhs1, PHI_RESULT (newphi));
+
+      /* Insert the PHI node.  */
+      gsi_from = gsi_after_labels (gimple_bb (phi));
+      gsi_insert_before (&gsi_from, stmt, GSI_NEW_STMT);
+
+      /* Remove the original PHI stmt.  */
+      gsi_from = gsi_for_stmt (phi);
+      gsi_remove (&gsi_from, true);
+
+      /* Do the replacement if the newphi can be folded.  */
+      if (check_stmt_for_phi (newphi)
+	  && check_operands_for_stmt (newphi))
+	binary_rhs_replacement (newphi);
+
+      gimple_set_plf (phi, GF_PLF_1, false);
+      gimple_set_plf (phi, GF_PLF_2, false);
+    }
+}
+
 /* The core routine of conditional store replacement and normal
    phi optimizations.  Both share much of the infrastructure in how
    to match applicable basic block patterns.  DO_STORE_ELIM is true
    when we want to do conditional store replacement, false otherwise.
    DO_HOIST_LOADS is true when we want to hoist adjacent loads out
-   of diamond control flow patterns, false otherwise.  */
+   of diamond control flow patterns, false otherwise.  DO_BINARY_COMBINE
+   is true when we want to do binary operation replacement, false
+   otherwise.  */
 static unsigned int
-tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p)
+tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads,
+			bool do_binary_combine, bool early_p)
 {
   basic_block bb;
   basic_block *bb_order;
@@ -258,6 +506,27 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads, bool early_p)
 	    hoist_adjacent_loads (bb, bb1, bb2, bb3);
 	  continue;
 	}
+      else if (do_binary_combine
+	       && EDGE_SUCC (bb1, 0)->dest == EDGE_SUCC (bb2, 0)->dest)
+	{
+	  basic_block bb3 = EDGE_SUCC (bb1, 0)->dest;
+	  gimple_seq phis = phi_nodes (bb3);
+
+	  /* Check all phis in bb3 to make sure they can all be folded.  */
+	  if (check_phis (phis))
+	    {
+	      gimple_stmt_iterator gsi;
+
+	      for (gsi = gsi_start (phis); !gsi_end_p (gsi); gsi_next (&gsi))
+		{
+		  gimple *vphi = as_a <gphi *> (gsi_stmt (gsi));
+		  if (gimple_plf (vphi, GF_PLF_1)
+		      || gimple_plf (vphi, GF_PLF_2))
+		    binary_rhs_replacement (vphi);
+		}
+	    }
+	  continue;
+	}
       else
 	continue;
 
@@ -3037,6 +3306,7 @@ public:
     {
       return tree_ssa_phiopt_worker (false,
 				     !early_p ? gate_hoist_loads () : false,
+				     false,
 				     early_p);
     }
 
@@ -3052,6 +3322,80 @@ make_pass_phiopt (gcc::context *ctxt)
   return new pass_phiopt (ctxt);
 }
 
+/* This pass tries to replaces an if-then-else block with some
+   assignments which have a same operand.
+
+   Binary RHS Replacement
+   ---------------
+
+   This transformation, implemented in binary_rhs_replacement, replaces
+
+     bb0:
+       if (cond) goto bb1; else goto bb2;
+     bb1:
+       x1 = a + b;
+     bb2:
+       x2 = a + c;
+     bb3:
+       x = PHI <x1 (bb1), x2 (bb2), ...>;
+
+   with
+
+     bb0:
+       if (cond) goto bb1; else goto bb2;
+     bb1:
+     bb2:
+     bb3:
+       x3 = PHI <b (bb1), c (bb2), ...>;
+       x = a + x3;  */
+
+namespace {
+
+const pass_data pass_data_tree_combine =
+{
+  GIMPLE_PASS, /* type */
+  "combine", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_TREE_PHIOPT, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_tree_combine : public gimple_opt_pass
+{
+public:
+  pass_tree_combine (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_tree_combine, ctxt), early_p (false)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass * clone () { return new pass_tree_combine (m_ctxt); }
+  void set_pass_param (unsigned n, bool param)
+    {
+      gcc_assert (n == 0);
+      early_p = param;
+    }
+  virtual bool gate (function *) { return flag_tree_combine; }
+  virtual unsigned int execute (function *)
+    {
+      return tree_ssa_phiopt_worker (false, false, true, early_p);
+    }
+
+private:
+  bool early_p;
+}; // class pass_tree_combine
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_tree_combine (gcc::context *ctxt)
+{
+  return new pass_tree_combine (ctxt);
+}
+
 namespace {
 
 const pass_data pass_data_cselim =
-- 
2.19.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH PR94274] fold phi whose incoming args are defined from binary
  2020-06-11 12:01 [PATCH PR94274] fold phi whose incoming args are defined from binary Zhanghaijian (A)
@ 2020-06-15 12:38 ` Richard Biener
  2020-11-23 22:37 ` Jeff Law
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Biener @ 2020-06-15 12:38 UTC (permalink / raw)
  To: Zhanghaijian (A); +Cc: gcc-patches, Martin Liska

On Thu, 11 Jun 2020, Zhanghaijian (A) wrote:

> Hi,
> 
> This is a experimental fix for pr94274. For if/else structure, when the 
> expressions is binary operation and have a common subexpression, and the 
> opcode is the same, we can fold the merging phi node in 
> tree_ssa_phiopt_worker (ssa-phiopt). It can be optimized to do csel 
> first and then do binary operations. This can eliminate one or more 
> instructions. And this patch is effective for 500.perlbench_r in 
> spec2017. Bootstrap and tested on aarch64/x86_64 Linux platform. No new 
> regression witnessed.
> 
> Any suggestion?  

First of all I would not implement this as separate pass, we do
have precedence for this kind of transform in phiopt itself
(factor_out_conditional_conversion).  There's also abs_replacement
which eventually sinks a negation operation.

Note that the equivalency code matches roughly what we do in
the tail-merging optimization (tree-ssa-tail-merge.c).  I guess
if you'd cleverly re-order stmts and split blocks you could see
tail-merging doing the desired transform.

Your approach hard-codes binary operations but we have other classes
of operations that would benefit from the transform (including
calls for example).

Your approach is greedy, for a larger expression you rely on
intermediate transforms to be profitable, that is you won't
sink

 _1 = a_2(D) + b_3(D);
 _2 = a_2(D) - b_3(D);
 _3 = _1 * _2;

 .. = PHI <_3, ...>

because sinking _1 * _2 requires two PHIs but after sinking
a + b and a - b both are gone.  Note that the number of PHIs
does not directly translate to register pressure.  Note
other opportunities you leave on the plate are when the above
also has a second PHI like

 .. = PHI <_2, ...>

because then _2 does not have a single use but if we sink
the second PHI vanishes as well.

So I do not think the greedy approach is sound given you
talk about examples with more than one sunk statement.

There is already two copies of code comparing statements,
one in the above mentioned tail-merging pass and one in
ICF - ipa-icf-gimple.[ch] may even have a usable interface
to compare sequences of stmts for equality and if not
Martin may be of help here.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH PR94274] fold phi whose incoming args are defined from binary
  2020-06-11 12:01 [PATCH PR94274] fold phi whose incoming args are defined from binary Zhanghaijian (A)
  2020-06-15 12:38 ` Richard Biener
@ 2020-11-23 22:37 ` Jeff Law
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff Law @ 2020-11-23 22:37 UTC (permalink / raw)
  To: Zhanghaijian (A), gcc-patches; +Cc: rguenth



On 6/11/20 6:01 AM, Zhanghaijian (A) wrote:
> Hi,
>
> This is a experimental fix for pr94274.
> For if/else structure, when the expressions is binary operation and have a common subexpression, and the opcode is the same, we can
> fold the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can be optimized to do csel first and then do binary operations.
> This can eliminate one or more instructions. And this patch is effective for 500.perlbench_r in spec2017.
> Bootstrap and tested on aarch64/x86_64 Linux platform. No new regression witnessed.
>
> Any suggestion?  
>
> Thanks,
> Haijian Zhang
>
> pr94274-v1.diff
>
> From 1f0f09ef3170569ef15791cf3e70de781a9a4cb0 Mon Sep 17 00:00:00 2001
> From: Haijian Zhang <z.zhanghaijian@huawei.com>
> Date: Thu, 11 Jun 2020 14:56:44 +0800
> Subject: [PATCH] fold phi whose incoming args are defined from binary
>  operations [PR94274]
>
> For if/else structure, when the expressions is binary operation and
> have a common subexpression, and the opcode is the same. We can fold
> the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can
> be optimized to do csel first and then do binary operations. This can
> eliminate one or more instructions.
>
> 2020-06-11  Haijian Zhang  <z.zhanghaijian@huawei.com>
>
> gcc/
>
> 	PR tree-optimization/94274
> 	* common.opt (ftree-combine): New option.
> 	* doc/invoke.texi (-ftree-combine): Document new option.
> 	* opts.c (default_options_table): Enable -ftree-combine at -O1.
> 	* passes.def: Add pass_adjust_alignment.
> 	* tree-pass.h (make_pass_tree_combine): New.
> 	* tree-ssa-phiopt.c (pass_data_tree_combine): New.
> gcc/testsuite/
> 	PR tree-optimization/94274
> 	* gcc.dg/tree-ssa/pr94274-1.c: New test.
> 	* gcc.dg/tree-ssa/pr94274-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr94274-3.c: Likewise.
I like a lot of what I see in here.   Do you have a copyright assignment
and employer disclaimer on file with the FSF?  If not, then we can't
take the submission at this time.


> ---
>  gcc/ChangeLog                             |  10 +
>  gcc/common.opt                            |   4 +
>  gcc/doc/invoke.texi                       |  15 +-
>  gcc/opts.c                                |   1 +
>  gcc/passes.def                            |   1 +
>  gcc/testsuite/ChangeLog                   |   7 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c |  15 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c |  26 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c |  16 +
>  gcc/tree-pass.h                           |   1 +
>  gcc/tree-ssa-phiopt.c                     | 352 +++++++++++++++++++++-
>  11 files changed, 440 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index b0860738d04..c1a5bf87626 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,13 @@
> +2020-06-11  Haijian Zhang  <z.zhanghaijian@huawei.com>
> +
> +	PR tree-optimization/94274
> +	* common.opt (ftree-combine): New option.
> +	* doc/invoke.texi (-ftree-combine): Document new option.
> +	* opts.c (default_options_table): Enable -ftree-combine at -O1.
> +	* passes.def: Add pass_adjust_alignment.
> +	* tree-pass.h (make_pass_tree_combine): New.
> +	* tree-ssa-phiopt.c (pass_data_tree_combine): New.
You should reference tree-optimization/64700 as well.   While your patch
doesn't address all the issues in 64700, it's an excellent start.

> +
>  2020-06-10  Martin Sebor  <msebor@redhat.com>
>  
>  	PR middle-end/95353
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af365d1b..f4441aa9ddd 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2721,6 +2721,10 @@ ftree-cselim
>  Common Report Var(flag_tree_cselim) Init(2) Optimization
>  Transform condition stores into unconditional ones.
>  
> +ftree-combine
> +Common Report Var(flag_tree_combine) Optimization
> +Combine binary operations using SSA PHI nodes.
I probably would bother with a switch for this.  There's already a flag
to control phi-opt as a whole and that seems fine for this change as well.

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
> new file mode 100644
> index 00000000000..f883c6a3201
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-combine" } */
> +
> +int foo (int cond, int a, int b, int c)
> +{
> +  int result = 0;
> +
> +  if (cond == 1)
> +    result = b + a;
> +  else
> +    result = a + c;
> +  return result;
> +}
You should look at comment #3 in pr64700 which I think has a couple more
test cases.diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c


> index 5f283890998..9b09330c026 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-eh.h"
>  #include "gimple-fold.h"
>  
> -static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
> +static unsigned int tree_ssa_phiopt_worker (bool, bool, bool, bool);
>  static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
>  				   tree, tree);
So if we don't make this a separate pass, but instead have it run as a
standard part of the phi-opt optimizations, then I don't think you need
an argument to control the behavior.

> @@ -155,14 +155,262 @@ single_non_singleton_phi_for_edges (gimple_seq seq, edge e0, edge e1)
>    return phi;
>  }
>  
> +/* Compare the operands of stmts.  Return 1 if only rhs1 is different.
> +   Return 2 if only rhs2 is different.  Otherwise return false.  */
> +
> +static unsigned
> +compare_operands (gimple *first_stmt, tree rhs1, tree rhs2)
So the comment here could be better.  "If only rhs1 is different." --
different from what?, similarly for the comment about rhs2.  I'd use "0"
instead of false -- we generally use true/false for booleans, mixing
them could confuse folks.  You could make an enum for the 3 return
states and use that instead.
> +/* Check all stmts for phis, return true if the stmts are binary operation,
> +   and have the same opcode.  Otherwise return false.  */
> +
> +static bool
> +check_stmt_for_phi (gimple *phi)
> +{
> +  gimple *first_stmt;
> +  enum tree_code opcode;
> +  tree arg;
> +
> +  /* First clear flag for stmts which used to track of which operand
> +     need phi node.  */
> +  gimple_set_plf (phi, GF_PLF_1, false);
> +  gimple_set_plf (phi, GF_PLF_2, false);
> +
> +  if (gimple_code (phi) != GIMPLE_PHI)
> +    return false;
So this function is called within a loop over the PHIs.  THere should be
no case when gimple_code (phi) != GIMPLE_PHI.  I'd just drop that test.

If you want checking, then change the argument to a gphi* and you'll get
static checking at the call site.


+ /* Check the stmt is GIMPLE_ASSIGN, and the stmt is binary operation. */
> +  if (!first_stmt || !is_gimple_assign (first_stmt)
> +      || gimple_assign_rhs_class (first_stmt) != GIMPLE_BINARY_RHS
> +      || !has_single_use (gimple_assign_lhs (first_stmt)))
> +    return false;
So I don't think it should be a requirement for this patch to move
forward, but you could easily handle unary operations here too.  I'm
thinking about stuff like casts and the like.  But it's also a step
towards handling stuff like INDIRECT_REF.

You may want to verify that the defining statements do not have any
virtual operands.   It's just the safe thing to do.


> +
> +/* Check all operands of stmts, keep track of which operand needs a phi.
> +   Return true if the phis can be replaced, otherwise return false.  */
> +
> +static bool
> +check_operands_for_stmt (gimple *phi)
> +{
> +  tree arg = PHI_ARG_DEF (phi, 0);
> +  gimple *first_stmt = SSA_NAME_DEF_STMT (arg);
> +
> +  enum tree_code opcode = gimple_assign_rhs_code (first_stmt);
> +  tree rhs1 = gimple_assign_rhs1 (first_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (first_stmt);
> +
> +  /* Scan all args to find which can do the replacement.  */
> +  for (unsigned i = 1; i != gimple_phi_num_args (phi); i++)
> +    {
> +      gimple *stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
> +      unsigned rhschanged = 0;
> +
> +      rhschanged = compare_operands (first_stmt,
> +				     gimple_assign_rhs1 (stmt),
> +				     gimple_assign_rhs2 (stmt));
So you earlier verified that FIRST_STMT is a GIMPLE_BINARY_RHS.  But you
don't do it for the defining statements of subsequent operands in the
PHI node.   You need to verify the basic form STMT to ensure it's the
same as FIRST_STMT before you call gimple_assign_rhs1 (stmt) and
gimple_assign_rhs2 (stmt).

+
> +/* The function binary_rhs_replacement does the main work of doing the
> +   replacement of binary operation.  */
> +
> +static void
> +binary_rhs_replacement (gimple *phi)
> +{
> +  edge e;
> +  gphi *newphi = NULL;
> +  gimple *first_stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, 0));
> +  tree rhs1 = gimple_assign_rhs1 (first_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (first_stmt);
> +
> +  /*  Create a new PHI for replacement.  */
> +  if (gimple_plf (phi, GF_PLF_1))
> +    {
> +      tree temp = make_ssa_name (TREE_TYPE (rhs1), NULL);
> +      newphi = create_phi_node (temp, gimple_bb (phi));
> +      e = gimple_phi_arg_edge (newphi, 0);
> +      add_phi_arg (newphi, gimple_assign_rhs1 (first_stmt), e,
> +		   UNKNOWN_LOCATION);
> +    }
> +  else if (gimple_plf (phi, GF_PLF_2))
> +    {
> +      tree temp = make_ssa_name (TREE_TYPE (rhs2), NULL);
> +      newphi = create_phi_node (temp, gimple_bb (phi));
> +      e = gimple_phi_arg_edge (newphi, 0);
> +      add_phi_arg (newphi, gimple_assign_rhs2 (first_stmt), e,
> +		   UNKNOWN_LOCATION);
> +    }
I would think we could do better than UNKNOWN_LOCATION here.  We should
have locations in the old phi as well as the statements in bb1/bb2.  Any
would be better than UNKNOWN_LOCATION here.  Similarly for the rest of
the PHI arguments.

At a high level, does your patch handle the case where one of the PHI
arguments is defined by another PHI in the same block?  This can happen
in loops and the semantics are that all the reads of PHI arguments
happen at once and assignments to all the PHI results happen after that,
and effectively at the same time.  If this scenario is detected, I'd
just avoid trying to do the optimization.

And a nit.  Please check for consistent use of tabs and spaces. 
Formatting guidelines for GCC are to use tabs instead of 8 spaces.  I
think the new code you're adding is inconsistent.  As an example look at
the 3 lines that start with "/* Remove the first statement.  */ in
binary_rhs_replacement.


So I think this needs another iteration, but given it was submitted in
June, I think it should still be considered for gcc-11, particularly if
the copyright assignment and technical details can be worked out
relatively quickly.

Thanks!

jeff


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-23 22:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 12:01 [PATCH PR94274] fold phi whose incoming args are defined from binary Zhanghaijian (A)
2020-06-15 12:38 ` Richard Biener
2020-11-23 22:37 ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).