public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
@ 2017-11-30 23:38 Jeff Law
  2017-12-04 21:25 ` Michael Matz
  2018-05-01 22:44 ` [RFA][PATCH] " Jeff Law
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Law @ 2017-11-30 23:38 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3259 bytes --]



This addresses some code quality regressions with some work scheduled
for gcc-9.  However, if anyone is aware of BZs that could be helped with
this patch, don't hesitate to pass them along.  Depending on their
severity we could always re-evaluate the plan for this patch.

The patch optimizes sequences like this:


 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 _30 = (unsigned char) iftmp.4_16;

ie, we have a PHI where all its arguments are known (differing)
constants.  The PHI feeds an operation which can be compile-time
evaluated for all the PHI's arguments.

This arises most often where the result of the PHI is type converted via
 NOP_EXPR.  But it also happens for a boolean comparisons, arithmetic
and logicals and other type conversions.

We can optimize away the statement by creating a new PHI node holding
the result of the statement applied to each of the original PHI's
constant operands.  So for the fragment above we'd generate:

 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 # _30 = PHI <0(7), 1(8), 0(9)>

These kind of backward propagations can cascade.  Here's an example I
saw during analysis of test results:

 # m_5 = PHI <10(5), 12(6), 14(7)>
<L13>:
  _2 = (long unsigned int) m_5;
  _3 = _2 * 32;
  goto <bb 12>; [100.00%]

After this patch the two statements have been eliminated in favor of
generating PHIs with constant arguments:

And after PHI propagation we have:
  # m_5 = PHI <10(5), 12(6), 14(7)>
  # _2 = PHI <10(5), 12(6), 14(7)>
  # _3 = PHI <320(5), 384(6), 448(7)>
<L13>:
  goto <bb 12>; [100.00%]

DCE will come along and wipe out m_5 and _2 if they are unused.

I initially covered just NOP_EXPR in the proof of concept.   But it's
trivial to extend to cover other unaries, and binary/comparisons where
one operand was already a constant, so I did that.  This applies
surprisingly often during a bootstrap.  An earlier version (which didn't
handle multiple uses of the result of the PHI) triggered over 6000 times
during a bootstrap:

NOP_EXPR    5161
LT_EXPR      658
GE_EXPR      504
NE_EXPR      310
BIT_NOT_EXPR 295
BIT_AND_EXPR  94
PLUS_EXPR     77
NEGATE_EXPR   48
LSHIFT_EXPR   40
EQ_EXPR       34
GT_EXPR       16
BIT_IOR_EXPR  10
MINUS_EXPR     8

There's actually other cases we could handle were

x = PHI (a, 0);
y = a & x;

Turns into

x = PHI (a, 0);
y = PHI (a, 0);


I'm not sure how often these non-constant cases happen -- I haven't
tried to support this or gain any data on how often this kidn of case
might happen.

FWIW, there are cases were the PHI arguments are constant, but we still
can't simplify.  For example:


x = PHI (0, 1, 2)

y = 1234 / x;


When we process this we'll try to simplify 1234 / 0 to a constant which
fails.  We have to gracefully remove the partially initialized PHI node
in that case.  This is tested by existing tests in the suite.

You'll also see that certain tests in the suite such as pr61839_3,
ssa-pre-4.c are twiddled.  I've suppressed phi backwards propagation in
the original test so that it's not compromised.  THere's also a variant
of each test which verifies that the transformation is handled by phi
back propagation.

Bootstrapped and regression tested on x86, both in isolation and in a
larger patch series.

Will ping when gcc-9 opens for development.


Jeff


[-- Attachment #2: P1 --]
[-- Type: text/plain, Size: 13224 bytes --]

	* tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
	propagate_with_phi.
	(stmt_likely_produces_constant_result): New function.
	(fold_use_into_phi, propagate_with_phi_2):  Likewise.
	(pass_phiprop::execute): Corresponding changes.  Call
	propagate_with_phi_2.

	* gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
	* gcc.dg/tree-ssa/pr61839_1.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.

	* gcc.dg/tree-ssa/pr61743-1a.c: New test.
	* gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4a.c: New test.

diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 494158b..f2ec2b3 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
    with aliasing issues as we are moving memory reads.  */
 
 static bool
-propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-		    size_t n)
+propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
+		      size_t n)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -440,6 +440,207 @@ next:;
   return phi_inserted;
 }
 
+/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
+   a constant result if PHI_RESULT is itself a constant.  Else return
+   FALSE.
+
+   It is safe for this routine to always return TRUE.  If the result
+   is not a constant it will be detected and properly handled later.
+
+   This is overly conservative.  If USE_STMT were to produce an SSA_NAME,
+   then that would still work for our purposes.  */
+
+static bool
+stmt_likely_produces_constant_result (gimple *use_stmt, tree phi_result)
+{
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  switch (TREE_CODE_CLASS (code))
+    {
+    /* With few exceptions, a unary operator applied to a constant
+       will result in a constant.  So they are OK without digging
+       deeper into the operator/operands.  */
+    case tcc_unary:
+      return true;
+
+    /* For binary and comparisons we'll generally be able to generate
+       a constant if only one operand is an SSA_NAME.  */
+    case tcc_binary:
+    case tcc_comparison:
+      {
+	tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	tree rhs2 = gimple_assign_rhs2 (use_stmt);
+
+	/* We need to check for RES op CONST and CONST op RES.  Consider
+	   MINUS_EXPR or MIN/MAX where RES could be the first or the second
+	   argument.  */
+	if (rhs1 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs2)) == tcc_constant)
+	  return true;
+
+	if (rhs2 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_constant)
+	  return true;
+
+	/* We do not try to handle two SSA_NAME operands.  */
+	return false;
+      }
+
+    /* There might be some tcc_reference or tcc_exceptional types we
+       could handle with deeper investigation.  COND_EXPR comes to mind.  */
+    default:
+      return false;
+    }
+}
+
+/* PHI in block BB produces RESULT.  PHI_RESULT is used in USE_STMT.
+
+   All of PHIs arguments are simple constants.  See if propagating
+   the PHI arguments into USE_STMT produces a constant.  If that is
+   the case for all the PHI arguments, then STMT is replaced with a
+   new PHI and we return TRUE.  Else make no changes to the IL and
+   return FALSE.
+
+   When applied this transformation reduces runtime computations and
+   replaces them with either constant initializations.  This also enables
+   jump threading to occur earlier in the optimization pipeline.  */
+
+static bool
+fold_use_into_phi (gphi *phi, gimple *use_stmt,
+		   basic_block bb, tree phi_result)
+{
+  /* Now verify the use is an assigment to an SSA_NAME.  */
+  if (!is_gimple_assign (use_stmt)
+      || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
+    return false;
+
+  /* If USE_STMT does not likely produce a constant result, then
+     do not try to fold this use.  Note this is overly conservative
+     as we could handle USE_STMT simplifying to an SSA_NAME rather than
+     just constants.  */
+  if (!stmt_likely_produces_constant_result (use_stmt, phi_result))
+    return false;
+
+  /* Build a new PHI node to replace the definition of the USE_STMT's LHS.  */
+  gphi *new_phi = create_phi_node (NULL_TREE, bb);
+
+  /* Now fill in its arguments by applying the operator to each
+     argument of the original PHI.  */
+  edge e;
+  edge_iterator ei;
+  tree use_result = gimple_assign_lhs (use_stmt);
+  tree type = TREE_TYPE (use_result);
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      tree old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+      tree new_arg;
+
+      switch (TREE_CODE_CLASS (code))
+	{
+	case tcc_unary:
+	  new_arg = fold_unary_to_constant (code, type, old_arg);
+	  break;
+
+	case tcc_binary:
+	case tcc_comparison:
+	  {
+	    tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	    tree rhs2 = gimple_assign_rhs2 (use_stmt);
+	  if (rhs1 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   old_arg, gimple_assign_rhs2 (use_stmt));
+	  else if (rhs2 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   gimple_assign_rhs1 (use_stmt), old_arg);
+	  else
+	    gcc_unreachable ();
+	  break;
+	  }
+
+	default:
+	  gcc_unreachable ();
+	}
+
+      /* We did not get a constant.  This can happen (imagine division
+	 by zero).  We have to remove the PHI from the IL, as the PHI
+	 is only partially constructed.  */
+      if (!new_arg)
+	{
+	  /* Set the PHI_RESULT to a new, throw-away SSA_NAME.  This avoids
+	     problems unlinking the "uses" of a currently NULL PHI_RESULT.  */
+	  gimple_phi_set_result (new_phi, make_ssa_name (type));
+
+	  /* Actually remove the PHI from the IL.  It is safe to remove the
+	     PHI if some of its PHI arguments are still uninitialized.  */
+	  gimple_stmt_iterator gsi = gsi_for_stmt (new_phi);
+	  remove_phi_node (&gsi, true);
+	  return false;
+	}
+
+      source_location locus = gimple_phi_arg_location_from_edge (phi, e);
+      add_phi_arg (new_phi, new_arg, e, locus);
+    }
+
+  gimple_phi_set_result (new_phi, use_result);
+
+  /* At this point we have our new PHI installed where we want it.
+     Delete the old PHI and USE_STMT.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+  gsi_remove (&gsi, true);
+  return true;
+}
+
+/* Look for sequences like this:
+
+    # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+    _30 = (unsigned char) iftmp.4_16;
+
+   We can "hoist" the conversion into the PHI and get it for free, generating:
+
+     # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+     # _30 = PHI <0(7), 1(8), 0(9)>
+
+   DCE will eliminate the first PHI.  In addition to getting the conversion
+   for free, removal of the conversion improves backwards threading.  */
+
+static bool
+propagate_with_phi_2 (basic_block bb, gphi *phi)
+{
+  /* First verify that all the arguments of the PHI are constants.
+
+     This is overly conservative.  Consider:
+
+       y = PHI (x, -1, 0);
+       z = y & x;
+
+     We could transform that into:
+
+       y = PHI (x, -1, 0);
+       z = PHI (x, x, 0);
+
+     It's not clear how important those cases are and we punt them
+     for now.  */
+  use_operand_p arg_p;
+  ssa_op_iter i;
+  FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
+    {
+      tree arg = USE_FROM_PTR (arg_p);
+      if (TREE_CODE_CLASS (TREE_CODE (arg)) != tcc_constant)
+	return false;
+    }
+
+  /* Not iterate over each immediate use to see if the statement can
+     be implemented as a PHI rather than a real statement.  */
+  gimple *use_stmt;
+  imm_use_iterator ui;
+  bool retval = false;
+  tree phi_result = PHI_RESULT (phi);
+  FOR_EACH_IMM_USE_STMT (use_stmt, ui, phi_result)
+    retval |= fold_use_into_phi (phi, use_stmt, bb, phi_result);
+
+  return retval;
+}
+
 /* Main entry for phiprop pass.  */
 
 namespace {
@@ -492,7 +693,10 @@ pass_phiprop::execute (function *fun)
 				  single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
   FOR_EACH_VEC_ELT (bbs, i, bb)
     for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-      did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+      {
+          did_something |= propagate_with_phi_1 (bb, gsi.phi (), phivn, n);
+	  did_something |= propagate_with_phi_2 (bb, gsi.phi ());
+      }
 
   if (did_something)
     gsi_commit_edge_inserts ();

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index a5c83cf..7771044 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
+/* { dg-options "-O3 -fno-tree-phiprop -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */

 #define N 8
 #define M 14
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
new file mode 100644
index 0000000..dd64472
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fdump-tree-phiprop -fno-peel-loops" } */
+
+#include "pr61743-1.c"
+
+/* { dg-final { scan-tree-dump-times "PHI <(\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
+
+/* The code with PHI back propagation is simpler and we unswitch more.  */
+/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 10 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
index 9f8168a..5bf1982 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-phiprop -fdump-tree-optimized" } */
 /* { dg-require-effective-target int32plus } */
 
 __attribute__ ((noinline))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
new file mode 100644
index 0000000..99a9016
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
@@ -0,0 +1,11 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-phiprop" } */
+/* { dg-require-effective-target int32plus } */
+
+#include "pr61839_1.c"
+
+/* Scan for c = 972195717) >> [0, 1] in function foo.  */
+/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp1" } } */
+/* Scan for c = 972195717) >> [2, 3] in function bar.  */
+/* { dg-final { scan-tree-dump-times "PHI <(?:243048929|121524464)\\(\[0-9\]+\\), (?:243048929|121524464)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
index 5ceb073..ad908c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-vrp1 -fdump-tree-optimized" } */
 
 __attribute__ ((noinline))
 int foo (int a, unsigned b)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
new file mode 100644
index 0000000..3a4d800
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-phiprop -fdump-tree-optimized" } */
+#include "pr61839_3.c"
+
+/* Scan for a PHI with (12, 13) << 8 in function foo.  */
+/* { dg-final { scan-tree-dump-times "PHI <\(3072|3328\)\\(\[0-9\]+\\), \(3072|3328\)\\(\[0-9\
+]+\\)>" 1 "phiprop"} } */
+/* { dg-final { scan-tree-dump-times "3072" 0  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
index 5b4fd54..0bc5f73 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-pre-stats" } */
 int foo(void)
 {
 	int x, c, y;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
new file mode 100644
index 0000000..70e8b16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiprop" } */
+#include "ssa-pre-4.c"
+/* We should eliminate the x+1 computation from this routine, replacing
+   it with a phi of 3, 4 */
+/* { dg-final { scan-tree-dump-times "PHI <\[34\]\\(\[0-9\]+\\), \[34\]\\(\[0-9\]+\\)>" 1 "phiprop"} } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
  2017-11-30 23:38 [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better Jeff Law
@ 2017-12-04 21:25 ` Michael Matz
  2017-12-05 17:03   ` Jeff Law
  2018-05-01 22:44 ` [RFA][PATCH] " Jeff Law
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Matz @ 2017-12-04 21:25 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Hi,

On Thu, 30 Nov 2017, Jeff Law wrote:

> And after PHI propagation we have:
>   # m_5 = PHI <10(5), 12(6), 14(7)>
>   # _2 = PHI <10(5), 12(6), 14(7)>
>   # _3 = PHI <320(5), 384(6), 448(7)>
> <L13>:
>   goto <bb 12>; [100.00%]
> 
> DCE will come along and wipe out m_5 and _2 if they are unused.

When I did something along these lines a long time ago I had to be a bit 
careful in not regressing performance.  Every PHI node with constants will 
generate N instructions (with N the arity), there's no coalescing 
possible.  And if the feeding PHI nodes don't go away it increases 
register pressure by (at least) one.

In one situation at that time I basically replaced N PHI nodes and 
N*N instructions (each calculating x op y for each x,y \in PHI) by N*N PHI 
nodes, increasing register pressure unsensibly.  As the N*N instructions 
formed a few expr trees the register pressure before the transformation 
was merely at about N+3.  In the end the benchmark was 30% slower than 
faster :)  (spilling added)

(If I read the patch correctly you don't handle the situation of "x op y" 
where both arguments come from PHI nodes, so it's probably not an issue 
for you, but I thought I'd mention it)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
  2017-12-04 21:25 ` Michael Matz
@ 2017-12-05 17:03   ` Jeff Law
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Law @ 2017-12-05 17:03 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On 12/04/2017 02:25 PM, Michael Matz wrote:
> Hi,
> 
> On Thu, 30 Nov 2017, Jeff Law wrote:
> 
>> And after PHI propagation we have:
>>   # m_5 = PHI <10(5), 12(6), 14(7)>
>>   # _2 = PHI <10(5), 12(6), 14(7)>
>>   # _3 = PHI <320(5), 384(6), 448(7)>
>> <L13>:
>>   goto <bb 12>; [100.00%]
>>
>> DCE will come along and wipe out m_5 and _2 if they are unused.
> 
> When I did something along these lines a long time ago I had to be a bit 
> careful in not regressing performance.  Every PHI node with constants will 
> generate N instructions (with N the arity), there's no coalescing 
> possible.  And if the feeding PHI nodes don't go away it increases 
> register pressure by (at least) one.
Yup.  I ran into similar problems with a different patch in this general
space.  It's actually fairly easy to restrict to single use cases if we
wanted which should result in never making things worse at the expense
of sometimes not making it better when it could.

> 
> (If I read the patch correctly you don't handle the situation of "x op y" 
> where both arguments come from PHI nodes, so it's probably not an issue 
> for you, but I thought I'd mention it)
True, I don't handle x op y where both come from PHIs.  It's just X OP
CONST where X comes from a PHI with all constant arguments.

jeff

ps.  Ironically I was wandering through the regression list yesterday
and stumbled on a BZ which *may* be helped by this code.  I still need
to dig deeper though.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFA][PATCH] Optimize PHIs with constant arguments better
  2017-11-30 23:38 [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better Jeff Law
  2017-12-04 21:25 ` Michael Matz
@ 2018-05-01 22:44 ` Jeff Law
  2018-05-02 11:23   ` Richard Biener
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff Law @ 2018-05-01 22:44 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3642 bytes --]


Time to come back to this now that the trunk is open again...


-------- Forwarded Message --------
Subject: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
Date: Thu, 30 Nov 2017 16:24:33 -0700
From: Jeff Law <law@redhat.com>
To: gcc-patches <gcc-patches@gcc.gnu.org>



This addresses some code quality regressions with some work scheduled
for gcc-9.  However, if anyone is aware of BZs that could be helped with
this patch, don't hesitate to pass them along.  Depending on their
severity we could always re-evaluate the plan for this patch.

The patch optimizes sequences like this:


 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 _30 = (unsigned char) iftmp.4_16;

ie, we have a PHI where all its arguments are known (differing)
constants.  The PHI feeds an operation which can be compile-time
evaluated for all the PHI's arguments.

This arises most often where the result of the PHI is type converted via
 NOP_EXPR.  But it also happens for a boolean comparisons, arithmetic
and logicals and other type conversions.

We can optimize away the statement by creating a new PHI node holding
the result of the statement applied to each of the original PHI's
constant operands.  So for the fragment above we'd generate:

 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 # _30 = PHI <0(7), 1(8), 0(9)>

These kind of backward propagations can cascade.  Here's an example I
saw during analysis of test results:

 # m_5 = PHI <10(5), 12(6), 14(7)>
<L13>:
  _2 = (long unsigned int) m_5;
  _3 = _2 * 32;
  goto <bb 12>; [100.00%]

After this patch the two statements have been eliminated in favor of
generating PHIs with constant arguments:

And after PHI propagation we have:
  # m_5 = PHI <10(5), 12(6), 14(7)>
  # _2 = PHI <10(5), 12(6), 14(7)>
  # _3 = PHI <320(5), 384(6), 448(7)>
<L13>:
  goto <bb 12>; [100.00%]

DCE will come along and wipe out m_5 and _2 if they are unused.

I initially covered just NOP_EXPR in the proof of concept.   But it's
trivial to extend to cover other unaries, and binary/comparisons where
one operand was already a constant, so I did that.  This applies
surprisingly often during a bootstrap.  An earlier version (which didn't
handle multiple uses of the result of the PHI) triggered over 6000 times
during a bootstrap:

NOP_EXPR    5161
LT_EXPR      658
GE_EXPR      504
NE_EXPR      310
BIT_NOT_EXPR 295
BIT_AND_EXPR  94
PLUS_EXPR     77
NEGATE_EXPR   48
LSHIFT_EXPR   40
EQ_EXPR       34
GT_EXPR       16
BIT_IOR_EXPR  10
MINUS_EXPR     8

There's actually other cases we could handle were

x = PHI (a, 0);
y = a & x;

Turns into

x = PHI (a, 0);
y = PHI (a, 0);


I'm not sure how often these non-constant cases happen -- I haven't
tried to support this or gain any data on how often this kind of case
might happen.

FWIW, there are cases were the PHI arguments are constant, but we still
can't simplify.  For example:


x = PHI (0, 1, 2)

y = 1234 / x;


When we process this we'll try to simplify 1234 / 0 to a constant which
fails.  We have to gracefully remove the partially initialized PHI node
in that case.  This is tested by existing tests in the suite.

You'll also see that certain tests in the suite such as pr61839_3,
ssa-pre-4.c are twiddled.  I've suppressed phi backwards propagation in
the original test so that it's not compromised.  THere's also a variant
of each test which verifies that the transformation is handled by phi
back propagation.

Bootstrapped and regression tested on x86, both in isolation and in a
larger patch series.

I see there's a couple whitespace issues in the patch.  I'll fix those.
I'd like to get comments/review on the meat of the patch though.



Jeff



[-- Attachment #2: P1 --]
[-- Type: text/plain, Size: 13224 bytes --]

	* tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
	propagate_with_phi.
	(stmt_likely_produces_constant_result): New function.
	(fold_use_into_phi, propagate_with_phi_2):  Likewise.
	(pass_phiprop::execute): Corresponding changes.  Call
	propagate_with_phi_2.

	* gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
	* gcc.dg/tree-ssa/pr61839_1.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.

	* gcc.dg/tree-ssa/pr61743-1a.c: New test.
	* gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
	* gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
	* gcc.dg/tree-ssa/ssa-pre-4a.c: New test.

diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 494158b..f2ec2b3 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
    with aliasing issues as we are moving memory reads.  */
 
 static bool
-propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-		    size_t n)
+propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
+		      size_t n)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -440,6 +440,207 @@ next:;
   return phi_inserted;
 }
 
+/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
+   a constant result if PHI_RESULT is itself a constant.  Else return
+   FALSE.
+
+   It is safe for this routine to always return TRUE.  If the result
+   is not a constant it will be detected and properly handled later.
+
+   This is overly conservative.  If USE_STMT were to produce an SSA_NAME,
+   then that would still work for our purposes.  */
+
+static bool
+stmt_likely_produces_constant_result (gimple *use_stmt, tree phi_result)
+{
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  switch (TREE_CODE_CLASS (code))
+    {
+    /* With few exceptions, a unary operator applied to a constant
+       will result in a constant.  So they are OK without digging
+       deeper into the operator/operands.  */
+    case tcc_unary:
+      return true;
+
+    /* For binary and comparisons we'll generally be able to generate
+       a constant if only one operand is an SSA_NAME.  */
+    case tcc_binary:
+    case tcc_comparison:
+      {
+	tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	tree rhs2 = gimple_assign_rhs2 (use_stmt);
+
+	/* We need to check for RES op CONST and CONST op RES.  Consider
+	   MINUS_EXPR or MIN/MAX where RES could be the first or the second
+	   argument.  */
+	if (rhs1 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs2)) == tcc_constant)
+	  return true;
+
+	if (rhs2 == phi_result
+	    && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_constant)
+	  return true;
+
+	/* We do not try to handle two SSA_NAME operands.  */
+	return false;
+      }
+
+    /* There might be some tcc_reference or tcc_exceptional types we
+       could handle with deeper investigation.  COND_EXPR comes to mind.  */
+    default:
+      return false;
+    }
+}
+
+/* PHI in block BB produces RESULT.  PHI_RESULT is used in USE_STMT.
+
+   All of PHIs arguments are simple constants.  See if propagating
+   the PHI arguments into USE_STMT produces a constant.  If that is
+   the case for all the PHI arguments, then STMT is replaced with a
+   new PHI and we return TRUE.  Else make no changes to the IL and
+   return FALSE.
+
+   When applied this transformation reduces runtime computations and
+   replaces them with either constant initializations.  This also enables
+   jump threading to occur earlier in the optimization pipeline.  */
+
+static bool
+fold_use_into_phi (gphi *phi, gimple *use_stmt,
+		   basic_block bb, tree phi_result)
+{
+  /* Now verify the use is an assigment to an SSA_NAME.  */
+  if (!is_gimple_assign (use_stmt)
+      || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
+    return false;
+
+  /* If USE_STMT does not likely produce a constant result, then
+     do not try to fold this use.  Note this is overly conservative
+     as we could handle USE_STMT simplifying to an SSA_NAME rather than
+     just constants.  */
+  if (!stmt_likely_produces_constant_result (use_stmt, phi_result))
+    return false;
+
+  /* Build a new PHI node to replace the definition of the USE_STMT's LHS.  */
+  gphi *new_phi = create_phi_node (NULL_TREE, bb);
+
+  /* Now fill in its arguments by applying the operator to each
+     argument of the original PHI.  */
+  edge e;
+  edge_iterator ei;
+  tree use_result = gimple_assign_lhs (use_stmt);
+  tree type = TREE_TYPE (use_result);
+  enum tree_code code = gimple_assign_rhs_code (use_stmt);
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      tree old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+      tree new_arg;
+
+      switch (TREE_CODE_CLASS (code))
+	{
+	case tcc_unary:
+	  new_arg = fold_unary_to_constant (code, type, old_arg);
+	  break;
+
+	case tcc_binary:
+	case tcc_comparison:
+	  {
+	    tree rhs1 = gimple_assign_rhs1 (use_stmt);
+	    tree rhs2 = gimple_assign_rhs2 (use_stmt);
+	  if (rhs1 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   old_arg, gimple_assign_rhs2 (use_stmt));
+	  else if (rhs2 == phi_result)
+	    new_arg = fold_binary (code, type,
+				   gimple_assign_rhs1 (use_stmt), old_arg);
+	  else
+	    gcc_unreachable ();
+	  break;
+	  }
+
+	default:
+	  gcc_unreachable ();
+	}
+
+      /* We did not get a constant.  This can happen (imagine division
+	 by zero).  We have to remove the PHI from the IL, as the PHI
+	 is only partially constructed.  */
+      if (!new_arg)
+	{
+	  /* Set the PHI_RESULT to a new, throw-away SSA_NAME.  This avoids
+	     problems unlinking the "uses" of a currently NULL PHI_RESULT.  */
+	  gimple_phi_set_result (new_phi, make_ssa_name (type));
+
+	  /* Actually remove the PHI from the IL.  It is safe to remove the
+	     PHI if some of its PHI arguments are still uninitialized.  */
+	  gimple_stmt_iterator gsi = gsi_for_stmt (new_phi);
+	  remove_phi_node (&gsi, true);
+	  return false;
+	}
+
+      source_location locus = gimple_phi_arg_location_from_edge (phi, e);
+      add_phi_arg (new_phi, new_arg, e, locus);
+    }
+
+  gimple_phi_set_result (new_phi, use_result);
+
+  /* At this point we have our new PHI installed where we want it.
+     Delete the old PHI and USE_STMT.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+  gsi_remove (&gsi, true);
+  return true;
+}
+
+/* Look for sequences like this:
+
+    # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+    _30 = (unsigned char) iftmp.4_16;
+
+   We can "hoist" the conversion into the PHI and get it for free, generating:
+
+     # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
+     # _30 = PHI <0(7), 1(8), 0(9)>
+
+   DCE will eliminate the first PHI.  In addition to getting the conversion
+   for free, removal of the conversion improves backwards threading.  */
+
+static bool
+propagate_with_phi_2 (basic_block bb, gphi *phi)
+{
+  /* First verify that all the arguments of the PHI are constants.
+
+     This is overly conservative.  Consider:
+
+       y = PHI (x, -1, 0);
+       z = y & x;
+
+     We could transform that into:
+
+       y = PHI (x, -1, 0);
+       z = PHI (x, x, 0);
+
+     It's not clear how important those cases are and we punt them
+     for now.  */
+  use_operand_p arg_p;
+  ssa_op_iter i;
+  FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
+    {
+      tree arg = USE_FROM_PTR (arg_p);
+      if (TREE_CODE_CLASS (TREE_CODE (arg)) != tcc_constant)
+	return false;
+    }
+
+  /* Not iterate over each immediate use to see if the statement can
+     be implemented as a PHI rather than a real statement.  */
+  gimple *use_stmt;
+  imm_use_iterator ui;
+  bool retval = false;
+  tree phi_result = PHI_RESULT (phi);
+  FOR_EACH_IMM_USE_STMT (use_stmt, ui, phi_result)
+    retval |= fold_use_into_phi (phi, use_stmt, bb, phi_result);
+
+  return retval;
+}
+
 /* Main entry for phiprop pass.  */
 
 namespace {
@@ -492,7 +693,10 @@ pass_phiprop::execute (function *fun)
 				  single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
   FOR_EACH_VEC_ELT (bbs, i, bb)
     for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-      did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+      {
+          did_something |= propagate_with_phi_1 (bb, gsi.phi (), phivn, n);
+	  did_something |= propagate_with_phi_2 (bb, gsi.phi ());
+      }
 
   if (did_something)
     gsi_commit_edge_inserts ();

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
index a5c83cf..7771044 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
+/* { dg-options "-O3 -fno-tree-phiprop -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */

 #define N 8
 #define M 14
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
new file mode 100644
index 0000000..dd64472
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fdump-tree-phiprop -fno-peel-loops" } */
+
+#include "pr61743-1.c"
+
+/* { dg-final { scan-tree-dump-times "PHI <(\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
+
+/* The code with PHI back propagation is simpler and we unswitch more.  */
+/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 10 "cunroll" } } */
+/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
index 9f8168a..5bf1982 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-phiprop -fdump-tree-optimized" } */
 /* { dg-require-effective-target int32plus } */
 
 __attribute__ ((noinline))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
new file mode 100644
index 0000000..99a9016
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
@@ -0,0 +1,11 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-phiprop" } */
+/* { dg-require-effective-target int32plus } */
+
+#include "pr61839_1.c"
+
+/* Scan for c = 972195717) >> [0, 1] in function foo.  */
+/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp1" } } */
+/* Scan for c = 972195717) >> [2, 3] in function bar.  */
+/* { dg-final { scan-tree-dump-times "PHI <(?:243048929|121524464)\\(\[0-9\]+\\), (?:243048929|121524464)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
index 5ceb073..ad908c0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/61839.  */
 /* { dg-do run } */
-/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-vrp1 -fdump-tree-optimized" } */
 
 __attribute__ ((noinline))
 int foo (int a, unsigned b)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
new file mode 100644
index 0000000..3a4d800
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
@@ -0,0 +1,9 @@
+/* PR tree-optimization/61839.  */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-phiprop -fdump-tree-optimized" } */
+#include "pr61839_3.c"
+
+/* Scan for a PHI with (12, 13) << 8 in function foo.  */
+/* { dg-final { scan-tree-dump-times "PHI <\(3072|3328\)\\(\[0-9\]+\\), \(3072|3328\)\\(\[0-9\
+]+\\)>" 1 "phiprop"} } */
+/* { dg-final { scan-tree-dump-times "3072" 0  "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
index 5b4fd54..0bc5f73 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-pre-stats" } */
 int foo(void)
 {
 	int x, c, y;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
new file mode 100644
index 0000000..70e8b16
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiprop" } */
+#include "ssa-pre-4.c"
+/* We should eliminate the x+1 computation from this routine, replacing
+   it with a phi of 3, 4 */
+/* { dg-final { scan-tree-dump-times "PHI <\[34\]\\(\[0-9\]+\\), \[34\]\\(\[0-9\]+\\)>" 1 "phiprop"} } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFA][PATCH] Optimize PHIs with constant arguments better
  2018-05-01 22:44 ` [RFA][PATCH] " Jeff Law
@ 2018-05-02 11:23   ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2018-05-02 11:23 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, May 2, 2018 at 12:44 AM, Jeff Law <law@redhat.com> wrote:
>
> Time to come back to this now that the trunk is open again...
>
>
> -------- Forwarded Message --------
> Subject: [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better
> Date: Thu, 30 Nov 2017 16:24:33 -0700
> From: Jeff Law <law@redhat.com>
> To: gcc-patches <gcc-patches@gcc.gnu.org>
>
>
>
> This addresses some code quality regressions with some work scheduled
> for gcc-9.  However, if anyone is aware of BZs that could be helped with
> this patch, don't hesitate to pass them along.  Depending on their
> severity we could always re-evaluate the plan for this patch.
>
> The patch optimizes sequences like this:
>
>
>  # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
>  _30 = (unsigned char) iftmp.4_16;
>
> ie, we have a PHI where all its arguments are known (differing)
> constants.  The PHI feeds an operation which can be compile-time
> evaluated for all the PHI's arguments.
>
> This arises most often where the result of the PHI is type converted via
>  NOP_EXPR.  But it also happens for a boolean comparisons, arithmetic
> and logicals and other type conversions.
>
> We can optimize away the statement by creating a new PHI node holding
> the result of the statement applied to each of the original PHI's
> constant operands.  So for the fragment above we'd generate:
>
>  # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
>  # _30 = PHI <0(7), 1(8), 0(9)>
>
> These kind of backward propagations can cascade.  Here's an example I
> saw during analysis of test results:
>
>  # m_5 = PHI <10(5), 12(6), 14(7)>
> <L13>:
>   _2 = (long unsigned int) m_5;
>   _3 = _2 * 32;
>   goto <bb 12>; [100.00%]
>
> After this patch the two statements have been eliminated in favor of
> generating PHIs with constant arguments:
>
> And after PHI propagation we have:
>   # m_5 = PHI <10(5), 12(6), 14(7)>
>   # _2 = PHI <10(5), 12(6), 14(7)>
>   # _3 = PHI <320(5), 384(6), 448(7)>
> <L13>:
>   goto <bb 12>; [100.00%]
>
> DCE will come along and wipe out m_5 and _2 if they are unused.
>
> I initially covered just NOP_EXPR in the proof of concept.   But it's
> trivial to extend to cover other unaries, and binary/comparisons where
> one operand was already a constant, so I did that.  This applies
> surprisingly often during a bootstrap.  An earlier version (which didn't
> handle multiple uses of the result of the PHI) triggered over 6000 times
> during a bootstrap:
>
> NOP_EXPR    5161
> LT_EXPR      658
> GE_EXPR      504
> NE_EXPR      310
> BIT_NOT_EXPR 295
> BIT_AND_EXPR  94
> PLUS_EXPR     77
> NEGATE_EXPR   48
> LSHIFT_EXPR   40
> EQ_EXPR       34
> GT_EXPR       16
> BIT_IOR_EXPR  10
> MINUS_EXPR     8
>
> There's actually other cases we could handle were
>
> x = PHI (a, 0);
> y = a & x;
>
> Turns into
>
> x = PHI (a, 0);
> y = PHI (a, 0);
>
>
> I'm not sure how often these non-constant cases happen -- I haven't
> tried to support this or gain any data on how often this kind of case
> might happen.
>
> FWIW, there are cases were the PHI arguments are constant, but we still
> can't simplify.  For example:
>
>
> x = PHI (0, 1, 2)
>
> y = 1234 / x;
>
>
> When we process this we'll try to simplify 1234 / 0 to a constant which
> fails.  We have to gracefully remove the partially initialized PHI node
> in that case.  This is tested by existing tests in the suite.
>
> You'll also see that certain tests in the suite such as pr61839_3,
> ssa-pre-4.c are twiddled.  I've suppressed phi backwards propagation in
> the original test so that it's not compromised.  THere's also a variant
> of each test which verifies that the transformation is handled by phi
> back propagation.
>
> Bootstrapped and regression tested on x86, both in isolation and in a
> larger patch series.
>
> I see there's a couple whitespace issues in the patch.  I'll fix those.
> I'd like to get comments/review on the meat of the patch though.

As a general comment I'd note what this patch does is sth that
PRE does as well (and quite aggressively so).  So I'd say if we
kind-of duplicate this transform we should tame it down (see comments
below).

Given it duplicates functionality I guess that you placed it inside
phiprop was mostly due to pass ordering as you mention jump
threading can happen earlier?  phiopt would have been another
natural choice.

>
>
> Jeff
>
>
>
>         * tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
>         propagate_with_phi.
>         (stmt_likely_produces_constant_result): New function.
>         (fold_use_into_phi, propagate_with_phi_2):  Likewise.
>         (pass_phiprop::execute): Corresponding changes.  Call
>         propagate_with_phi_2.
>
>         * gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
>         * gcc.dg/tree-ssa/pr61839_1.c: Likewise.
>         * gcc.dg/tree-ssa/pr61839_3.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.
>
>         * gcc.dg/tree-ssa/pr61743-1a.c: New test.
>         * gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
>         * gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-pre-4a.c: New test.
>
> diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
> index 494158b..f2ec2b3 100644
> --- a/gcc/tree-ssa-phiprop.c
> +++ b/gcc/tree-ssa-phiprop.c
> @@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
>     with aliasing issues as we are moving memory reads.  */
>
>  static bool
> -propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> -                   size_t n)
> +propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
> +                     size_t n)
>  {
>    tree ptr = PHI_RESULT (phi);
>    gimple *use_stmt;
> @@ -440,6 +440,207 @@ next:;
>    return phi_inserted;
>  }
>
> +/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
> +   a constant result if PHI_RESULT is itself a constant.  Else return
> +   FALSE.
> +
> +   It is safe for this routine to always return TRUE.  If the result
> +   is not a constant it will be detected and properly handled later.
> +
> +   This is overly conservative.  If USE_STMT were to produce an SSA_NAME,
> +   then that would still work for our purposes.  */
> +
> +static bool
> +stmt_likely_produces_constant_result (gimple *use_stmt, tree phi_result)
> +{
> +  enum tree_code code = gimple_assign_rhs_code (use_stmt);
> +  switch (TREE_CODE_CLASS (code))
> +    {
> +    /* With few exceptions, a unary operator applied to a constant
> +       will result in a constant.  So they are OK without digging
> +       deeper into the operator/operands.  */
> +    case tcc_unary:
> +      return true;
> +
> +    /* For binary and comparisons we'll generally be able to generate
> +       a constant if only one operand is an SSA_NAME.  */
> +    case tcc_binary:
> +    case tcc_comparison:
> +      {
> +       tree rhs1 = gimple_assign_rhs1 (use_stmt);
> +       tree rhs2 = gimple_assign_rhs2 (use_stmt);
> +
> +       /* We need to check for RES op CONST and CONST op RES.  Consider
> +          MINUS_EXPR or MIN/MAX where RES could be the first or the second
> +          argument.  */
> +       if (rhs1 == phi_result
> +           && TREE_CODE_CLASS (TREE_CODE (rhs2)) == tcc_constant)
> +         return true;
> +
> +       if (rhs2 == phi_result
> +           && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_constant)
> +         return true;
> +
> +       /* We do not try to handle two SSA_NAME operands.  */
> +       return false;
> +      }
> +
> +    /* There might be some tcc_reference or tcc_exceptional types we
> +       could handle with deeper investigation.  COND_EXPR comes to mind.  */
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* PHI in block BB produces RESULT.  PHI_RESULT is used in USE_STMT.
> +
> +   All of PHIs arguments are simple constants.  See if propagating
> +   the PHI arguments into USE_STMT produces a constant.  If that is
> +   the case for all the PHI arguments, then STMT is replaced with a
> +   new PHI and we return TRUE.  Else make no changes to the IL and
> +   return FALSE.
> +
> +   When applied this transformation reduces runtime computations and
> +   replaces them with either constant initializations.  This also enables
> +   jump threading to occur earlier in the optimization pipeline.  */
> +
> +static bool
> +fold_use_into_phi (gphi *phi, gimple *use_stmt,
> +                  basic_block bb, tree phi_result)
> +{
> +  /* Now verify the use is an assigment to an SSA_NAME.  */
> +  if (!is_gimple_assign (use_stmt)
> +      || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
> +    return false;
> +
> +  /* If USE_STMT does not likely produce a constant result, then
> +     do not try to fold this use.  Note this is overly conservative
> +     as we could handle USE_STMT simplifying to an SSA_NAME rather than
> +     just constants.  */
> +  if (!stmt_likely_produces_constant_result (use_stmt, phi_result))
> +    return false;
> +
> +  /* Build a new PHI node to replace the definition of the USE_STMT's LHS.  */
> +  gphi *new_phi = create_phi_node (NULL_TREE, bb);
> +
> +  /* Now fill in its arguments by applying the operator to each
> +     argument of the original PHI.  */

So rather than creating the PHI immediately please do sth like

  auto_vec<tree, 8> new_args;
  new_args.reserve_exact (gimple_phi_num_args (phi));
  FOR_EACH_EDGE (...)
    push-to-new-args-vec-or bail out

  alloc PHI and add args in second sweep over edges.

> +  edge e;
> +  edge_iterator ei;
> +  tree use_result = gimple_assign_lhs (use_stmt);
> +  tree type = TREE_TYPE (use_result);
> +  enum tree_code code = gimple_assign_rhs_code (use_stmt);
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      tree old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> +      tree new_arg;
> +
> +      switch (TREE_CODE_CLASS (code))
> +       {
> +       case tcc_unary:
> +         new_arg = fold_unary_to_constant (code, type, old_arg);
> +         break;
> +
> +       case tcc_binary:
> +       case tcc_comparison:
> +         {
> +           tree rhs1 = gimple_assign_rhs1 (use_stmt);
> +           tree rhs2 = gimple_assign_rhs2 (use_stmt);
> +         if (rhs1 == phi_result)
> +           new_arg = fold_binary (code, type,
> +                                  old_arg, gimple_assign_rhs2 (use_stmt));
> +         else if (rhs2 == phi_result)
> +           new_arg = fold_binary (code, type,
> +                                  gimple_assign_rhs1 (use_stmt), old_arg);
> +         else
> +           gcc_unreachable ();
> +         break;
> +         }

So I wonder if we don't want to be more aggressive here and not care
about PHI arg constant-ness or other operator state of the operation.

You should be able to do sth like (with the single-use constraint!)

   use_operand_p use_p;
   gimple *use_stmt;
   single_imm_use (gimple_phi_result (phi), &use_p, &use_stmt);
   tree orig_arg = USE_FROM_PTR (use_p);

 in the loop then do

    *(use_p->use) = phi-arg;  // do not use SET_USE to keep immediate
uses intact
    tree res = gimple_fold_stmt_to_constant (USE_STMT (use_p), NULL);
    if (is_gimple_min_invariant (res))
      OK, record it;
    else
      {
         *(use_p->use) = orig_arg;
         return false;
      }

I suspect the cases where we do have the first N edges to optimize to a constant
but a later one will fail the optimization isn't common.  As a early
bail-out we probably
want to avoid this transform on loop headers (thus when backedges are involved).

Note that for the case where we have n > 1 edges with constant
propagation result
it might be profitable to create a forwarder for the constant or
non-constant parts
thus end up with

     cst_result_1 = PHI< .... >;
     goto merge;

     orig_phi_2 = PHI< edges with non-cst-result... >;
     noncst_result_3 = OP (orig_phi_2);

     res_4 = PHI <cst_result_1, noncst_result_3>
merge:

where of course whether we may hoist OP () needs to be checked.

> +       default:
> +         gcc_unreachable ();
> +       }
> +
> +      /* We did not get a constant.  This can happen (imagine division
> +        by zero).  We have to remove the PHI from the IL, as the PHI
> +        is only partially constructed.  */
> +      if (!new_arg)
> +       {
> +         /* Set the PHI_RESULT to a new, throw-away SSA_NAME.  This avoids
> +            problems unlinking the "uses" of a currently NULL PHI_RESULT.  */
> +         gimple_phi_set_result (new_phi, make_ssa_name (type));
> +
> +         /* Actually remove the PHI from the IL.  It is safe to remove the
> +            PHI if some of its PHI arguments are still uninitialized.  */
> +         gimple_stmt_iterator gsi = gsi_for_stmt (new_phi);
> +         remove_phi_node (&gsi, true);
> +         return false;
> +       }
> +
> +      source_location locus = gimple_phi_arg_location_from_edge (phi, e);
> +      add_phi_arg (new_phi, new_arg, e, locus);
> +    }
> +
> +  gimple_phi_set_result (new_phi, use_result);
> +
> +  /* At this point we have our new PHI installed where we want it.
> +     Delete the old PHI and USE_STMT.  */
> +  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> +  gsi_remove (&gsi, true);
> +  return true;
> +}
> +
> +/* Look for sequences like this:
> +
> +    # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> +    _30 = (unsigned char) iftmp.4_16;
> +
> +   We can "hoist" the conversion into the PHI and get it for free, generating:
> +
> +     # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
> +     # _30 = PHI <0(7), 1(8), 0(9)>
> +
> +   DCE will eliminate the first PHI.  In addition to getting the conversion
> +   for free, removal of the conversion improves backwards threading.  */
> +
> +static bool
> +propagate_with_phi_2 (basic_block bb, gphi *phi)
> +{
> +  /* First verify that all the arguments of the PHI are constants.
> +
> +     This is overly conservative.  Consider:
> +
> +       y = PHI (x, -1, 0);
> +       z = y & x;
> +
> +     We could transform that into:
> +
> +       y = PHI (x, -1, 0);
> +       z = PHI (x, x, 0);
> +
> +     It's not clear how important those cases are and we punt them
> +     for now.  */
> +  use_operand_p arg_p;
> +  ssa_op_iter i;
> +  FOR_EACH_PHI_ARG (arg_p, phi, i, SSA_OP_USE)
> +    {
> +      tree arg = USE_FROM_PTR (arg_p);
> +      if (TREE_CODE_CLASS (TREE_CODE (arg)) != tcc_constant)
> +       return false;
> +    }
> +
> +  /* Not iterate over each immediate use to see if the statement can
> +     be implemented as a PHI rather than a real statement.  */
> +  gimple *use_stmt;
> +  imm_use_iterator ui;
> +  bool retval = false;
> +  tree phi_result = PHI_RESULT (phi);
> +  FOR_EACH_IMM_USE_STMT (use_stmt, ui, phi_result)
> +    retval |= fold_use_into_phi (phi, use_stmt, bb, phi_result);

So you say DCE will remove the PHI but then you handle multiple
uses where some of them might fail to transform.

Thus should we restrict this to single-uses?  That way we also avoid
to end up with too many PHIs and we can remove the original PHI
immediately.

> +  return retval;
> +}
> +
>  /* Main entry for phiprop pass.  */
>
>  namespace {
> @@ -492,7 +693,10 @@ pass_phiprop::execute (function *fun)
>                                   single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
>    FOR_EACH_VEC_ELT (bbs, i, bb)
>      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> -      did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
> +      {
> +          did_something |= propagate_with_phi_1 (bb, gsi.phi (), phivn, n);
> +         did_something |= propagate_with_phi_2 (bb, gsi.phi ());
> +      }
>
>    if (did_something)
>      gsi_commit_edge_inserts ();
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> index a5c83cf..7771044 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
> +/* { dg-options "-O3 -fno-tree-phiprop -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fno-peel-loops" } */
>
>  #define N 8
>  #define M 14
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
> new file mode 100644
> index 0000000..dd64472
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61743-1a.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -funroll-loops -fno-tree-vectorize -fdump-tree-cunroll-details -fdump-tree-cunrolli-details -fdump-tree-phiprop -fno-peel-loops" } */
> +
> +#include "pr61743-1.c"
> +
> +/* { dg-final { scan-tree-dump-times "PHI <(\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\), (\?:320|384|448)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
> +
> +/* The code with PHI back propagation is simpler and we unswitch more.  */
> +/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 10 "cunroll" } } */
> +/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> index 9f8168a..5bf1982 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1.c
> @@ -1,6 +1,6 @@
>  /* PR tree-optimization/61839.  */
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-vrp1 -fno-tree-phiprop -fdump-tree-optimized" } */
>  /* { dg-require-effective-target int32plus } */
>
>  __attribute__ ((noinline))
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
> new file mode 100644
> index 0000000..99a9016
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_1a.c
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/61839.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-phiprop" } */
> +/* { dg-require-effective-target int32plus } */
> +
> +#include "pr61839_1.c"
> +
> +/* Scan for c = 972195717) >> [0, 1] in function foo.  */
> +/* { dg-final { scan-tree-dump-times "486097858 : 972195717" 1  "vrp1" } } */
> +/* Scan for c = 972195717) >> [2, 3] in function bar.  */
> +/* { dg-final { scan-tree-dump-times "PHI <(?:243048929|121524464)\\(\[0-9\]+\\), (?:243048929|121524464)\\(\[0-9\]+\\)>" 1 "phiprop"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> index 5ceb073..ad908c0 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3.c
> @@ -1,6 +1,6 @@
>  /* PR tree-optimization/61839.  */
>  /* { dg-do run } */
> -/* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-vrp1 -fdump-tree-optimized" } */
>
>  __attribute__ ((noinline))
>  int foo (int a, unsigned b)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
> new file mode 100644
> index 0000000..3a4d800
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr61839_3a.c
> @@ -0,0 +1,9 @@
> +/* PR tree-optimization/61839.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-phiprop -fdump-tree-optimized" } */
> +#include "pr61839_3.c"
> +
> +/* Scan for a PHI with (12, 13) << 8 in function foo.  */
> +/* { dg-final { scan-tree-dump-times "PHI <\(3072|3328\)\\(\[0-9\]+\\), \(3072|3328\)\\(\[0-9\
> +]+\\)>" 1 "phiprop"} } */
> +/* { dg-final { scan-tree-dump-times "3072" 0  "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> index 5b4fd54..0bc5f73 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-pre-stats" } */
> +/* { dg-options "-O2 -fno-tree-phiprop -fdump-tree-pre-stats" } */
>  int foo(void)
>  {
>         int x, c, y;
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
> new file mode 100644
> index 0000000..70e8b16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-4a.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-phiprop" } */
> +#include "ssa-pre-4.c"
> +/* We should eliminate the x+1 computation from this routine, replacing
> +   it with a phi of 3, 4 */
> +/* { dg-final { scan-tree-dump-times "PHI <\[34\]\\(\[0-9\]+\\), \[34\]\\(\[0-9\]+\\)>" 1 "phiprop"} } */
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-02 11:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 23:38 [GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better Jeff Law
2017-12-04 21:25 ` Michael Matz
2017-12-05 17:03   ` Jeff Law
2018-05-01 22:44 ` [RFA][PATCH] " Jeff Law
2018-05-02 11:23   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).