public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][RFC] final-value replacement from DCE
@ 2019-05-29 13:39 Richard Biener
  2019-05-29 15:58 ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2019-05-29 13:39 UTC (permalink / raw)
  To: gcc-patches


The following tries to address PR90648 by performing final
value replacement from DCE when DCE knows the final value
computation is not used during loop iteration.  This fits
neatly enough into existing tricks performed by DCE like
removing unused malloc/free pairs.

There's a few complications, one is it fails to bootstrap
because it exposes a few uninit warning false positives,
another is that -fno-tree-sccp is no longer effective.
As written this turns gcc.dg/pr34027-1.c into a division
again (I did not copy the expression_expensive checking).
It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c).

The goal of this patch is to remove the SCCP pass, or rather
us unconditionally replacing loop-closed PHIs with final
value computations which we've got complaints in the past
already that it duplicates computation that is readily
available.  I've not yet figured testsuite fallout from that
change.

For the -fno-tree-sccp I consider to simply honor that
flag in the DCE path, for the gcc.dg/pr34027-1.c I'll
re-install the expression_expensive checking.  I'll
also fix the -ftrapv issue.

Does this otherwise look a sensible way forward?

Thanks,
Richard.

FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1

Running target unix//-m32
FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.target/i386/pr85073.c scan-assembler-times test 1

2019-05-29  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/90594
	* tree-scalar-evolution.h (final_value_replacement): Declare.
	* tree-scalar-evolution.c (final_value_replacement): Split out
	worker on one PHI from ...
	(final_value_replacement_loop): ... here.
	* gimple-fold.h (rewrite_seq_to_defined_overflow): Declare.
	* gimple-fold.c (rewrite_seq_to_defined_overflow): Wrap
	around rewrite_to_defined_overflow to handle a whole sequence.
	* tree-ssa-dce.c (lcphi_map): New.
	(mark_operands_necessary): New walk_tree worker.
	(propagate_necessity): Elide defs of LC PHIs we can perform value
	replacement on.
	(remove_dead_phis): Perform value replacement on LC PHIs that
	had their def removed.
	(perform_tree_ssa_dce): Go into LC SSA in aggressive mode.
	Precompute final value replacements.

	* gcc.dg/tree-ssa/loop-14.c: Adjust for new expected place
	for transform.

Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	(revision 271644)
+++ gcc/tree-scalar-evolution.c	(working copy)
@@ -3601,6 +3601,35 @@ expression_expensive_p (tree expr)
     }
 }
 
+tree
+final_value_replacement (gphi *phi, edge exit, bool *folded_casts)
+{
+  struct loop *ex_loop
+    = superloop_at_depth (exit->src->loop_father,
+			  loop_depth (exit->dest->loop_father) + 1);
+
+  tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+  if (virtual_operand_p (def))
+    return NULL_TREE;
+
+  if (!POINTER_TYPE_P (TREE_TYPE (def))
+      && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
+    return NULL_TREE;
+
+  *folded_casts = false;
+  def = analyze_scalar_evolution_in_loop (ex_loop, exit->src->loop_father, def,
+					  folded_casts);
+  def = compute_overall_effect_of_inner_loop (ex_loop, def);
+  if (!tree_does_not_contain_chrecs (def)
+      || chrec_contains_symbols_defined_in_loop (def, ex_loop->num)
+      /* Moving the computation from the loop may prolong life range
+	 of some ssa names, which may cause problems if they appear
+	 on abnormal edges.  */
+      || contains_abnormal_ssa_name_p (def))
+    return NULL_TREE;
+  return def;
+}
+
 /* Do final value replacement for LOOP, return true if we did anything.  */
 
 bool
Index: gcc/tree-scalar-evolution.h
===================================================================
--- gcc/tree-scalar-evolution.h	(revision 271644)
+++ gcc/tree-scalar-evolution.h	(working copy)
@@ -34,6 +34,7 @@ extern tree instantiate_scev (edge, stru
 extern tree resolve_mixers (struct loop *, tree, bool *);
 extern void gather_stats_on_scev_database (void);
 extern bool final_value_replacement_loop (struct loop *);
+extern tree final_value_replacement (gphi *, edge, bool *);
 extern unsigned int scev_const_prop (void);
 extern bool expression_expensive_p (tree);
 extern bool simple_iv_with_niters (struct loop *, struct loop *, tree,
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	(revision 271644)
+++ gcc/gimple-fold.c	(working copy)
@@ -7381,6 +7381,29 @@ rewrite_to_defined_overflow (gimple *stm
   return stmts;
 }
 
+/* Wrapper around rewrite_to_defined_overflow rewriting STMTS in-place.  */
+
+void
+rewrite_seq_to_defined_overflow (gimple_seq *stmts)
+{
+  gimple_stmt_iterator gsi = gsi_start (*stmts);
+  while (!gsi_end_p (gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      if (is_gimple_assign (stmt)
+	  && arith_code_with_undefined_signed_overflow
+		(gimple_assign_rhs_code (stmt)))
+	{
+	  gsi_remove (&gsi, false);
+	  gsi_insert_seq_before (&gsi,
+				 rewrite_to_defined_overflow (stmt),
+				 GSI_SAME_STMT);
+	}
+      else
+	gsi_next (&gsi);
+    }
+}
+
 
 /* The valueization hook we use for the gimple_build API simplification.
    This makes us match fold_buildN behavior by only combining with
Index: gcc/gimple-fold.h
===================================================================
--- gcc/gimple-fold.h	(revision 271644)
+++ gcc/gimple-fold.h	(working copy)
@@ -60,6 +60,7 @@ extern bool gimple_fold_builtin_sprintf
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
 extern bool arith_code_with_undefined_signed_overflow (tree_code);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
+extern void rewrite_seq_to_defined_overflow (gimple_seq *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
 extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree);
 
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c	(revision 271644)
+++ gcc/tree-ssa-dce.c	(working copy)
@@ -67,6 +67,8 @@ along with GCC; see the file COPYING3.
 #include "tree-scalar-evolution.h"
 #include "tree-ssa-propagate.h"
 #include "gimple-fold.h"
+#include "tree-ssa-loop-manip.h"
+#include "gimplify-me.h"
 
 static struct stmt_stats
 {
@@ -114,6 +116,9 @@ static bool cfg_altered;
 /* When non-NULL holds map from basic block index into the postorder.  */
 static int *bb_postorder;
 
+/* Map from loop-closed PHI argument to final value replacement tree.  */
+hash_map<tree, std::pair<tree, bool> > *lcphi_map;
+
 
 /* If STMT is not already marked necessary, mark it, and add it to the
    worklist if ADD_TO_WORKLIST is true.  */
@@ -181,6 +186,19 @@ mark_operand_necessary (tree op)
   worklist.safe_push (stmt);
 }
 
+/* Helper for walk_tree calling mark_operand_necessary.  */
+
+static tree
+mark_operands_necessary (tree *tp, int *walk_subtrees, void *)
+{
+  if (EXPR_P (*tp))
+    return NULL_TREE;
+  if (TREE_CODE (*tp) == SSA_NAME)
+    mark_operand_necessary (*tp);
+  *walk_subtrees = 0;
+  return NULL_TREE;
+}
+
 
 /* Mark STMT as necessary if it obviously is.  Add it to the worklist if
    it can make other statements necessary.
@@ -657,6 +675,22 @@ propagate_necessity (bool aggressive)
 	  gphi *phi = as_a <gphi *> (stmt);
 	  size_t k;
 
+	  /* There's the special exception of loop-closed PHI nodes
+	     which we might be able to elide.  Do not mark those.  */
+	  if (aggressive
+	      && gimple_phi_num_args (stmt) == 1)
+	    {
+	      tree def = gimple_phi_arg_def (stmt, 0);
+	      if (std::pair<tree, bool> *repl = lcphi_map->get (def))
+		{
+		  /* Now we can elide marking DEF as well as control
+		     dependences.  But we have to mark the replacement
+		     uses instead, else those may get eliminated.  */
+		  walk_tree (&repl->first, mark_operands_necessary, NULL, NULL);
+		  continue;
+		}
+	    }
+
 	  for (k = 0; k < gimple_phi_num_args (stmt); k++)
             {
 	      tree arg = PHI_ARG_DEF (stmt, k);
@@ -975,6 +1009,43 @@ remove_dead_phis (basic_block bb)
 	  continue;
 	}
 
+      /* If we have a necessary PHI node that got its single operand elided
+         perform final value replacement.  */
+      tree arg0 = gimple_phi_arg_def (phi, 0);
+      if (lcphi_map
+	  && gimple_phi_num_args (phi) == 1
+	  && TREE_CODE (arg0) == SSA_NAME
+	  && !SSA_NAME_IS_DEFAULT_DEF (arg0)
+	  && !gimple_plf (SSA_NAME_DEF_STMT (arg0), STMT_NECESSARY))
+	{
+	  something_changed = true;
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Performing final value replacement: ");
+	      print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
+	      fprintf (dump_file, "\n");
+	    }
+
+	  tree rslt = PHI_RESULT (phi);
+	  std::pair<tree, bool> *repl = lcphi_map->get (arg0);
+	  gcc_assert (repl);
+
+	  remove_phi_node (&gsi, false);
+	  stats.removed_phis++;
+
+	  gimple_seq stmts = NULL;
+	  tree nw = force_gimple_operand (unshare_expr (repl->first),
+					  &stmts, false, NULL_TREE);
+	  if (repl->second)
+	    rewrite_seq_to_defined_overflow (&stmts);
+	  gassign *ass = gimple_build_assign (rslt, nw);
+	  gimple_set_plf (ass, STMT_NECESSARY, true);
+	  gimple_seq_add_stmt_without_update (&stmts, ass);
+	  gimple_stmt_iterator gsi2 = gsi_after_labels (bb);
+	  gsi_insert_seq_before (&gsi2, stmts, GSI_SAME_STMT);
+	  continue;
+	}
+
       gsi_next (&gsi);
     }
   return something_changed;
@@ -1559,6 +1630,7 @@ perform_tree_ssa_dce (bool aggressive)
       scev_initialize ();
       loop_optimizer_init (LOOPS_NORMAL
 			   | LOOPS_HAVE_RECORDED_EXITS);
+      rewrite_into_loop_closed_ssa (NULL, 0);
     }
 
   tree_dce_init (aggressive);
@@ -1574,6 +1646,29 @@ perform_tree_ssa_dce (bool aggressive)
       bitmap_clear (visited_control_parents);
 
       mark_dfs_back_edges ();
+
+      lcphi_map = new hash_map<tree, std::pair<tree, bool> > ();
+      struct loop *loop;
+      FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
+	{
+	  if (edge exit = single_exit (loop))
+	    /* When we currently handle a single PHI arg only, we'd
+	       also not be able to insert into the destination block
+	       and thus had to fixup loop-closed SSA form.  */
+	    if (single_pred_p (exit->dest))
+	      {
+		for (gphi_iterator psi = gsi_start_phis (exit->dest);
+		     !gsi_end_p (psi); gsi_next (&psi))
+		  {
+		    gphi *phi = psi.phi ();
+		    tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+		    bool folded_casts;
+		    if (tree repl = final_value_replacement (phi, exit,
+							     &folded_casts))
+		      lcphi_map->put (def, std::make_pair (repl, folded_casts));
+		  }
+	      }
+	}
     }
 
   find_obviously_necessary_stmts (aggressive);
@@ -1604,6 +1699,12 @@ perform_tree_ssa_dce (bool aggressive)
   if (cfg_altered)
     free_dominance_info (CDI_DOMINATORS);
 
+  if (aggressive)
+    {
+      delete lcphi_map;
+      lcphi_map = NULL;
+    }
+
   statistics_counter_event (cfun, "Statements deleted", stats.removed);
   statistics_counter_event (cfun, "PHI nodes deleted", stats.removed_phis);
 
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-14.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-14.c	(revision 271644)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-14.c	(working copy)
@@ -1,6 +1,6 @@
 /* A test for final value replacement.  */
 
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-cddce1" } */
 
 int foo(void);
 
@@ -15,4 +15,4 @@ int bla(void)
   return j;
 }
 
-/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "cddce1" } } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] final-value replacement from DCE
  2019-05-29 13:39 [PATCH][RFC] final-value replacement from DCE Richard Biener
@ 2019-05-29 15:58 ` Jeff Law
  2019-05-29 16:02   ` Jakub Jelinek
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2019-05-29 15:58 UTC (permalink / raw)
  To: Richard Biener, gcc-patches

On 5/29/19 7:36 AM, Richard Biener wrote:
> 
> The following tries to address PR90648 by performing final
> value replacement from DCE when DCE knows the final value
> computation is not used during loop iteration.  This fits
> neatly enough into existing tricks performed by DCE like
> removing unused malloc/free pairs.
DO you have the right BZ #?  90648 is a ICE in tree checking and doesn't
have a loop :-)



> 
> There's a few complications, one is it fails to bootstrap
> because it exposes a few uninit warning false positives,
> another is that -fno-tree-sccp is no longer effective.
> As written this turns gcc.dg/pr34027-1.c into a division
> again (I did not copy the expression_expensive checking).
> It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c).
> 
> The goal of this patch is to remove the SCCP pass, or rather
> us unconditionally replacing loop-closed PHIs with final
> value computations which we've got complaints in the past
> already that it duplicates computation that is readily
> available.  I've not yet figured testsuite fallout from that
> change.
> 
> For the -fno-tree-sccp I consider to simply honor that
> flag in the DCE path, for the gcc.dg/pr34027-1.c I'll
> re-install the expression_expensive checking.  I'll
> also fix the -ftrapv issue.
> 
> Does this otherwise look a sensible way forward?

> 
> Thanks,
> Richard.
> 
> FAIL: gcc.dg/builtin-object-size-1.c execution test
> FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/pr81661.c (internal compiler error)
> FAIL: gcc.dg/pr81661.c (test for excess errors)
> XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
> FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
> FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
> FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
> 
> Running target unix//-m32
> FAIL: gcc.dg/builtin-object-size-1.c execution test
> FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/pr81661.c (internal compiler error)
> FAIL: gcc.dg/pr81661.c (test for excess errors)
> XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0
> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
> FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
> FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times"
> FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1
> FAIL: gcc.target/i386/pr85073.c scan-assembler-times test 1
> 
> 2019-05-29  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/90594
> 	* tree-scalar-evolution.h (final_value_replacement): Declare.
> 	* tree-scalar-evolution.c (final_value_replacement): Split out
> 	worker on one PHI from ...
> 	(final_value_replacement_loop): ... here.
> 	* gimple-fold.h (rewrite_seq_to_defined_overflow): Declare.
> 	* gimple-fold.c (rewrite_seq_to_defined_overflow): Wrap
> 	around rewrite_to_defined_overflow to handle a whole sequence.
> 	* tree-ssa-dce.c (lcphi_map): New.
> 	(mark_operands_necessary): New walk_tree worker.
> 	(propagate_necessity): Elide defs of LC PHIs we can perform value
> 	replacement on.
> 	(remove_dead_phis): Perform value replacement on LC PHIs that
> 	had their def removed.
> 	(perform_tree_ssa_dce): Go into LC SSA in aggressive mode.
> 	Precompute final value replacements.
> 
> 	* gcc.dg/tree-ssa/loop-14.c: Adjust for new expected place
> 	for transform.
The approach here looks quite reasonable to me, so it's just a matter of
the fallout, as are so many things these days...

jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] final-value replacement from DCE
  2019-05-29 15:58 ` Jeff Law
@ 2019-05-29 16:02   ` Jakub Jelinek
  2019-05-31 11:52     ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2019-05-29 16:02 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc-patches

On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote:
> > FAIL: gcc.dg/builtin-object-size-1.c execution test
> > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort

I admit I haven't looked at the details here, but wonder if the optimization
couldn't be done only in the DCE passes post IPA, otherwise we risk
behavior changes for __builtin_object_size.

	Jakub

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][RFC] final-value replacement from DCE
  2019-05-29 16:02   ` Jakub Jelinek
@ 2019-05-31 11:52     ` Richard Biener
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Biener @ 2019-05-31 11:52 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, gcc-patches

On Wed, 29 May 2019, Jakub Jelinek wrote:

> On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote:
> > > FAIL: gcc.dg/builtin-object-size-1.c execution test
> > > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> 
> I admit I haven't looked at the details here, but wonder if the optimization
> couldn't be done only in the DCE passes post IPA, otherwise we risk
> behavior changes for __builtin_object_size.

We can do that - the first CD-DCE pass is in the loop pipeline though,
_after_ final value replacement.  Looking at the testsuite fallout
it's also clear that doing loop-header copying before final-value
replacement results in better code for some testcases.

So I'm trying turning the first DCE after loop-header copying into
a CD-DCE run, not doing final value replacement before IPA.

The following does that independently, bootstrapped & tested
on x86_64-unknown-linux-gnu.  It will leave

FAIL: gcc.dg/tree-ssa/pr68619-4.c scan-tree-dump optimized "PHI <.*, 39"

because the testcase is totally unclear on who is supposed to
propagate 39 and why.  With CD-DCE there's one PRE opportunity
less because, well, a value is no longer partially redundant.

I hope I catched all dce/cddce dump issues and it just seemed to
me that unifying dce and cd-dce may be a useful cleanup
and just have

  NEXT_PASS (pass_dce, true /* perform control-dependent DCE */)

but not for today...

Not going to apply this separately but only eventually together
with the rest.

Richard.

2019-05-31  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/68619
	* passes.def (pass_dce after CH): Turn into pass_cd_dce.

	* g++.dg/tree-ssa/copyprop-1.C: Adjust dump scanned.
	* gcc.dg/tree-ssa/20030709-2.c: Likewise.
	* gcc.dg/tree-ssa/20030808-1.c: Likewise.
	* gcc.dg/tree-ssa/20040729-1.c: Likewise.
	* gcc.dg/tree-ssa/loop-36.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dce-1.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dce-2.c:  Likewise.

Index: gcc/passes.def
===================================================================
--- gcc/passes.def	(revision 271802)
+++ gcc/passes.def	(working copy)
@@ -231,7 +231,7 @@ along with GCC; see the file COPYING3.
       NEXT_PASS (pass_isolate_erroneous_paths);
       NEXT_PASS (pass_dse);
       NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
-      NEXT_PASS (pass_dce);
+      NEXT_PASS (pass_cd_dce);
       NEXT_PASS (pass_forwprop);
       NEXT_PASS (pass_phiopt, false /* early_p */);
       NEXT_PASS (pass_ccp, true /* nonzero_p */);
Index: gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C
===================================================================
--- gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C	(revision 271802)
+++ gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce3" } */
+/* { dg-options "-O -fdump-tree-cddce2" } */
 
 /* Verify that we can eliminate the useless conversions to/from
    const qualified pointer types
@@ -27,4 +27,4 @@ int foo(Object&o)
 
 /* Remaining should be two loads.  */
 
-/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "dce3" } } */
+/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "cddce2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce3" } */
+/* { dg-options "-O -fdump-tree-cddce2" } */
   
 struct rtx_def;
 typedef struct rtx_def *rtx;
@@ -42,13 +42,13 @@ get_alias_set (t)
 
 /* There should be precisely one load of ->decl.rtl.  If there is
    more than, then the dominator optimizations failed.  */
-/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "cddce2"} } */
   
 /* There should be no loads of .rtmem since the complex return statement
    is just "return 0".  */
-/* { dg-final { scan-tree-dump-times ".rtmem" 0 "dce3"} } */
+/* { dg-final { scan-tree-dump-times ".rtmem" 0 "cddce2"} } */
   
 /* There should be one IF statement (the complex return statement should
    collapse down to a simple return 0 without any conditionals).  */
-/* { dg-final { scan-tree-dump-times "if " 1 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "if " 1 "cddce2"} } */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-cddce3" } */
+/* { dg-options "-O1 -fdump-tree-cddce4" } */
       
 extern void abort (void);
 
@@ -33,8 +33,8 @@ delete_dead_jumptables ()
 /* There should be no loads of ->code.  If any exist, then we failed to
    optimize away all the IF statements and the statements feeding
    their conditions.  */
-/* { dg-final { scan-tree-dump-times "->code" 0 "cddce3"} } */
+/* { dg-final { scan-tree-dump-times "->code" 0 "cddce4"} } */
    
 /* There should be no IF statements.  */
-/* { dg-final { scan-tree-dump-times "if " 0 "cddce3"} } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce4"} } */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/20040729-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/20040729-1.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/20040729-1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-dce3" } */
+/* { dg-options "-O1 -fdump-tree-cddce2" } */
 
 int
 foo ()
@@ -16,4 +16,4 @@ foo ()
    compiler was mistakenly thinking that the statement had volatile
    operands.  But 'p' itself is not volatile and taking the address of
    a volatile does not constitute a volatile operand.  */
-/* { dg-final { scan-tree-dump-times "&x" 0 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "&x" 0 "cddce2"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-36.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-36.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-36.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dce3" } */
+/* { dg-options "-O2 -fdump-tree-cddce2" } */
 
 struct X { float array[2]; };
 
@@ -18,4 +18,4 @@ float foobar () {
 
 /* The temporary structure should have been promoted to registers
    by FRE after the loops have been unrolled by the early unrolling pass.  */
-/* { dg-final { scan-tree-dump-not "c\.array" "dce3" } } */
+/* { dg-final { scan-tree-dump-not "c\.array" "cddce2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-1.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-1.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-dce3" } */
+/* { dg-options "-O1 -fdump-tree-cddce2" } */
 
 int t() __attribute__ ((const));
 void
@@ -10,4 +10,4 @@ q()
     i = t();
 }
 /* There should be no IF conditionals.  */
-/* { dg-final { scan-tree-dump-times "if " 0 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-2.c	(revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dce-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dce3" } */
+/* { dg-options "-O2 -fdump-tree-cddce2" } */
 
 /* We should notice constantness of this function. */
 static int __attribute__((noinline)) t(int a) 
@@ -13,4 +13,4 @@ void q(void)
     i = t(1);
 }
 /* There should be no IF conditionals.  */
-/* { dg-final { scan-tree-dump-times "if " 0 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "if " 0 "cddce2"} } */

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-05-31 11:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-29 13:39 [PATCH][RFC] final-value replacement from DCE Richard Biener
2019-05-29 15:58 ` Jeff Law
2019-05-29 16:02   ` Jakub Jelinek
2019-05-31 11:52     ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).