From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29340 invoked by alias); 29 May 2019 13:36:33 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 29311 invoked by uid 89); 29 May 2019 13:36:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-13.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=ham version=3.3.1 spammy=complaints X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 29 May 2019 13:36:30 +0000 Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 15E9FB022 for ; Wed, 29 May 2019 13:36:28 +0000 (UTC) Date: Wed, 29 May 2019 13:39:00 -0000 From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH][RFC] final-value replacement from DCE Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-05/txt/msg01904.txt.bz2 The following tries to address PR90648 by performing final value replacement from DCE when DCE knows the final value computation is not used during loop iteration. This fits neatly enough into existing tricks performed by DCE like removing unused malloc/free pairs. There's a few complications, one is it fails to bootstrap because it exposes a few uninit warning false positives, another is that -fno-tree-sccp is no longer effective. As written this turns gcc.dg/pr34027-1.c into a division again (I did not copy the expression_expensive checking). It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c). The goal of this patch is to remove the SCCP pass, or rather us unconditionally replacing loop-closed PHIs with final value computations which we've got complaints in the past already that it duplicates computation that is readily available. I've not yet figured testsuite fallout from that change. For the -fno-tree-sccp I consider to simply honor that flag in the DCE path, for the gcc.dg/pr34027-1.c I'll re-install the expression_expensive checking. I'll also fix the -ftrapv issue. Does this otherwise look a sensible way forward? Thanks, Richard. FAIL: gcc.dg/builtin-object-size-1.c execution test FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/pr81661.c (internal compiler error) FAIL: gcc.dg/pr81661.c (test for excess errors) XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0 FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6 FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times" FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1 Running target unix//-m32 FAIL: gcc.dg/builtin-object-size-1.c execution test FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/pr81661.c (internal compiler error) FAIL: gcc.dg/pr81661.c (test for excess errors) XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " \\\\+ " 0 FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6 FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times" FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.target/i386/pr85073.c scan-assembler-times test 1 2019-05-29 Richard Biener PR tree-optimization/90594 * tree-scalar-evolution.h (final_value_replacement): Declare. * tree-scalar-evolution.c (final_value_replacement): Split out worker on one PHI from ... (final_value_replacement_loop): ... here. * gimple-fold.h (rewrite_seq_to_defined_overflow): Declare. * gimple-fold.c (rewrite_seq_to_defined_overflow): Wrap around rewrite_to_defined_overflow to handle a whole sequence. * tree-ssa-dce.c (lcphi_map): New. (mark_operands_necessary): New walk_tree worker. (propagate_necessity): Elide defs of LC PHIs we can perform value replacement on. (remove_dead_phis): Perform value replacement on LC PHIs that had their def removed. (perform_tree_ssa_dce): Go into LC SSA in aggressive mode. Precompute final value replacements. * gcc.dg/tree-ssa/loop-14.c: Adjust for new expected place for transform. Index: gcc/tree-scalar-evolution.c =================================================================== --- gcc/tree-scalar-evolution.c (revision 271644) +++ gcc/tree-scalar-evolution.c (working copy) @@ -3601,6 +3601,35 @@ expression_expensive_p (tree expr) } } +tree +final_value_replacement (gphi *phi, edge exit, bool *folded_casts) +{ + struct loop *ex_loop + = superloop_at_depth (exit->src->loop_father, + loop_depth (exit->dest->loop_father) + 1); + + tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit); + if (virtual_operand_p (def)) + return NULL_TREE; + + if (!POINTER_TYPE_P (TREE_TYPE (def)) + && !INTEGRAL_TYPE_P (TREE_TYPE (def))) + return NULL_TREE; + + *folded_casts = false; + def = analyze_scalar_evolution_in_loop (ex_loop, exit->src->loop_father, def, + folded_casts); + def = compute_overall_effect_of_inner_loop (ex_loop, def); + if (!tree_does_not_contain_chrecs (def) + || chrec_contains_symbols_defined_in_loop (def, ex_loop->num) + /* Moving the computation from the loop may prolong life range + of some ssa names, which may cause problems if they appear + on abnormal edges. */ + || contains_abnormal_ssa_name_p (def)) + return NULL_TREE; + return def; +} + /* Do final value replacement for LOOP, return true if we did anything. */ bool Index: gcc/tree-scalar-evolution.h =================================================================== --- gcc/tree-scalar-evolution.h (revision 271644) +++ gcc/tree-scalar-evolution.h (working copy) @@ -34,6 +34,7 @@ extern tree instantiate_scev (edge, stru extern tree resolve_mixers (struct loop *, tree, bool *); extern void gather_stats_on_scev_database (void); extern bool final_value_replacement_loop (struct loop *); +extern tree final_value_replacement (gphi *, edge, bool *); extern unsigned int scev_const_prop (void); extern bool expression_expensive_p (tree); extern bool simple_iv_with_niters (struct loop *, struct loop *, tree, Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c (revision 271644) +++ gcc/gimple-fold.c (working copy) @@ -7381,6 +7381,29 @@ rewrite_to_defined_overflow (gimple *stm return stmts; } +/* Wrapper around rewrite_to_defined_overflow rewriting STMTS in-place. */ + +void +rewrite_seq_to_defined_overflow (gimple_seq *stmts) +{ + gimple_stmt_iterator gsi = gsi_start (*stmts); + while (!gsi_end_p (gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (is_gimple_assign (stmt) + && arith_code_with_undefined_signed_overflow + (gimple_assign_rhs_code (stmt))) + { + gsi_remove (&gsi, false); + gsi_insert_seq_before (&gsi, + rewrite_to_defined_overflow (stmt), + GSI_SAME_STMT); + } + else + gsi_next (&gsi); + } +} + /* The valueization hook we use for the gimple_build API simplification. This makes us match fold_buildN behavior by only combining with Index: gcc/gimple-fold.h =================================================================== --- gcc/gimple-fold.h (revision 271644) +++ gcc/gimple-fold.h (working copy) @@ -60,6 +60,7 @@ extern bool gimple_fold_builtin_sprintf extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *); extern bool arith_code_with_undefined_signed_overflow (tree_code); extern gimple_seq rewrite_to_defined_overflow (gimple *); +extern void rewrite_seq_to_defined_overflow (gimple_seq *); extern void replace_call_with_value (gimple_stmt_iterator *, tree); extern tree tree_vec_extract (gimple_stmt_iterator *, tree, tree, tree, tree); Index: gcc/tree-ssa-dce.c =================================================================== --- gcc/tree-ssa-dce.c (revision 271644) +++ gcc/tree-ssa-dce.c (working copy) @@ -67,6 +67,8 @@ along with GCC; see the file COPYING3. #include "tree-scalar-evolution.h" #include "tree-ssa-propagate.h" #include "gimple-fold.h" +#include "tree-ssa-loop-manip.h" +#include "gimplify-me.h" static struct stmt_stats { @@ -114,6 +116,9 @@ static bool cfg_altered; /* When non-NULL holds map from basic block index into the postorder. */ static int *bb_postorder; +/* Map from loop-closed PHI argument to final value replacement tree. */ +hash_map > *lcphi_map; + /* If STMT is not already marked necessary, mark it, and add it to the worklist if ADD_TO_WORKLIST is true. */ @@ -181,6 +186,19 @@ mark_operand_necessary (tree op) worklist.safe_push (stmt); } +/* Helper for walk_tree calling mark_operand_necessary. */ + +static tree +mark_operands_necessary (tree *tp, int *walk_subtrees, void *) +{ + if (EXPR_P (*tp)) + return NULL_TREE; + if (TREE_CODE (*tp) == SSA_NAME) + mark_operand_necessary (*tp); + *walk_subtrees = 0; + return NULL_TREE; +} + /* Mark STMT as necessary if it obviously is. Add it to the worklist if it can make other statements necessary. @@ -657,6 +675,22 @@ propagate_necessity (bool aggressive) gphi *phi = as_a (stmt); size_t k; + /* There's the special exception of loop-closed PHI nodes + which we might be able to elide. Do not mark those. */ + if (aggressive + && gimple_phi_num_args (stmt) == 1) + { + tree def = gimple_phi_arg_def (stmt, 0); + if (std::pair *repl = lcphi_map->get (def)) + { + /* Now we can elide marking DEF as well as control + dependences. But we have to mark the replacement + uses instead, else those may get eliminated. */ + walk_tree (&repl->first, mark_operands_necessary, NULL, NULL); + continue; + } + } + for (k = 0; k < gimple_phi_num_args (stmt); k++) { tree arg = PHI_ARG_DEF (stmt, k); @@ -975,6 +1009,43 @@ remove_dead_phis (basic_block bb) continue; } + /* If we have a necessary PHI node that got its single operand elided + perform final value replacement. */ + tree arg0 = gimple_phi_arg_def (phi, 0); + if (lcphi_map + && gimple_phi_num_args (phi) == 1 + && TREE_CODE (arg0) == SSA_NAME + && !SSA_NAME_IS_DEFAULT_DEF (arg0) + && !gimple_plf (SSA_NAME_DEF_STMT (arg0), STMT_NECESSARY)) + { + something_changed = true; + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Performing final value replacement: "); + print_gimple_stmt (dump_file, phi, 0, TDF_SLIM); + fprintf (dump_file, "\n"); + } + + tree rslt = PHI_RESULT (phi); + std::pair *repl = lcphi_map->get (arg0); + gcc_assert (repl); + + remove_phi_node (&gsi, false); + stats.removed_phis++; + + gimple_seq stmts = NULL; + tree nw = force_gimple_operand (unshare_expr (repl->first), + &stmts, false, NULL_TREE); + if (repl->second) + rewrite_seq_to_defined_overflow (&stmts); + gassign *ass = gimple_build_assign (rslt, nw); + gimple_set_plf (ass, STMT_NECESSARY, true); + gimple_seq_add_stmt_without_update (&stmts, ass); + gimple_stmt_iterator gsi2 = gsi_after_labels (bb); + gsi_insert_seq_before (&gsi2, stmts, GSI_SAME_STMT); + continue; + } + gsi_next (&gsi); } return something_changed; @@ -1559,6 +1630,7 @@ perform_tree_ssa_dce (bool aggressive) scev_initialize (); loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); + rewrite_into_loop_closed_ssa (NULL, 0); } tree_dce_init (aggressive); @@ -1574,6 +1646,29 @@ perform_tree_ssa_dce (bool aggressive) bitmap_clear (visited_control_parents); mark_dfs_back_edges (); + + lcphi_map = new hash_map > (); + struct loop *loop; + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST) + { + if (edge exit = single_exit (loop)) + /* When we currently handle a single PHI arg only, we'd + also not be able to insert into the destination block + and thus had to fixup loop-closed SSA form. */ + if (single_pred_p (exit->dest)) + { + for (gphi_iterator psi = gsi_start_phis (exit->dest); + !gsi_end_p (psi); gsi_next (&psi)) + { + gphi *phi = psi.phi (); + tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit); + bool folded_casts; + if (tree repl = final_value_replacement (phi, exit, + &folded_casts)) + lcphi_map->put (def, std::make_pair (repl, folded_casts)); + } + } + } } find_obviously_necessary_stmts (aggressive); @@ -1604,6 +1699,12 @@ perform_tree_ssa_dce (bool aggressive) if (cfg_altered) free_dominance_info (CDI_DOMINATORS); + if (aggressive) + { + delete lcphi_map; + lcphi_map = NULL; + } + statistics_counter_event (cfun, "Statements deleted", stats.removed); statistics_counter_event (cfun, "PHI nodes deleted", stats.removed_phis); Index: gcc/testsuite/gcc.dg/tree-ssa/loop-14.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-14.c (revision 271644) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-14.c (working copy) @@ -1,6 +1,6 @@ /* A test for final value replacement. */ -/* { dg-options "-O2 -fdump-tree-optimized" } */ +/* { dg-options "-O2 -fdump-tree-cddce1" } */ int foo(void); @@ -15,4 +15,4 @@ int bla(void) return j; } -/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\+ 100" 1 "cddce1" } } */