From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30453 invoked by alias); 22 Jul 2011 15:36:32 -0000 Received: (qmail 30156 invoked by uid 22791); 22 Jul 2011 15:36:30 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_TM X-Spam-Check-By: sourceware.org Received: from mail-yi0-f47.google.com (HELO mail-yi0-f47.google.com) (209.85.218.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 22 Jul 2011 15:36:11 +0000 Received: by yib18 with SMTP id 18so1494690yib.20 for ; Fri, 22 Jul 2011 08:36:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.151.43.19 with SMTP id v19mr1966645ybj.444.1311348970734; Fri, 22 Jul 2011 08:36:10 -0700 (PDT) Received: by 10.150.205.2 with HTTP; Fri, 22 Jul 2011 08:36:10 -0700 (PDT) In-Reply-To: <4E232ADD.3020803@codesourcery.com> References: <4DEF4408.4040001@codesourcery.com> <4DF24C2C.7080808@codesourcery.com> <4E1C3A19.8060706@codesourcery.com> <4E232ADD.3020803@codesourcery.com> Date: Fri, 22 Jul 2011 15:54:00 -0000 Message-ID: Subject: Re: [PATCH, PR43864] Gimple level duplicate block cleanup. From: Richard Guenther To: Tom de Vries Cc: Steven Bosscher , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-07/txt/msg01975.txt.bz2 On Sun, Jul 17, 2011 at 8:33 PM, Tom de Vries wrot= e: > Bootstrapped and reg-tested on x86_64. =A0Ok for trunk (after ARM testing= )? +static int +same_succ_equal (const void *ve1, const void *ve2) +{ ... + if (bitmap_bit_p (e1->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e1->bbs, EXIT_BLOCK) + || bitmap_bit_p (e2->bbs, ENTRY_BLOCK) + || bitmap_bit_p (e2->bbs, EXIT_BLOCK)) + return 0; that's odd - what are these checks for? + if (dump_file) + { + fprintf (dump_file, "initial worklist:\n"); with dump_flags & TDF_DETAILS I'm now at merge_calls and wondering about alias info again. We are probably safe for the per-pointer information because we are not operating flow-sensitive for memory and for merging require value-equivalen= ce for SSA names. For calls the same should be true - we are not flow- or context-sensitive, and even if we were context-sentitive we require equivalent arguments (for memory arguments we should be safe because of the non-flow-sensitivity). So, did you actually run into problems? If not then I suggest to remove merge_calls completely (and the related changes that it requires). +/* Create or update a vop phi in BB2. Use VUSE1 arguments for all the + REDIRECTED_EDGES, or if VUSE1 is NULL_TREE, use BB_VOP_AT_EXIT. If a n= ew + phis is created, use the phi instead of VUSE2 in BB2. */ + +static void +update_vuses (tree vuse1, tree vuse2, basic_block bb2, + VEC (edge,heap) *redirected_edges) ... + if (vuse2 =3D=3D NULL_TREE) + return; hm, that's the case when there is no VUSE that is dominated by BB2 (or is in BB2). Ok, might happen. + locus1 =3D gimple_location (SSA_NAME_DEF_STMT (arg)); + add_phi_arg (phi, arg, e, locus1); I don't think virtual operand PHIs should have locations, just use UNKNOWN_LOCATION here. + locus2 =3D gimple_location (def_stmt2); Likewise. + /* Create a phi, first with default argument vuse2 for all preds. */ + lhs =3D make_ssa_name (SSA_NAME_VAR (vuse2), NULL); + VN_INFO_GET (lhs); + phi =3D create_phi_node (lhs, bb2); + SSA_NAME_DEF_STMT (lhs) =3D phi; + FOR_EACH_EDGE (e, ei, bb2->preds) + add_phi_arg (phi, vuse2, e, locus2); + + /* Now overwrite the arguments associated with the redirected edges with + vuse1. */ + for (i =3D 0; i < EDGE_COUNT (redirected_edges); ++i) + { + e =3D VEC_index (edge, redirected_edges, i); + gcc_assert (PHI_ARG_DEF_FROM_EDGE (phi, e)); No need for this assert. + if (vuse1) + arg =3D vuse1; + else + arg =3D BB_VOP_AT_EXIT (e->src); + SET_PHI_ARG_DEF (phi, e->dest_idx, arg); + locus1 =3D gimple_location (SSA_NAME_DEF_STMT (arg)); See above. + gimple_phi_arg_set_location (phi, e->dest_idx, locus1); + } Can you maybe merge this with the update-existing-phi-case? They look all too similar. + /* Replace uses of vuse2 in bb2 with phi. */ + FOR_EACH_IMM_USE_STMT (stmt, iter, vuse2) + { + if (gimple_code (stmt) =3D=3D GIMPLE_PHI) Does FOR_EACH_IMM_USE_ON_STMT really not work for PHIs? Other code doesn't seem to care. + { + edge e; + if (stmt =3D=3D phi) + continue; + e =3D find_edge (bb2, gimple_bb (stmt)); + if (e =3D=3D NULL) + continue; + use_p =3D PHI_ARG_DEF_PTR_FROM_EDGE (stmt, e); + SET_USE (use_p, lhs); + update_stmt (stmt); + } + else if (gimple_bb (stmt) =3D=3D bb2) That check looks odd. A use can very well appear in a forwarder block. + { + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, lhs); + update_stmt (stmt); + } + } +/* Scans the vdefs and vuses of the insn of BB, and returns the vop at ent= ry in + VOP_AT_ENTRY, and the vop at exit in VOP_AT_EXIT. */ + +static void +insn_vops (basic_block bb, tree *vop_at_entry, tree *vop_at_exit) it's easier to start from the bb end and walk until you see the first vdef or vuse. Then you have *vop_at_exit. From there just walk the SSA_NAME_DEF_STMTs of the vuse until you hit one whose definition is not in BB - and you have *vop_at_entry. That way you can avoid scanning most of the stmts. The function also has an odd name ;) It should be something like vops_at_bb_entry_and_exit. +static void +vop_at_entry (basic_block bb1, basic_block bb2, tree *vop_at_entry1, + tree *vop_at_entry2) so you don't need the vop at exit at all? The function is a bit unclear to me given it does so much stuff other than just computing the BBs entry VOPs ... +static void +replace_block_by (basic_block bb1, basic_block bb2, bool update_vops) +{ can you add some comments before the different phases of update? I _think_ I understand what it does, but ... +/* Runs tail merge optimization. */ + +unsigned int +tail_merge_optimize (unsigned int todo) +{ + int nr_bbs_removed_total =3D 0; + int nr_bbs_removed; + bool loop_entered =3D false; + int iteration_nr =3D 0; + bool update_vops =3D ((todo & TODO_update_ssa_only_virtuals) =3D=3D 0 + || !symbol_marked_for_renaming (gimple_vop (cfun))); you need to simplify this to bool update_vops =3D !symbol_marked_for_renaming (gimple_vop (cfun)); + if (nr_bbs_removed =3D=3D 0) + break; + + free_dominance_info (CDI_DOMINATORS); we might want to limit the number of iterations we perform - especially as you are re-computing dominators on each iteration. What's the maximum number of iterations you see during a GCC bootstrap? + if (dump_file) + fprintf (dump_file, "htab collision / search: %f\n", + htab_collisions (same_succ_htab)); in general without dump_flags & TDF_DETAILS passes should print at most things when they did a transformation (some even don't do that). Please double-check all your dump-prints. + todo |=3D (TODO_verify_ssa | TODO_verify_stmts | TODO_verify_flow + | TODO_dump_func); should all be already set. @ -4945,6 +4944,9 @@ execute_pre (bool do_fre) scev_finalize (); fini_pre (do_fre); + todo |=3D tail_merge_optimize (todo); + free_scc_vn (); Please only run tail_merge_optimize once. As we are running through this code three times at -O2. I suggest to try it in the !do_fre case only which we only run once (as PRE). If that doesn't work out nicely we need to find other means (like introduce a pass_fre_and_tail_merge which passes down another flag and replace one FRE pass with that new combined pass). Index: gcc/tree-flow.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/tree-flow.h (revision 175801) +++ gcc/tree-flow.h (working copy) @@ -806,6 +806,9 @@ bool multiplier_allowed_in_address_p (HO unsigned multiply_by_cost (HOST_WIDE_INT, enum machine_mode, bool); bool may_be_nonaddressable_p (tree expr); +/* In tree-ssa-tail-merge.c. */ +extern unsigned int tail_merge_optimize (unsigned int); Eh, tree-flow.h kitchen-sink ;) Please put it into tree-pass.h instead. That said - I'm reasonably happy with the pass now, but it's rather large (this review took 40min again ...) so I appreciate a second look from somebody else. Btw, can you expand a bit on the amount of testcases? Thanks, Richard. > Thanks, > - Tom > > 2011-07-17 =A0Tom de Vries =A0 > > =A0 =A0 =A0 =A0PR middle-end/43864 > =A0 =A0 =A0 =A0* tree-ssa-tail-merge.c: New file. > =A0 =A0 =A0 =A0(struct same_succ): Define. > =A0 =A0 =A0 =A0(same_succ_t, const_same_succ_t): New typedef. > =A0 =A0 =A0 =A0(struct bb_cluster): Define. > =A0 =A0 =A0 =A0(bb_cluster_t, const_bb_cluster_t): New typedef. > =A0 =A0 =A0 =A0(struct aux_bb_info): Define. > =A0 =A0 =A0 =A0(BB_SIZE, BB_SAME_SUCC, BB_CLUSTER, BB_VOP_AT_EXIT): Defin= e. > =A0 =A0 =A0 =A0(gvn_uses_equal): New function. > =A0 =A0 =A0 =A0(same_succ_print, same_succ_print_traverse, same_succ_hash) > =A0 =A0 =A0 =A0(inverse_flags, same_succ_equal, same_succ_alloc, same_suc= c_delete) > =A0 =A0 =A0 =A0(same_succ_reset): New function. > =A0 =A0 =A0 =A0(same_succ_htab, same_succ_edge_flags) > =A0 =A0 =A0 =A0(deleted_bbs, deleted_bb_preds): New var. > =A0 =A0 =A0 =A0(debug_same_succ): New function. > =A0 =A0 =A0 =A0(worklist): New var. > =A0 =A0 =A0 =A0(print_worklist, add_to_worklist, find_same_succ_bb, find_= same_succ) > =A0 =A0 =A0 =A0(init_worklist, delete_worklist, delete_basic_block_same_s= ucc) > =A0 =A0 =A0 =A0(same_succ_flush_bbs, update_worklist): New function. > =A0 =A0 =A0 =A0(print_cluster, debug_cluster, same_predecessors) > =A0 =A0 =A0 =A0(add_bb_to_cluster, new_cluster, delete_cluster): New func= tion. > =A0 =A0 =A0 =A0(all_clusters): New var. > =A0 =A0 =A0 =A0(alloc_cluster_vectors, reset_cluster_vectors, delete_clus= ter_vectors) > =A0 =A0 =A0 =A0(merge_clusters, set_cluster): New function. > =A0 =A0 =A0 =A0(gimple_equal_p, find_duplicate, same_phi_alternatives_1) > =A0 =A0 =A0 =A0(same_phi_alternatives, bb_has_non_vop_phi, find_clusters_= 1) > =A0 =A0 =A0 =A0(find_clusters): New function. > =A0 =A0 =A0 =A0(merge_calls, update_vuses, vop_phi, insn_vops, vop_at_ent= ry) > =A0 =A0 =A0 =A0(replace_block_by): New function. > =A0 =A0 =A0 =A0(update_bbs): New var. > =A0 =A0 =A0 =A0(apply_clusters): New function. > =A0 =A0 =A0 =A0(update_debug_stmt, update_debug_stmts): New function. > =A0 =A0 =A0 =A0(tail_merge_optimize): New function. > =A0 =A0 =A0 =A0tree-flow.h (tail_merge_optimize): Declare. > =A0 =A0 =A0 =A0* tree-ssa-pre.c (execute_pre): Use tail_merge_optimize. > =A0 =A0 =A0 =A0* Makefile.in (OBJS-common): Add tree-ssa-tail-merge.o. > =A0 =A0 =A0 =A0(tree-ssa-tail-merge.o): New rule. > =A0 =A0 =A0 =A0* opts.c (default_options_table): Set OPT_ftree_tail_merge= by default at > =A0 =A0 =A0 =A0OPT_LEVELS_2_PLUS. > =A0 =A0 =A0 =A0* tree-ssa-sccvn.c (vn_valueize): Move to ... > =A0 =A0 =A0 =A0* tree-ssa-sccvn.h (vn_valueize): Here. > =A0 =A0 =A0 =A0* tree-ssa-alias.h (pt_solution_ior_into_shared): Declare. > =A0 =A0 =A0 =A0* tree-ssa-structalias.c (find_what_var_points_to): Factor= out and > =A0 =A0 =A0 =A0use ... > =A0 =A0 =A0 =A0(pt_solution_share): New function. > =A0 =A0 =A0 =A0(pt_solution_unshare, pt_solution_ior_into_shared): New fu= nction. > =A0 =A0 =A0 =A0(delete_points_to_sets): Nullify shared_bitmap_table after= deletion. > =A0 =A0 =A0 =A0* timevar.def (TV_TREE_TAIL_MERGE): New timevar. > =A0 =A0 =A0 =A0* common.opt (ftree-tail-merge): New switch. > =A0 =A0 =A0 =A0* params.def (PARAM_MAX_TAIL_MERGE_COMPARISONS): New param= eter. > =A0 =A0 =A0 =A0* doc/invoke.texi (Optimization Options, -O2): Add -ftree-= tail-merge. > =A0 =A0 =A0 =A0(-ftree-tail-merge, max-tail-merge-comparisons): New item. >