Hi, This is the main patch improving control flow graph for vectorized loop. It generally rewrites loop peeling stuff in vectorizer. As described in patch, for a typical loop to be vectorized like: preheader: LOOP: header_bb: loop_body if (exit_loop_cond) goto exit_bb else goto header_bb exit_bb: This patch peels prolog and epilog from the loop, adds guards skipping PROLOG and EPILOG for various conditions. As a result, the changed CFG would look like: guard_bb_1: if (prefer_scalar_loop) goto merge_bb_1 else goto guard_bb_2 guard_bb_2: if (skip_prolog) goto merge_bb_2 else goto prolog_preheader prolog_preheader: PROLOG: prolog_header_bb: prolog_body if (exit_prolog_cond) goto prolog_exit_bb else goto prolog_header_bb prolog_exit_bb: merge_bb_2: vector_preheader: VECTOR LOOP: vector_header_bb: vector_body if (exit_vector_cond) goto vector_exit_bb else goto vector_header_bb vector_exit_bb: guard_bb_3: if (skip_epilog) goto merge_bb_3 else goto epilog_preheader merge_bb_1: epilog_preheader: EPILOG: epilog_header_bb: epilog_body if (exit_epilog_cond) goto merge_bb_3 else goto epilog_header_bb merge_bb_3: Note this patch peels prolog and epilog only if it's necessary, as well as adds different guard_conditions/branches. Also the first guard/branch could be further improved by merging it with loop versioning. Before this patch, up to 4 branch instructions need to be executed before the vectorized loop is reached in the worst case, while the number is reduced to 2 with this patch. The patch also does better in compile time analysis to avoid unnecessary peeling/branching. From implementation's point of view, vectorizer needs to update induction variables and iteration bounds along with control flow changes. Unfortunately, it also becomes much harder to follow because slpeel_* functions updates SSA by itself, rather than using update_ssa interface. This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes. This should make the implementation easier to read, and I think it maybe a step forward to replace slpeel_* functions with generic GIMPLE loop copy interfaces as Richard suggested. Thanks, bin 2016-09-01 Bin Cheng * tree-vect-loop-manip.c (adjust_vec_debug_stmts): Don't release adjust_vec automatically. (slpeel_add_loop_guard): Remove param cond_expr_stmt_list. Rename param exit_bb to guard_to. (slpeel_checking_verify_cfg_after_peeling): (set_prologue_iterations): (create_lcssa_for_virtual_phi): New func which is factored out from slpeel_tree_peel_loop_to_edge. (slpeel_tree_peel_loop_to_edge): (iv_phi_p): New func. (vect_can_advance_ivs_p): Call iv_phi_p. (vect_update_ivs_after_vectorizer): Call iv_phi_p. Directly insert new gimple stmts in basic block. (vect_do_peeling_for_loop_bound): (vect_do_peeling_for_alignment): (vect_gen_niters_for_prolog_loop): Rename to... (vect_gen_prolog_loop_niters): ...Rename from. Change parameters and adjust implementation. (vect_update_inits_of_drs): Fix code style issue. Convert niters to sizetype if necessary. (vect_build_loop_niters): Move to here from tree-vect-loop.c. Change it to external function. (vect_gen_scalar_loop_niters, vect_gen_vector_loop_niters): New. (vect_gen_vector_loop_niters_mult_vf): New. (slpeel_update_phi_nodes_for_loops): New. (slpeel_update_phi_nodes_for_guard1): Reimplement. (find_guard_arg, slpeel_update_phi_nodes_for_guard2): Reimplement. (slpeel_update_phi_nodes_for_lcssa, vect_do_peeling): New. * tree-vect-loop.c (vect_build_loop_niters): Move to file tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): Delete. (vect_transform_loop): Rename vectorization_factor to vf. Call vect_do_peeling instead of vect_do_peeling-* functions. * tree-vectorizer.h (vect_do_peeling): New decl. (vect_build_loop_niters, vect_gen_vector_loop_niters): New decls. (vect_do_peeling_for_loop_bound): Delete. (vect_do_peeling_for_alignment): Delete.