Attached below is the updated patch (part 1, updated to a more recent snapshot) Bootstrpped on powerpc64-linux, bootstrapped with vectorization enabled on i386-linux, and tested on the vectorizer testcases. dorit (See attached file: updated-outerloop-patch1.txt) > Hi, > This patch is the first part of > http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds initial > support for outer-loop vectorization. It basicaly brings over this patch: > http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html, along with some > fixes that went in later. > This patch can vectorize outer-loops only if there are no memory-references > in the inner-loop. > The patch includes the following changes to the vectorizer: > 1) So far we supported single-BB loops (+empty latch), so the order by > which we traversed the loop BBs did not matter. Now, it does - we sort in > BBs in dfs order (since we don't allow if's in the loop, this should > guarantee visiting defs before their uses). > 2) vect_analyze_loop_form was extend to allow a restricted form of > outer-loops. We currently support doubly-nested loops that consist of a > header, a single inner(most)-loop, a tail, and an empty latch (5 BBs all > together). > 3) vect_analyze_loop_form calls a new function - vect_analyze_loop_1 - to > do a few analyses on the inner-loop (currently only one analysis: > analyze_loop_form), and to build a loop_info for the inner-loop. It is > destroyed soon after, but w/o destroying the stmt_info's that were set up > for the inner-loop stmts. Maybe later we'll keep the inner-loop_info > around, if needed. > 4) Support for outer-loops breaks the assumption that phi nodes are only in > the loop-header, and represent a scalar-cycle (induction or reduction). In > outer-loops we also have phi-nodes inside the loop - these are the > loop-closed phis after the inner-loop. This required a way to distinguish > between these two kinds of phis (we use 'is_loop_header_bb_p' for that), > and a few small changes in several places: > o new_stmt_vec_info: different def-type initialization for the two kinds of > phis > o vect_is_simple_reduction: the uses that are not the reduction-variable > can now be defined by a phi, though not a loop-header phi. > o vect_recog_dot_prod_pattern: a vect_loop_def might be a phi, and not > necessarily a gimple_modify_stmt. > o vect_get_vec_def_for_oprnd: a vect_loop_def can be a phi node, and not > necessarily a gimple_modify_stmt. > 5) the enum "relevant" has two new values - > vect_used_in_outer[_by_reduction], which are propagated during the > mark_relevant pass. > 6) since we don't yet support multiple-data-types in the inner-loop, we > check in all relevant places, that this is not the case. > The more significant changes are to vectorization of reduction and > induction. In both cases we need to be aware of whether the > induction/reduction-phi that we are vectorizing is in the same nest that is > being vectorized, or is 'nested_in_vect_loop' (is inside the inner-loop > while vectorizing the outer-loop): > 7) vectorization of induction: In get_initial_def_for_induction, if this is > a 'nested_in_vect_loop' case, then: > o the initialization vector can be obtained using > vect_get_vec_def_for_operand (does not need to be built from scratch). > o the vector that holds the step of the vectorized induction is {S,S,S,S} > rather than {VF*S,VF*S,VF*S,VF*S} (where S is the step of the induction), > because in the vectorized inner-loop we are advancing sequentially (though > in parallel for VF outer-loop iterations). > o the final vector for inductions is recorded in the corresponding > loop-exit phi (of the inner-loop) so that we can easily obtain it when we > vectorize stmts in the outer-loop that use it. > 8) vectorization of reduction: The main thing here is that we don't need to > reduce the reduction to a single result; the final vector of partial > results will feed the vector operations that may use it in the outer-loop. > So: > o In get_initial_def_for_reduction, we may return a vector for the epilog > adjustment, rather than a scalar. > o epilog_for_reduction - skip the part that computes the final scalar > result in case this is a 'nested_in_vect_loop' case. > o and in vectorizable_reduction, we don't check that the reduction is > LIVE_P anymore (used out of the loop), cause it may be not used outside the > (outer) loop, but used inside the outer-loop (so as far as the inner-loop > reduction is concerned, it is used_in_outer_loop, but not live). > Bootstrpped on powerpc64-linux, > bootstrapped with vectorization enabled on i386-linux, > passed full regression testing on both platforms. > I will wait at least a week to give people a chance to review and comment. > thanks, > dorit > ChangeLog: > * tree-vectorizer.h (vect_is_simple_reduction): Takes a > loop_vec_info > as argument instead of struct loop. > (nested_in_vect_loop_p): New function. > (vect_relevant): Add enum values vect_used_in_outer_by_reduction > and > vect_used_in_outer. > (is_loop_header_bb_p): New. Used to differentiate loop-header phis > from other phis in the loop. > (destroy_loop_vec_info): Add additional argument to declaration. > > * tree-vectorizer.c (supportable_widening_operation): Also check if > nested_in_vect_loop_p (don't allow changing the order in this > case). > (vect_is_simple_reduction): Takes a loop_vec_info as argument > instead > of struct loop. Call nested_in_vect_loop_p and don't require > flag_unsafe_math_optimizations if it returns true. > * tree-vectorizer.c (new_stmt_vec_info): When setting def_type for > phis differentiate loop-header phis from other phis. > (bb_in_loop_p): New function. > (new_loop_vec_info): Inner-loop phis already have a stmt_vinfo, so > just > update their loop_vinfo. Order of BB traversal now matters - call > dfs_enumerate_from with bb_in_loop_p. > (destroy_loop_vec_info): Takes additional argument to control > whether > stmt_vinfo of the loop stmts should be destroyed as well. > (vect_is_simple_reduction): Allow the "non-reduction" use of a > reduction stmt to be defines by a non loop-header phi. > (vectorize_loops): Call destroy_loop_vec_info with additional > argument. > * tree-vect-transform.c (vectorizable_reduction): Call > nested_in_vect_loop_p. Check for multitypes in the inner-loop. > (vectorizable_call): Likewise. > (vectorizable_conversion): Likewise. > (vectorizable_operation): Likewise. > (vectorizable_type_promotion): Likewise. > (vectorizable_type_demotion): Likewise. > (vectorizable_store): Likewise. > (vectorizable_live_operation): Likewise. > (vectorizable_reduction): Likewise. Also pass loop_info to > vect_is_simple_reduction instead of loop. > (vect_init_vector): Call nested_in_vect_loop_p. > (get_initial_def_for_reduction): Likewise. > (vect_create_epilog_for_reduction): Likewise. > (vect_init_vector): Check which loop to work with, in case there's > an > inner-loop. > (get_initial_def_for_inducion): Extend to handle outer-loop > vectorization. Fix indentation. > (vect_get_vec_def_for_operand): Support phis in the case > vect_loop_def. > In the case vect_induction_def get the vector def from the > induction > phi node, instead of calling get_initial_def_for_inducion. > (get_initial_def_for_reduction): Extend to handle outer-loop > vectorization. > (vect_create_epilog_for_reduction): Extend to handle outer-loop > vectorization. > (vect_transform_loop): Change assert to just skip this case. Add a > dump printout. > (vect_finish_stmt_generation): Add a couple asserts. > > (vect_estimate_min_profitable_iters): Multiply > cost of inner-loop stmts (in outer-loop vectorization) by estimated > inner-loop bound. > (vect_model_reduction_cost): Don't add reduction epilogue cost in > case > this is an inner-loop reduction in outer-loop vectorization. > > * tree-vect-analyze.c (vect_analyze_scalar_cycles_1): New function. > Same code as what used to be vect_analyze_scalar_cycles, only with > additional argument loop, and loop_info passed to > vect_is_simple_reduction instead of loop. > (vect_analyze_scalar_cycles): Code factored out into > vect_analyze_scalar_cycles_1. Call it for each relevant loop-nest. > Updated documentation. > (analyze_operations): Check for inner-loop loop-closed exit-phis > during > outer-loop vectorization that are live or not used in the > outerloop, > cause this requires special handling. > (vect_enhance_data_refs_alignment): Don't consider versioning for > nested-loops. > (vect_analyze_data_refs): Check that there are no datarefs in the > inner-loop. > (vect_mark_stmts_to_be_vectorized): Also consider > vect_used_in_outer > and vect_used_in_outer_by_reduction cases. > (process_use): Also consider the case of outer-loop stmt defining > an > inner-loop stmt and vice versa. > (vect_analyze_loop_1): New function. > (vect_analyze_loop_form): Extend, to allow a restricted form of > nested > loops. Call vect_analyze_loop_1. > (vect_analyze_loop): Skip (inner-)loops within outer-loops that > have > been vectorized. Call destroy_loop_vec_info with additional > argument. > * tree-vect-patterns.c (vect_recog_widen_sum_pattern): Don't allow > in the inner-loop when doing outer-loop vectorization. Add > documentation and printout. > (vect_recog_dot_prod_pattern): Likewise. Also add check for > GIMPLE_MODIFY_STMT (in case we encounter a phi in the loop). > > testsuite/ChangeLog: > * gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop > and -fno-tree-reassoc. > * gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c: Moved to... > * gcc.dg/vect/no-scevccp-vect-iv-1.c: New test. > * gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c: Moved to... > * gcc.dg/vect/no-scevccp-vect-iv-2.c: New test. > * gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: New test. > * gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: New test. > * gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: New test. > * gcc.dg/vect/no-scevccp-noreassoc-outer-4.c: New test. > * gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: New test. > * gcc.dg/vect/no-scevccp-outer-1.c: New test. > * gcc.dg/vect/no-scevccp-outer-2.c: New test. > * gcc.dg/vect/no-scevccp-outer-3.c: New test. > * gcc.dg/vect/no-scevccp-outer-4.c: New test. > * gcc.dg/vect/no-scevccp-outer-5.c: New test. > * gcc.dg/vect/no-scevccp-outer-6.c: New test. > * gcc.dg/vect/no-scevccp-outer-7.c: New test. > * gcc.dg/vect/no-scevccp-outer-8.c: New test. > * gcc.dg/vect/no-scevccp-outer-9.c: New test. > * gcc.dg/vect/no-scevccp-outer-9a.c: New test. > * gcc.dg/vect/no-scevccp-outer-9b.c: New test. > * gcc.dg/vect/no-scevccp-outer-10.c: New test. > * gcc.dg/vect/no-scevccp-outer-10a.c: New test. > * gcc.dg/vect/no-scevccp-outer-10b.c: New test. > * gcc.dg/vect/no-scevccp-outer-11.c: New test. > * gcc.dg/vect/no-scevccp-outer-12.c: New test. > * gcc.dg/vect/no-scevccp-outer-13.c: New test. > * gcc.dg/vect/no-scevccp-outer-14.c: New test. > * gcc.dg/vect/no-scevccp-outer-15.c: New test. > * gcc.dg/vect/no-scevccp-outer-16.c: New test. > * gcc.dg/vect/no-scevccp-outer-17.c: New test. > * gcc.dg/vect/no-scevccp-outer-18.c: New test. > * gcc.dg/vect/no-scevccp-outer-19.c: New test. > * gcc.dg/vect/no-scevccp-outer-20.c: New test. > * gcc.dg/vect/no-scevccp-outer-21.c: New test. > * gcc.dg/vect/no-scevccp-outer-22.c: New test. > > (See attached file: mainlineouterloopdiff1t.txt) > > #### mainlineouterloopdiff1t.txt has been deleted (was already in > repository MyAttachments Repository ->) from this note on 11 August > 2007 by Dorit Nuzman