Thanks a lot Richard for your review. I presented updated patch which is not gated by force_vectorize. I added test on outer-loop in vect_enhance_data_refs_alignment and it returns false for it because we can not improve dr alighment through outer-loop peeling in general. So I assume that only versioning for alignment can be applied for targets do not support unaligned memory access. I did not change tests for outer loops in slpeel_can_duplicate_loop_p as you proposed since it is not called outside vectorization. I also noticed one not-resolved issue with outer-loop peeling - we don't consider remainder for possible vectorization of inner-loop as we can see on the following example: for (i = 0; i < n; i++) { diff = 0; for (j = 0; j < M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } Is it worth to fix it? 2015-06-16 Yuri Rumyantsev * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument to allow renaming of PHI arguments on edges incoming from outer loop header, add corresponding check before start PHI iterator. (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool variable DUPLICATE_OUTER_LOOP and set it to true for outer loops with true force_vectorize. Set-up dominator for outer loop too. Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it was marked with force_vectorize and has restricted cfg. * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can not be reachable for outer loops. (vect_enhance_data_refs_alignment): Add test on true value of do_peeling. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-outer-simd-2.c: New test. 2015-06-09 16:26 GMT+03:00 Richard Biener : > On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev wrote: >> Hi All, >> >> Here is a simple fix which allows duplication of outer loops to >> perform peeling for number of iterations if outer loop is marked with >> pragma omp simd. >> >> Bootstrap and regression testing did not show any new failures. >> Is it OK for trunk? > > Hmm, I don't remember needing to adjust rename_variables_in_bb > when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg > on non-innermost loops... (I just copied, I never called > slpeel_can_duplicate_loop_p though). > > So - you should just remove the loop->inner condition from > slpeel_can_duplicate_loop_p as it is used by non-vectorizer > code as well (yeah, I never merged the nested loop support > for loop distribution...). > > Index: tree-vect-loop.c > =================================================================== > --- tree-vect-loop.c (revision 224100) > +++ tree-vect-loop.c (working copy) > @@ -1879,6 +1879,10 @@ > return false; > } > > + /* Peeling for alignment is not supported for outer-loop vectorization. */ > + if (LOOP_VINFO_LOOP (loop_vinfo)->inner) > + LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0; > > I think you can't simply do this - if vect_enhance_data_refs_alignment > decided to peel for alignment then it has adjusted the DRs alignment > info already. So instead of the above simply disallow peeling for > alignment in vect_enhance_data_refs_alignment? Thus add > || ->inner to > > /* Check if we can possibly peel the loop. */ > if (!vect_can_advance_ivs_p (loop_vinfo) > || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) > do_peeling = false; > > ? > > I also can't see why the improvement has to be gated on force_vect, > it surely looks profitable to enable more outer loop vectorization in > general, no? > > How do the cost model calculations end up with peeling the outer loop > for niter? > > On targets which don't support unaligned accesses we're left with > versioning for alignment. Isn't peeling for alignment better there? > Thus only disallow peeling for alignment if there is no unhandled > alignment? > > Thanks, > Richard. > >> ChangeLog: >> >> 2015-06-08 Yuri Rumyantsev >> >> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >> to allow renaming of PHI arguments on edges incoming from outer >> loop header, add corresponding check before start PHI iterator. >> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >> with true force_vectorize. Set-up dominator for outer loop too. >> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >> was marked with force_vectorize and has restricted cfg. >> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling >> for outer loops. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/vect/vect-outer-simd-2.c: New test.