Hi All, Here is updated patch containing missed change in slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses in inner loop. ChangeLog: 2015-06-29 Yuri Rumyantsev * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument to allow renaming of PHI arguments on edges incoming from outer loop header, add corresponding check before start PHI iterator. (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool variable DUPLICATE_OUTER_LOOP and set it to true for outer loops with true force_vectorize. Set-up dominator for outer loop too. Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it was marked with force_vectorize and has restricted cfg. (slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in inner loop. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not do peeling for outer loops. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-outer-simd-2.c: New test. 2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev : > Richard, > > I attached updated patch. > You asked me to explain why I did changes for renaming. > If we do not change PHI arguments for inner loop header we can get the > following IR: > source outer loop: > : outer-loop header > # i_45 = PHI <0(4), i_18(9)> > # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)> > > :inner-loop header > # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)> > > after duplication we have (without argument correction): > : > # i_74 = PHI > # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)> > > : > # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)> > > and later we get verifier error: > test1.c:20:6: error: definition in block 6 does not dominate use in block 10 > void foo (int n) > ^ > for SSA_NAME: .MEM_17 in statement: > .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)> > > and you can see that we need to rename MEM_17 argument for out-coming > edge to MEM_73 since > MEM_17 was converted to MEM_73 in outer-loop header. > > This explains my simple fix in rename_variables_in_bb. > Note also that loop distribution is not performed for outer loops. > > I also did a change in slpeel_can_duplicate_loop_p to simplify check. > > Any comments will be appreciated. > > Yuri. > > 2015-06-17 15:24 GMT+03:00 Richard Biener : >> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev wrote: >>> Thanks a lot Richard for your review. >>> >>> I presented updated patch which is not gated by force_vectorize. >>> I added test on outer-loop in vect_enhance_data_refs_alignment >>> and it returns false for it because we can not improve dr alighment >>> through outer-loop peeling in general. So I assume that only >>> versioning for alignment can be applied for targets do not support >>> unaligned memory access. >> >> @@ -998,7 +998,12 @@ >> gimple stmt = DR_STMT (dr); >> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> tree vectype = STMT_VINFO_VECTYPE (stmt_info); >> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> >> + /* Peeling of outer loops can't improve alignment. */ >> + if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner) >> + return false; >> + >> >> but this looks wrong. It depends on the context (DRs in the outer loop >> can improve alignment by peeling the outer loop and we can still >> peel the inner loop for alignment). >> >> So IMHO the correct place to amend is vect_enhance_data_refs_alignment >> (which it seems doesn't consider peeling the inner loop). >> >> I'd say you should simply add >> >> || loop->inner) >> >> to the >> >> /* Check if we can possibly peel the loop. */ >> if (!vect_can_advance_ivs_p (loop_vinfo) >> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) >> do_peeling = false; >> >> check. >> >>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p >>> as you proposed since it is not called outside vectorization. >> >> There is still no reason for this complex condition, so please remove it. >> >> _Please_ also generate diffs with -p, it is very tedious to see patch hunks >> without a function name context. >> >> You didn't explain why you needed the renaming changes as I don't >> remember needing it when using the code from loop distribution. >> >>> I also noticed one not-resolved issue with outer-loop peeling - we don't >>> consider remainder for possible vectorization of inner-loop as we can see >>> on the following example: >>> >>> for (i = 0; i < n; i++) { >>> diff = 0; >>> for (j = 0; j < M; j++) { >>> diff += in[j+i]*coeff[j]; >>> } >>> out[i] = diff; >>> } >>> >>> Is it worth to fix it? >> >> You mean vectorizing the inner loop in the niter peel epilogue loop >> of the outer loop? Possibly yes. >> >> Thanks, >> Richard. >> >>> 2015-06-16 Yuri Rumyantsev >>> >>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >>> to allow renaming of PHI arguments on edges incoming from outer >>> loop header, add corresponding check before start PHI iterator. >>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >>> with true force_vectorize. Set-up dominator for outer loop too. >>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >>> was marked with force_vectorize and has restricted cfg. >>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can >>> not be reachable for outer loops. >>> (vect_enhance_data_refs_alignment): Add test on true value of >>> do_peeling. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.dg/vect/vect-outer-simd-2.c: New test. >>> >>> 2015-06-09 16:26 GMT+03:00 Richard Biener : >>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev wrote: >>>>> Hi All, >>>>> >>>>> Here is a simple fix which allows duplication of outer loops to >>>>> perform peeling for number of iterations if outer loop is marked with >>>>> pragma omp simd. >>>>> >>>>> Bootstrap and regression testing did not show any new failures. >>>>> Is it OK for trunk? >>>> >>>> Hmm, I don't remember needing to adjust rename_variables_in_bb >>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg >>>> on non-innermost loops... (I just copied, I never called >>>> slpeel_can_duplicate_loop_p though). >>>> >>>> So - you should just remove the loop->inner condition from >>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer >>>> code as well (yeah, I never merged the nested loop support >>>> for loop distribution...). >>>> >>>> Index: tree-vect-loop.c >>>> =================================================================== >>>> --- tree-vect-loop.c (revision 224100) >>>> +++ tree-vect-loop.c (working copy) >>>> @@ -1879,6 +1879,10 @@ >>>> return false; >>>> } >>>> >>>> + /* Peeling for alignment is not supported for outer-loop vectorization. */ >>>> + if (LOOP_VINFO_LOOP (loop_vinfo)->inner) >>>> + LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0; >>>> >>>> I think you can't simply do this - if vect_enhance_data_refs_alignment >>>> decided to peel for alignment then it has adjusted the DRs alignment >>>> info already. So instead of the above simply disallow peeling for >>>> alignment in vect_enhance_data_refs_alignment? Thus add >>>> || ->inner to >>>> >>>> /* Check if we can possibly peel the loop. */ >>>> if (!vect_can_advance_ivs_p (loop_vinfo) >>>> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) >>>> do_peeling = false; >>>> >>>> ? >>>> >>>> I also can't see why the improvement has to be gated on force_vect, >>>> it surely looks profitable to enable more outer loop vectorization in >>>> general, no? >>>> >>>> How do the cost model calculations end up with peeling the outer loop >>>> for niter? >>>> >>>> On targets which don't support unaligned accesses we're left with >>>> versioning for alignment. Isn't peeling for alignment better there? >>>> Thus only disallow peeling for alignment if there is no unhandled >>>> alignment? >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> ChangeLog: >>>>> >>>>> 2015-06-08 Yuri Rumyantsev >>>>> >>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >>>>> to allow renaming of PHI arguments on edges incoming from outer >>>>> loop header, add corresponding check before start PHI iterator. >>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >>>>> with true force_vectorize. Set-up dominator for outer loop too. >>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >>>>> was marked with force_vectorize and has restricted cfg. >>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling >>>>> for outer loops. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.