Richard, I attached updated patch. You asked me to explain why I did changes for renaming. If we do not change PHI arguments for inner loop header we can get the following IR: source outer loop: : outer-loop header # i_45 = PHI <0(4), i_18(9)> # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)> :inner-loop header # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)> after duplication we have (without argument correction): : # i_74 = PHI # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)> : # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)> and later we get verifier error: test1.c:20:6: error: definition in block 6 does not dominate use in block 10 void foo (int n) ^ for SSA_NAME: .MEM_17 in statement: .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)> and you can see that we need to rename MEM_17 argument for out-coming edge to MEM_73 since MEM_17 was converted to MEM_73 in outer-loop header. This explains my simple fix in rename_variables_in_bb. Note also that loop distribution is not performed for outer loops. I also did a change in slpeel_can_duplicate_loop_p to simplify check. Any comments will be appreciated. Yuri. 2015-06-17 15:24 GMT+03:00 Richard Biener : > On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev wrote: >> Thanks a lot Richard for your review. >> >> I presented updated patch which is not gated by force_vectorize. >> I added test on outer-loop in vect_enhance_data_refs_alignment >> and it returns false for it because we can not improve dr alighment >> through outer-loop peeling in general. So I assume that only >> versioning for alignment can be applied for targets do not support >> unaligned memory access. > > @@ -998,7 +998,12 @@ > gimple stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > tree vectype = STMT_VINFO_VECTYPE (stmt_info); > + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > > + /* Peeling of outer loops can't improve alignment. */ > + if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner) > + return false; > + > > but this looks wrong. It depends on the context (DRs in the outer loop > can improve alignment by peeling the outer loop and we can still > peel the inner loop for alignment). > > So IMHO the correct place to amend is vect_enhance_data_refs_alignment > (which it seems doesn't consider peeling the inner loop). > > I'd say you should simply add > > || loop->inner) > > to the > > /* Check if we can possibly peel the loop. */ > if (!vect_can_advance_ivs_p (loop_vinfo) > || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) > do_peeling = false; > > check. > >> I did not change tests for outer loops in slpeel_can_duplicate_loop_p >> as you proposed since it is not called outside vectorization. > > There is still no reason for this complex condition, so please remove it. > > _Please_ also generate diffs with -p, it is very tedious to see patch hunks > without a function name context. > > You didn't explain why you needed the renaming changes as I don't > remember needing it when using the code from loop distribution. > >> I also noticed one not-resolved issue with outer-loop peeling - we don't >> consider remainder for possible vectorization of inner-loop as we can see >> on the following example: >> >> for (i = 0; i < n; i++) { >> diff = 0; >> for (j = 0; j < M; j++) { >> diff += in[j+i]*coeff[j]; >> } >> out[i] = diff; >> } >> >> Is it worth to fix it? > > You mean vectorizing the inner loop in the niter peel epilogue loop > of the outer loop? Possibly yes. > > Thanks, > Richard. > >> 2015-06-16 Yuri Rumyantsev >> >> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >> to allow renaming of PHI arguments on edges incoming from outer >> loop header, add corresponding check before start PHI iterator. >> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >> with true force_vectorize. Set-up dominator for outer loop too. >> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >> was marked with force_vectorize and has restricted cfg. >> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can >> not be reachable for outer loops. >> (vect_enhance_data_refs_alignment): Add test on true value of >> do_peeling. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/vect/vect-outer-simd-2.c: New test. >> >> 2015-06-09 16:26 GMT+03:00 Richard Biener : >>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev wrote: >>>> Hi All, >>>> >>>> Here is a simple fix which allows duplication of outer loops to >>>> perform peeling for number of iterations if outer loop is marked with >>>> pragma omp simd. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> Is it OK for trunk? >>> >>> Hmm, I don't remember needing to adjust rename_variables_in_bb >>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg >>> on non-innermost loops... (I just copied, I never called >>> slpeel_can_duplicate_loop_p though). >>> >>> So - you should just remove the loop->inner condition from >>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer >>> code as well (yeah, I never merged the nested loop support >>> for loop distribution...). >>> >>> Index: tree-vect-loop.c >>> =================================================================== >>> --- tree-vect-loop.c (revision 224100) >>> +++ tree-vect-loop.c (working copy) >>> @@ -1879,6 +1879,10 @@ >>> return false; >>> } >>> >>> + /* Peeling for alignment is not supported for outer-loop vectorization. */ >>> + if (LOOP_VINFO_LOOP (loop_vinfo)->inner) >>> + LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0; >>> >>> I think you can't simply do this - if vect_enhance_data_refs_alignment >>> decided to peel for alignment then it has adjusted the DRs alignment >>> info already. So instead of the above simply disallow peeling for >>> alignment in vect_enhance_data_refs_alignment? Thus add >>> || ->inner to >>> >>> /* Check if we can possibly peel the loop. */ >>> if (!vect_can_advance_ivs_p (loop_vinfo) >>> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) >>> do_peeling = false; >>> >>> ? >>> >>> I also can't see why the improvement has to be gated on force_vect, >>> it surely looks profitable to enable more outer loop vectorization in >>> general, no? >>> >>> How do the cost model calculations end up with peeling the outer loop >>> for niter? >>> >>> On targets which don't support unaligned accesses we're left with >>> versioning for alignment. Isn't peeling for alignment better there? >>> Thus only disallow peeling for alignment if there is no unhandled >>> alignment? >>> >>> Thanks, >>> Richard. >>> >>>> ChangeLog: >>>> >>>> 2015-06-08 Yuri Rumyantsev >>>> >>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >>>> to allow renaming of PHI arguments on edges incoming from outer >>>> loop header, add corresponding check before start PHI iterator. >>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >>>> with true force_vectorize. Set-up dominator for outer loop too. >>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >>>> was marked with force_vectorize and has restricted cfg. >>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling >>>> for outer loops. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.