From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15389 invoked by alias); 14 Jul 2015 11:00:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 15333 invoked by uid 89); 14 Jul 2015 11:00:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f177.google.com Received: from mail-ob0-f177.google.com (HELO mail-ob0-f177.google.com) (209.85.214.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 14 Jul 2015 11:00:10 +0000 Received: by obbgp5 with SMTP id gp5so3670583obb.0 for ; Tue, 14 Jul 2015 04:00:08 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.89.194 with SMTP id n185mr14001850oib.46.1436871608682; Tue, 14 Jul 2015 04:00:08 -0700 (PDT) Received: by 10.76.115.167 with HTTP; Tue, 14 Jul 2015 04:00:08 -0700 (PDT) In-Reply-To: References: Date: Tue, 14 Jul 2015 11:00:00 -0000 Message-ID: Subject: Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization. From: Richard Biener To: Yuri Rumyantsev Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-07/txt/msg01131.txt.bz2 On Mon, Jun 29, 2015 at 6:15 PM, Yuri Rumyantsev wrote: > Hi All, > > Here is updated patch containing missed change in > slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses > in inner loop. Ok. Thanks, Richard. > ChangeLog: > 2015-06-29 Yuri Rumyantsev > > * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument > to allow renaming of PHI arguments on edges incoming from outer > loop header, add corresponding check before start PHI iterator. > (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool > variable DUPLICATE_OUTER_LOOP and set it to true for outer loops > with true force_vectorize. Set-up dominator for outer loop too. > Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. > (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it > was marked with force_vectorize and has restricted cfg. > (slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in > inner loop. > * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not > do peeling for outer loops. > > gcc/testsuite/ChangeLog: > * gcc.dg/vect/vect-outer-simd-2.c: New test. > > 2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev : >> Richard, >> >> I attached updated patch. >> You asked me to explain why I did changes for renaming. >> If we do not change PHI arguments for inner loop header we can get the >> following IR: >> source outer loop: >> : outer-loop header >> # i_45 = PHI <0(4), i_18(9)> >> # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)> >> >> :inner-loop header >> # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)> >> >> after duplication we have (without argument correction): >> : >> # i_74 = PHI >> # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)> >> >> : >> # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)> >> >> and later we get verifier error: >> test1.c:20:6: error: definition in block 6 does not dominate use in block 10 >> void foo (int n) >> ^ >> for SSA_NAME: .MEM_17 in statement: >> .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)> >> >> and you can see that we need to rename MEM_17 argument for out-coming >> edge to MEM_73 since >> MEM_17 was converted to MEM_73 in outer-loop header. >> >> This explains my simple fix in rename_variables_in_bb. >> Note also that loop distribution is not performed for outer loops. >> >> I also did a change in slpeel_can_duplicate_loop_p to simplify check. >> >> Any comments will be appreciated. >> >> Yuri. >> >> 2015-06-17 15:24 GMT+03:00 Richard Biener : >>> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev wrote: >>>> Thanks a lot Richard for your review. >>>> >>>> I presented updated patch which is not gated by force_vectorize. >>>> I added test on outer-loop in vect_enhance_data_refs_alignment >>>> and it returns false for it because we can not improve dr alighment >>>> through outer-loop peeling in general. So I assume that only >>>> versioning for alignment can be applied for targets do not support >>>> unaligned memory access. >>> >>> @@ -998,7 +998,12 @@ >>> gimple stmt = DR_STMT (dr); >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> tree vectype = STMT_VINFO_VECTYPE (stmt_info); >>> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >>> >>> + /* Peeling of outer loops can't improve alignment. */ >>> + if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner) >>> + return false; >>> + >>> >>> but this looks wrong. It depends on the context (DRs in the outer loop >>> can improve alignment by peeling the outer loop and we can still >>> peel the inner loop for alignment). >>> >>> So IMHO the correct place to amend is vect_enhance_data_refs_alignment >>> (which it seems doesn't consider peeling the inner loop). >>> >>> I'd say you should simply add >>> >>> || loop->inner) >>> >>> to the >>> >>> /* Check if we can possibly peel the loop. */ >>> if (!vect_can_advance_ivs_p (loop_vinfo) >>> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) >>> do_peeling = false; >>> >>> check. >>> >>>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p >>>> as you proposed since it is not called outside vectorization. >>> >>> There is still no reason for this complex condition, so please remove it. >>> >>> _Please_ also generate diffs with -p, it is very tedious to see patch hunks >>> without a function name context. >>> >>> You didn't explain why you needed the renaming changes as I don't >>> remember needing it when using the code from loop distribution. >>> >>>> I also noticed one not-resolved issue with outer-loop peeling - we don't >>>> consider remainder for possible vectorization of inner-loop as we can see >>>> on the following example: >>>> >>>> for (i = 0; i < n; i++) { >>>> diff = 0; >>>> for (j = 0; j < M; j++) { >>>> diff += in[j+i]*coeff[j]; >>>> } >>>> out[i] = diff; >>>> } >>>> >>>> Is it worth to fix it? >>> >>> You mean vectorizing the inner loop in the niter peel epilogue loop >>> of the outer loop? Possibly yes. >>> >>> Thanks, >>> Richard. >>> >>>> 2015-06-16 Yuri Rumyantsev >>>> >>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >>>> to allow renaming of PHI arguments on edges incoming from outer >>>> loop header, add corresponding check before start PHI iterator. >>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >>>> with true force_vectorize. Set-up dominator for outer loop too. >>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >>>> was marked with force_vectorize and has restricted cfg. >>>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can >>>> not be reachable for outer loops. >>>> (vect_enhance_data_refs_alignment): Add test on true value of >>>> do_peeling. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.dg/vect/vect-outer-simd-2.c: New test. >>>> >>>> 2015-06-09 16:26 GMT+03:00 Richard Biener : >>>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev wrote: >>>>>> Hi All, >>>>>> >>>>>> Here is a simple fix which allows duplication of outer loops to >>>>>> perform peeling for number of iterations if outer loop is marked with >>>>>> pragma omp simd. >>>>>> >>>>>> Bootstrap and regression testing did not show any new failures. >>>>>> Is it OK for trunk? >>>>> >>>>> Hmm, I don't remember needing to adjust rename_variables_in_bb >>>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg >>>>> on non-innermost loops... (I just copied, I never called >>>>> slpeel_can_duplicate_loop_p though). >>>>> >>>>> So - you should just remove the loop->inner condition from >>>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer >>>>> code as well (yeah, I never merged the nested loop support >>>>> for loop distribution...). >>>>> >>>>> Index: tree-vect-loop.c >>>>> =================================================================== >>>>> --- tree-vect-loop.c (revision 224100) >>>>> +++ tree-vect-loop.c (working copy) >>>>> @@ -1879,6 +1879,10 @@ >>>>> return false; >>>>> } >>>>> >>>>> + /* Peeling for alignment is not supported for outer-loop vectorization. */ >>>>> + if (LOOP_VINFO_LOOP (loop_vinfo)->inner) >>>>> + LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0; >>>>> >>>>> I think you can't simply do this - if vect_enhance_data_refs_alignment >>>>> decided to peel for alignment then it has adjusted the DRs alignment >>>>> info already. So instead of the above simply disallow peeling for >>>>> alignment in vect_enhance_data_refs_alignment? Thus add >>>>> || ->inner to >>>>> >>>>> /* Check if we can possibly peel the loop. */ >>>>> if (!vect_can_advance_ivs_p (loop_vinfo) >>>>> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))) >>>>> do_peeling = false; >>>>> >>>>> ? >>>>> >>>>> I also can't see why the improvement has to be gated on force_vect, >>>>> it surely looks profitable to enable more outer loop vectorization in >>>>> general, no? >>>>> >>>>> How do the cost model calculations end up with peeling the outer loop >>>>> for niter? >>>>> >>>>> On targets which don't support unaligned accesses we're left with >>>>> versioning for alignment. Isn't peeling for alignment better there? >>>>> Thus only disallow peeling for alignment if there is no unhandled >>>>> alignment? >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> ChangeLog: >>>>>> >>>>>> 2015-06-08 Yuri Rumyantsev >>>>>> >>>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument >>>>>> to allow renaming of PHI arguments on edges incoming from outer >>>>>> loop header, add corresponding check before start PHI iterator. >>>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool >>>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops >>>>>> with true force_vectorize. Set-up dominator for outer loop too. >>>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb. >>>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it >>>>>> was marked with force_vectorize and has restricted cfg. >>>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling >>>>>> for outer loops. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.