From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6109 invoked by alias); 8 Oct 2015 12:31:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 6087 invoked by uid 89); 8 Oct 2015 12:31:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f181.google.com Received: from mail-yk0-f181.google.com (HELO mail-yk0-f181.google.com) (209.85.160.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 08 Oct 2015 12:31:38 +0000 Received: by ykft14 with SMTP id t14so46885239ykf.0 for ; Thu, 08 Oct 2015 05:31:36 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.129.49.149 with SMTP id x143mr5429791ywx.147.1444307496262; Thu, 08 Oct 2015 05:31:36 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Thu, 8 Oct 2015 05:31:35 -0700 (PDT) In-Reply-To: References: Date: Thu, 08 Oct 2015 12:31:00 -0000 Message-ID: Subject: Re: [PATCH] Unswitching outer loops. From: Richard Biener To: Yuri Rumyantsev Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00823.txt.bz2 On Wed, Oct 7, 2015 at 5:26 PM, Yuri Rumyantsev wrote: > Richard, > > I noticed that 'gimple' type was changed and send you updated patch. Ok. Thanks, Richard. > Thanks. > Yuri. > > 2015-10-07 12:53 GMT+03:00 Yuri Rumyantsev : >> Richard, >> >> I've fixed adding virtual phi argument and add check on irreducible basic block. >> New patch is attached. >> >> I checked it for bootstrap and regression testing, no new failures. >> >> ChangeLog: >> 2015-10-07 Yuri Rumyantsev >> >> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >> "cfghooks.h", add prototypes for introduced new functions. >> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >> checks on ability of loop unswitching to tree_unswitch_single_loop; >> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >> on innermost loop check. >> (tree_unswitch_single_loop): Add all required checks on ability of >> loop unswitching under zero recursive level guard. >> (tree_unswitch_outer_loop): New function. >> (find_loop_guard): Likewise. >> (empty_bb_without_guard_p): Likewise. >> (used_outside_loop_p): Likewise. >> (get_vop_from_header): Likewise. >> (hoist_guard): Likewise. >> (check_exit_phi): Likewise. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/loop-unswitch-2.c: New test. >> * gcc.dg/loop-unswitch-3.c: Likewise. >> * gcc.dg/loop-unswitch-4.c: Likewise. >> >> >> 2015-10-06 15:21 GMT+03:00 Richard Biener : >>> On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev wrote: >>>> Richard, >>>> >>>> Here is updated patch which reflects almost all your remarks: >>>> 1. Use ordinary get_loop_body. >>>> 2. Delete useless asserts. >>>> 3. Use check on iterated loop instead of finite_loop_p. >>>> 4. Do not update CFG by adjusting the CONDs condition to always true/false. >>>> 5. Add couple tests. >>> >>> + /* Add NEW_ADGE argument for all phi in post-header block. */ >>> + bb = exit->dest; >>> + for (gphi_iterator gsi = gsi_start_phis (bb); >>> + !gsi_end_p (gsi); gsi_next (&gsi)) >>> + { >>> + gphi *phi = gsi.phi (); >>> + /* edge_iterator ei; */ >>> + tree arg; >>> + if (virtual_operand_p (gimple_phi_result (phi))) >>> + { >>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>> >>> now I know what confused me - here you are looking at a loop exit PHI >>> but querying with the preheader edge index. I think you need to walk >>> the loop header PHIs to find the PHI for the virtual operand and use that >>> to get the PHI arg from? >>> >>> The side-effect / used-outside code is still the same. What matters >>> is side-effects outside of the loop-header protected code region, not >>> blocks excluding the inner loop. Say, >>> >>> for (;;) >>> { >>> if (invariant-guard) >>> { >>> printf ("Blah"); >>> for (;;) >>> ; >>> } >>> } >>> >>> would still ok to be unswitched. So instead of >>> >>> + if (body[i]->loop_father != loop) >>> + continue; >>> >>> it would be >>> >>> if (dominated_by_p (CDI_DOMINATORS, body[i], header) >>> && !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest)) >>> >>> with the obvious improvement to the patch to not only consider header checks >>> in the outer loop header but in the pre-header block of the inner loop. >>> >>> And I still think you should walk the exit PHIs args to see whether they >>> are defined in the non-guarded region of the outer loop instead of walking >>> all uses of all defs. >>> >>> Note that I think you miss endless loops as side-effects if that endless >>> loop occurs through a irreducible region (thus not reflected in the >>> loop tree). Thus you should reject BB_IRREDUCIBLE_LOOP blocks >>> in the non-guarded region as well. >>> >>> It seems to me that protecting adjacent loops with a single guard is >>> also eligible for hoisting thus the restriction on loop->inner->next >>> should become a restriction on no loops (or irreducible regions) >>> in the non-guarded region. >>> >>> Most things can be improved as followup, but at least the >>> virtual PHI arg thing needs to be sorted out. >>> >>> Thanks, >>> Richard. >>> >>> >>>> ChangeLog: >>>> 2015-10-06 Yuri Rumyantsev >>>> >>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>>> "cfghooks.h", add prototypes for introduced new functions. >>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>>> on innermost loop check. >>>> (tree_unswitch_single_loop): Add all required checks on ability of >>>> loop unswitching under zero recursive level guard. >>>> (tree_unswitch_outer_loop): New function. >>>> (find_loop_guard): Likewise. >>>> (empty_bb_without_guard_p): Likewise. >>>> (used_outside_loop_p): Likewise. >>>> (hoist_guard): Likewise. >>>> (check_exit_phi): Likewise. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.dg/loop-unswitch-2.c: New test. >>>> * gcc.dg/loop-unswitch-3.c: Likewise. >>>> * gcc.dg/loop-unswitch-4.c: Likewise. >>>> >>>> 2015-10-06 10:59 GMT+03:00 Richard Biener : >>>>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev wrote: >>>>>> Thanks Richard. >>>>>> I'd like to answer on your last comment related to using of exit edge >>>>>> argument for edge that skips loop. >>>>>> Let's consider the following test-case: >>>>>> >>>>>> #include >>>>>> #define N 32 >>>>>> float *foo(int ustride, int size, float *src) >>>>>> { >>>>>> float *buffer, *p; >>>>>> int i, k; >>>>>> >>>>>> if (!src) >>>>>> return NULL; >>>>>> >>>>>> buffer = (float *) malloc(N * size * sizeof(float)); >>>>>> >>>>>> if(buffer) >>>>>> for(i=0, p=buffer; i>>>>> for(k=0; k>>>>> *p++ = src[k]; >>>>>> >>>>>> return buffer; >>>>>> } >>>>>> >>>>>> Before adding new edge we have in post-header bb: >>>>>> : >>>>>> # _6 = PHI <0B(8), buffer_20(16)> >>>>>> return _6; >>>>>> >>>>>> It is clear that we must preserve function semantic and transform it to >>>>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)> >>>>> >>>>> Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> 2015-10-05 13:57 GMT+03:00 Richard Biener : >>>>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch - >>>>>>>> hoist invariant guard if loop is empty without guard. Note that this >>>>>>>> was added to loop unswitching pass with simple modifications - using >>>>>>>> another loop iterator etc. >>>>>>>> >>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>> What is your opinion? >>>>>>> >>>>>>> Overall it looks good. Some comments below - a few more testcases would >>>>>>> be nice as well. >>>>>>> >>>>>>> + /* Loop must not be infinite. */ >>>>>>> + if (!finite_loop_p (loop)) >>>>>>> + return false; >>>>>>> >>>>>>> why's that? >>>>>>> >>>>>>> + body = get_loop_body_in_dom_order (loop); >>>>>>> + for (i = 0; i < loop->num_nodes; i++) >>>>>>> + { >>>>>>> + if (body[i]->loop_father != loop) >>>>>>> + continue; >>>>>>> + if (!empty_bb_without_guard_p (loop, body[i])) >>>>>>> >>>>>>> I wonder if there is a better way to iterate over the interesting >>>>>>> blocks and PHIs >>>>>>> we need to check for side-effects (and thus we maybe can avoid gathering >>>>>>> the loop in DOM order). >>>>>>> >>>>>>> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) >>>>>>> + { >>>>>>> + if (may_be_used_outside >>>>>>> >>>>>>> may_be_used_outside can be hoisted above the loop. I wonder if we can take >>>>>>> advantage of loop-closed SSA form here (and the fact we have a single exit >>>>>>> from the loop). Iterating over exit dest PHIs and determining whether the >>>>>>> exit edge DEF is inside the loop part it may not be should be enough. >>>>>>> >>>>>>> + gcc_assert (single_succ_p (pre_header)); >>>>>>> >>>>>>> that should be always true. >>>>>>> >>>>>>> + gsi_remove (&gsi, false); >>>>>>> + bb = guard->dest; >>>>>>> + remove_edge (guard); >>>>>>> + /* Update dominance for destination of GUARD. */ >>>>>>> + if (EDGE_COUNT (bb->preds) == 0) >>>>>>> + { >>>>>>> + basic_block s_bb; >>>>>>> + gcc_assert (single_succ_p (bb)); >>>>>>> + s_bb = single_succ (bb); >>>>>>> + delete_basic_block (bb); >>>>>>> + if (single_pred_p (s_bb)) >>>>>>> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); >>>>>>> >>>>>>> all this massaging should be simplified by leaving it to CFG cleanup by >>>>>>> simply adjusting the CONDs condition to always true/false. There is >>>>>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant >>>>>>> taking a bool). >>>>>>> >>>>>>> + new_edge = make_edge (pre_header, exit->dest, flags); >>>>>>> + if (fix_dom_of_exit) >>>>>>> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); >>>>>>> + update_stmt (gsi_stmt (gsi)); >>>>>>> >>>>>>> the update_stmt should be not necessary, it's done by gsi_insert_after already. >>>>>>> >>>>>>> + /* Add NEW_ADGE argument for all phi in post-header block. */ >>>>>>> + bb = exit->dest; >>>>>>> + for (gphi_iterator gsi = gsi_start_phis (bb); >>>>>>> + !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>>> + { >>>>>>> + gphi *phi = gsi.phi (); >>>>>>> + /* edge_iterator ei; */ >>>>>>> + tree arg; >>>>>>> + if (virtual_operand_p (gimple_phi_result (phi))) >>>>>>> + { >>>>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >>>>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>>>>>> + } >>>>>>> + else >>>>>>> + { >>>>>>> + /* Use exit edge argument. */ >>>>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); >>>>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>>>>>> >>>>>>> Hum. How is it ok to use the exit edge argument for the edge that skips >>>>>>> the loop? Why can't you always use the pre-header edge value? >>>>>>> That is, if we have >>>>>>> >>>>>>> for(i=0;i>>>>>> { >>>>>>> if (n > 0) >>>>>>> { >>>>>>> for (;;) >>>>>>> { >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> ... = i; >>>>>>> >>>>>>> then i is used after the loop and the correct value to use if >>>>>>> n > 0 is false is '0'. Maybe this way we can also relax >>>>>>> what check_exit_phi does? IMHO the only restriction is >>>>>>> if sth defined inside the loop before the header check for >>>>>>> the inner loop is used after the loop. >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> ChangeLog: >>>>>>>> 2015-09-30 Yuri Rumyantsev >>>>>>>> >>>>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>>>>>>> "cfghooks.h", add prototypes for introduced new functions. >>>>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>>>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>>>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>>>>>>> on innermost loop check. >>>>>>>> (tree_unswitch_single_loop): Add all required checks on ability of >>>>>>>> loop unswitching under zero recursive level guard. >>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>> (find_loop_guard): Likewise. >>>>>>>> (empty_bb_without_guard_p): Likewise. >>>>>>>> (used_outside_loop_p): Likewise. >>>>>>>> (hoist_guard): Likewise. >>>>>>>> (check_exit_phi): Likewise. >>>>>>>> >>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>> * gcc.dg/loop-unswitch-2.c: New test. >>>>>>>> >>>>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener : >>>>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >>>>>>>>> this hoisting in loop header copying. Integrating it >>>>>>>>> with LIM would be a better fit eventually. >>>>>>>>> >>>>>>>>> Note that we did agree to go forward with your original patch just >>>>>>>>> making it more "generically" perform outer loop >>>>>>>>> unswitching. Did you explore that idea further? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Thanks Richard. >>>>>>>>>> >>>>>>>>>> I found one more issue that could not be fixed simply. In 23855 you >>>>>>>>>> consider the following test-case: >>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>> { >>>>>>>>>> int i, j; >>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>> } >>>>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires >>>>>>>>>> memref alias analysis since in general x and ie can alias (if their >>>>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed >>>>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on >>>>>>>>>> non-zero for *ie out of loop using 238565 patch. >>>>>>>>>> The second concern is that proposed copy header algorithm changes >>>>>>>>>> loop structure significantly and it is not accepted by vectorizer >>>>>>>>>> since latch is not empty (such transformation assumes loop peeling for >>>>>>>>>> one iteration. So I can propose to implement simple guard hoisting >>>>>>>>>> without copying header and tail blocks (if it is possible). >>>>>>>>>> >>>>>>>>>> I will appreciate you for any advice or help since without such >>>>>>>>>> hoisting we are not able to perform outer loop vectorization for >>>>>>>>>> important benchmark. >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>> Hi Richard, >>>>>>>>>>>> >>>>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>>>>>>>>> discovered thta it does not work properly. >>>>>>>>>>>> So I wonder is it tested patch and it should work? >>>>>>>>>>> >>>>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>>>>>>>>> >>>>>>>>>>>> Should it accept for hoisting the following loop nest >>>>>>>>>>>> for (i=0; i>>>>>>>>>>> s = 0; >>>>>>>>>>>> for (j=0; j>>>>>>>>>>> s += a[i] * b[j]; >>>>>>>>>>>> c[i] = s; >>>>>>>>>>>> } >>>>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0. >>>>>>>>>>> >>>>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>>>>>>>>> comparison with my. >>>>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>>>>>>>>> vecctorization. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>>>>>>>>> is sensible. >>>>>>>>>>>>> >>>>>>>>>>>>>> Note that your approach will not hoist invariant >>>>>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>>>>>>>>> won't be empty for taken branch. >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>>>>>>>>> is originally Zdenek Dvoraks patch. >>>>>>>>>>>>> >>>>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>>>>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>>>>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>>>>>>>>> other pass. >>>>>>>>>>>>> >>>>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>>>>>>>>> get rid of them? >>>>>>>>>>>>> >>>>>>>>>>>>>> What is your opinion? >>>>>>>>>>>>> >>>>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>>>>>>>>> (or similar) cases then we should make it a lot more general than >>>>>>>>>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>>>>>>>>> us from considering unswitching non-innermost loops in general. >>>>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>>>>>>>>> depth of a loop nest to unswitch). >>>>>>>>>>>>> >>>>>>>>>>>>> So my first thought when seeing your patch still holds - the patch >>>>>>>>>>>>> looks very much too specific. >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>>>>>>>>> nest as >>>>>>>>>>>>>>>> for (i=0; i<10; i++) >>>>>>>>>>>>>>>> for (j=0;j>>>>>>>>>>>>>>> for (k=0;k<20;k++) >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is dump for your second test-case: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> int i, j; >>>>>>>>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>>>>>>>>> ;; Unswitching outer loop >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>>>>>>>>> unknown state...) >>>>>>>>>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>>>>>>>>> expensive compared >>>>>>>>>>>>>>> to "moving" the invariant condition. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In your patch: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + if (!nloop->force_vectorize) >>>>>>>>>>>>>>> + nloop->force_vectorize = true; >>>>>>>>>>>>>>> + if (loop->safelen != 0) >>>>>>>>>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>>>>>>>>> copy_loop_info. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>>>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>>>>>>>>> + outer-loop. */ >>>>>>>>>>>>>>> + cleanup_tree_cfg (); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>>>>>>>>> need to invoke CFG cleanup here? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>>>>>>>>> acc = 0; >>>>>>>>>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>>>>>>>>> acc <<= 1; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>>>>>>>>> to patch). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think this is >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>>>>>>>>> back to that patch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>>>>>>>>> loops of innermost loops. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.