Richard, I noticed that 'gimple' type was changed and send you updated patch. Thanks. Yuri. 2015-10-07 12:53 GMT+03:00 Yuri Rumyantsev : > Richard, > > I've fixed adding virtual phi argument and add check on irreducible basic block. > New patch is attached. > > I checked it for bootstrap and regression testing, no new failures. > > ChangeLog: > 2015-10-07 Yuri Rumyantsev > > * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and > "cfghooks.h", add prototypes for introduced new functions. > (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all > checks on ability of loop unswitching to tree_unswitch_single_loop; > invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending > on innermost loop check. > (tree_unswitch_single_loop): Add all required checks on ability of > loop unswitching under zero recursive level guard. > (tree_unswitch_outer_loop): New function. > (find_loop_guard): Likewise. > (empty_bb_without_guard_p): Likewise. > (used_outside_loop_p): Likewise. > (get_vop_from_header): Likewise. > (hoist_guard): Likewise. > (check_exit_phi): Likewise. > > gcc/testsuite/ChangeLog: > * gcc.dg/loop-unswitch-2.c: New test. > * gcc.dg/loop-unswitch-3.c: Likewise. > * gcc.dg/loop-unswitch-4.c: Likewise. > > > 2015-10-06 15:21 GMT+03:00 Richard Biener : >> On Tue, Oct 6, 2015 at 1:41 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> Here is updated patch which reflects almost all your remarks: >>> 1. Use ordinary get_loop_body. >>> 2. Delete useless asserts. >>> 3. Use check on iterated loop instead of finite_loop_p. >>> 4. Do not update CFG by adjusting the CONDs condition to always true/false. >>> 5. Add couple tests. >> >> + /* Add NEW_ADGE argument for all phi in post-header block. */ >> + bb = exit->dest; >> + for (gphi_iterator gsi = gsi_start_phis (bb); >> + !gsi_end_p (gsi); gsi_next (&gsi)) >> + { >> + gphi *phi = gsi.phi (); >> + /* edge_iterator ei; */ >> + tree arg; >> + if (virtual_operand_p (gimple_phi_result (phi))) >> + { >> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >> >> now I know what confused me - here you are looking at a loop exit PHI >> but querying with the preheader edge index. I think you need to walk >> the loop header PHIs to find the PHI for the virtual operand and use that >> to get the PHI arg from? >> >> The side-effect / used-outside code is still the same. What matters >> is side-effects outside of the loop-header protected code region, not >> blocks excluding the inner loop. Say, >> >> for (;;) >> { >> if (invariant-guard) >> { >> printf ("Blah"); >> for (;;) >> ; >> } >> } >> >> would still ok to be unswitched. So instead of >> >> + if (body[i]->loop_father != loop) >> + continue; >> >> it would be >> >> if (dominated_by_p (CDI_DOMINATORS, body[i], header) >> && !dominated_by_p (CDI_DOMINATORS, body[i], fe->dest)) >> >> with the obvious improvement to the patch to not only consider header checks >> in the outer loop header but in the pre-header block of the inner loop. >> >> And I still think you should walk the exit PHIs args to see whether they >> are defined in the non-guarded region of the outer loop instead of walking >> all uses of all defs. >> >> Note that I think you miss endless loops as side-effects if that endless >> loop occurs through a irreducible region (thus not reflected in the >> loop tree). Thus you should reject BB_IRREDUCIBLE_LOOP blocks >> in the non-guarded region as well. >> >> It seems to me that protecting adjacent loops with a single guard is >> also eligible for hoisting thus the restriction on loop->inner->next >> should become a restriction on no loops (or irreducible regions) >> in the non-guarded region. >> >> Most things can be improved as followup, but at least the >> virtual PHI arg thing needs to be sorted out. >> >> Thanks, >> Richard. >> >> >>> ChangeLog: >>> 2015-10-06 Yuri Rumyantsev >>> >>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>> "cfghooks.h", add prototypes for introduced new functions. >>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>> on innermost loop check. >>> (tree_unswitch_single_loop): Add all required checks on ability of >>> loop unswitching under zero recursive level guard. >>> (tree_unswitch_outer_loop): New function. >>> (find_loop_guard): Likewise. >>> (empty_bb_without_guard_p): Likewise. >>> (used_outside_loop_p): Likewise. >>> (hoist_guard): Likewise. >>> (check_exit_phi): Likewise. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.dg/loop-unswitch-2.c: New test. >>> * gcc.dg/loop-unswitch-3.c: Likewise. >>> * gcc.dg/loop-unswitch-4.c: Likewise. >>> >>> 2015-10-06 10:59 GMT+03:00 Richard Biener : >>>> On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev wrote: >>>>> Thanks Richard. >>>>> I'd like to answer on your last comment related to using of exit edge >>>>> argument for edge that skips loop. >>>>> Let's consider the following test-case: >>>>> >>>>> #include >>>>> #define N 32 >>>>> float *foo(int ustride, int size, float *src) >>>>> { >>>>> float *buffer, *p; >>>>> int i, k; >>>>> >>>>> if (!src) >>>>> return NULL; >>>>> >>>>> buffer = (float *) malloc(N * size * sizeof(float)); >>>>> >>>>> if(buffer) >>>>> for(i=0, p=buffer; i>>>> for(k=0; k>>>> *p++ = src[k]; >>>>> >>>>> return buffer; >>>>> } >>>>> >>>>> Before adding new edge we have in post-header bb: >>>>> : >>>>> # _6 = PHI <0B(8), buffer_20(16)> >>>>> return _6; >>>>> >>>>> It is clear that we must preserve function semantic and transform it to >>>>> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)> >>>> >>>> Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop. >>>> >>>> Richard. >>>> >>>>> >>>>> 2015-10-05 13:57 GMT+03:00 Richard Biener : >>>>>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> I re-designed outer loop unswitching using basic idea of 23855 patch - >>>>>>> hoist invariant guard if loop is empty without guard. Note that this >>>>>>> was added to loop unswitching pass with simple modifications - using >>>>>>> another loop iterator etc. >>>>>>> >>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>> What is your opinion? >>>>>> >>>>>> Overall it looks good. Some comments below - a few more testcases would >>>>>> be nice as well. >>>>>> >>>>>> + /* Loop must not be infinite. */ >>>>>> + if (!finite_loop_p (loop)) >>>>>> + return false; >>>>>> >>>>>> why's that? >>>>>> >>>>>> + body = get_loop_body_in_dom_order (loop); >>>>>> + for (i = 0; i < loop->num_nodes; i++) >>>>>> + { >>>>>> + if (body[i]->loop_father != loop) >>>>>> + continue; >>>>>> + if (!empty_bb_without_guard_p (loop, body[i])) >>>>>> >>>>>> I wonder if there is a better way to iterate over the interesting >>>>>> blocks and PHIs >>>>>> we need to check for side-effects (and thus we maybe can avoid gathering >>>>>> the loop in DOM order). >>>>>> >>>>>> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) >>>>>> + { >>>>>> + if (may_be_used_outside >>>>>> >>>>>> may_be_used_outside can be hoisted above the loop. I wonder if we can take >>>>>> advantage of loop-closed SSA form here (and the fact we have a single exit >>>>>> from the loop). Iterating over exit dest PHIs and determining whether the >>>>>> exit edge DEF is inside the loop part it may not be should be enough. >>>>>> >>>>>> + gcc_assert (single_succ_p (pre_header)); >>>>>> >>>>>> that should be always true. >>>>>> >>>>>> + gsi_remove (&gsi, false); >>>>>> + bb = guard->dest; >>>>>> + remove_edge (guard); >>>>>> + /* Update dominance for destination of GUARD. */ >>>>>> + if (EDGE_COUNT (bb->preds) == 0) >>>>>> + { >>>>>> + basic_block s_bb; >>>>>> + gcc_assert (single_succ_p (bb)); >>>>>> + s_bb = single_succ (bb); >>>>>> + delete_basic_block (bb); >>>>>> + if (single_pred_p (s_bb)) >>>>>> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); >>>>>> >>>>>> all this massaging should be simplified by leaving it to CFG cleanup by >>>>>> simply adjusting the CONDs condition to always true/false. There is >>>>>> gimple_cond_make_{true,false} () for this (would be nice to have a variant >>>>>> taking a bool). >>>>>> >>>>>> + new_edge = make_edge (pre_header, exit->dest, flags); >>>>>> + if (fix_dom_of_exit) >>>>>> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); >>>>>> + update_stmt (gsi_stmt (gsi)); >>>>>> >>>>>> the update_stmt should be not necessary, it's done by gsi_insert_after already. >>>>>> >>>>>> + /* Add NEW_ADGE argument for all phi in post-header block. */ >>>>>> + bb = exit->dest; >>>>>> + for (gphi_iterator gsi = gsi_start_phis (bb); >>>>>> + !gsi_end_p (gsi); gsi_next (&gsi)) >>>>>> + { >>>>>> + gphi *phi = gsi.phi (); >>>>>> + /* edge_iterator ei; */ >>>>>> + tree arg; >>>>>> + if (virtual_operand_p (gimple_phi_result (phi))) >>>>>> + { >>>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >>>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>>>>> + } >>>>>> + else >>>>>> + { >>>>>> + /* Use exit edge argument. */ >>>>>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); >>>>>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>>>>> >>>>>> Hum. How is it ok to use the exit edge argument for the edge that skips >>>>>> the loop? Why can't you always use the pre-header edge value? >>>>>> That is, if we have >>>>>> >>>>>> for(i=0;i>>>>> { >>>>>> if (n > 0) >>>>>> { >>>>>> for (;;) >>>>>> { >>>>>> } >>>>>> } >>>>>> } >>>>>> ... = i; >>>>>> >>>>>> then i is used after the loop and the correct value to use if >>>>>> n > 0 is false is '0'. Maybe this way we can also relax >>>>>> what check_exit_phi does? IMHO the only restriction is >>>>>> if sth defined inside the loop before the header check for >>>>>> the inner loop is used after the loop. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> ChangeLog: >>>>>>> 2015-09-30 Yuri Rumyantsev >>>>>>> >>>>>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>>>>>> "cfghooks.h", add prototypes for introduced new functions. >>>>>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>>>>>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>>>>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>>>>>> on innermost loop check. >>>>>>> (tree_unswitch_single_loop): Add all required checks on ability of >>>>>>> loop unswitching under zero recursive level guard. >>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>> (find_loop_guard): Likewise. >>>>>>> (empty_bb_without_guard_p): Likewise. >>>>>>> (used_outside_loop_p): Likewise. >>>>>>> (hoist_guard): Likewise. >>>>>>> (check_exit_phi): Likewise. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> * gcc.dg/loop-unswitch-2.c: New test. >>>>>>> >>>>>>> 2015-09-16 11:26 GMT+03:00 Richard Biener : >>>>>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >>>>>>>> this hoisting in loop header copying. Integrating it >>>>>>>> with LIM would be a better fit eventually. >>>>>>>> >>>>>>>> Note that we did agree to go forward with your original patch just >>>>>>>> making it more "generically" perform outer loop >>>>>>>> unswitching. Did you explore that idea further? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>>>>>>>> Thanks Richard. >>>>>>>>> >>>>>>>>> I found one more issue that could not be fixed simply. In 23855 you >>>>>>>>> consider the following test-case: >>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>> { >>>>>>>>> int i, j; >>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>> x[i+j] = 0.0; >>>>>>>>> } >>>>>>>>> and proposed to hoist up a check on *ie out of loop. It requires >>>>>>>>> memref alias analysis since in general x and ie can alias (if their >>>>>>>>> types are compatible - int *ie & int * x). Such analysis is performed >>>>>>>>> by pre or lim passes. Without such analysis we can not hoist a test on >>>>>>>>> non-zero for *ie out of loop using 238565 patch. >>>>>>>>> The second concern is that proposed copy header algorithm changes >>>>>>>>> loop structure significantly and it is not accepted by vectorizer >>>>>>>>> since latch is not empty (such transformation assumes loop peeling for >>>>>>>>> one iteration. So I can propose to implement simple guard hoisting >>>>>>>>> without copying header and tail blocks (if it is possible). >>>>>>>>> >>>>>>>>> I will appreciate you for any advice or help since without such >>>>>>>>> hoisting we are not able to perform outer loop vectorization for >>>>>>>>> important benchmark. >>>>>>>>> and >>>>>>>>> >>>>>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>>>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>>>>>>>> Hi Richard, >>>>>>>>>>> >>>>>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>>>>>>>> discovered thta it does not work properly. >>>>>>>>>>> So I wonder is it tested patch and it should work? >>>>>>>>>> >>>>>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>>>>>>>> >>>>>>>>>>> Should it accept for hoisting the following loop nest >>>>>>>>>>> for (i=0; i>>>>>>>>>> s = 0; >>>>>>>>>>> for (j=0; j>>>>>>>>>> s += a[i] * b[j]; >>>>>>>>>>> c[i] = s; >>>>>>>>>>> } >>>>>>>>>>> Note that i-loop will nit be empty if m is equal to 0. >>>>>>>>>> >>>>>>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>>>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>>>>>>>> comparison with my. >>>>>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>>>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>>>>>>>> vecctorization. >>>>>>>>>>>> >>>>>>>>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>>>>>>>> is sensible. >>>>>>>>>>>> >>>>>>>>>>>>> Note that your approach will not hoist invariant >>>>>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>>>>>>>> won't be empty for taken branch. >>>>>>>>>>>> >>>>>>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>>>>>>>> is originally Zdenek Dvoraks patch. >>>>>>>>>>>> >>>>>>>>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>>>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>>>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>>>>>>>> other pass. >>>>>>>>>>>> >>>>>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>>>>>>>> get rid of them? >>>>>>>>>>>> >>>>>>>>>>>>> What is your opinion? >>>>>>>>>>>> >>>>>>>>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>>>>>>>> (or similar) cases then we should make it a lot more general than >>>>>>>>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>>>>>>>> us from considering unswitching non-innermost loops in general. >>>>>>>>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>>>>>>>> depth of a loop nest to unswitch). >>>>>>>>>>>> >>>>>>>>>>>> So my first thought when seeing your patch still holds - the patch >>>>>>>>>>>> looks very much too specific. >>>>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> Yuri. >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>>>>>>>> nest as >>>>>>>>>>>>>>> for (i=0; i<10; i++) >>>>>>>>>>>>>>> for (j=0;j>>>>>>>>>>>>>> for (k=0;k<20;k++) >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here is dump for your second test-case: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> int i, j; >>>>>>>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>>>>>>>> ;; Unswitching outer loop >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>>>>>>>> unknown state...) >>>>>>>>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>>>>>>>> expensive compared >>>>>>>>>>>>>> to "moving" the invariant condition. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In your patch: >>>>>>>>>>>>>> >>>>>>>>>>>>>> + if (!nloop->force_vectorize) >>>>>>>>>>>>>> + nloop->force_vectorize = true; >>>>>>>>>>>>>> + if (loop->safelen != 0) >>>>>>>>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>>>>>>>> >>>>>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>>>>>>>> copy_loop_info. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>>>>>>>> >>>>>>>>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>>>>>>>> >>>>>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>>>>>>>> + outer-loop. */ >>>>>>>>>>>>>> + cleanup_tree_cfg (); >>>>>>>>>>>>>> >>>>>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>>>>>>>> need to invoke CFG cleanup here? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>>>>>>>> acc = 0; >>>>>>>>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>>>>>>>> acc <<= 1; >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>>>>>>>> to patch). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think this is >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>>>>>>>> back to that patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>>>>>>>> loops of innermost loops. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.