Richard, Here is updated patch which reflects almost all your remarks: 1. Use ordinary get_loop_body. 2. Delete useless asserts. 3. Use check on iterated loop instead of finite_loop_p. 4. Do not update CFG by adjusting the CONDs condition to always true/false. 5. Add couple tests. ChangeLog: 2015-10-06 Yuri Rumyantsev * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and "cfghooks.h", add prototypes for introduced new functions. (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all checks on ability of loop unswitching to tree_unswitch_single_loop; invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending on innermost loop check. (tree_unswitch_single_loop): Add all required checks on ability of loop unswitching under zero recursive level guard. (tree_unswitch_outer_loop): New function. (find_loop_guard): Likewise. (empty_bb_without_guard_p): Likewise. (used_outside_loop_p): Likewise. (hoist_guard): Likewise. (check_exit_phi): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/loop-unswitch-2.c: New test. * gcc.dg/loop-unswitch-3.c: Likewise. * gcc.dg/loop-unswitch-4.c: Likewise. 2015-10-06 10:59 GMT+03:00 Richard Biener : > On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev wrote: >> Thanks Richard. >> I'd like to answer on your last comment related to using of exit edge >> argument for edge that skips loop. >> Let's consider the following test-case: >> >> #include >> #define N 32 >> float *foo(int ustride, int size, float *src) >> { >> float *buffer, *p; >> int i, k; >> >> if (!src) >> return NULL; >> >> buffer = (float *) malloc(N * size * sizeof(float)); >> >> if(buffer) >> for(i=0, p=buffer; i> for(k=0; k> *p++ = src[k]; >> >> return buffer; >> } >> >> Before adding new edge we have in post-header bb: >> : >> # _6 = PHI <0B(8), buffer_20(16)> >> return _6; >> >> It is clear that we must preserve function semantic and transform it to >> _6 = PHI <0B(12), buffer_19(9), buffer_19(4)> > > Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop. > > Richard. > >> >> 2015-10-05 13:57 GMT+03:00 Richard Biener : >>> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: >>>> Hi Richard, >>>> >>>> I re-designed outer loop unswitching using basic idea of 23855 patch - >>>> hoist invariant guard if loop is empty without guard. Note that this >>>> was added to loop unswitching pass with simple modifications - using >>>> another loop iterator etc. >>>> >>>> Bootstrap and regression testing did not show any new failures. >>>> What is your opinion? >>> >>> Overall it looks good. Some comments below - a few more testcases would >>> be nice as well. >>> >>> + /* Loop must not be infinite. */ >>> + if (!finite_loop_p (loop)) >>> + return false; >>> >>> why's that? >>> >>> + body = get_loop_body_in_dom_order (loop); >>> + for (i = 0; i < loop->num_nodes; i++) >>> + { >>> + if (body[i]->loop_father != loop) >>> + continue; >>> + if (!empty_bb_without_guard_p (loop, body[i])) >>> >>> I wonder if there is a better way to iterate over the interesting >>> blocks and PHIs >>> we need to check for side-effects (and thus we maybe can avoid gathering >>> the loop in DOM order). >>> >>> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) >>> + { >>> + if (may_be_used_outside >>> >>> may_be_used_outside can be hoisted above the loop. I wonder if we can take >>> advantage of loop-closed SSA form here (and the fact we have a single exit >>> from the loop). Iterating over exit dest PHIs and determining whether the >>> exit edge DEF is inside the loop part it may not be should be enough. >>> >>> + gcc_assert (single_succ_p (pre_header)); >>> >>> that should be always true. >>> >>> + gsi_remove (&gsi, false); >>> + bb = guard->dest; >>> + remove_edge (guard); >>> + /* Update dominance for destination of GUARD. */ >>> + if (EDGE_COUNT (bb->preds) == 0) >>> + { >>> + basic_block s_bb; >>> + gcc_assert (single_succ_p (bb)); >>> + s_bb = single_succ (bb); >>> + delete_basic_block (bb); >>> + if (single_pred_p (s_bb)) >>> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); >>> >>> all this massaging should be simplified by leaving it to CFG cleanup by >>> simply adjusting the CONDs condition to always true/false. There is >>> gimple_cond_make_{true,false} () for this (would be nice to have a variant >>> taking a bool). >>> >>> + new_edge = make_edge (pre_header, exit->dest, flags); >>> + if (fix_dom_of_exit) >>> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); >>> + update_stmt (gsi_stmt (gsi)); >>> >>> the update_stmt should be not necessary, it's done by gsi_insert_after already. >>> >>> + /* Add NEW_ADGE argument for all phi in post-header block. */ >>> + bb = exit->dest; >>> + for (gphi_iterator gsi = gsi_start_phis (bb); >>> + !gsi_end_p (gsi); gsi_next (&gsi)) >>> + { >>> + gphi *phi = gsi.phi (); >>> + /* edge_iterator ei; */ >>> + tree arg; >>> + if (virtual_operand_p (gimple_phi_result (phi))) >>> + { >>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>> + } >>> + else >>> + { >>> + /* Use exit edge argument. */ >>> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); >>> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >>> >>> Hum. How is it ok to use the exit edge argument for the edge that skips >>> the loop? Why can't you always use the pre-header edge value? >>> That is, if we have >>> >>> for(i=0;i>> { >>> if (n > 0) >>> { >>> for (;;) >>> { >>> } >>> } >>> } >>> ... = i; >>> >>> then i is used after the loop and the correct value to use if >>> n > 0 is false is '0'. Maybe this way we can also relax >>> what check_exit_phi does? IMHO the only restriction is >>> if sth defined inside the loop before the header check for >>> the inner loop is used after the loop. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks. >>>> >>>> ChangeLog: >>>> 2015-09-30 Yuri Rumyantsev >>>> >>>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>>> "cfghooks.h", add prototypes for introduced new functions. >>>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>>> on innermost loop check. >>>> (tree_unswitch_single_loop): Add all required checks on ability of >>>> loop unswitching under zero recursive level guard. >>>> (tree_unswitch_outer_loop): New function. >>>> (find_loop_guard): Likewise. >>>> (empty_bb_without_guard_p): Likewise. >>>> (used_outside_loop_p): Likewise. >>>> (hoist_guard): Likewise. >>>> (check_exit_phi): Likewise. >>>> >>>> gcc/testsuite/ChangeLog: >>>> * gcc.dg/loop-unswitch-2.c: New test. >>>> >>>> 2015-09-16 11:26 GMT+03:00 Richard Biener : >>>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >>>>> this hoisting in loop header copying. Integrating it >>>>> with LIM would be a better fit eventually. >>>>> >>>>> Note that we did agree to go forward with your original patch just >>>>> making it more "generically" perform outer loop >>>>> unswitching. Did you explore that idea further? >>>>> >>>>> >>>>> >>>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>>>>> Thanks Richard. >>>>>> >>>>>> I found one more issue that could not be fixed simply. In 23855 you >>>>>> consider the following test-case: >>>>>> void foo(int *ie, int *je, double *x) >>>>>> { >>>>>> int i, j; >>>>>> for (j=0; j<*je; ++j) >>>>>> for (i=0; i<*ie; ++i) >>>>>> x[i+j] = 0.0; >>>>>> } >>>>>> and proposed to hoist up a check on *ie out of loop. It requires >>>>>> memref alias analysis since in general x and ie can alias (if their >>>>>> types are compatible - int *ie & int * x). Such analysis is performed >>>>>> by pre or lim passes. Without such analysis we can not hoist a test on >>>>>> non-zero for *ie out of loop using 238565 patch. >>>>>> The second concern is that proposed copy header algorithm changes >>>>>> loop structure significantly and it is not accepted by vectorizer >>>>>> since latch is not empty (such transformation assumes loop peeling for >>>>>> one iteration. So I can propose to implement simple guard hoisting >>>>>> without copying header and tail blocks (if it is possible). >>>>>> >>>>>> I will appreciate you for any advice or help since without such >>>>>> hoisting we are not able to perform outer loop vectorization for >>>>>> important benchmark. >>>>>> and >>>>>> >>>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>>>>> discovered thta it does not work properly. >>>>>>>> So I wonder is it tested patch and it should work? >>>>>>> >>>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>>>>> >>>>>>>> Should it accept for hoisting the following loop nest >>>>>>>> for (i=0; i>>>>>>> s = 0; >>>>>>>> for (j=0; j>>>>>>> s += a[i] * b[j]; >>>>>>>> c[i] = s; >>>>>>>> } >>>>>>>> Note that i-loop will nit be empty if m is equal to 0. >>>>>>> >>>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Hi Richard, >>>>>>>>>> >>>>>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>>>>> comparison with my. >>>>>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>>>>> vecctorization. >>>>>>>>> >>>>>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>>>>> is sensible. >>>>>>>>> >>>>>>>>>> Note that your approach will not hoist invariant >>>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>>>>> won't be empty for taken branch. >>>>>>>>> >>>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>>>>> is originally Zdenek Dvoraks patch. >>>>>>>>> >>>>>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>>>>> other pass. >>>>>>>>> >>>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>>>>> get rid of them? >>>>>>>>> >>>>>>>>>> What is your opinion? >>>>>>>>> >>>>>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>>>>> (or similar) cases then we should make it a lot more general than >>>>>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>>>>> us from considering unswitching non-innermost loops in general. >>>>>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>>>>> depth of a loop nest to unswitch). >>>>>>>>> >>>>>>>>> So my first thought when seeing your patch still holds - the patch >>>>>>>>> looks very much too specific. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Yuri. >>>>>>>>>> >>>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>> Hi Richard, >>>>>>>>>>>> >>>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>>>>> nest as >>>>>>>>>>>> for (i=0; i<10; i++) >>>>>>>>>>>> for (j=0;j>>>>>>>>>>> for (k=0;k<20;k++) >>>>>>>>>>>> ... >>>>>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>>>>> >>>>>>>>>>>> Here is dump for your second test-case: >>>>>>>>>>>> >>>>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>>>> { >>>>>>>>>>>> int i, j; >>>>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>>>> } >>>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>>>>> ;; Unswitching outer loop >>>>>>>>>>> >>>>>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>>>>> unknown state...) >>>>>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>>>>> expensive compared >>>>>>>>>>> to "moving" the invariant condition. >>>>>>>>>>> >>>>>>>>>>> In your patch: >>>>>>>>>>> >>>>>>>>>>> + if (!nloop->force_vectorize) >>>>>>>>>>> + nloop->force_vectorize = true; >>>>>>>>>>> + if (loop->safelen != 0) >>>>>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>>>>> >>>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>>>>> copy_loop_info. >>>>>>>>>>> >>>>>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>>>>> >>>>>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>>>>> >>>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>>>>> >>>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>>>>> + outer-loop. */ >>>>>>>>>>> + cleanup_tree_cfg (); >>>>>>>>>>> >>>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>>>>> need to invoke CFG cleanup here? >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> Yuri. >>>>>>>>>>>> >>>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>>>>> acc = 0; >>>>>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>>>>> acc <<= 1; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>>>>> to patch). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>>> >>>>>>>>>>>>> I think this is >>>>>>>>>>>>> >>>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>>>>> >>>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>>>>> back to that patch. >>>>>>>>>>>>> >>>>>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>>>>> loops of innermost loops. >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>>>>> >>>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>>>>> >>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.