Hi Richard, I re-designed outer loop unswitching using basic idea of 23855 patch - hoist invariant guard if loop is empty without guard. Note that this was added to loop unswitching pass with simple modifications - using another loop iterator etc. Bootstrap and regression testing did not show any new failures. What is your opinion? Thanks. ChangeLog: 2015-09-30 Yuri Rumyantsev * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and "cfghooks.h", add prototypes for introduced new functions. (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all checks on ability of loop unswitching to tree_unswitch_single_loop; invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending on innermost loop check. (tree_unswitch_single_loop): Add all required checks on ability of loop unswitching under zero recursive level guard. (tree_unswitch_outer_loop): New function. (find_loop_guard): Likewise. (empty_bb_without_guard_p): Likewise. (used_outside_loop_p): Likewise. (hoist_guard): Likewise. (check_exit_phi): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/loop-unswitch-2.c: New test. 2015-09-16 11:26 GMT+03:00 Richard Biener : > Yeah, as said, the patch wasn't fully ready and it also felt odd to do > this hoisting in loop header copying. Integrating it > with LIM would be a better fit eventually. > > Note that we did agree to go forward with your original patch just > making it more "generically" perform outer loop > unswitching. Did you explore that idea further? > > > > On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >> Thanks Richard. >> >> I found one more issue that could not be fixed simply. In 23855 you >> consider the following test-case: >> void foo(int *ie, int *je, double *x) >> { >> int i, j; >> for (j=0; j<*je; ++j) >> for (i=0; i<*ie; ++i) >> x[i+j] = 0.0; >> } >> and proposed to hoist up a check on *ie out of loop. It requires >> memref alias analysis since in general x and ie can alias (if their >> types are compatible - int *ie & int * x). Such analysis is performed >> by pre or lim passes. Without such analysis we can not hoist a test on >> non-zero for *ie out of loop using 238565 patch. >> The second concern is that proposed copy header algorithm changes >> loop structure significantly and it is not accepted by vectorizer >> since latch is not empty (such transformation assumes loop peeling for >> one iteration. So I can propose to implement simple guard hoisting >> without copying header and tail blocks (if it is possible). >> >> I will appreciate you for any advice or help since without such >> hoisting we are not able to perform outer loop vectorization for >> important benchmark. >> and >> >> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>> Hi Richard, >>>> >>>> I started learning, tuning and debugging patch proposed in 23855 and >>>> discovered thta it does not work properly. >>>> So I wonder is it tested patch and it should work? >>> >>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>> >>>> Should it accept for hoisting the following loop nest >>>> for (i=0; i>>> s = 0; >>>> for (j=0; j>>> s += a[i] * b[j]; >>>> c[i] = s; >>>> } >>>> Note that i-loop will nit be empty if m is equal to 0. >>> >>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>> >>> Richard. >>> >>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>> Hi Richard, >>>>>> >>>>>> I learned your updated patch for 23825 and it is more general in >>>>>> comparison with my. >>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>> vecctorization. >>>>> >>>>> I don't see why we should special-case that if the approach in 23825 >>>>> is sensible. >>>>> >>>>>> Note that your approach will not hoist invariant >>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>> won't be empty for taken branch. >>>>> >>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>> is originally Zdenek Dvoraks patch. >>>>> >>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>> block through substitution - such phi's prevent outer-loop >>>>>> vectorization. But it is clear that such transformation can be done >>>>>> other pass. >>>>> >>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>> get rid of them? >>>>> >>>>>> What is your opinion? >>>>> >>>>> My opinion is that if we want to enhance unswitching to catch this >>>>> (or similar) cases then we should make it a lot more general than >>>>> your pattern-matching approach. I see nothing that should prevent >>>>> us from considering unswitching non-innermost loops in general. >>>>> It should be only a cost consideration to not do non-innermost loop >>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>> depth of a loop nest to unswitch). >>>>> >>>>> So my first thought when seeing your patch still holds - the patch >>>>> looks very much too specific. >>>>> >>>>> Richard. >>>>> >>>>>> Yuri. >>>>>> >>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>> nest as >>>>>>>> for (i=0; i<10; i++) >>>>>>>> for (j=0;j>>>>>>> for (k=0;k<20;k++) >>>>>>>> ... >>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>> >>>>>>>> Here is dump for your second test-case: >>>>>>>> >>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>> { >>>>>>>> int i, j; >>>>>>>> for (j=0; j<*je; ++j) >>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>> x[i+j] = 0.0; >>>>>>>> } >>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>> ;; Unswitching outer loop >>>>>>> >>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>> unknown state...) >>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>> expensive compared >>>>>>> to "moving" the invariant condition. >>>>>>> >>>>>>> In your patch: >>>>>>> >>>>>>> + if (!nloop->force_vectorize) >>>>>>> + nloop->force_vectorize = true; >>>>>>> + if (loop->safelen != 0) >>>>>>> + nloop->safelen = loop->safelen; >>>>>>> >>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>> copy_loop_info. >>>>>>> >>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>> + else if (integer_zerop (cond_new)) >>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>> >>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>> >>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>> >>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>> + outer-loop. */ >>>>>>> + cleanup_tree_cfg (); >>>>>>> >>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>> need to invoke CFG cleanup here? >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Yuri. >>>>>>>> >>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>> simple cfg, as for example: >>>>>>>>>> acc = 0; >>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>> acc <<= 1; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>> completely deleted as dead code. >>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>> to patch). >>>>>>>>>> >>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>> >>>>>>>>>> Is it OK for trunk? >>>>>>>>> >>>>>>>>> I think this is >>>>>>>>> >>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>> >>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>> back to that patch. >>>>>>>>> >>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>> loops of innermost loops. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> ChangeLog: >>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>> >>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>> >>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.