From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103739 invoked by alias); 5 Oct 2015 10:57:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 103726 invoked by uid 89); 5 Oct 2015 10:57:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f178.google.com Received: from mail-yk0-f178.google.com (HELO mail-yk0-f178.google.com) (209.85.160.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 05 Oct 2015 10:57:43 +0000 Received: by ykft14 with SMTP id t14so166944016ykf.0 for ; Mon, 05 Oct 2015 03:57:41 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.170.69.66 with SMTP id l63mr24212507ykl.89.1444042661350; Mon, 05 Oct 2015 03:57:41 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Mon, 5 Oct 2015 03:57:41 -0700 (PDT) In-Reply-To: References: Date: Mon, 05 Oct 2015 10:57:00 -0000 Message-ID: Subject: Re: [PATCH] Unswitching outer loops. From: Richard Biener To: Yuri Rumyantsev Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00387.txt.bz2 On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: > Hi Richard, > > I re-designed outer loop unswitching using basic idea of 23855 patch - > hoist invariant guard if loop is empty without guard. Note that this > was added to loop unswitching pass with simple modifications - using > another loop iterator etc. > > Bootstrap and regression testing did not show any new failures. > What is your opinion? Overall it looks good. Some comments below - a few more testcases would be nice as well. + /* Loop must not be infinite. */ + if (!finite_loop_p (loop)) + return false; why's that? + body = get_loop_body_in_dom_order (loop); + for (i = 0; i < loop->num_nodes; i++) + { + if (body[i]->loop_father != loop) + continue; + if (!empty_bb_without_guard_p (loop, body[i])) I wonder if there is a better way to iterate over the interesting blocks and PHIs we need to check for side-effects (and thus we maybe can avoid gathering the loop in DOM order). + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) + { + if (may_be_used_outside may_be_used_outside can be hoisted above the loop. I wonder if we can take advantage of loop-closed SSA form here (and the fact we have a single exit from the loop). Iterating over exit dest PHIs and determining whether the exit edge DEF is inside the loop part it may not be should be enough. + gcc_assert (single_succ_p (pre_header)); that should be always true. + gsi_remove (&gsi, false); + bb = guard->dest; + remove_edge (guard); + /* Update dominance for destination of GUARD. */ + if (EDGE_COUNT (bb->preds) == 0) + { + basic_block s_bb; + gcc_assert (single_succ_p (bb)); + s_bb = single_succ (bb); + delete_basic_block (bb); + if (single_pred_p (s_bb)) + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); all this massaging should be simplified by leaving it to CFG cleanup by simply adjusting the CONDs condition to always true/false. There is gimple_cond_make_{true,false} () for this (would be nice to have a variant taking a bool). + new_edge = make_edge (pre_header, exit->dest, flags); + if (fix_dom_of_exit) + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); + update_stmt (gsi_stmt (gsi)); the update_stmt should be not necessary, it's done by gsi_insert_after already. + /* Add NEW_ADGE argument for all phi in post-header block. */ + bb = exit->dest; + for (gphi_iterator gsi = gsi_start_phis (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gphi *phi = gsi.phi (); + /* edge_iterator ei; */ + tree arg; + if (virtual_operand_p (gimple_phi_result (phi))) + { + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); + } + else + { + /* Use exit edge argument. */ + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); Hum. How is it ok to use the exit edge argument for the edge that skips the loop? Why can't you always use the pre-header edge value? That is, if we have for(i=0;i 0) { for (;;) { } } } ... = i; then i is used after the loop and the correct value to use if n > 0 is false is '0'. Maybe this way we can also relax what check_exit_phi does? IMHO the only restriction is if sth defined inside the loop before the header check for the inner loop is used after the loop. Thanks, Richard. > Thanks. > > ChangeLog: > 2015-09-30 Yuri Rumyantsev > > * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and > "cfghooks.h", add prototypes for introduced new functions. > (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all > checks on ability of loop unswitching to tree_unswitch_single_loop; > invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending > on innermost loop check. > (tree_unswitch_single_loop): Add all required checks on ability of > loop unswitching under zero recursive level guard. > (tree_unswitch_outer_loop): New function. > (find_loop_guard): Likewise. > (empty_bb_without_guard_p): Likewise. > (used_outside_loop_p): Likewise. > (hoist_guard): Likewise. > (check_exit_phi): Likewise. > > gcc/testsuite/ChangeLog: > * gcc.dg/loop-unswitch-2.c: New test. > > 2015-09-16 11:26 GMT+03:00 Richard Biener : >> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >> this hoisting in loop header copying. Integrating it >> with LIM would be a better fit eventually. >> >> Note that we did agree to go forward with your original patch just >> making it more "generically" perform outer loop >> unswitching. Did you explore that idea further? >> >> >> >> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>> Thanks Richard. >>> >>> I found one more issue that could not be fixed simply. In 23855 you >>> consider the following test-case: >>> void foo(int *ie, int *je, double *x) >>> { >>> int i, j; >>> for (j=0; j<*je; ++j) >>> for (i=0; i<*ie; ++i) >>> x[i+j] = 0.0; >>> } >>> and proposed to hoist up a check on *ie out of loop. It requires >>> memref alias analysis since in general x and ie can alias (if their >>> types are compatible - int *ie & int * x). Such analysis is performed >>> by pre or lim passes. Without such analysis we can not hoist a test on >>> non-zero for *ie out of loop using 238565 patch. >>> The second concern is that proposed copy header algorithm changes >>> loop structure significantly and it is not accepted by vectorizer >>> since latch is not empty (such transformation assumes loop peeling for >>> one iteration. So I can propose to implement simple guard hoisting >>> without copying header and tail blocks (if it is possible). >>> >>> I will appreciate you for any advice or help since without such >>> hoisting we are not able to perform outer loop vectorization for >>> important benchmark. >>> and >>> >>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>> Hi Richard, >>>>> >>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>> discovered thta it does not work properly. >>>>> So I wonder is it tested patch and it should work? >>>> >>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>> >>>>> Should it accept for hoisting the following loop nest >>>>> for (i=0; i>>>> s = 0; >>>>> for (j=0; j>>>> s += a[i] * b[j]; >>>>> c[i] = s; >>>>> } >>>>> Note that i-loop will nit be empty if m is equal to 0. >>>> >>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>> >>>> Richard. >>>> >>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>> comparison with my. >>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>> vecctorization. >>>>>> >>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>> is sensible. >>>>>> >>>>>>> Note that your approach will not hoist invariant >>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>> won't be empty for taken branch. >>>>>> >>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>> is originally Zdenek Dvoraks patch. >>>>>> >>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>> other pass. >>>>>> >>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>> get rid of them? >>>>>> >>>>>>> What is your opinion? >>>>>> >>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>> (or similar) cases then we should make it a lot more general than >>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>> us from considering unswitching non-innermost loops in general. >>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>> depth of a loop nest to unswitch). >>>>>> >>>>>> So my first thought when seeing your patch still holds - the patch >>>>>> looks very much too specific. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Yuri. >>>>>>> >>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>> Hi Richard, >>>>>>>>> >>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>> nest as >>>>>>>>> for (i=0; i<10; i++) >>>>>>>>> for (j=0;j>>>>>>>> for (k=0;k<20;k++) >>>>>>>>> ... >>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>> >>>>>>>>> Here is dump for your second test-case: >>>>>>>>> >>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>> { >>>>>>>>> int i, j; >>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>> x[i+j] = 0.0; >>>>>>>>> } >>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>> ;; Unswitching outer loop >>>>>>>> >>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>> unknown state...) >>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>> expensive compared >>>>>>>> to "moving" the invariant condition. >>>>>>>> >>>>>>>> In your patch: >>>>>>>> >>>>>>>> + if (!nloop->force_vectorize) >>>>>>>> + nloop->force_vectorize = true; >>>>>>>> + if (loop->safelen != 0) >>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>> >>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>> copy_loop_info. >>>>>>>> >>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>> >>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>> >>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>> >>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>> + outer-loop. */ >>>>>>>> + cleanup_tree_cfg (); >>>>>>>> >>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>> need to invoke CFG cleanup here? >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>> acc = 0; >>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>> acc <<= 1; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>> to patch). >>>>>>>>>>> >>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>> >>>>>>>>>>> Is it OK for trunk? >>>>>>>>>> >>>>>>>>>> I think this is >>>>>>>>>> >>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>> >>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>> back to that patch. >>>>>>>>>> >>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>> loops of innermost loops. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> ChangeLog: >>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>> >>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>> >>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.