From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 121510 invoked by alias); 5 Oct 2015 13:13:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 121497 invoked by uid 89); 5 Oct 2015 13:13:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f52.google.com Received: from mail-oi0-f52.google.com (HELO mail-oi0-f52.google.com) (209.85.218.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 05 Oct 2015 13:13:25 +0000 Received: by oibi136 with SMTP id i136so90475915oib.3 for ; Mon, 05 Oct 2015 06:13:23 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.202.86.208 with SMTP id k199mr16456464oib.102.1444050802952; Mon, 05 Oct 2015 06:13:22 -0700 (PDT) Received: by 10.202.191.6 with HTTP; Mon, 5 Oct 2015 06:13:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 05 Oct 2015 13:13:00 -0000 Message-ID: Subject: Re: [PATCH] Unswitching outer loops. From: Yuri Rumyantsev To: Richard Biener Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2015-10/txt/msg00406.txt.bz2 Thanks Richard. I'd like to answer on your last comment related to using of exit edge argument for edge that skips loop. Let's consider the following test-case: #include #define N 32 float *foo(int ustride, int size, float *src) { float *buffer, *p; int i, k; if (!src) return NULL; buffer = (float *) malloc(N * size * sizeof(float)); if(buffer) for(i=0, p=buffer; i: # _6 = PHI <0B(8), buffer_20(16)> return _6; It is clear that we must preserve function semantic and transform it to _6 = PHI <0B(12), buffer_19(9), buffer_19(4)> 2015-10-05 13:57 GMT+03:00 Richard Biener : > On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: >> Hi Richard, >> >> I re-designed outer loop unswitching using basic idea of 23855 patch - >> hoist invariant guard if loop is empty without guard. Note that this >> was added to loop unswitching pass with simple modifications - using >> another loop iterator etc. >> >> Bootstrap and regression testing did not show any new failures. >> What is your opinion? > > Overall it looks good. Some comments below - a few more testcases would > be nice as well. > > + /* Loop must not be infinite. */ > + if (!finite_loop_p (loop)) > + return false; > > why's that? > > + body = get_loop_body_in_dom_order (loop); > + for (i = 0; i < loop->num_nodes; i++) > + { > + if (body[i]->loop_father != loop) > + continue; > + if (!empty_bb_without_guard_p (loop, body[i])) > > I wonder if there is a better way to iterate over the interesting > blocks and PHIs > we need to check for side-effects (and thus we maybe can avoid gathering > the loop in DOM order). > > + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) > + { > + if (may_be_used_outside > > may_be_used_outside can be hoisted above the loop. I wonder if we can take > advantage of loop-closed SSA form here (and the fact we have a single exit > from the loop). Iterating over exit dest PHIs and determining whether the > exit edge DEF is inside the loop part it may not be should be enough. > > + gcc_assert (single_succ_p (pre_header)); > > that should be always true. > > + gsi_remove (&gsi, false); > + bb = guard->dest; > + remove_edge (guard); > + /* Update dominance for destination of GUARD. */ > + if (EDGE_COUNT (bb->preds) == 0) > + { > + basic_block s_bb; > + gcc_assert (single_succ_p (bb)); > + s_bb = single_succ (bb); > + delete_basic_block (bb); > + if (single_pred_p (s_bb)) > + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); > > all this massaging should be simplified by leaving it to CFG cleanup by > simply adjusting the CONDs condition to always true/false. There is > gimple_cond_make_{true,false} () for this (would be nice to have a variant > taking a bool). > > + new_edge = make_edge (pre_header, exit->dest, flags); > + if (fix_dom_of_exit) > + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); > + update_stmt (gsi_stmt (gsi)); > > the update_stmt should be not necessary, it's done by gsi_insert_after already. > > + /* Add NEW_ADGE argument for all phi in post-header block. */ > + bb = exit->dest; > + for (gphi_iterator gsi = gsi_start_phis (bb); > + !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gphi *phi = gsi.phi (); > + /* edge_iterator ei; */ > + tree arg; > + if (virtual_operand_p (gimple_phi_result (phi))) > + { > + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); > + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); > + } > + else > + { > + /* Use exit edge argument. */ > + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); > + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); > > Hum. How is it ok to use the exit edge argument for the edge that skips > the loop? Why can't you always use the pre-header edge value? > That is, if we have > > for(i=0;i { > if (n > 0) > { > for (;;) > { > } > } > } > ... = i; > > then i is used after the loop and the correct value to use if > n > 0 is false is '0'. Maybe this way we can also relax > what check_exit_phi does? IMHO the only restriction is > if sth defined inside the loop before the header check for > the inner loop is used after the loop. > > Thanks, > Richard. > >> Thanks. >> >> ChangeLog: >> 2015-09-30 Yuri Rumyantsev >> >> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >> "cfghooks.h", add prototypes for introduced new functions. >> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >> checks on ability of loop unswitching to tree_unswitch_single_loop; >> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >> on innermost loop check. >> (tree_unswitch_single_loop): Add all required checks on ability of >> loop unswitching under zero recursive level guard. >> (tree_unswitch_outer_loop): New function. >> (find_loop_guard): Likewise. >> (empty_bb_without_guard_p): Likewise. >> (used_outside_loop_p): Likewise. >> (hoist_guard): Likewise. >> (check_exit_phi): Likewise. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/loop-unswitch-2.c: New test. >> >> 2015-09-16 11:26 GMT+03:00 Richard Biener : >>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >>> this hoisting in loop header copying. Integrating it >>> with LIM would be a better fit eventually. >>> >>> Note that we did agree to go forward with your original patch just >>> making it more "generically" perform outer loop >>> unswitching. Did you explore that idea further? >>> >>> >>> >>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>>> Thanks Richard. >>>> >>>> I found one more issue that could not be fixed simply. In 23855 you >>>> consider the following test-case: >>>> void foo(int *ie, int *je, double *x) >>>> { >>>> int i, j; >>>> for (j=0; j<*je; ++j) >>>> for (i=0; i<*ie; ++i) >>>> x[i+j] = 0.0; >>>> } >>>> and proposed to hoist up a check on *ie out of loop. It requires >>>> memref alias analysis since in general x and ie can alias (if their >>>> types are compatible - int *ie & int * x). Such analysis is performed >>>> by pre or lim passes. Without such analysis we can not hoist a test on >>>> non-zero for *ie out of loop using 238565 patch. >>>> The second concern is that proposed copy header algorithm changes >>>> loop structure significantly and it is not accepted by vectorizer >>>> since latch is not empty (such transformation assumes loop peeling for >>>> one iteration. So I can propose to implement simple guard hoisting >>>> without copying header and tail blocks (if it is possible). >>>> >>>> I will appreciate you for any advice or help since without such >>>> hoisting we are not able to perform outer loop vectorization for >>>> important benchmark. >>>> and >>>> >>>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>>> Hi Richard, >>>>>> >>>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>>> discovered thta it does not work properly. >>>>>> So I wonder is it tested patch and it should work? >>>>> >>>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>>> >>>>>> Should it accept for hoisting the following loop nest >>>>>> for (i=0; i>>>>> s = 0; >>>>>> for (j=0; j>>>>> s += a[i] * b[j]; >>>>>> c[i] = s; >>>>>> } >>>>>> Note that i-loop will nit be empty if m is equal to 0. >>>>> >>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>>> >>>>> Richard. >>>>> >>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>>> comparison with my. >>>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>>> vecctorization. >>>>>>> >>>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>>> is sensible. >>>>>>> >>>>>>>> Note that your approach will not hoist invariant >>>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>>> won't be empty for taken branch. >>>>>>> >>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>>> is originally Zdenek Dvoraks patch. >>>>>>> >>>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>>> other pass. >>>>>>> >>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>>> get rid of them? >>>>>>> >>>>>>>> What is your opinion? >>>>>>> >>>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>>> (or similar) cases then we should make it a lot more general than >>>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>>> us from considering unswitching non-innermost loops in general. >>>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>>> depth of a loop nest to unswitch). >>>>>>> >>>>>>> So my first thought when seeing your patch still holds - the patch >>>>>>> looks very much too specific. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Yuri. >>>>>>>> >>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>>> Hi Richard, >>>>>>>>>> >>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>>> nest as >>>>>>>>>> for (i=0; i<10; i++) >>>>>>>>>> for (j=0;j>>>>>>>>> for (k=0;k<20;k++) >>>>>>>>>> ... >>>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>>> >>>>>>>>>> Here is dump for your second test-case: >>>>>>>>>> >>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>> { >>>>>>>>>> int i, j; >>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>> } >>>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>>> ;; Unswitching outer loop >>>>>>>>> >>>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>>> unknown state...) >>>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>>> expensive compared >>>>>>>>> to "moving" the invariant condition. >>>>>>>>> >>>>>>>>> In your patch: >>>>>>>>> >>>>>>>>> + if (!nloop->force_vectorize) >>>>>>>>> + nloop->force_vectorize = true; >>>>>>>>> + if (loop->safelen != 0) >>>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>>> >>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>>> copy_loop_info. >>>>>>>>> >>>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>>> >>>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>>> >>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>>> >>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>>> + outer-loop. */ >>>>>>>>> + cleanup_tree_cfg (); >>>>>>>>> >>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>>> need to invoke CFG cleanup here? >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Yuri. >>>>>>>>>> >>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>>> acc = 0; >>>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>>> acc <<= 1; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>>> to patch). >>>>>>>>>>>> >>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>> >>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>> >>>>>>>>>>> I think this is >>>>>>>>>>> >>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>>> >>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>>> back to that patch. >>>>>>>>>>> >>>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>>> loops of innermost loops. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> ChangeLog: >>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>>> >>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>>> >>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.