From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129896 invoked by alias); 6 Oct 2015 07:59:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129809 invoked by uid 89); 6 Oct 2015 07:59:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f181.google.com Received: from mail-yk0-f181.google.com (HELO mail-yk0-f181.google.com) (209.85.160.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 06 Oct 2015 07:59:02 +0000 Received: by ykdg206 with SMTP id g206so194072263ykd.1 for ; Tue, 06 Oct 2015 00:59:00 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.129.125.6 with SMTP id y6mr28922885ywc.5.1444118340555; Tue, 06 Oct 2015 00:59:00 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Tue, 6 Oct 2015 00:59:00 -0700 (PDT) In-Reply-To: References: Date: Tue, 06 Oct 2015 07:59:00 -0000 Message-ID: Subject: Re: [PATCH] Unswitching outer loops. From: Richard Biener To: Yuri Rumyantsev Cc: gcc-patches , Igor Zamyatin Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00502.txt.bz2 On Mon, Oct 5, 2015 at 3:13 PM, Yuri Rumyantsev wrote: > Thanks Richard. > I'd like to answer on your last comment related to using of exit edge > argument for edge that skips loop. > Let's consider the following test-case: > > #include > #define N 32 > float *foo(int ustride, int size, float *src) > { > float *buffer, *p; > int i, k; > > if (!src) > return NULL; > > buffer = (float *) malloc(N * size * sizeof(float)); > > if(buffer) > for(i=0, p=buffer; i for(k=0; k *p++ = src[k]; > > return buffer; > } > > Before adding new edge we have in post-header bb: > : > # _6 = PHI <0B(8), buffer_20(16)> > return _6; > > It is clear that we must preserve function semantic and transform it to > _6 = PHI <0B(12), buffer_19(9), buffer_19(4)> Ah, yeah. I was confusing the loop exit of the inner vs. the outer loop. Richard. > > 2015-10-05 13:57 GMT+03:00 Richard Biener : >> On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev wrote: >>> Hi Richard, >>> >>> I re-designed outer loop unswitching using basic idea of 23855 patch - >>> hoist invariant guard if loop is empty without guard. Note that this >>> was added to loop unswitching pass with simple modifications - using >>> another loop iterator etc. >>> >>> Bootstrap and regression testing did not show any new failures. >>> What is your opinion? >> >> Overall it looks good. Some comments below - a few more testcases would >> be nice as well. >> >> + /* Loop must not be infinite. */ >> + if (!finite_loop_p (loop)) >> + return false; >> >> why's that? >> >> + body = get_loop_body_in_dom_order (loop); >> + for (i = 0; i < loop->num_nodes; i++) >> + { >> + if (body[i]->loop_father != loop) >> + continue; >> + if (!empty_bb_without_guard_p (loop, body[i])) >> >> I wonder if there is a better way to iterate over the interesting >> blocks and PHIs >> we need to check for side-effects (and thus we maybe can avoid gathering >> the loop in DOM order). >> >> + FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF) >> + { >> + if (may_be_used_outside >> >> may_be_used_outside can be hoisted above the loop. I wonder if we can take >> advantage of loop-closed SSA form here (and the fact we have a single exit >> from the loop). Iterating over exit dest PHIs and determining whether the >> exit edge DEF is inside the loop part it may not be should be enough. >> >> + gcc_assert (single_succ_p (pre_header)); >> >> that should be always true. >> >> + gsi_remove (&gsi, false); >> + bb = guard->dest; >> + remove_edge (guard); >> + /* Update dominance for destination of GUARD. */ >> + if (EDGE_COUNT (bb->preds) == 0) >> + { >> + basic_block s_bb; >> + gcc_assert (single_succ_p (bb)); >> + s_bb = single_succ (bb); >> + delete_basic_block (bb); >> + if (single_pred_p (s_bb)) >> + set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb)); >> >> all this massaging should be simplified by leaving it to CFG cleanup by >> simply adjusting the CONDs condition to always true/false. There is >> gimple_cond_make_{true,false} () for this (would be nice to have a variant >> taking a bool). >> >> + new_edge = make_edge (pre_header, exit->dest, flags); >> + if (fix_dom_of_exit) >> + set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header); >> + update_stmt (gsi_stmt (gsi)); >> >> the update_stmt should be not necessary, it's done by gsi_insert_after already. >> >> + /* Add NEW_ADGE argument for all phi in post-header block. */ >> + bb = exit->dest; >> + for (gphi_iterator gsi = gsi_start_phis (bb); >> + !gsi_end_p (gsi); gsi_next (&gsi)) >> + { >> + gphi *phi = gsi.phi (); >> + /* edge_iterator ei; */ >> + tree arg; >> + if (virtual_operand_p (gimple_phi_result (phi))) >> + { >> + arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop)); >> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >> + } >> + else >> + { >> + /* Use exit edge argument. */ >> + arg = PHI_ARG_DEF_FROM_EDGE (phi, exit); >> + add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION); >> >> Hum. How is it ok to use the exit edge argument for the edge that skips >> the loop? Why can't you always use the pre-header edge value? >> That is, if we have >> >> for(i=0;i> { >> if (n > 0) >> { >> for (;;) >> { >> } >> } >> } >> ... = i; >> >> then i is used after the loop and the correct value to use if >> n > 0 is false is '0'. Maybe this way we can also relax >> what check_exit_phi does? IMHO the only restriction is >> if sth defined inside the loop before the header check for >> the inner loop is used after the loop. >> >> Thanks, >> Richard. >> >>> Thanks. >>> >>> ChangeLog: >>> 2015-09-30 Yuri Rumyantsev >>> >>> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and >>> "cfghooks.h", add prototypes for introduced new functions. >>> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all >>> checks on ability of loop unswitching to tree_unswitch_single_loop; >>> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending >>> on innermost loop check. >>> (tree_unswitch_single_loop): Add all required checks on ability of >>> loop unswitching under zero recursive level guard. >>> (tree_unswitch_outer_loop): New function. >>> (find_loop_guard): Likewise. >>> (empty_bb_without_guard_p): Likewise. >>> (used_outside_loop_p): Likewise. >>> (hoist_guard): Likewise. >>> (check_exit_phi): Likewise. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.dg/loop-unswitch-2.c: New test. >>> >>> 2015-09-16 11:26 GMT+03:00 Richard Biener : >>>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do >>>> this hoisting in loop header copying. Integrating it >>>> with LIM would be a better fit eventually. >>>> >>>> Note that we did agree to go forward with your original patch just >>>> making it more "generically" perform outer loop >>>> unswitching. Did you explore that idea further? >>>> >>>> >>>> >>>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev wrote: >>>>> Thanks Richard. >>>>> >>>>> I found one more issue that could not be fixed simply. In 23855 you >>>>> consider the following test-case: >>>>> void foo(int *ie, int *je, double *x) >>>>> { >>>>> int i, j; >>>>> for (j=0; j<*je; ++j) >>>>> for (i=0; i<*ie; ++i) >>>>> x[i+j] = 0.0; >>>>> } >>>>> and proposed to hoist up a check on *ie out of loop. It requires >>>>> memref alias analysis since in general x and ie can alias (if their >>>>> types are compatible - int *ie & int * x). Such analysis is performed >>>>> by pre or lim passes. Without such analysis we can not hoist a test on >>>>> non-zero for *ie out of loop using 238565 patch. >>>>> The second concern is that proposed copy header algorithm changes >>>>> loop structure significantly and it is not accepted by vectorizer >>>>> since latch is not empty (such transformation assumes loop peeling for >>>>> one iteration. So I can propose to implement simple guard hoisting >>>>> without copying header and tail blocks (if it is possible). >>>>> >>>>> I will appreciate you for any advice or help since without such >>>>> hoisting we are not able to perform outer loop vectorization for >>>>> important benchmark. >>>>> and >>>>> >>>>> 2015-09-15 14:22 GMT+03:00 Richard Biener : >>>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> I started learning, tuning and debugging patch proposed in 23855 and >>>>>>> discovered thta it does not work properly. >>>>>>> So I wonder is it tested patch and it should work? >>>>>> >>>>>> I don't remember, but as it wasn't committed it certainly wasn't ready. >>>>>> >>>>>>> Should it accept for hoisting the following loop nest >>>>>>> for (i=0; i>>>>>> s = 0; >>>>>>> for (j=0; j>>>>>> s += a[i] * b[j]; >>>>>>> c[i] = s; >>>>>>> } >>>>>>> Note that i-loop will nit be empty if m is equal to 0. >>>>>> >>>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course >>>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener : >>>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev wrote: >>>>>>>>> Hi Richard, >>>>>>>>> >>>>>>>>> I learned your updated patch for 23825 and it is more general in >>>>>>>>> comparison with my. >>>>>>>>> I'd like to propose you a compromise - let's consider my patch only >>>>>>>>> for force-vectorize outer loop only to allow outer-loop >>>>>>>>> vecctorization. >>>>>>>> >>>>>>>> I don't see why we should special-case that if the approach in 23825 >>>>>>>> is sensible. >>>>>>>> >>>>>>>>> Note that your approach will not hoist invariant >>>>>>>>> guards if loops contains something else except for inner-loop, i.e. it >>>>>>>>> won't be empty for taken branch. >>>>>>>> >>>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this >>>>>>>> is originally Zdenek Dvoraks patch. >>>>>>>> >>>>>>>>> I also would like to answer on your last question - CFG cleanup is >>>>>>>>> invoked to perform deletion of single-argument phi nodes from tail >>>>>>>>> block through substitution - such phi's prevent outer-loop >>>>>>>>> vectorization. But it is clear that such transformation can be done >>>>>>>>> other pass. >>>>>>>> >>>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not >>>>>>>> get rid of them? >>>>>>>> >>>>>>>>> What is your opinion? >>>>>>>> >>>>>>>> My opinion is that if we want to enhance unswitching to catch this >>>>>>>> (or similar) cases then we should make it a lot more general than >>>>>>>> your pattern-matching approach. I see nothing that should prevent >>>>>>>> us from considering unswitching non-innermost loops in general. >>>>>>>> It should be only a cost consideration to not do non-innermost loop >>>>>>>> unswitching (in addition to maybe a --param specifying the maximum >>>>>>>> depth of a loop nest to unswitch). >>>>>>>> >>>>>>>> So my first thought when seeing your patch still holds - the patch >>>>>>>> looks very much too specific. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener : >>>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev wrote: >>>>>>>>>>> Hi Richard, >>>>>>>>>>> >>>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched >>>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop >>>>>>>>>>> nest as >>>>>>>>>>> for (i=0; i<10; i++) >>>>>>>>>>> for (j=0;j>>>>>>>>>> for (k=0;k<20;k++) >>>>>>>>>>> ... >>>>>>>>>>> but duplication of middle-loop does not look reasonable. >>>>>>>>>>> >>>>>>>>>>> Here is dump for your second test-case: >>>>>>>>>>> >>>>>>>>>>> void foo(int *ie, int *je, double *x) >>>>>>>>>>> { >>>>>>>>>>> int i, j; >>>>>>>>>>> for (j=0; j<*je; ++j) >>>>>>>>>>> for (i=0; i<*ie; ++i) >>>>>>>>>>> x[i+j] = 0.0; >>>>>>>>>>> } >>>>>>>>>>> grep -i unswitch t6.c.119t.unswitch >>>>>>>>>>> ;; Unswitching outer loop >>>>>>>>>> >>>>>>>>>> I was saying that why go with a limited approach when a patch (in >>>>>>>>>> unknown state...) >>>>>>>>>> is available that does it more generally? Also unswitching is quite >>>>>>>>>> expensive compared >>>>>>>>>> to "moving" the invariant condition. >>>>>>>>>> >>>>>>>>>> In your patch: >>>>>>>>>> >>>>>>>>>> + if (!nloop->force_vectorize) >>>>>>>>>> + nloop->force_vectorize = true; >>>>>>>>>> + if (loop->safelen != 0) >>>>>>>>>> + nloop->safelen = loop->safelen; >>>>>>>>>> >>>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use >>>>>>>>>> copy_loop_info. >>>>>>>>>> >>>>>>>>>> + if (integer_nonzerop (cond_new)) >>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); >>>>>>>>>> + else if (integer_zerop (cond_new)) >>>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); >>>>>>>>>> >>>>>>>>>> gimple_cond_make_true/false (cond_stmt); >>>>>>>>>> >>>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant >>>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically >>>>>>>>>> determine whether loop or nloop has the in-loop condition true or false? >>>>>>>>>> >>>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of >>>>>>>>>> + outer-loop. */ >>>>>>>>>> + cleanup_tree_cfg (); >>>>>>>>>> >>>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function) >>>>>>>>>> because it updates SSA form after each individual unswitching (and it does that >>>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really >>>>>>>>>> need to invoke CFG cleanup here? >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> Yuri. >>>>>>>>>>> >>>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener : >>>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev wrote: >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of >>>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very >>>>>>>>>>>>> restricted transformation since it accepts outer-loops with very >>>>>>>>>>>>> simple cfg, as for example: >>>>>>>>>>>>> acc = 0; >>>>>>>>>>>>> for (i = 1; i <= m; i++) { >>>>>>>>>>>>> for (j = 0; j < n; j++) >>>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; }; >>>>>>>>>>>>> acc <<= 1; >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be >>>>>>>>>>>>> completely deleted as dead code. >>>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form >>>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included >>>>>>>>>>>>> to patch). >>>>>>>>>>>>> >>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>>> >>>>>>>>>>>>> Is it OK for trunk? >>>>>>>>>>>> >>>>>>>>>>>> I think this is >>>>>>>>>>>> >>>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 >>>>>>>>>>>> >>>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting >>>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to >>>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come >>>>>>>>>>>> back to that patch. >>>>>>>>>>>> >>>>>>>>>>>> Your patch seems to be very specific and only handles outer >>>>>>>>>>>> loops of innermost loops. >>>>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev >>>>>>>>>>>>> >>>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and >>>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop. >>>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop. >>>>>>>>>>>>> (tree_unswitch_outer_loop): New function. >>>>>>>>>>>>> >>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test. >>>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.