From: Richard Biener <richard.guenther@gmail.com>
To: Yuri Rumyantsev <ysrumyan@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Unswitching outer loops.
Date: Mon, 05 Oct 2015 10:57:00 -0000 [thread overview]
Message-ID: <CAFiYyc1RS-Yjr=J__J3u5=nmbD+d9vxJgc198um8ZA_EHn-v2g@mail.gmail.com> (raw)
In-Reply-To: <CAEoMCqQzyKtyT5N_LCsO3w9W65Tz4RcWXFp52ek=Z3r06AFdJQ@mail.gmail.com>
On Wed, Sep 30, 2015 at 12:46 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi Richard,
>
> I re-designed outer loop unswitching using basic idea of 23855 patch -
> hoist invariant guard if loop is empty without guard. Note that this
> was added to loop unswitching pass with simple modifications - using
> another loop iterator etc.
>
> Bootstrap and regression testing did not show any new failures.
> What is your opinion?
Overall it looks good. Some comments below - a few more testcases would
be nice as well.
+ /* Loop must not be infinite. */
+ if (!finite_loop_p (loop))
+ return false;
why's that?
+ body = get_loop_body_in_dom_order (loop);
+ for (i = 0; i < loop->num_nodes; i++)
+ {
+ if (body[i]->loop_father != loop)
+ continue;
+ if (!empty_bb_without_guard_p (loop, body[i]))
I wonder if there is a better way to iterate over the interesting
blocks and PHIs
we need to check for side-effects (and thus we maybe can avoid gathering
the loop in DOM order).
+ FOR_EACH_SSA_TREE_OPERAND (name, stmt, op_iter, SSA_OP_DEF)
+ {
+ if (may_be_used_outside
may_be_used_outside can be hoisted above the loop. I wonder if we can take
advantage of loop-closed SSA form here (and the fact we have a single exit
from the loop). Iterating over exit dest PHIs and determining whether the
exit edge DEF is inside the loop part it may not be should be enough.
+ gcc_assert (single_succ_p (pre_header));
that should be always true.
+ gsi_remove (&gsi, false);
+ bb = guard->dest;
+ remove_edge (guard);
+ /* Update dominance for destination of GUARD. */
+ if (EDGE_COUNT (bb->preds) == 0)
+ {
+ basic_block s_bb;
+ gcc_assert (single_succ_p (bb));
+ s_bb = single_succ (bb);
+ delete_basic_block (bb);
+ if (single_pred_p (s_bb))
+ set_immediate_dominator (CDI_DOMINATORS, s_bb, single_pred (s_bb));
all this massaging should be simplified by leaving it to CFG cleanup by
simply adjusting the CONDs condition to always true/false. There is
gimple_cond_make_{true,false} () for this (would be nice to have a variant
taking a bool).
+ new_edge = make_edge (pre_header, exit->dest, flags);
+ if (fix_dom_of_exit)
+ set_immediate_dominator (CDI_DOMINATORS, exit->dest, pre_header);
+ update_stmt (gsi_stmt (gsi));
the update_stmt should be not necessary, it's done by gsi_insert_after already.
+ /* Add NEW_ADGE argument for all phi in post-header block. */
+ bb = exit->dest;
+ for (gphi_iterator gsi = gsi_start_phis (bb);
+ !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gphi *phi = gsi.phi ();
+ /* edge_iterator ei; */
+ tree arg;
+ if (virtual_operand_p (gimple_phi_result (phi)))
+ {
+ arg = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (loop));
+ add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
+ }
+ else
+ {
+ /* Use exit edge argument. */
+ arg = PHI_ARG_DEF_FROM_EDGE (phi, exit);
+ add_phi_arg (phi, arg, new_edge, UNKNOWN_LOCATION);
Hum. How is it ok to use the exit edge argument for the edge that skips
the loop? Why can't you always use the pre-header edge value?
That is, if we have
for(i=0;i<m;++i)
{
if (n > 0)
{
for (;;)
{
}
}
}
... = i;
then i is used after the loop and the correct value to use if
n > 0 is false is '0'. Maybe this way we can also relax
what check_exit_phi does? IMHO the only restriction is
if sth defined inside the loop before the header check for
the inner loop is used after the loop.
Thanks,
Richard.
> Thanks.
>
> ChangeLog:
> 2015-09-30 Yuri Rumyantsev <ysrumyan@gmail.com>
>
> * tree-ssa-loop-unswitch.c: Include "gimple-iterator.h" and
> "cfghooks.h", add prototypes for introduced new functions.
> (tree_ssa_unswitch_loops): Use from innermost loop iterator, move all
> checks on ability of loop unswitching to tree_unswitch_single_loop;
> invoke tree_unswitch_single_loop or tree_unswitch_outer_loop depending
> on innermost loop check.
> (tree_unswitch_single_loop): Add all required checks on ability of
> loop unswitching under zero recursive level guard.
> (tree_unswitch_outer_loop): New function.
> (find_loop_guard): Likewise.
> (empty_bb_without_guard_p): Likewise.
> (used_outside_loop_p): Likewise.
> (hoist_guard): Likewise.
> (check_exit_phi): Likewise.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/loop-unswitch-2.c: New test.
>
> 2015-09-16 11:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> Yeah, as said, the patch wasn't fully ready and it also felt odd to do
>> this hoisting in loop header copying. Integrating it
>> with LIM would be a better fit eventually.
>>
>> Note that we did agree to go forward with your original patch just
>> making it more "generically" perform outer loop
>> unswitching. Did you explore that idea further?
>>
>>
>>
>> On Tue, Sep 15, 2015 at 6:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks Richard.
>>>
>>> I found one more issue that could not be fixed simply. In 23855 you
>>> consider the following test-case:
>>> void foo(int *ie, int *je, double *x)
>>> {
>>> int i, j;
>>> for (j=0; j<*je; ++j)
>>> for (i=0; i<*ie; ++i)
>>> x[i+j] = 0.0;
>>> }
>>> and proposed to hoist up a check on *ie out of loop. It requires
>>> memref alias analysis since in general x and ie can alias (if their
>>> types are compatible - int *ie & int * x). Such analysis is performed
>>> by pre or lim passes. Without such analysis we can not hoist a test on
>>> non-zero for *ie out of loop using 238565 patch.
>>> The second concern is that proposed copy header algorithm changes
>>> loop structure significantly and it is not accepted by vectorizer
>>> since latch is not empty (such transformation assumes loop peeling for
>>> one iteration. So I can propose to implement simple guard hoisting
>>> without copying header and tail blocks (if it is possible).
>>>
>>> I will appreciate you for any advice or help since without such
>>> hoisting we are not able to perform outer loop vectorization for
>>> important benchmark.
>>> and
>>>
>>> 2015-09-15 14:22 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Thu, Sep 3, 2015 at 6:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> I started learning, tuning and debugging patch proposed in 23855 and
>>>>> discovered thta it does not work properly.
>>>>> So I wonder is it tested patch and it should work?
>>>>
>>>> I don't remember, but as it wasn't committed it certainly wasn't ready.
>>>>
>>>>> Should it accept for hoisting the following loop nest
>>>>> for (i=0; i<n; i++) {
>>>>> s = 0;
>>>>> for (j=0; j<m; j++)
>>>>> s += a[i] * b[j];
>>>>> c[i] = s;
>>>>> }
>>>>> Note that i-loop will nit be empty if m is equal to 0.
>>>>
>>>> if m is equal to 0 then we still have the c[i] = s store, no? Of course
>>>> we could unswitch the outer loop on m == 0 but simple hoisting wouldn't work.
>>>>
>>>> Richard.
>>>>
>>>>> 2015-08-03 10:27 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>> On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> I learned your updated patch for 23825 and it is more general in
>>>>>>> comparison with my.
>>>>>>> I'd like to propose you a compromise - let's consider my patch only
>>>>>>> for force-vectorize outer loop only to allow outer-loop
>>>>>>> vecctorization.
>>>>>>
>>>>>> I don't see why we should special-case that if the approach in 23825
>>>>>> is sensible.
>>>>>>
>>>>>>> Note that your approach will not hoist invariant
>>>>>>> guards if loops contains something else except for inner-loop, i.e. it
>>>>>>> won't be empty for taken branch.
>>>>>>
>>>>>> Yes, it does not perform unswitching but guard hoisting. Note that this
>>>>>> is originally Zdenek Dvoraks patch.
>>>>>>
>>>>>>> I also would like to answer on your last question - CFG cleanup is
>>>>>>> invoked to perform deletion of single-argument phi nodes from tail
>>>>>>> block through substitution - such phi's prevent outer-loop
>>>>>>> vectorization. But it is clear that such transformation can be done
>>>>>>> other pass.
>>>>>>
>>>>>> Hmm, I wonder why the copy_prop pass after unswitching does not
>>>>>> get rid of them?
>>>>>>
>>>>>>> What is your opinion?
>>>>>>
>>>>>> My opinion is that if we want to enhance unswitching to catch this
>>>>>> (or similar) cases then we should make it a lot more general than
>>>>>> your pattern-matching approach. I see nothing that should prevent
>>>>>> us from considering unswitching non-innermost loops in general.
>>>>>> It should be only a cost consideration to not do non-innermost loop
>>>>>> unswitching (in addition to maybe a --param specifying the maximum
>>>>>> depth of a loop nest to unswitch).
>>>>>>
>>>>>> So my first thought when seeing your patch still holds - the patch
>>>>>> looks very much too specific.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2015-07-28 13:50 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>> On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> I checked that both test-cases from 23855 are sucessfully unswitched
>>>>>>>>> by proposed patch. I understand that it does not catch deeper loop
>>>>>>>>> nest as
>>>>>>>>> for (i=0; i<10; i++)
>>>>>>>>> for (j=0;j<n;j++)
>>>>>>>>> for (k=0;k<20;k++)
>>>>>>>>> ...
>>>>>>>>> but duplication of middle-loop does not look reasonable.
>>>>>>>>>
>>>>>>>>> Here is dump for your second test-case:
>>>>>>>>>
>>>>>>>>> void foo(int *ie, int *je, double *x)
>>>>>>>>> {
>>>>>>>>> int i, j;
>>>>>>>>> for (j=0; j<*je; ++j)
>>>>>>>>> for (i=0; i<*ie; ++i)
>>>>>>>>> x[i+j] = 0.0;
>>>>>>>>> }
>>>>>>>>> grep -i unswitch t6.c.119t.unswitch
>>>>>>>>> ;; Unswitching outer loop
>>>>>>>>
>>>>>>>> I was saying that why go with a limited approach when a patch (in
>>>>>>>> unknown state...)
>>>>>>>> is available that does it more generally? Also unswitching is quite
>>>>>>>> expensive compared
>>>>>>>> to "moving" the invariant condition.
>>>>>>>>
>>>>>>>> In your patch:
>>>>>>>>
>>>>>>>> + if (!nloop->force_vectorize)
>>>>>>>> + nloop->force_vectorize = true;
>>>>>>>> + if (loop->safelen != 0)
>>>>>>>> + nloop->safelen = loop->safelen;
>>>>>>>>
>>>>>>>> I see no guard on force_vectorize so = true looks bogus here. Please just use
>>>>>>>> copy_loop_info.
>>>>>>>>
>>>>>>>> + if (integer_nonzerop (cond_new))
>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node);
>>>>>>>> + else if (integer_zerop (cond_new))
>>>>>>>> + gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node);
>>>>>>>>
>>>>>>>> gimple_cond_make_true/false (cond_stmt);
>>>>>>>>
>>>>>>>> btw, seems odd that we have to recompute which loop is the true / false variant
>>>>>>>> when we just fed a guard condition to loop_version. Can't we statically
>>>>>>>> determine whether loop or nloop has the in-loop condition true or false?
>>>>>>>>
>>>>>>>> + /* Clean-up cfg to remove useless one-argument phi in exit block of
>>>>>>>> + outer-loop. */
>>>>>>>> + cleanup_tree_cfg ();
>>>>>>>>
>>>>>>>> I know unswitching is already O(number-of-unswitched-loops * size-of-function)
>>>>>>>> because it updates SSA form after each individual unswitching (and it does that
>>>>>>>> because it invokes itself recursively on unswitched loops). But do you really
>>>>>>>> need to invoke CFG cleanup here?
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>> 2015-07-14 14:06 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>>>>>>> On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Here is presented simple transformation which tries to hoist out of
>>>>>>>>>>> outer-loop a check on zero trip count for inner-loop. This is very
>>>>>>>>>>> restricted transformation since it accepts outer-loops with very
>>>>>>>>>>> simple cfg, as for example:
>>>>>>>>>>> acc = 0;
>>>>>>>>>>> for (i = 1; i <= m; i++) {
>>>>>>>>>>> for (j = 0; j < n; j++)
>>>>>>>>>>> if (l[j] == i) { v[j] = acc; acc++; };
>>>>>>>>>>> acc <<= 1;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Note that degenerative outer loop (without inner loop) will be
>>>>>>>>>>> completely deleted as dead code.
>>>>>>>>>>> The main goal of this transformation was to convert outer-loop to form
>>>>>>>>>>> accepted by outer-loop vectorization (such test-case is also included
>>>>>>>>>>> to patch).
>>>>>>>>>>>
>>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>>
>>>>>>>>>>> Is it OK for trunk?
>>>>>>>>>>
>>>>>>>>>> I think this is
>>>>>>>>>>
>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855
>>>>>>>>>>
>>>>>>>>>> as well. It has a patch adding a invariant loop guard hoisting
>>>>>>>>>> phase to loop-header copying. Yeah, it needs updating to
>>>>>>>>>> trunk again I suppose. It's always non-stage1 when I come
>>>>>>>>>> back to that patch.
>>>>>>>>>>
>>>>>>>>>> Your patch seems to be very specific and only handles outer
>>>>>>>>>> loops of innermost loops.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>> 2015-07-10 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
>>>>>>>>>>> "gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
>>>>>>>>>>> (tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
>>>>>>>>>>> (tree_unswitch_outer_loop): New function.
>>>>>>>>>>>
>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>> * gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
>>>>>>>>>>> * gcc.dg/vect/vect-outer-simd-3.c: New test.
next prev parent reply other threads:[~2015-10-05 10:57 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-10 10:03 Yuri Rumyantsev
2015-07-14 11:07 ` Richard Biener
2015-07-23 15:21 ` Yuri Rumyantsev
2015-07-28 11:00 ` Richard Biener
2015-07-31 12:07 ` Yuri Rumyantsev
2015-07-31 15:54 ` Jeff Law
2015-08-03 7:27 ` Richard Biener
[not found] ` <CAEoMCqSorkh1WmFtVB_huC2hbcVr8uc1EYaRaNVe1g+5hVuzPw@mail.gmail.com>
[not found] ` <CAFiYyc1nCCyF-4BH2hPWkKpmXnaQFQ34RMM5TTuHjZxZ25crrA@mail.gmail.com>
[not found] ` <CAEoMCqSRsER9ZGgnX9eJgZJyN4EwkpxzWWk1FHRxWNiEW0HVCg@mail.gmail.com>
[not found] ` <CAFiYyc2O9i690A0LZ0+jEOP8nkyz8Btc0YAb469aMgnRaVsmsQ@mail.gmail.com>
2015-09-30 11:40 ` Yuri Rumyantsev
2015-10-05 10:57 ` Richard Biener [this message]
2015-10-05 13:13 ` Yuri Rumyantsev
2015-10-06 7:59 ` Richard Biener
2015-10-06 11:41 ` Yuri Rumyantsev
2015-10-06 12:21 ` Richard Biener
2015-10-07 9:53 ` Yuri Rumyantsev
2015-10-07 15:26 ` Yuri Rumyantsev
2015-10-08 12:31 ` Richard Biener
2015-10-09 19:05 ` H.J. Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFiYyc1RS-Yjr=J__J3u5=nmbD+d9vxJgc198um8ZA_EHn-v2g@mail.gmail.com' \
--to=richard.guenther@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=izamyatin@gmail.com \
--cc=ysrumyan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).