From: Richard Biener <rguenther@suse.de>
To: Xionghu Luo <luoxhu@linux.ibm.com>
Cc: segher@kernel.crashing.org, wschmidt@linux.ibm.com,
linkw@gcc.gnu.org, gcc-patches@gcc.gnu.org, hubicka@ucw.cz
Subject: Re: [PATCH] Fix loop split incorrect count and probability
Date: Tue, 26 Oct 2021 13:59:59 +0200 (CEST) [thread overview]
Message-ID: <npq92284-nns0-4057-n8s6-2qr240p44p0@fhfr.qr> (raw)
In-Reply-To: <3c9541c2-a9f5-3280-2b5c-6687889372fa@linux.ibm.com>
On Tue, 26 Oct 2021, Xionghu Luo wrote:
>
>
> On 2021/10/21 18:55, Richard Biener wrote:
> > On Thu, 21 Oct 2021, Xionghu Luo wrote:
> >
> >>
> >>
> >> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> >>>
> >>>
> >>> On 2021/9/23 20:17, Richard Biener wrote:
> >>>> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 2021/8/11 17:16, Richard Biener wrote:
> >>>>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2021/8/10 22:47, Richard Biener wrote:
> >>>>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>>>
> >>>>>>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>>>>>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>>>>>>> INV),
> >>>>>>>>>>> secondly, the original loop1 and loop2 count need be propotional from
> >>>>>>>>>>> the
> >>>>>>>>>>> original loop.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>>>>>>> ...
> >>>>>>>>>>> int prephitmp_16;
> >>>>>>>>>>> int prephitmp_25;
> >>>>>>>>>>>
> >>>>>>>>>>> <bb 2> [local count: 118111600]:
> >>>>>>>>>>> if (n_7(D) > 0)
> >>>>>>>>>>> goto <bb 4>; [89.00%]
> >>>>>>>>>>> else
> >>>>>>>>>>> goto <bb 3>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> <bb 3> [local count: 118111600]:
> >>>>>>>>>>> return;
> >>>>>>>>>>>
> >>>>>>>>>>> <bb 4> [local count: 105119324]:
> >>>>>>>>>>> pretmp_3 = ga;
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 5> [local count: 955630225]:
> >>>>>>>>>>> + <bb 5> [local count: 315357973]:
> >>>>>>>>>>> # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>>>>>> # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>>>>>> if (prephitmp_12 != 0)
> >>>>>>>>>>> goto <bb 6>; [33.00%]
> >>>>>>>>>>> else
> >>>>>>>>>>> goto <bb 7>; [67.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 6> [local count: 315357972]:
> >>>>>>>>>>> + <bb 6> [local count: 104068130]:
> >>>>>>>>>>> _2 = do_something ();
> >>>>>>>>>>> ga = _2;
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 7> [local count: 955630225]:
> >>>>>>>>>>> + <bb 7> [local count: 315357973]:
> >>>>>>>>>>> # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>>>>>> i_10 = inc (i_13);
> >>>>>>>>>>> if (n_7(D) > i_10)
> >>>>>>>>>>> goto <bb 21>; [89.00%]
> >>>>>>>>>>> else
> >>>>>>>>>>> goto <bb 11>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> <bb 11> [local count: 105119324]:
> >>>>>>>>>>> goto <bb 3>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 21> [local count: 850510901]:
> >>>>>>>>>>> + <bb 21> [local count: 280668596]:
> >>>>>>>>>>> if (prephitmp_12 != 0)
> >>>>>>>>>>> - goto <bb 20>; [100.00%]
> >>>>>>>>>>> + goto <bb 20>; [33.00%]
> >>>>>>>>>>> else
> >>>>>>>>>>> - goto <bb 19>; [INV]
> >>>>>>>>>>> + goto <bb 19>; [67.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 20> [local count: 850510901]:
> >>>>>>>>>>> + <bb 20> [local count: 280668596]:
> >>>>>>>>>>> goto <bb 5>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 19> [count: 0]:
> >>>>>>>>>>> + <bb 19> [local count: 70429947]:
> >>>>>>>>>>> # i_23 = PHI <i_10(21)>
> >>>>>>>>>>> # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 12> [local count: 955630225]:
> >>>>>>>>>>> + <bb 12> [local count: 640272252]:
> >>>>>>>>>>> # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>>>>>> # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>>>>>> i_22 = inc (i_15);
> >>>>>>>>>>> if (n_7(D) > i_22)
> >>>>>>>>>>> goto <bb 16>; [89.00%]
> >>>>>>>>>>> else
> >>>>>>>>>>> goto <bb 11>; [11.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> - <bb 16> [local count: 850510901]:
> >>>>>>>>>>> + <bb 16> [local count: 569842305]:
> >>>>>>>>>>> goto <bb 12>; [100.00%]
> >>>>>>>>>>>
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>>>
> >>>>>>>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>>>>>> (do_split_loop_on_cond): Likewise.
> >>>>>>>>>>> ---
> >>>>>>>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>>>>>> basic_block cond_bb;
> >>>>>>>>>
> >>>>>>>>> if (!initial_true)
> >>>>>>>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>>>> +
> >>>>>>>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>>>>>>> + ? EDGE_SUCC (bbs[i], 0)
> >>>>>>>>> + : EDGE_SUCC (bbs[i], 1);
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>>>>>>> - profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> - profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> - profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> - profile_probability::always
> >>>>>>>>>>> (),
> >>>>>>>>>>> + true_edge->probability,
> >>>>>>>>>>> +
> >>>>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>>>> + true_edge->probability,
> >>>>>>>>>>> +
> >>>>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>>>> true);
> >>>>>>>>>>
> >>>>>>>>>> there is no 'true_edge' variable at this point.
> >>>>>>>>>
> >>>>>>>>> Sorry, missed the above hunk when split the patch.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> gcc_assert (loop2);
> >>>>>>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1,
> >>>>>>>>>>> edge invar_branch)
> >>>>>>>>>>> initialize_original_copy_tables ();
> >>>>>>>>>>>
> >>>>>>>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>>>>>>> NULL,
> >>>>>>>>>>> - profile_probability::always (),
> >>>>>>>>>>> - profile_probability::never (),
> >>>>>>>>>>> - profile_probability::always (),
> >>>>>>>>>>> - profile_probability::always (),
> >>>>>>>>>>> + invar_branch->probability.invert
> >>>>>>>>>>> (),
> >>>>>>>>>>> + invar_branch->probability,
> >>>>>>>>>>> + invar_branch->probability.invert
> >>>>>>>>>>> (),
> >>>>>>>>>>> + invar_branch->probability,
> >>>>>>>>>>> true);
> >>>>>>>>>>> if (!loop2)
> >>>>>>>>>>> {
> >>>>>>>>>>
> >>>>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>>>>>>>>
> >>>>>>>>> split_loop faces similar issue though it sets the two branches to 100% vs
> >>>>>>>>> 100%
> >>>>>>>>> and no scaling which seems also incorrect.
> >>>>>>>>>
> >>>>>>>>>> Since loop versioning inserts a condition with the passed probabilities
> >>>>>>>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>>>>>>> probabilities passed look correct. It's just the scaling arguments
> >>>>>>>>>> that look wrong? This loop_version call should get a comment as to
> >>>>>>>>>> why we are passing probabilities the way we do.
> >>>>>>>>>
> >>>>>>>>> This optimization is transforming from:
> >>>>>>>>>
> >>>>>>>>> for (i = 0; i < n; i = inc (i))
> >>>>>>>>> {
> >>>>>>>>> if (ga)
> >>>>>>>>> ga = do_something ();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> to:
> >>>>>>>>>
> >>>>>>>>> for (i = 0; i < x; i = inc (i))
> >>>>>>>>> {
> >>>>>>>>> if (true)
> >>>>>>>>> ga = do_something ();
> >>>>>>>>> if (!ga)
> >>>>>>>>> break;
> >>>>>>>>> }
> >>>>>>>>> for (; i < n; i = inc (i))
> >>>>>>>>> {
> >>>>>>>>> if (false)
> >>>>>>>>> ga = do_something ();
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>>>>>>> statement to be 'true', after returning from loop_version, there is a
> >>>>>>>>> piece of code forcing the condition in second loop to be false,
> >>>>>>>>> and the original condition is moved from *in loop* to *exit edge*
> >>>>>>>>> between loop1 and loop2.
> >>>>>>>>
> >>>>>>>> Yes, one complication is that we use loop_version but do not retain
> >>>>>>>> the CFG it creates. Something like the vectorizers
> >>>>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>>>>>>> but then that code doesn't do any scaling yet. But then
> >>>>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>>>>>>> we could write a variant that simply doesn't mangle the CFG
> >>>>>>>> with a new condition switching between both loops but keeps them
> >>>>>>>> executed after each other ...
> >>>>>>>>
> >>>>>>>> As said, this adds to the confusion and some awkwardness.
> >>>>>>>
> >>>>>>> Then loop_version in loop split requires two types of variant, one
> >>>>>>> is to insert condition to loop preheader for 'split_loops' usage,
> >>>>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >>>>>>> usage, it needs one extra function to encapsulate these cfg alterations
> >>>>>>> from outside to inside.
> >>>>>>>
> >>>>>>> unswitching only execute one loop as it only moves the invariant condition
> >>>>>>> to first loop's pre-header. While 'split_loops' and
> >>>>>>> 'do_split_loop_on_cond'
> >>>>>>> may execute both loops no matter the condition is moved to the first loop's
> >>>>>>> preheader or exit.
> >>>>>>>
> >>>>>>> The condition stmt in loop unswitching is invariant, but it is variant
> >>>>>>> in loop splitting, that's why loop unswitching execute only one loop
> >>>>>>> but loop splitting executes both loops.
> >>>>>>>
> >>>>>>> Seems we need two condition arguments for loop_version, one for connecting
> >>>>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> >>>>>>> to loop2's header? Then it will be more generic for both unswitching pass
> >>>>>>> and splitting pass. Looks a bit complicated and conflicted with
> >>>>>>> loop_version's
> >>>>>>> comments:
> >>>>>>>
> >>>>>>> /* Main entry point for Loop Versioning transformation.
> >>>>>>>
> >>>>>>> This transformation given a condition and a loop, creates
> >>>>>>> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>>>>>>
> >>>>>>>
> >>>>>>> And this only works for loop split usage, those many other places
> >>>>>>> doesn't use loop_version like this?
> >>>>>>
> >>>>>> Yes, as said if you don't want the above CFG then you probably
> >>>>>> shouldn't use loop_version but instead use its building blocks
> >>>>>> (and some refactoring of loop_version can make that easier).
> >>>>>>
> >>>>>> I think splitting wants
> >>>>>>
> >>>>>> loop_copy1
> >>>>>> if (condition)
> >>>>>> loop_copy2
> >>>>>>
> >>>>>> IMHO it would be good to split 'loopify' into the actual loopification
> >>>>>> and the scaling. Basically make the part of loop_version that
> >>>>>> copies the loop on the header edge and creates a loop structure for
> >>>>>> it separate.
> >>>>>>
> >>>>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> >>>>>> (copy_loop_before).
> >>>>>>
> >>>>>
> >>>>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> >>>>> copying loops with single exit, it would cause many ICEs in it even
> >>>>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> >>>>> is NULL returning from single_exit (loop).). Seems loop_version is
> >>>>> more flexible for loop split.
> >>>>
> >>>> Hmm, sure - loop_version does not need to do anything special with
> >>>> exits since control flow either enters the original or the loop copy.
> >>>>
> >>>> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> >>>> control flow that enters _both_ loops, so it needs to have
> >>>> an edge from the first loop exit to the second loop entry.
> >>>>
> >>>> One could extend this to a region copy, copying eventual CFG merges
> >>>> of exits and specifying which exit of a SEME region is supposed
> >>>> to be connected to the original region entry.
> >>>>
> >>>> After all that's what loop splitting needs in the end - though not
> >>>> sure what exactly it does with more than one exit.
> >>>
> >>> In tree-ssa-loop-split.c, split_loop only accepts single exit loop,
> >>> the recently added split_loop_on_cond could process multiple exits loop.
> >>>
> >>> For example, do some changes to the loop-cond-split-1.c,
> >>>
> >>> int ga;
> >>> extern int a;
> >>> extern int b;
> >>> extern int c;
> >>>
> >>> void test1 (int n) {
> >>> int i;
> >>> for (i = 0; i < n; i = inc (i)). {
> >>> if (a+3 > 0)
> >>> break;
> >>>
> >>> if (ga)
> >>> ga = do_something ();
> >>>
> >>> for (int j = 0; j < n; j++)
> >>> {
> >>> b+=5;
> >>> if (b > c) break;
> >>> }
> >>> }
> >>> }
> >>>
> >>> the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2.
> >>> I am not sure whether it is valuable to do semi-invariant loop split to such
> >>> cases with multiple exits, but obviously the split_loop_on_cond is a special
> >>> case from split_loop both duplicate loop to
> >>> if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> >>> The only difference is condition1 is true for split_loop_on_cond.
> >>>
> >>>>
> >>>> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> >>>> which doesn't actually require a single exit but a single "important"
> >>>> exit. That might match how you treat multiple exits.
> >>>
> >>> gimple_duplicate_sese_region doesn't handle subloops and latches. Finally,
> >>> I found that loop_version is still much better
> >>> than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> >>> since it could handle all cases like multiple exits/subloops, etc. I did some
> >>> refactor to the code to introduce loop_version2 to create duplicated loops
> >>> with two input conditions as attached patch, is this reasonable enough?
> >>>
> >>> I also tried to copy the code in loop_version out of it to don't call loop_version
> >>> in loop_version2, but it seems useless with many duplicate code and NOT get rid
> >>> of creating "if (condition1) {loop_copy1}" at first?
> >>>
> >>>
> >>
> >> The previous attached patch 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> >> had issues when connecting two loops, reason is split_loop connect_loops at *exit* edge,
> >> do_split_loop_on_cond connect_loops at latch_edge. So they have different CFGs even
> >> both with two conditions :(. It seems not so that useful to also define another new function
> >> 'connect_loops_at_latch' given the limited usage in loop split only? And the loop_version2
> >> also looks not so general again ...
> >
> > All I wanted to say is that none of the current high-level APIs we have
> > are a 1:1 match for loop splitting and that they in fact might do
> > stuff that's unnecessary or counter-productive. If inventing a new API
> > doesn't sound appealing then maybe refactoring an existing
> > (maybe loop_version) to expose re-usable pieces would make more sense.
> >
> > For example loop_version currently does lv_adjust_loop_entry_edge
> > before it loopifys the copy inserted on the header. I wonder if
> > the condition generation can be done later and thus we can
> > have three pieces - 1) duplicating the loop on the entry edge,
> > 2) adjusting the CFG to insert a condition branching to either loop,
> > 3) from loopify extract the scale_loop_frequencies bits (in fact
> > loopify is only used from within cfgloopmanip.c)
> >
> > That said, it shouldn't be a requirement to do all this to fix the
> > profile for loop splitting it's just that the current situation
> > does not help understanding of how the adjustment works.
>
> The attached patch 0002-Refactor-loop_version.patch is to refactor loop_version
> according to your above comments.
>
> loopify is moved before condition generation, scale_loop_frequencies is
> extracted out of loopify.
I think that's a good improvement (and if loopify is unused after the
patch it should be removed together with it). The loop copying part
could then be a new clone_loop_to_header_edge function, or loop_copy
(rather than loop_version).
The existing duplicate_loop_to_header_edge function is named badly
since it does duplicate_loop_body_to_header_edge ...
> 0001-Fix-loop-split-incorrect-count-and-probability.patch is still the initial
> version that fixes the probability and frequency issues in loop split, just a
> repeat, both passed regression test on P8LE, OK for master?
That's still the original patch, I don't see any of the comments addressed
and I do not have enough knowledge to approve the patch.
Sorry here, but I really hoped that Honza would eventually chime in.
Richard.
next prev parent reply other threads:[~2021-10-26 12:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-03 8:58 Xionghu Luo
2021-08-04 2:42 ` Xionghu Luo
2021-08-06 11:46 ` Richard Biener
2021-08-09 2:37 ` Xionghu Luo
2021-08-10 14:47 ` Richard Biener
2021-08-11 3:03 ` Feng Xue OS
2021-10-26 13:05 ` Jan Hubicka
2021-10-27 1:42 ` Xionghu Luo
2021-08-11 8:32 ` Xionghu Luo
2021-08-11 9:16 ` Richard Biener
2021-08-12 3:24 ` Xionghu Luo
2021-09-22 8:40 ` Xionghu Luo
2021-09-23 12:17 ` Richard Biener
2021-10-15 5:51 ` Xionghu Luo
2021-10-21 8:43 ` Xionghu Luo
2021-10-21 10:55 ` Richard Biener
2021-10-26 5:40 ` Xionghu Luo
2021-10-26 11:59 ` Richard Biener [this message]
2021-10-26 12:19 ` Jan Hubicka
2021-08-09 4:33 ` Feng Xue OS
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=npq92284-nns0-4057-n8s6-2qr240p44p0@fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=linkw@gcc.gnu.org \
--cc=luoxhu@linux.ibm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).