From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 732D6385F033; Wed, 11 Aug 2021 09:16:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 732D6385F033 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 6CD082216A; Wed, 11 Aug 2021 09:16:41 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 14E45A3BA7; Wed, 11 Aug 2021 09:16:41 +0000 (UTC) Date: Wed, 11 Aug 2021 11:16:40 +0200 (CEST) From: Richard Biener To: Xionghu Luo cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, fxue@os.amperecomputing.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz Subject: Re: [PATCH] Fix loop split incorrect count and probability In-Reply-To: Message-ID: References: <20210803085813.2217766-1-luoxhu@linux.ibm.com> <58f9e351-fc10-406e-5272-270ece73be71@linux.ibm.com> User-Agent: Alpine 2.21 (LSU 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Aug 2021 09:16:53 -0000 On Wed, 11 Aug 2021, Xionghu Luo wrote: > > > On 2021/8/10 22:47, Richard Biener wrote: > > On Mon, 9 Aug 2021, Xionghu Luo wrote: > > > >> Thanks, > >> > >> On 2021/8/6 19:46, Richard Biener wrote: > >>> On Tue, 3 Aug 2021, Xionghu Luo wrote: > >>> > >>>> loop split condition is moved between loop1 and loop2, the split bb's > >>>> count and probability should also be duplicated instead of (100% vs INV), > >>>> secondly, the original loop1 and loop2 count need be propotional from the > >>>> original loop. > >>>> > >>>> > >>>> diff base/loop-cond-split-1.c.151t.lsplit patched/loop-cond-split-1.c.151t.lsplit: > >>>> ... > >>>> int prephitmp_16; > >>>> int prephitmp_25; > >>>> > >>>> [local count: 118111600]: > >>>> if (n_7(D) > 0) > >>>> goto ; [89.00%] > >>>> else > >>>> goto ; [11.00%] > >>>> > >>>> [local count: 118111600]: > >>>> return; > >>>> > >>>> [local count: 105119324]: > >>>> pretmp_3 = ga; > >>>> > >>>> - [local count: 955630225]: > >>>> + [local count: 315357973]: > >>>> # i_13 = PHI > >>>> # prephitmp_12 = PHI > >>>> if (prephitmp_12 != 0) > >>>> goto ; [33.00%] > >>>> else > >>>> goto ; [67.00%] > >>>> > >>>> - [local count: 315357972]: > >>>> + [local count: 104068130]: > >>>> _2 = do_something (); > >>>> ga = _2; > >>>> > >>>> - [local count: 955630225]: > >>>> + [local count: 315357973]: > >>>> # prephitmp_5 = PHI > >>>> i_10 = inc (i_13); > >>>> if (n_7(D) > i_10) > >>>> goto ; [89.00%] > >>>> else > >>>> goto ; [11.00%] > >>>> > >>>> [local count: 105119324]: > >>>> goto ; [100.00%] > >>>> > >>>> - [local count: 850510901]: > >>>> + [local count: 280668596]: > >>>> if (prephitmp_12 != 0) > >>>> - goto ; [100.00%] > >>>> + goto ; [33.00%] > >>>> else > >>>> - goto ; [INV] > >>>> + goto ; [67.00%] > >>>> > >>>> - [local count: 850510901]: > >>>> + [local count: 280668596]: > >>>> goto ; [100.00%] > >>>> > >>>> - [count: 0]: > >>>> + [local count: 70429947]: > >>>> # i_23 = PHI > >>>> # prephitmp_25 = PHI > >>>> > >>>> - [local count: 955630225]: > >>>> + [local count: 640272252]: > >>>> # i_15 = PHI > >>>> # prephitmp_16 = PHI > >>>> i_22 = inc (i_15); > >>>> if (n_7(D) > i_22) > >>>> goto ; [89.00%] > >>>> else > >>>> goto ; [11.00%] > >>>> > >>>> - [local count: 850510901]: > >>>> + [local count: 569842305]: > >>>> goto ; [100.00%] > >>>> > >>>> } > >>>> > >>>> gcc/ChangeLog: > >>>> > >>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. > >>>> (do_split_loop_on_cond): Likewise. > >>>> --- > >>>> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- > >>>> 1 file changed, 8 insertions(+), 8 deletions(-) > >>>> > >>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c > >>>> index 3a09bbc39e5..8e5a7ded0f7 100644 > >>>> --- a/gcc/tree-ssa-loop-split.c > >>>> +++ b/gcc/tree-ssa-loop-split.c > >>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) > >>>> basic_block cond_bb; > >> > >> if (!initial_true) > >> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); > >> + > >> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE > >> + ? EDGE_SUCC (bbs[i], 0) > >> + : EDGE_SUCC (bbs[i], 1); > >> > >>>> > >>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, > >>>> - profile_probability::always (), > >>>> - profile_probability::always (), > >>>> - profile_probability::always (), > >>>> - profile_probability::always (), > >>>> + true_edge->probability, > >>>> + true_edge->probability.invert (), > >>>> + true_edge->probability, > >>>> + true_edge->probability.invert (), > >>>> true); > >>> > >>> there is no 'true_edge' variable at this point. > >> > >> Sorry, missed the above hunk when split the patch. > >> > >>> > >>>> gcc_assert (loop2); > >>>> > >>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch) > >>>> initialize_original_copy_tables (); > >>>> > >>>> struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, > >>>> - profile_probability::always (), > >>>> - profile_probability::never (), > >>>> - profile_probability::always (), > >>>> - profile_probability::always (), > >>>> + invar_branch->probability.invert (), > >>>> + invar_branch->probability, > >>>> + invar_branch->probability.invert (), > >>>> + invar_branch->probability, > >>>> true); > >>>> if (!loop2) > >>>> { > >>> > >>> The patch introduction seems to talk about do_split_loop_on_cond only. > >> > >> split_loop faces similar issue though it sets the two branches to 100% vs 100% > >> and no scaling which seems also incorrect. > >> > >>> Since loop versioning inserts a condition with the passed probabilities > >>> but in this case a 'boolean_true_node' condition the then and else > >>> probabilities passed look correct. It's just the scaling arguments > >>> that look wrong? This loop_version call should get a comment as to > >>> why we are passing probabilities the way we do. > >> > >> This optimization is transforming from: > >> > >> for (i = 0; i < n; i = inc (i)) > >> { > >> if (ga) > >> ga = do_something (); > >> } > >> > >> to: > >> > >> for (i = 0; i < x; i = inc (i)) > >> { > >> if (true) > >> ga = do_something (); > >> if (!ga) > >> break; > >> } > >> for (; i < n; i = inc (i)) > >> { > >> if (false) > >> ga = do_something (); > >> } > >> > >> > >> `boolean_true_node` is passed in to make the first loop's condition > >> statement to be 'true', after returning from loop_version, there is a > >> piece of code forcing the condition in second loop to be false, > >> and the original condition is moved from *in loop* to *exit edge* > >> between loop1 and loop2. > > > > Yes, one complication is that we use loop_version but do not retain > > the CFG it creates. Something like the vectorizers > > slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" > > but then that code doesn't do any scaling yet. But then > > loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose > > we could write a variant that simply doesn't mangle the CFG > > with a new condition switching between both loops but keeps them > > executed after each other ... > > > > As said, this adds to the confusion and some awkwardness. > > Then loop_version in loop split requires two types of variant, one > is to insert condition to loop preheader for 'split_loops' usage, > another is to insert condition to loop exit for 'do_split_loop_on_cond' > usage, it needs one extra function to encapsulate these cfg alterations > from outside to inside. > > unswitching only execute one loop as it only moves the invariant condition > to first loop's pre-header. While 'split_loops' and 'do_split_loop_on_cond' > may execute both loops no matter the condition is moved to the first loop's > preheader or exit. > > The condition stmt in loop unswitching is invariant, but it is variant > in loop splitting, that's why loop unswitching execute only one loop > but loop splitting executes both loops. > > Seems we need two condition arguments for loop_version, one for connecting > loop1 preheader to loop2 preheader, another one for connecting loop1's exit > to loop2's header? Then it will be more generic for both unswitching pass > and splitting pass. Looks a bit complicated and conflicted with loop_version's > comments: > > /* Main entry point for Loop Versioning transformation. > > This transformation given a condition and a loop, creates > -if (condition) { loop_copy1 } else { loop_copy2 }, ... */ > > > And this only works for loop split usage, those many other places > doesn't use loop_version like this? Yes, as said if you don't want the above CFG then you probably shouldn't use loop_version but instead use its building blocks (and some refactoring of loop_version can make that easier). I think splitting wants loop_copy1 if (condition) loop_copy2 IMHO it would be good to split 'loopify' into the actual loopification and the scaling. Basically make the part of loop_version that copies the loop on the header edge and creates a loop structure for it separate. loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg (copy_loop_before). > grep "loop_version (" * -r > qgcc/tree-parloops.c: loop_version (loop, many_iterations_cond, NULL, > gcc/tree-vect-loop-manip.c: nloop = loop_version (loop_to_version, cond_expr, &condition_bb, > gcc/tree-loop-distribution.c: nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (), > gcc/tree-if-conv.c: new_loop = loop_version (loop, cond, &cond_bb, > gcc/tree-ssa-loop-manip.c: new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry, > gcc/cfgloopmanip.c:loop_version (class loop *loop, > gcc/tree-ssa-loop-split.c: class loop *loop2 = loop_version (loop1, cond, &cond_bb, > gcc/tree-ssa-loop-split.c: loop2 is duplicated using loop_version (), which corresponds to the part > gcc/tree-ssa-loop-split.c: struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, > gcc/tree-ssa-loop-unswitch.c: return loop_version (loop, unshare_expr (cond), > gcc/modulo-sched.c: loop_version (loop, comp_rtx, &condition_bb, > gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *, > gcc/gimple-loop-versioning.cc: li.optimized_loop = loop_version (loop, cond, &cond_bb, > > > > >> @fxue may have inputs about this since he contributed it years ago. > >> > >>> > >>> It does seem that scaling the loop by the invar_branch probability > >>> is correct. Since this does similar things to unswitching, I see > >>> that unswitching does > >>> > >>> prob_true = edge_true->probability; > >>> loop_version (loop, unshare_expr (cond), > >>> NULL, prob_true, > >>> prob_true.invert (), > >>> prob_true, prob_true.invert (), > >>> false); > >>> > >>> which maybe suggests that your invar_branch based passing should > >>> depend on 'true_invar'? > >>> > >>> Also compared to unswitching the first loop is always entered, > >>> so I wonder if the scaling is correct with respect to that > >>> given unswitching where only ever one loop is entered? > >> > >> > >> Scaling is based on the probability calculated on "if (ga)", if it is > >> 33% vs 67% before split, then it is reasonable to scale loop1 to 33% > >> and loop2 to 67% suppose the branch probability is correct enough? > >> > >> unswitch also scaled the two loops based on prob_true, if prob_true > >> is 50%, it decreases X's count to HALF unexpectedly since it should be > >> executed on both branches? while loop split still kept execute both first > >> loop and second loop, it seems even more accurate than loop unswitching? > > > > I just was saying that both doing exactly the same looks wrong (on > > either side). > > > >> tree-ssa-loop-unswitch.c: > >> > >> while (A) > >> { > >> if (inv) > >> B; > >> > >> X; > >> > >> if (!inv) > >> C; > >> } > >> > >> where inv is the loop invariant, into > >> > >> if (inv) > >> { > >> while (A) > >> { > >> B; > >> X; > >> } > >> } > >> else > >> { > >> while (A) > >> { > >> X; > >> C; > >> } > >> } > > > > Yes, here scaling based on the if (inv) probability looks obviously > > 100% correct to me. But I might be wrong. And thus the > > splitting case must be still off (so I conclude). Hmm, in fact > > I think for the loop unswitching case the scaling of the B and > > C blocks is off? > > B, C and X are all scaled in tree-ssa-loop-unswitch.c: > > prob_true = edge_true->probability; > return loop_version (loop, unshare_expr (cond), > NULL, prob_true, > prob_true.invert (), > prob_true, prob_true.invert (), > false); > > > Those should be considered always executed. > > Note the loop unswitching pass is altering the conditions > > condition to static true/false but I don't think it performs > > further adjustments. > > loop unswitching calls tree_unswitch_single_loop recursively, > firstly, it calls loop_version to generate nloop, then it calls > itself again for nloop and loop with simplify_using_entry_checks > to cut their false branches respectively. > > /* Invoke itself on modified loops. */ > tree_unswitch_single_loop (nloop, num + 1); > tree_unswitch_single_loop (loop, num + 1); > > > > > That said, likely the profile update cannot be done uniformly > > for all blocks of a loop? > > > > Richard. > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)