From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8EA08383A80E; Thu, 12 Aug 2021 03:26:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8EA08383A80E Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17C3CChB130840; Wed, 11 Aug 2021 23:26:56 -0400 Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com [149.81.74.108]) by mx0a-001b2d01.pphosted.com with ESMTP id 3abr2w3ds4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 11 Aug 2021 23:26:56 -0400 Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1]) by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17C3DeqV017274; Thu, 12 Aug 2021 03:24:48 GMT Received: from b06avi18626390.portsmouth.uk.ibm.com (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192]) by ppma05fra.de.ibm.com with ESMTP id 3a9ht8yymd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 Aug 2021 03:24:48 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17C3LV3R55837104 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 Aug 2021 03:21:31 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 79AE7A4053; Thu, 12 Aug 2021 03:24:44 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BC1C8A4051; Thu, 12 Aug 2021 03:24:40 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.55.111]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Thu, 12 Aug 2021 03:24:40 +0000 (GMT) Subject: Re: [PATCH] Fix loop split incorrect count and probability To: Richard Biener Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, fxue@os.amperecomputing.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz References: <20210803085813.2217766-1-luoxhu@linux.ibm.com> <58f9e351-fc10-406e-5272-270ece73be71@linux.ibm.com> From: Xionghu Luo Message-ID: <66755a87-9457-2090-95f1-ec9d3400b021@linux.ibm.com> Date: Thu, 12 Aug 2021 11:24:37 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 9BSqbp6IWD1EmcQ2NDbH61iQ8HxdIVcc X-Proofpoint-ORIG-GUID: 9BSqbp6IWD1EmcQ2NDbH61iQ8HxdIVcc X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-12_01:2021-08-11, 2021-08-12 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 spamscore=0 bulkscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 phishscore=0 priorityscore=1501 clxscore=1015 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108120018 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Aug 2021 03:26:59 -0000 On 2021/8/11 17:16, Richard Biener wrote: > On Wed, 11 Aug 2021, Xionghu Luo wrote: > >> >> >> On 2021/8/10 22:47, Richard Biener wrote: >>> On Mon, 9 Aug 2021, Xionghu Luo wrote: >>> >>>> Thanks, >>>> >>>> On 2021/8/6 19:46, Richard Biener wrote: >>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote: >>>>> >>>>>> loop split condition is moved between loop1 and loop2, the split bb's >>>>>> count and probability should also be duplicated instead of (100% vs INV), >>>>>> secondly, the original loop1 and loop2 count need be propotional from the >>>>>> original loop. >>>>>> >>>>>> >>>>>> diff base/loop-cond-split-1.c.151t.lsplit patched/loop-cond-split-1.c.151t.lsplit: >>>>>> ... >>>>>> int prephitmp_16; >>>>>> int prephitmp_25; >>>>>> >>>>>> [local count: 118111600]: >>>>>> if (n_7(D) > 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 118111600]: >>>>>> return; >>>>>> >>>>>> [local count: 105119324]: >>>>>> pretmp_3 = ga; >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 315357973]: >>>>>> # i_13 = PHI >>>>>> # prephitmp_12 = PHI >>>>>> if (prephitmp_12 != 0) >>>>>> goto ; [33.00%] >>>>>> else >>>>>> goto ; [67.00%] >>>>>> >>>>>> - [local count: 315357972]: >>>>>> + [local count: 104068130]: >>>>>> _2 = do_something (); >>>>>> ga = _2; >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 315357973]: >>>>>> # prephitmp_5 = PHI >>>>>> i_10 = inc (i_13); >>>>>> if (n_7(D) > i_10) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 105119324]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 280668596]: >>>>>> if (prephitmp_12 != 0) >>>>>> - goto ; [100.00%] >>>>>> + goto ; [33.00%] >>>>>> else >>>>>> - goto ; [INV] >>>>>> + goto ; [67.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 280668596]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> - [count: 0]: >>>>>> + [local count: 70429947]: >>>>>> # i_23 = PHI >>>>>> # prephitmp_25 = PHI >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 640272252]: >>>>>> # i_15 = PHI >>>>>> # prephitmp_16 = PHI >>>>>> i_22 = inc (i_15); >>>>>> if (n_7(D) > i_22) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 569842305]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> } >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. >>>>>> (do_split_loop_on_cond): Likewise. >>>>>> --- >>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- >>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644 >>>>>> --- a/gcc/tree-ssa-loop-split.c >>>>>> +++ b/gcc/tree-ssa-loop-split.c >>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) >>>>>> basic_block cond_bb; >>>> >>>> if (!initial_true) >>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>> + >>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE >>>> + ? EDGE_SUCC (bbs[i], 0) >>>> + : EDGE_SUCC (bbs[i], 1); >>>> >>>>>> >>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> + true_edge->probability, >>>>>> + true_edge->probability.invert (), >>>>>> + true_edge->probability, >>>>>> + true_edge->probability.invert (), >>>>>> true); >>>>> >>>>> there is no 'true_edge' variable at this point. >>>> >>>> Sorry, missed the above hunk when split the patch. >>>> >>>>> >>>>>> gcc_assert (loop2); >>>>>> >>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch) >>>>>> initialize_original_copy_tables (); >>>>>> >>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, >>>>>> - profile_probability::always (), >>>>>> - profile_probability::never (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> + invar_branch->probability.invert (), >>>>>> + invar_branch->probability, >>>>>> + invar_branch->probability.invert (), >>>>>> + invar_branch->probability, >>>>>> true); >>>>>> if (!loop2) >>>>>> { >>>>> >>>>> The patch introduction seems to talk about do_split_loop_on_cond only. >>>> >>>> split_loop faces similar issue though it sets the two branches to 100% vs 100% >>>> and no scaling which seems also incorrect. >>>> >>>>> Since loop versioning inserts a condition with the passed probabilities >>>>> but in this case a 'boolean_true_node' condition the then and else >>>>> probabilities passed look correct. It's just the scaling arguments >>>>> that look wrong? This loop_version call should get a comment as to >>>>> why we are passing probabilities the way we do. >>>> >>>> This optimization is transforming from: >>>> >>>> for (i = 0; i < n; i = inc (i)) >>>> { >>>> if (ga) >>>> ga = do_something (); >>>> } >>>> >>>> to: >>>> >>>> for (i = 0; i < x; i = inc (i)) >>>> { >>>> if (true) >>>> ga = do_something (); >>>> if (!ga) >>>> break; >>>> } >>>> for (; i < n; i = inc (i)) >>>> { >>>> if (false) >>>> ga = do_something (); >>>> } >>>> >>>> >>>> `boolean_true_node` is passed in to make the first loop's condition >>>> statement to be 'true', after returning from loop_version, there is a >>>> piece of code forcing the condition in second loop to be false, >>>> and the original condition is moved from *in loop* to *exit edge* >>>> between loop1 and loop2. >>> >>> Yes, one complication is that we use loop_version but do not retain >>> the CFG it creates. Something like the vectorizers >>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" >>> but then that code doesn't do any scaling yet. But then >>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose >>> we could write a variant that simply doesn't mangle the CFG >>> with a new condition switching between both loops but keeps them >>> executed after each other ... >>> >>> As said, this adds to the confusion and some awkwardness. >> >> Then loop_version in loop split requires two types of variant, one >> is to insert condition to loop preheader for 'split_loops' usage, >> another is to insert condition to loop exit for 'do_split_loop_on_cond' >> usage, it needs one extra function to encapsulate these cfg alterations >> from outside to inside. >> >> unswitching only execute one loop as it only moves the invariant condition >> to first loop's pre-header. While 'split_loops' and 'do_split_loop_on_cond' >> may execute both loops no matter the condition is moved to the first loop's >> preheader or exit. >> >> The condition stmt in loop unswitching is invariant, but it is variant >> in loop splitting, that's why loop unswitching execute only one loop >> but loop splitting executes both loops. >> >> Seems we need two condition arguments for loop_version, one for connecting >> loop1 preheader to loop2 preheader, another one for connecting loop1's exit >> to loop2's header? Then it will be more generic for both unswitching pass >> and splitting pass. Looks a bit complicated and conflicted with loop_version's >> comments: >> >> /* Main entry point for Loop Versioning transformation. >> >> This transformation given a condition and a loop, creates >> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */ >> >> >> And this only works for loop split usage, those many other places >> doesn't use loop_version like this? > > Yes, as said if you don't want the above CFG then you probably > shouldn't use loop_version but instead use its building blocks > (and some refactoring of loop_version can make that easier). > > I think splitting wants > > loop_copy1 > if (condition) > loop_copy2 Maybe if (condition1) loop_copy1 if (!condition1 || condition2) loop_copy2 as loop_copy1 is not always executed in loop split. > > IMHO it would be good to split 'loopify' into the actual loopification > and the scaling. Basically make the part of loop_version that > copies the loop on the header edge and creates a loop structure for > it separate. > > loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg > (copy_loop_before). > >> grep "loop_version (" * -r >> qgcc/tree-parloops.c: loop_version (loop, many_iterations_cond, NULL, >> gcc/tree-vect-loop-manip.c: nloop = loop_version (loop_to_version, cond_expr, &condition_bb, >> gcc/tree-loop-distribution.c: nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (), >> gcc/tree-if-conv.c: new_loop = loop_version (loop, cond, &cond_bb, >> gcc/tree-ssa-loop-manip.c: new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry, >> gcc/cfgloopmanip.c:loop_version (class loop *loop, >> gcc/tree-ssa-loop-split.c: class loop *loop2 = loop_version (loop1, cond, &cond_bb, >> gcc/tree-ssa-loop-split.c: loop2 is duplicated using loop_version (), which corresponds to the part >> gcc/tree-ssa-loop-split.c: struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, >> gcc/tree-ssa-loop-unswitch.c: return loop_version (loop, unshare_expr (cond), >> gcc/modulo-sched.c: loop_version (loop, comp_rtx, &condition_bb, >> gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *, >> gcc/gimple-loop-versioning.cc: li.optimized_loop = loop_version (loop, cond, &cond_bb, >> >>> >>>> @fxue may have inputs about this since he contributed it years ago. >>>> >>>>> >>>>> It does seem that scaling the loop by the invar_branch probability >>>>> is correct. Since this does similar things to unswitching, I see >>>>> that unswitching does >>>>> >>>>> prob_true = edge_true->probability; >>>>> loop_version (loop, unshare_expr (cond), >>>>> NULL, prob_true, >>>>> prob_true.invert (), >>>>> prob_true, prob_true.invert (), >>>>> false); >>>>> >>>>> which maybe suggests that your invar_branch based passing should >>>>> depend on 'true_invar'? >>>>> >>>>> Also compared to unswitching the first loop is always entered, >>>>> so I wonder if the scaling is correct with respect to that >>>>> given unswitching where only ever one loop is entered? >>>> >>>> >>>> Scaling is based on the probability calculated on "if (ga)", if it is >>>> 33% vs 67% before split, then it is reasonable to scale loop1 to 33% >>>> and loop2 to 67% suppose the branch probability is correct enough? >>>> >>>> unswitch also scaled the two loops based on prob_true, if prob_true >>>> is 50%, it decreases X's count to HALF unexpectedly since it should be >>>> executed on both branches? while loop split still kept execute both first >>>> loop and second loop, it seems even more accurate than loop unswitching? >>> >>> I just was saying that both doing exactly the same looks wrong (on >>> either side). >>> >>>> tree-ssa-loop-unswitch.c: >>>> >>>> while (A) >>>> { >>>> if (inv) >>>> B; >>>> >>>> X; >>>> >>>> if (!inv) >>>> C; >>>> } >>>> >>>> where inv is the loop invariant, into >>>> >>>> if (inv) >>>> { >>>> while (A) >>>> { >>>> B; >>>> X; >>>> } >>>> } >>>> else >>>> { >>>> while (A) >>>> { >>>> X; >>>> C; >>>> } >>>> } >>> >>> Yes, here scaling based on the if (inv) probability looks obviously >>> 100% correct to me. But I might be wrong. And thus the >>> splitting case must be still off (so I conclude). Hmm, in fact >>> I think for the loop unswitching case the scaling of the B and >>> C blocks is off? >> >> B, C and X are all scaled in tree-ssa-loop-unswitch.c: >> >> prob_true = edge_true->probability; >> return loop_version (loop, unshare_expr (cond), >> NULL, prob_true, >> prob_true.invert (), >> prob_true, prob_true.invert (), >> false); >> >>> Those should be considered always executed. >>> Note the loop unswitching pass is altering the conditions >>> condition to static true/false but I don't think it performs >>> further adjustments. >> >> loop unswitching calls tree_unswitch_single_loop recursively, >> firstly, it calls loop_version to generate nloop, then it calls >> itself again for nloop and loop with simplify_using_entry_checks >> to cut their false branches respectively. >> >> /* Invoke itself on modified loops. */ >> tree_unswitch_single_loop (nloop, num + 1); >> tree_unswitch_single_loop (loop, num + 1); >> >>> >>> That said, likely the profile update cannot be done uniformly >>> for all blocks of a loop? >>> >>> Richard. >>> >> >> > -- Thanks, Xionghu