From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id F393D3858D39; Wed, 22 Sep 2021 08:40:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F393D3858D39 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 18M8VJED016271; Wed, 22 Sep 2021 04:40:28 -0400 Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b811u85h4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 22 Sep 2021 04:40:28 -0400 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18M8bo6r018170; Wed, 22 Sep 2021 08:40:26 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03fra.de.ibm.com with ESMTP id 3b7q6jv3kt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 22 Sep 2021 08:40:25 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18M8ZWB555706110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 22 Sep 2021 08:35:33 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 57E8811C05B; Wed, 22 Sep 2021 08:40:22 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0C7B411C058; Wed, 22 Sep 2021 08:40:20 +0000 (GMT) Received: from luoxhus-MacBook-Pro.local (unknown [9.200.155.166]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 22 Sep 2021 08:40:19 +0000 (GMT) Subject: Re: [PATCH] Fix loop split incorrect count and probability To: Richard Biener Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, fxue@os.amperecomputing.com, wschmidt@linux.ibm.com, guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz References: <20210803085813.2217766-1-luoxhu@linux.ibm.com> <58f9e351-fc10-406e-5272-270ece73be71@linux.ibm.com> From: Xionghu Luo Message-ID: Date: Wed, 22 Sep 2021 16:40:17 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: e0MHV5bax8cvCj3FGSj_6yba_aKQJL1A X-Proofpoint-GUID: e0MHV5bax8cvCj3FGSj_6yba_aKQJL1A Content-Transfer-Encoding: 7bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.391,FMLib:17.0.607.475 definitions=2021-09-22_02,2021-09-20_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 malwarescore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 impostorscore=0 suspectscore=0 phishscore=0 lowpriorityscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109200000 definitions=main-2109220058 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, NICE_REPLY_A, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Sep 2021 08:40:32 -0000 On 2021/8/11 17:16, Richard Biener wrote: > On Wed, 11 Aug 2021, Xionghu Luo wrote: > >> >> >> On 2021/8/10 22:47, Richard Biener wrote: >>> On Mon, 9 Aug 2021, Xionghu Luo wrote: >>> >>>> Thanks, >>>> >>>> On 2021/8/6 19:46, Richard Biener wrote: >>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote: >>>>> >>>>>> loop split condition is moved between loop1 and loop2, the split bb's >>>>>> count and probability should also be duplicated instead of (100% vs INV), >>>>>> secondly, the original loop1 and loop2 count need be propotional from the >>>>>> original loop. >>>>>> >>>>>> >>>>>> diff base/loop-cond-split-1.c.151t.lsplit patched/loop-cond-split-1.c.151t.lsplit: >>>>>> ... >>>>>> int prephitmp_16; >>>>>> int prephitmp_25; >>>>>> >>>>>> [local count: 118111600]: >>>>>> if (n_7(D) > 0) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 118111600]: >>>>>> return; >>>>>> >>>>>> [local count: 105119324]: >>>>>> pretmp_3 = ga; >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 315357973]: >>>>>> # i_13 = PHI >>>>>> # prephitmp_12 = PHI >>>>>> if (prephitmp_12 != 0) >>>>>> goto ; [33.00%] >>>>>> else >>>>>> goto ; [67.00%] >>>>>> >>>>>> - [local count: 315357972]: >>>>>> + [local count: 104068130]: >>>>>> _2 = do_something (); >>>>>> ga = _2; >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 315357973]: >>>>>> # prephitmp_5 = PHI >>>>>> i_10 = inc (i_13); >>>>>> if (n_7(D) > i_10) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> [local count: 105119324]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 280668596]: >>>>>> if (prephitmp_12 != 0) >>>>>> - goto ; [100.00%] >>>>>> + goto ; [33.00%] >>>>>> else >>>>>> - goto ; [INV] >>>>>> + goto ; [67.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 280668596]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> - [count: 0]: >>>>>> + [local count: 70429947]: >>>>>> # i_23 = PHI >>>>>> # prephitmp_25 = PHI >>>>>> >>>>>> - [local count: 955630225]: >>>>>> + [local count: 640272252]: >>>>>> # i_15 = PHI >>>>>> # prephitmp_16 = PHI >>>>>> i_22 = inc (i_15); >>>>>> if (n_7(D) > i_22) >>>>>> goto ; [89.00%] >>>>>> else >>>>>> goto ; [11.00%] >>>>>> >>>>>> - [local count: 850510901]: >>>>>> + [local count: 569842305]: >>>>>> goto ; [100.00%] >>>>>> >>>>>> } >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability. >>>>>> (do_split_loop_on_cond): Likewise. >>>>>> --- >>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++-------- >>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c >>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644 >>>>>> --- a/gcc/tree-ssa-loop-split.c >>>>>> +++ b/gcc/tree-ssa-loop-split.c >>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1) >>>>>> basic_block cond_bb; >>>> >>>> if (!initial_true) >>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); >>>> + >>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE >>>> + ? EDGE_SUCC (bbs[i], 0) >>>> + : EDGE_SUCC (bbs[i], 1); >>>> >>>>>> >>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb, >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> + true_edge->probability, >>>>>> + true_edge->probability.invert (), >>>>>> + true_edge->probability, >>>>>> + true_edge->probability.invert (), >>>>>> true); >>>>> >>>>> there is no 'true_edge' variable at this point. >>>> >>>> Sorry, missed the above hunk when split the patch. >>>> >>>>> >>>>>> gcc_assert (loop2); >>>>>> >>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch) >>>>>> initialize_original_copy_tables (); >>>>>> >>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL, >>>>>> - profile_probability::always (), >>>>>> - profile_probability::never (), >>>>>> - profile_probability::always (), >>>>>> - profile_probability::always (), >>>>>> + invar_branch->probability.invert (), >>>>>> + invar_branch->probability, >>>>>> + invar_branch->probability.invert (), >>>>>> + invar_branch->probability, >>>>>> true); >>>>>> if (!loop2) >>>>>> { >>>>> >>>>> The patch introduction seems to talk about do_split_loop_on_cond only. >>>> >>>> split_loop faces similar issue though it sets the two branches to 100% vs 100% >>>> and no scaling which seems also incorrect. >>>> >>>>> Since loop versioning inserts a condition with the passed probabilities >>>>> but in this case a 'boolean_true_node' condition the then and else >>>>> probabilities passed look correct. It's just the scaling arguments >>>>> that look wrong? This loop_version call should get a comment as to >>>>> why we are passing probabilities the way we do. >>>> >>>> This optimization is transforming from: >>>> >>>> for (i = 0; i < n; i = inc (i)) >>>> { >>>> if (ga) >>>> ga = do_something (); >>>> } >>>> >>>> to: >>>> >>>> for (i = 0; i < x; i = inc (i)) >>>> { >>>> if (true) >>>> ga = do_something (); >>>> if (!ga) >>>> break; >>>> } >>>> for (; i < n; i = inc (i)) >>>> { >>>> if (false) >>>> ga = do_something (); >>>> } >>>> >>>> >>>> `boolean_true_node` is passed in to make the first loop's condition >>>> statement to be 'true', after returning from loop_version, there is a >>>> piece of code forcing the condition in second loop to be false, >>>> and the original condition is moved from *in loop* to *exit edge* >>>> between loop1 and loop2. >>> >>> Yes, one complication is that we use loop_version but do not retain >>> the CFG it creates. Something like the vectorizers >>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit" >>> but then that code doesn't do any scaling yet. But then >>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose >>> we could write a variant that simply doesn't mangle the CFG >>> with a new condition switching between both loops but keeps them >>> executed after each other ... >>> >>> As said, this adds to the confusion and some awkwardness. >> >> Then loop_version in loop split requires two types of variant, one >> is to insert condition to loop preheader for 'split_loops' usage, >> another is to insert condition to loop exit for 'do_split_loop_on_cond' >> usage, it needs one extra function to encapsulate these cfg alterations >> from outside to inside. >> >> unswitching only execute one loop as it only moves the invariant condition >> to first loop's pre-header. While 'split_loops' and 'do_split_loop_on_cond' >> may execute both loops no matter the condition is moved to the first loop's >> preheader or exit. >> >> The condition stmt in loop unswitching is invariant, but it is variant >> in loop splitting, that's why loop unswitching execute only one loop >> but loop splitting executes both loops. >> >> Seems we need two condition arguments for loop_version, one for connecting >> loop1 preheader to loop2 preheader, another one for connecting loop1's exit >> to loop2's header? Then it will be more generic for both unswitching pass >> and splitting pass. Looks a bit complicated and conflicted with loop_version's >> comments: >> >> /* Main entry point for Loop Versioning transformation. >> >> This transformation given a condition and a loop, creates >> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */ >> >> >> And this only works for loop split usage, those many other places >> doesn't use loop_version like this? > > Yes, as said if you don't want the above CFG then you probably > shouldn't use loop_version but instead use its building blocks > (and some refactoring of loop_version can make that easier). > > I think splitting wants > > loop_copy1 > if (condition) > loop_copy2 > > IMHO it would be good to split 'loopify' into the actual loopification > and the scaling. Basically make the part of loop_version that > copies the loop on the header edge and creates a loop structure for > it separate. > > loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg > (copy_loop_before). > Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports copying loops with single exit, it would cause many ICEs in it even building GCC stage1 (line 1065, line 1184 due to exit or new_exit is NULL returning from single_exit (loop).). Seems loop_version is more flexible for loop split. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/tree-vect-loop-manip.c;h=4988c93fdb61507a26430651b416ae61b217793a;hb=HEAD -- Thanks, Xionghu