From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luoxhu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 8EA08383A80E;
 Thu, 12 Aug 2021 03:26:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8EA08383A80E
Received: from pps.filterd (m0098396.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 17C3CChB130840; Wed, 11 Aug 2021 23:26:56 -0400
Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.108])
 by mx0a-001b2d01.pphosted.com with ESMTP id 3abr2w3ds4-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 11 Aug 2021 23:26:56 -0400
Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1])
 by ppma05fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17C3DeqV017274;
 Thu, 12 Aug 2021 03:24:48 GMT
Received: from b06avi18626390.portsmouth.uk.ibm.com
 (b06avi18626390.portsmouth.uk.ibm.com [9.149.26.192])
 by ppma05fra.de.ibm.com with ESMTP id 3a9ht8yymd-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Thu, 12 Aug 2021 03:24:48 +0000
Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com
 [9.149.105.59])
 by b06avi18626390.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP
 id 17C3LV3R55837104
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Thu, 12 Aug 2021 03:21:31 GMT
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 79AE7A4053;
 Thu, 12 Aug 2021 03:24:44 +0000 (GMT)
Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id BC1C8A4051;
 Thu, 12 Aug 2021 03:24:40 +0000 (GMT)
Received: from luoxhus-MacBook-Pro.local (unknown [9.200.55.111])
 by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTPS;
 Thu, 12 Aug 2021 03:24:40 +0000 (GMT)
Subject: Re: [PATCH] Fix loop split incorrect count and probability
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
 fxue@os.amperecomputing.com, wschmidt@linux.ibm.com,
 guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz
References: <20210803085813.2217766-1-luoxhu@linux.ibm.com>
 <nycvar.YFH.7.76.2108061338260.11781@zhemvz.fhfr.qr>
 <58f9e351-fc10-406e-5272-270ece73be71@linux.ibm.com>
 <nycvar.YFH.7.76.2108101634250.11781@zhemvz.fhfr.qr>
 <a54fa5a6-e5f4-8358-9a00-a7a70ff7cbfc@linux.ibm.com>
 <nycvar.YFH.7.76.2108111108290.11781@zhemvz.fhfr.qr>
From: Xionghu Luo <luoxhu@linux.ibm.com>
Message-ID: <66755a87-9457-2090-95f1-ec9d3400b021@linux.ibm.com>
Date: Thu, 12 Aug 2021 11:24:37 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.0; rv:68.0)
 Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <nycvar.YFH.7.76.2108111108290.11781@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: 9BSqbp6IWD1EmcQ2NDbH61iQ8HxdIVcc
X-Proofpoint-ORIG-GUID: 9BSqbp6IWD1EmcQ2NDbH61iQ8HxdIVcc
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-08-12_01:2021-08-11,
 2021-08-12 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 mlxscore=0 malwarescore=0
 spamscore=0 bulkscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0
 phishscore=0 priorityscore=1501 clxscore=1015 impostorscore=0 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000
 definitions=main-2108120018
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H4,
 RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Aug 2021 03:26:59 -0000


On 2021/8/11 17:16, Richard Biener wrote:
> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/10 22:47, Richard Biener wrote:
>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>>>
>>>> Thanks,
>>>>
>>>> On 2021/8/6 19:46, Richard Biener wrote:
>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>>>
>>>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>>>> count and probability should also be duplicated instead of (100% vs INV),
>>>>>> secondly, the original loop1 and loop2 count need be propotional from the
>>>>>> original loop.
>>>>>>
>>>>>>
>>>>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>>>>>> ...
>>>>>>       int prephitmp_16;
>>>>>>       int prephitmp_25;
>>>>>>
>>>>>>       <bb 2> [local count: 118111600]:
>>>>>>       if (n_7(D) > 0)
>>>>>>         goto <bb 4>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 3>; [11.00%]
>>>>>>
>>>>>>       <bb 3> [local count: 118111600]:
>>>>>>       return;
>>>>>>
>>>>>>       <bb 4> [local count: 105119324]:
>>>>>>       pretmp_3 = ga;
>>>>>>
>>>>>> -  <bb 5> [local count: 955630225]:
>>>>>> +  <bb 5> [local count: 315357973]:
>>>>>>       # i_13 = PHI <i_10(20), 0(4)>
>>>>>>       # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>>>       if (prephitmp_12 != 0)
>>>>>>         goto <bb 6>; [33.00%]
>>>>>>       else
>>>>>>         goto <bb 7>; [67.00%]
>>>>>>
>>>>>> -  <bb 6> [local count: 315357972]:
>>>>>> +  <bb 6> [local count: 104068130]:
>>>>>>       _2 = do_something ();
>>>>>>       ga = _2;
>>>>>>
>>>>>> -  <bb 7> [local count: 955630225]:
>>>>>> +  <bb 7> [local count: 315357973]:
>>>>>>       # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>>>       i_10 = inc (i_13);
>>>>>>       if (n_7(D) > i_10)
>>>>>>         goto <bb 21>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>>       <bb 11> [local count: 105119324]:
>>>>>>       goto <bb 3>; [100.00%]
>>>>>>
>>>>>> -  <bb 21> [local count: 850510901]:
>>>>>> +  <bb 21> [local count: 280668596]:
>>>>>>       if (prephitmp_12 != 0)
>>>>>> -    goto <bb 20>; [100.00%]
>>>>>> +    goto <bb 20>; [33.00%]
>>>>>>       else
>>>>>> -    goto <bb 19>; [INV]
>>>>>> +    goto <bb 19>; [67.00%]
>>>>>>
>>>>>> -  <bb 20> [local count: 850510901]:
>>>>>> +  <bb 20> [local count: 280668596]:
>>>>>>       goto <bb 5>; [100.00%]
>>>>>>
>>>>>> -  <bb 19> [count: 0]:
>>>>>> +  <bb 19> [local count: 70429947]:
>>>>>>       # i_23 = PHI <i_10(21)>
>>>>>>       # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>>>
>>>>>> -  <bb 12> [local count: 955630225]:
>>>>>> +  <bb 12> [local count: 640272252]:
>>>>>>       # i_15 = PHI <i_23(19), i_22(16)>
>>>>>>       # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>>>       i_22 = inc (i_15);
>>>>>>       if (n_7(D) > i_22)
>>>>>>         goto <bb 16>; [89.00%]
>>>>>>       else
>>>>>>         goto <bb 11>; [11.00%]
>>>>>>
>>>>>> -  <bb 16> [local count: 850510901]:
>>>>>> +  <bb 16> [local count: 569842305]:
>>>>>>       goto <bb 12>; [100.00%]
>>>>>>
>>>>>>     }
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>>>> 	(do_split_loop_on_cond): Likewise.
>>>>>> ---
>>>>>>     gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>>>     1 file changed, 8 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>>>> --- a/gcc/tree-ssa-loop-split.c
>>>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>>>     	basic_block cond_bb;
>>>>
>>>>    	if (!initial_true)
>>>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>>>> +
>>>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>>>> +			   ? EDGE_SUCC (bbs[i], 0)
>>>> +			   : EDGE_SUCC (bbs[i], 1);
>>>>
>>>>>>     
>>>>>>     	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> -					   profile_probability::always (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>> +					   true_edge->probability,
>>>>>> +					   true_edge->probability.invert (),
>>>>>>     					   true);
>>>>>
>>>>> there is no 'true_edge' variable at this point.
>>>>
>>>> Sorry, missed the above hunk when split the patch.
>>>>
>>>>>
>>>>>>     	gcc_assert (loop2);
>>>>>>     
>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>>>>>       initialize_original_copy_tables ();
>>>>>>     
>>>>>>       struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::never (),
>>>>>> -				     profile_probability::always (),
>>>>>> -				     profile_probability::always (),
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>> +				     invar_branch->probability.invert (),
>>>>>> +				     invar_branch->probability,
>>>>>>     				     true);
>>>>>>       if (!loop2)
>>>>>>         {
>>>>>
>>>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>>>
>>>> split_loop faces similar issue though it sets the two branches to 100% vs 100%
>>>> and no scaling which seems also incorrect.
>>>>
>>>>> Since loop versioning inserts a condition with the passed probabilities
>>>>> but in this case a 'boolean_true_node' condition the then and else
>>>>> probabilities passed look correct.  It's just the scaling arguments
>>>>> that look wrong?  This loop_version call should get a comment as to
>>>>> why we are passing probabilities the way we do.
>>>>
>>>> This optimization is transforming from:
>>>>
>>>> for (i = 0; i < n; i = inc (i))
>>>>       {
>>>>         if (ga)
>>>>           ga = do_something ();
>>>> }
>>>>
>>>> to:
>>>>
>>>>     for (i = 0; i < x; i = inc (i))
>>>> {
>>>>       if (true)
>>>>            ga = do_something ();
>>>>           if (!ga)
>>>>             break;
>>>> }
>>>>     for (; i < n; i = inc (i))
>>>> {
>>>>       if (false)
>>>>            ga = do_something ();
>>>> }
>>>>
>>>>
>>>> `boolean_true_node` is passed in to make the first loop's condition
>>>> statement to be 'true', after returning from loop_version, there is a
>>>> piece of code forcing the condition in second loop to be false,
>>>> and the original condition is moved from *in loop* to *exit edge*
>>>> between loop1 and loop2.
>>>
>>> Yes, one complication is that we use loop_version but do not retain
>>> the CFG it creates.  Something like the vectorizers
>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
>>> but then that code doesn't do any scaling yet.  But then
>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
>>> we could write a variant that simply doesn't mangle the CFG
>>> with a new condition switching between both loops but keeps them
>>> executed after each other ...
>>>
>>> As said, this adds to the confusion and some awkwardness.
>>
>> Then loop_version in loop split requires two types of variant, one
>> is to insert condition to loop preheader for 'split_loops' usage,
>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
>> usage, it needs one extra function to encapsulate these cfg alterations
>> from outside to inside.
>>
>> unswitching only execute one loop as it only moves the invariant condition
>> to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
>> may execute both loops no matter the condition is moved to the first loop's
>> preheader or exit.
>>
>> The condition stmt in loop unswitching is invariant, but it is variant
>> in loop splitting, that's why loop unswitching execute only one loop
>> but loop splitting executes both loops.
>>
>> Seems we need two condition arguments for loop_version, one for connecting
>> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
>> to loop2's header?  Then it will be more generic for both unswitching pass
>> and splitting pass.  Looks a bit complicated and conflicted with loop_version's
>> comments:
>>
>> /* Main entry point for Loop Versioning transformation.
>>
>>     This transformation given a condition and a loop, creates
>>     -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
>>
>>
>> And this only works for loop split usage, those many other places
>> doesn't use loop_version like this?
> 
> Yes, as said if you don't want the above CFG then you probably
> shouldn't use loop_version but instead use its building blocks
> (and some refactoring of loop_version can make that easier).
> 
> I think splitting wants
> 
>    loop_copy1
>    if (condition)
>      loop_copy2

Maybe

   if (condition1)
   loop_copy1
   if (!condition1 || condition2)
     loop_copy2

as loop_copy1 is not always executed in loop split.

> 
> IMHO it would be good to split 'loopify' into the actual loopification
> and the scaling.  Basically make the part of loop_version that
> copies the loop on the header edge and creates a loop structure for
> it separate.
> 
> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> (copy_loop_before).
> 
>> grep "loop_version (" * -r
>> qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
>> gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
>> gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
>> gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
>> gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
>> gcc/cfgloopmanip.c:loop_version (class loop *loop,
>> gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>> gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
>> gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>> gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
>> gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
>> gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
>> gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,
>>
>>>
>>>> @fxue may have inputs about this since he contributed it years ago.
>>>>
>>>>>
>>>>> It does seem that scaling the loop by the invar_branch probability
>>>>> is correct.  Since this does similar things to unswitching, I see
>>>>> that unswitching does
>>>>>
>>>>> prob_true = edge_true->probability;
>>>>> loop_version (loop, unshare_expr (cond),
>>>>>                           NULL, prob_true,
>>>>>                           prob_true.invert (),
>>>>>                           prob_true, prob_true.invert (),
>>>>>                           false);
>>>>>
>>>>> which maybe suggests that your invar_branch based passing should
>>>>> depend on 'true_invar'?
>>>>>
>>>>> Also compared to unswitching the first loop is always entered,
>>>>> so I wonder if the scaling is correct with respect to that
>>>>> given unswitching where only ever one loop is entered?
>>>>
>>>>
>>>> Scaling is based on the probability calculated on "if (ga)", if it is
>>>> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
>>>> and loop2 to 67% suppose the branch probability is correct enough?
>>>>
>>>> unswitch also scaled the two loops based on prob_true, if prob_true
>>>> is 50%, it decreases X's count to HALF unexpectedly since it should be
>>>> executed on both branches?  while loop split still kept execute both first
>>>> loop and second loop, it seems even more accurate than loop unswitching?
>>>
>>> I just was saying that both doing exactly the same looks wrong (on
>>> either side).
>>>
>>>> tree-ssa-loop-unswitch.c:
>>>>
>>>>      while (A)
>>>>        {
>>>>          if (inv)
>>>>            B;
>>>>
>>>>          X;
>>>>
>>>>          if (!inv)
>>>> 	 C;
>>>>        }
>>>>
>>>>      where inv is the loop invariant, into
>>>>
>>>>      if (inv)
>>>>        {
>>>>          while (A)
>>>> 	 {
>>>>              B;
>>>> 	   X;
>>>> 	 }
>>>>        }
>>>>      else
>>>>        {
>>>>          while (A)
>>>> 	 {
>>>> 	   X;
>>>> 	   C;
>>>> 	 }
>>>>        }
>>>
>>> Yes, here scaling based on the if (inv) probability looks obviously
>>> 100% correct to me.  But I might be wrong.  And thus the
>>> splitting case must be still off (so I conclude).  Hmm, in fact
>>> I think for the loop unswitching case the scaling of the B and
>>> C blocks is off?
>>
>> B, C and X are all scaled in tree-ssa-loop-unswitch.c:
>>
>>    prob_true = edge_true->probability;
>>    return loop_version (loop, unshare_expr (cond),
>> 		       NULL, prob_true,
>> 		       prob_true.invert (),
>> 		       prob_true, prob_true.invert (),
>> 		       false);
>>
>>>   Those should be considered always executed.
>>> Note the loop unswitching pass is altering the conditions
>>> condition to static true/false but I don't think it performs
>>> further adjustments.
>>
>> loop unswitching calls tree_unswitch_single_loop recursively,
>> firstly, it calls loop_version to generate nloop, then it calls
>> itself again for nloop and loop with simplify_using_entry_checks
>> to cut their false branches respectively.
>>
>>    /* Invoke itself on modified loops.  */
>>    tree_unswitch_single_loop (nloop, num + 1);
>>    tree_unswitch_single_loop (loop, num + 1);
>>
>>>
>>> That said, likely the profile update cannot be done uniformly
>>> for all blocks of a loop?
>>>
>>> Richard.
>>>
>>
>>
> 

-- 
Thanks,
Xionghu