public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Xionghu Luo <luoxhu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org,
	fxue@os.amperecomputing.com, wschmidt@linux.ibm.com,
	guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz
Subject: Re: [PATCH] Fix loop split incorrect count and probability
Date: Wed, 11 Aug 2021 16:32:40 +0800	[thread overview]
Message-ID: <a54fa5a6-e5f4-8358-9a00-a7a70ff7cbfc@linux.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2108101634250.11781@zhemvz.fhfr.qr>



On 2021/8/10 22:47, Richard Biener wrote:
> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> 
>> Thanks,
>>
>> On 2021/8/6 19:46, Richard Biener wrote:
>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>
>>>> loop split condition is moved between loop1 and loop2, the split bb's
>>>> count and probability should also be duplicated instead of (100% vs INV),
>>>> secondly, the original loop1 and loop2 count need be propotional from the
>>>> original loop.
>>>>
>>>>
>>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
>>>> ...
>>>>      int prephitmp_16;
>>>>      int prephitmp_25;
>>>>
>>>>      <bb 2> [local count: 118111600]:
>>>>      if (n_7(D) > 0)
>>>>        goto <bb 4>; [89.00%]
>>>>      else
>>>>        goto <bb 3>; [11.00%]
>>>>
>>>>      <bb 3> [local count: 118111600]:
>>>>      return;
>>>>
>>>>      <bb 4> [local count: 105119324]:
>>>>      pretmp_3 = ga;
>>>>
>>>> -  <bb 5> [local count: 955630225]:
>>>> +  <bb 5> [local count: 315357973]:
>>>>      # i_13 = PHI <i_10(20), 0(4)>
>>>>      # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
>>>>      if (prephitmp_12 != 0)
>>>>        goto <bb 6>; [33.00%]
>>>>      else
>>>>        goto <bb 7>; [67.00%]
>>>>
>>>> -  <bb 6> [local count: 315357972]:
>>>> +  <bb 6> [local count: 104068130]:
>>>>      _2 = do_something ();
>>>>      ga = _2;
>>>>
>>>> -  <bb 7> [local count: 955630225]:
>>>> +  <bb 7> [local count: 315357973]:
>>>>      # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
>>>>      i_10 = inc (i_13);
>>>>      if (n_7(D) > i_10)
>>>>        goto <bb 21>; [89.00%]
>>>>      else
>>>>        goto <bb 11>; [11.00%]
>>>>
>>>>      <bb 11> [local count: 105119324]:
>>>>      goto <bb 3>; [100.00%]
>>>>
>>>> -  <bb 21> [local count: 850510901]:
>>>> +  <bb 21> [local count: 280668596]:
>>>>      if (prephitmp_12 != 0)
>>>> -    goto <bb 20>; [100.00%]
>>>> +    goto <bb 20>; [33.00%]
>>>>      else
>>>> -    goto <bb 19>; [INV]
>>>> +    goto <bb 19>; [67.00%]
>>>>
>>>> -  <bb 20> [local count: 850510901]:
>>>> +  <bb 20> [local count: 280668596]:
>>>>      goto <bb 5>; [100.00%]
>>>>
>>>> -  <bb 19> [count: 0]:
>>>> +  <bb 19> [local count: 70429947]:
>>>>      # i_23 = PHI <i_10(21)>
>>>>      # prephitmp_25 = PHI <prephitmp_5(21)>
>>>>
>>>> -  <bb 12> [local count: 955630225]:
>>>> +  <bb 12> [local count: 640272252]:
>>>>      # i_15 = PHI <i_23(19), i_22(16)>
>>>>      # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
>>>>      i_22 = inc (i_15);
>>>>      if (n_7(D) > i_22)
>>>>        goto <bb 16>; [89.00%]
>>>>      else
>>>>        goto <bb 11>; [11.00%]
>>>>
>>>> -  <bb 16> [local count: 850510901]:
>>>> +  <bb 16> [local count: 569842305]:
>>>>      goto <bb 12>; [100.00%]
>>>>
>>>>    }
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
>>>> 	(do_split_loop_on_cond): Likewise.
>>>> ---
>>>>    gcc/tree-ssa-loop-split.c | 16 ++++++++--------
>>>>    1 file changed, 8 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
>>>> --- a/gcc/tree-ssa-loop-split.c
>>>> +++ b/gcc/tree-ssa-loop-split.c
>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
>>>>    	basic_block cond_bb;
>>
>>   	if (!initial_true)
>> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +
>> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>> +			   ? EDGE_SUCC (bbs[i], 0)
>> +			   : EDGE_SUCC (bbs[i], 1);
>>
>>>>    
>>>>    	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> -					   profile_probability::always (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>> +					   true_edge->probability,
>>>> +					   true_edge->probability.invert (),
>>>>    					   true);
>>>
>>> there is no 'true_edge' variable at this point.
>>
>> Sorry, missed the above hunk when split the patch.
>>
>>>
>>>>    	gcc_assert (loop2);
>>>>    
>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
>>>>      initialize_original_copy_tables ();
>>>>    
>>>>      struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
>>>> -				     profile_probability::always (),
>>>> -				     profile_probability::never (),
>>>> -				     profile_probability::always (),
>>>> -				     profile_probability::always (),
>>>> +				     invar_branch->probability.invert (),
>>>> +				     invar_branch->probability,
>>>> +				     invar_branch->probability.invert (),
>>>> +				     invar_branch->probability,
>>>>    				     true);
>>>>      if (!loop2)
>>>>        {
>>>
>>> The patch introduction seems to talk about do_split_loop_on_cond only.
>>
>> split_loop faces similar issue though it sets the two branches to 100% vs 100%
>> and no scaling which seems also incorrect.
>>
>>> Since loop versioning inserts a condition with the passed probabilities
>>> but in this case a 'boolean_true_node' condition the then and else
>>> probabilities passed look correct.  It's just the scaling arguments
>>> that look wrong?  This loop_version call should get a comment as to
>>> why we are passing probabilities the way we do.
>>
>> This optimization is transforming from:
>>
>> for (i = 0; i < n; i = inc (i))
>>      {
>>        if (ga)
>>          ga = do_something ();
>> }
>>
>> to:
>>
>>    for (i = 0; i < x; i = inc (i))
>> {
>>      if (true)
>>           ga = do_something ();
>>          if (!ga)
>>            break;
>> }
>>    for (; i < n; i = inc (i))
>> {
>>      if (false)
>>           ga = do_something ();
>> }
>>
>>
>> `boolean_true_node` is passed in to make the first loop's condition
>> statement to be 'true', after returning from loop_version, there is a
>> piece of code forcing the condition in second loop to be false,
>> and the original condition is moved from *in loop* to *exit edge*
>> between loop1 and loop2.
> 
> Yes, one complication is that we use loop_version but do not retain
> the CFG it creates.  Something like the vectorizers
> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> but then that code doesn't do any scaling yet.  But then
> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> we could write a variant that simply doesn't mangle the CFG
> with a new condition switching between both loops but keeps them
> executed after each other ...
> 
> As said, this adds to the confusion and some awkwardness.

Then loop_version in loop split requires two types of variant, one
is to insert condition to loop preheader for 'split_loops' usage, 
another is to insert condition to loop exit for 'do_split_loop_on_cond'
usage, it needs one extra function to encapsulate these cfg alterations
from outside to inside.

unswitching only execute one loop as it only moves the invariant condition
to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
may execute both loops no matter the condition is moved to the first loop's
preheader or exit.

The condition stmt in loop unswitching is invariant, but it is variant
in loop splitting, that's why loop unswitching execute only one loop
but loop splitting executes both loops.

Seems we need two condition arguments for loop_version, one for connecting
loop1 preheader to loop2 preheader, another one for connecting loop1's exit
to loop2's header?  Then it will be more generic for both unswitching pass
and splitting pass.  Looks a bit complicated and conflicted with loop_version's
comments:

/* Main entry point for Loop Versioning transformation.

   This transformation given a condition and a loop, creates
   -if (condition) { loop_copy1 } else { loop_copy2 }, ... */


And this only works for loop split usage, those many other places
doesn't use loop_version like this?

grep "loop_version (" * -r
qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
gcc/cfgloopmanip.c:loop_version (class loop *loop,
gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,  

> 
>> @fxue may have inputs about this since he contributed it years ago.
>>
>>>
>>> It does seem that scaling the loop by the invar_branch probability
>>> is correct.  Since this does similar things to unswitching, I see
>>> that unswitching does
>>>
>>> prob_true = edge_true->probability;
>>> loop_version (loop, unshare_expr (cond),
>>>                          NULL, prob_true,
>>>                          prob_true.invert (),
>>>                          prob_true, prob_true.invert (),
>>>                          false);
>>>
>>> which maybe suggests that your invar_branch based passing should
>>> depend on 'true_invar'?
>>>
>>> Also compared to unswitching the first loop is always entered,
>>> so I wonder if the scaling is correct with respect to that
>>> given unswitching where only ever one loop is entered?
>>
>>
>> Scaling is based on the probability calculated on "if (ga)", if it is
>> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
>> and loop2 to 67% suppose the branch probability is correct enough?
>>
>> unswitch also scaled the two loops based on prob_true, if prob_true
>> is 50%, it decreases X's count to HALF unexpectedly since it should be
>> executed on both branches?  while loop split still kept execute both first
>> loop and second loop, it seems even more accurate than loop unswitching?
> 
> I just was saying that both doing exactly the same looks wrong (on
> either side).
> 
>> tree-ssa-loop-unswitch.c:
>>
>>     while (A)
>>       {
>>         if (inv)
>>           B;
>>
>>         X;
>>
>>         if (!inv)
>> 	 C;
>>       }
>>
>>     where inv is the loop invariant, into
>>
>>     if (inv)
>>       {
>>         while (A)
>> 	 {
>>             B;
>> 	   X;
>> 	 }
>>       }
>>     else
>>       {
>>         while (A)
>> 	 {
>> 	   X;
>> 	   C;
>> 	 }
>>       }
> 
> Yes, here scaling based on the if (inv) probability looks obviously
> 100% correct to me.  But I might be wrong.  And thus the
> splitting case must be still off (so I conclude).  Hmm, in fact
> I think for the loop unswitching case the scaling of the B and
> C blocks is off?

B, C and X are all scaled in tree-ssa-loop-unswitch.c:

  prob_true = edge_true->probability;
  return loop_version (loop, unshare_expr (cond),
		       NULL, prob_true,
		       prob_true.invert (),
		       prob_true, prob_true.invert (),
		       false);

>  Those should be considered always executed.
> Note the loop unswitching pass is altering the conditions
> condition to static true/false but I don't think it performs
> further adjustments.

loop unswitching calls tree_unswitch_single_loop recursively, 
firstly, it calls loop_version to generate nloop, then it calls
itself again for nloop and loop with simplify_using_entry_checks
to cut their false branches respectively.

  /* Invoke itself on modified loops.  */
  tree_unswitch_single_loop (nloop, num + 1);
  tree_unswitch_single_loop (loop, num + 1);

> 
> That said, likely the profile update cannot be done uniformly
> for all blocks of a loop?
> 
> Richard.
> 

-- 
Thanks,
Xionghu

  parent reply	other threads:[~2021-08-11  8:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03  8:58 Xionghu Luo
2021-08-04  2:42 ` Xionghu Luo
2021-08-06 11:46 ` Richard Biener
2021-08-09  2:37   ` Xionghu Luo
2021-08-10 14:47     ` Richard Biener
2021-08-11  3:03       ` Feng Xue OS
2021-10-26 13:05         ` Jan Hubicka
2021-10-27  1:42           ` Xionghu Luo
2021-08-11  8:32       ` Xionghu Luo [this message]
2021-08-11  9:16         ` Richard Biener
2021-08-12  3:24           ` Xionghu Luo
2021-09-22  8:40           ` Xionghu Luo
2021-09-23 12:17             ` Richard Biener
2021-10-15  5:51               ` Xionghu Luo
2021-10-21  8:43                 ` Xionghu Luo
2021-10-21 10:55                   ` Richard Biener
2021-10-26  5:40                     ` Xionghu Luo
2021-10-26 11:59                       ` Richard Biener
2021-10-26 12:19                         ` Jan Hubicka
2021-08-09  4:33   ` Feng Xue OS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a54fa5a6-e5f4-8358-9a00-a7a70ff7cbfc@linux.ibm.com \
    --to=luoxhu@linux.ibm.com \
    --cc=fxue@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=hubicka@ucw.cz \
    --cc=linkw@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).