From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 732D6385F033;
 Wed, 11 Aug 2021 09:16:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 732D6385F033
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id 6CD082216A;
 Wed, 11 Aug 2021 09:16:41 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 14E45A3BA7;
 Wed, 11 Aug 2021 09:16:41 +0000 (UTC)
Date: Wed, 11 Aug 2021 11:16:40 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: Xionghu Luo <luoxhu@linux.ibm.com>
cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, 
 fxue@os.amperecomputing.com, wschmidt@linux.ibm.com, 
 guojiufu@linux.ibm.com, linkw@gcc.gnu.org, hubicka@ucw.cz
Subject: Re: [PATCH] Fix loop split incorrect count and probability
In-Reply-To: <a54fa5a6-e5f4-8358-9a00-a7a70ff7cbfc@linux.ibm.com>
Message-ID: <nycvar.YFH.7.76.2108111108290.11781@zhemvz.fhfr.qr>
References: <20210803085813.2217766-1-luoxhu@linux.ibm.com>
 <nycvar.YFH.7.76.2108061338260.11781@zhemvz.fhfr.qr>
 <58f9e351-fc10-406e-5272-270ece73be71@linux.ibm.com>
 <nycvar.YFH.7.76.2108101634250.11781@zhemvz.fhfr.qr>
 <a54fa5a6-e5f4-8358-9a00-a7a70ff7cbfc@linux.ibm.com>
User-Agent: Alpine 2.21 (LSU 202 2017-01-01)
MIME-Version: 1.0
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8BIT
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 11 Aug 2021 09:16:53 -0000

On Wed, 11 Aug 2021, Xionghu Luo wrote:

> 
> 
> On 2021/8/10 22:47, Richard Biener wrote:
> > On Mon, 9 Aug 2021, Xionghu Luo wrote:
> > 
> >> Thanks,
> >>
> >> On 2021/8/6 19:46, Richard Biener wrote:
> >>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>
> >>>> loop split condition is moved between loop1 and loop2, the split bb's
> >>>> count and probability should also be duplicated instead of (100% vs INV),
> >>>> secondly, the original loop1 and loop2 count need be propotional from the
> >>>> original loop.
> >>>>
> >>>>
> >>>> diff base/loop-cond-split-1.c.151t.lsplit  patched/loop-cond-split-1.c.151t.lsplit:
> >>>> ...
> >>>>      int prephitmp_16;
> >>>>      int prephitmp_25;
> >>>>
> >>>>      <bb 2> [local count: 118111600]:
> >>>>      if (n_7(D) > 0)
> >>>>        goto <bb 4>; [89.00%]
> >>>>      else
> >>>>        goto <bb 3>; [11.00%]
> >>>>
> >>>>      <bb 3> [local count: 118111600]:
> >>>>      return;
> >>>>
> >>>>      <bb 4> [local count: 105119324]:
> >>>>      pretmp_3 = ga;
> >>>>
> >>>> -  <bb 5> [local count: 955630225]:
> >>>> +  <bb 5> [local count: 315357973]:
> >>>>      # i_13 = PHI <i_10(20), 0(4)>
> >>>>      # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>      if (prephitmp_12 != 0)
> >>>>        goto <bb 6>; [33.00%]
> >>>>      else
> >>>>        goto <bb 7>; [67.00%]
> >>>>
> >>>> -  <bb 6> [local count: 315357972]:
> >>>> +  <bb 6> [local count: 104068130]:
> >>>>      _2 = do_something ();
> >>>>      ga = _2;
> >>>>
> >>>> -  <bb 7> [local count: 955630225]:
> >>>> +  <bb 7> [local count: 315357973]:
> >>>>      # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>      i_10 = inc (i_13);
> >>>>      if (n_7(D) > i_10)
> >>>>        goto <bb 21>; [89.00%]
> >>>>      else
> >>>>        goto <bb 11>; [11.00%]
> >>>>
> >>>>      <bb 11> [local count: 105119324]:
> >>>>      goto <bb 3>; [100.00%]
> >>>>
> >>>> -  <bb 21> [local count: 850510901]:
> >>>> +  <bb 21> [local count: 280668596]:
> >>>>      if (prephitmp_12 != 0)
> >>>> -    goto <bb 20>; [100.00%]
> >>>> +    goto <bb 20>; [33.00%]
> >>>>      else
> >>>> -    goto <bb 19>; [INV]
> >>>> +    goto <bb 19>; [67.00%]
> >>>>
> >>>> -  <bb 20> [local count: 850510901]:
> >>>> +  <bb 20> [local count: 280668596]:
> >>>>      goto <bb 5>; [100.00%]
> >>>>
> >>>> -  <bb 19> [count: 0]:
> >>>> +  <bb 19> [local count: 70429947]:
> >>>>      # i_23 = PHI <i_10(21)>
> >>>>      # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>
> >>>> -  <bb 12> [local count: 955630225]:
> >>>> +  <bb 12> [local count: 640272252]:
> >>>>      # i_15 = PHI <i_23(19), i_22(16)>
> >>>>      # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>      i_22 = inc (i_15);
> >>>>      if (n_7(D) > i_22)
> >>>>        goto <bb 16>; [89.00%]
> >>>>      else
> >>>>        goto <bb 11>; [11.00%]
> >>>>
> >>>> -  <bb 16> [local count: 850510901]:
> >>>> +  <bb 16> [local count: 569842305]:
> >>>>      goto <bb 12>; [100.00%]
> >>>>
> >>>>    }
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>> 	* tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>> 	(do_split_loop_on_cond): Likewise.
> >>>> ---
> >>>>    gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>    1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>> --- a/gcc/tree-ssa-loop-split.c
> >>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>    	basic_block cond_bb;
> >>
> >>   	if (!initial_true)
> >> -	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >> +	  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >> +
> >> +	edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >> +			   ? EDGE_SUCC (bbs[i], 0)
> >> +			   : EDGE_SUCC (bbs[i], 1);
> >>
> >>>>    
> >>>>    	class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> -					   profile_probability::always (),
> >>>> +					   true_edge->probability,
> >>>> +					   true_edge->probability.invert (),
> >>>> +					   true_edge->probability,
> >>>> +					   true_edge->probability.invert (),
> >>>>    					   true);
> >>>
> >>> there is no 'true_edge' variable at this point.
> >>
> >> Sorry, missed the above hunk when split the patch.
> >>
> >>>
> >>>>    	gcc_assert (loop2);
> >>>>    
> >>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop *loop1, edge invar_branch)
> >>>>      initialize_original_copy_tables ();
> >>>>    
> >>>>      struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> >>>> -				     profile_probability::always (),
> >>>> -				     profile_probability::never (),
> >>>> -				     profile_probability::always (),
> >>>> -				     profile_probability::always (),
> >>>> +				     invar_branch->probability.invert (),
> >>>> +				     invar_branch->probability,
> >>>> +				     invar_branch->probability.invert (),
> >>>> +				     invar_branch->probability,
> >>>>    				     true);
> >>>>      if (!loop2)
> >>>>        {
> >>>
> >>> The patch introduction seems to talk about do_split_loop_on_cond only.
> >>
> >> split_loop faces similar issue though it sets the two branches to 100% vs 100%
> >> and no scaling which seems also incorrect.
> >>
> >>> Since loop versioning inserts a condition with the passed probabilities
> >>> but in this case a 'boolean_true_node' condition the then and else
> >>> probabilities passed look correct.  It's just the scaling arguments
> >>> that look wrong?  This loop_version call should get a comment as to
> >>> why we are passing probabilities the way we do.
> >>
> >> This optimization is transforming from:
> >>
> >> for (i = 0; i < n; i = inc (i))
> >>      {
> >>        if (ga)
> >>          ga = do_something ();
> >> }
> >>
> >> to:
> >>
> >>    for (i = 0; i < x; i = inc (i))
> >> {
> >>      if (true)
> >>           ga = do_something ();
> >>          if (!ga)
> >>            break;
> >> }
> >>    for (; i < n; i = inc (i))
> >> {
> >>      if (false)
> >>           ga = do_something ();
> >> }
> >>
> >>
> >> `boolean_true_node` is passed in to make the first loop's condition
> >> statement to be 'true', after returning from loop_version, there is a
> >> piece of code forcing the condition in second loop to be false,
> >> and the original condition is moved from *in loop* to *exit edge*
> >> between loop1 and loop2.
> > 
> > Yes, one complication is that we use loop_version but do not retain
> > the CFG it creates.  Something like the vectorizers
> > slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> > but then that code doesn't do any scaling yet.  But then
> > loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> > we could write a variant that simply doesn't mangle the CFG
> > with a new condition switching between both loops but keeps them
> > executed after each other ...
> > 
> > As said, this adds to the confusion and some awkwardness.
> 
> Then loop_version in loop split requires two types of variant, one
> is to insert condition to loop preheader for 'split_loops' usage, 
> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> usage, it needs one extra function to encapsulate these cfg alterations
> from outside to inside.
> 
> unswitching only execute one loop as it only moves the invariant condition
> to first loop's pre-header.  While 'split_loops' and 'do_split_loop_on_cond'
> may execute both loops no matter the condition is moved to the first loop's
> preheader or exit.
> 
> The condition stmt in loop unswitching is invariant, but it is variant
> in loop splitting, that's why loop unswitching execute only one loop
> but loop splitting executes both loops.
> 
> Seems we need two condition arguments for loop_version, one for connecting
> loop1 preheader to loop2 preheader, another one for connecting loop1's exit
> to loop2's header?  Then it will be more generic for both unswitching pass
> and splitting pass.  Looks a bit complicated and conflicted with loop_version's
> comments:
> 
> /* Main entry point for Loop Versioning transformation.
> 
>    This transformation given a condition and a loop, creates
>    -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> 
> 
> And this only works for loop split usage, those many other places
> doesn't use loop_version like this?

Yes, as said if you don't want the above CFG then you probably
shouldn't use loop_version but instead use its building blocks
(and some refactoring of loop_version can make that easier).

I think splitting wants

  loop_copy1
  if (condition)
    loop_copy2

IMHO it would be good to split 'loopify' into the actual loopification
and the scaling.  Basically make the part of loop_version that
copies the loop on the header edge and creates a loop structure for
it separate.

loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
(copy_loop_before).

> grep "loop_version (" * -r
> qgcc/tree-parloops.c:      loop_version (loop, many_iterations_cond, NULL,
> gcc/tree-vect-loop-manip.c:      nloop = loop_version (loop_to_version, cond_expr, &condition_bb,
> gcc/tree-loop-distribution.c:  nloop = loop_version (loop, lhs, &cond_bb, prob, prob.invert (),
> gcc/tree-if-conv.c:  new_loop = loop_version (loop, cond, &cond_bb,
> gcc/tree-ssa-loop-manip.c:  new_loop = loop_version (loop, enter_main_cond, NULL, prob_entry,
> gcc/cfgloopmanip.c:loop_version (class loop *loop,
> gcc/tree-ssa-loop-split.c:      class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> gcc/tree-ssa-loop-split.c:   loop2 is duplicated using loop_version (), which corresponds to the part
> gcc/tree-ssa-loop-split.c:  struct loop *loop2 = loop_version (loop1, boolean_true_node, NULL,
> gcc/tree-ssa-loop-unswitch.c:  return loop_version (loop, unshare_expr (cond),
> gcc/modulo-sched.c:           loop_version (loop, comp_rtx, &condition_bb,
> gcc/cfgloopmanip.h:class loop * loop_version (class loop *, void *,
> gcc/gimple-loop-versioning.cc:  li.optimized_loop = loop_version (loop, cond, &cond_bb,  
> 
> > 
> >> @fxue may have inputs about this since he contributed it years ago.
> >>
> >>>
> >>> It does seem that scaling the loop by the invar_branch probability
> >>> is correct.  Since this does similar things to unswitching, I see
> >>> that unswitching does
> >>>
> >>> prob_true = edge_true->probability;
> >>> loop_version (loop, unshare_expr (cond),
> >>>                          NULL, prob_true,
> >>>                          prob_true.invert (),
> >>>                          prob_true, prob_true.invert (),
> >>>                          false);
> >>>
> >>> which maybe suggests that your invar_branch based passing should
> >>> depend on 'true_invar'?
> >>>
> >>> Also compared to unswitching the first loop is always entered,
> >>> so I wonder if the scaling is correct with respect to that
> >>> given unswitching where only ever one loop is entered?
> >>
> >>
> >> Scaling is based on the probability calculated on "if (ga)", if it is
> >> 33% vs 67% before split, then it is reasonable to scale loop1 to 33%
> >> and loop2 to 67% suppose the branch probability is correct enough?
> >>
> >> unswitch also scaled the two loops based on prob_true, if prob_true
> >> is 50%, it decreases X's count to HALF unexpectedly since it should be
> >> executed on both branches?  while loop split still kept execute both first
> >> loop and second loop, it seems even more accurate than loop unswitching?
> > 
> > I just was saying that both doing exactly the same looks wrong (on
> > either side).
> > 
> >> tree-ssa-loop-unswitch.c:
> >>
> >>     while (A)
> >>       {
> >>         if (inv)
> >>           B;
> >>
> >>         X;
> >>
> >>         if (!inv)
> >> 	 C;
> >>       }
> >>
> >>     where inv is the loop invariant, into
> >>
> >>     if (inv)
> >>       {
> >>         while (A)
> >> 	 {
> >>             B;
> >> 	   X;
> >> 	 }
> >>       }
> >>     else
> >>       {
> >>         while (A)
> >> 	 {
> >> 	   X;
> >> 	   C;
> >> 	 }
> >>       }
> > 
> > Yes, here scaling based on the if (inv) probability looks obviously
> > 100% correct to me.  But I might be wrong.  And thus the
> > splitting case must be still off (so I conclude).  Hmm, in fact
> > I think for the loop unswitching case the scaling of the B and
> > C blocks is off?
> 
> B, C and X are all scaled in tree-ssa-loop-unswitch.c:
> 
>   prob_true = edge_true->probability;
>   return loop_version (loop, unshare_expr (cond),
> 		       NULL, prob_true,
> 		       prob_true.invert (),
> 		       prob_true, prob_true.invert (),
> 		       false);
> 
> >  Those should be considered always executed.
> > Note the loop unswitching pass is altering the conditions
> > condition to static true/false but I don't think it performs
> > further adjustments.
> 
> loop unswitching calls tree_unswitch_single_loop recursively, 
> firstly, it calls loop_version to generate nloop, then it calls
> itself again for nloop and loop with simplify_using_entry_checks
> to cut their false branches respectively.
> 
>   /* Invoke itself on modified loops.  */
>   tree_unswitch_single_loop (nloop, num + 1);
>   tree_unswitch_single_loop (loop, num + 1);
> 
> > 
> > That said, likely the profile update cannot be done uniformly
> > for all blocks of a loop?
> > 
> > Richard.
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)