Re: [PATCH] split loop for NE condition.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: guojiufu <guojiufu@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com,
	segher@kernel.crashing.org, dje.gcc@gmail.com, jlaw@tachyum.com
Subject: Re: [PATCH] split loop for NE condition.
Date: Fri, 07 May 2021 16:27:09 +0800	[thread overview]
Message-ID: <8335b451836fd14e60d18046ea76c06e@imap.linux.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2105061017520.9200@zhemvz.fhfr.qr>

On 2021-05-06 16:27, Richard Biener wrote:
> On Thu, 6 May 2021, guojiufu wrote:
> 
>> On 2021-05-03 20:18, Richard Biener wrote:
>> > On Thu, 29 Apr 2021, Jiufu Guo wrote:
>> >
>> >> When there is the possibility that overflow may happen on the loop index,
>> >> a few optimizations would not happen. For example code:
>> >>
>> >> foo (int *a, int *b, unsigned k, unsigned n)
>> >> {
>> >>   while (++k != n)
>> >>     a[k] = b[k]  + 1;
>> >> }
>> >>
>> >> For this code, if "l > n", overflow may happen.  if "l < n" at begining,
>> >> it could be optimized (e.g. vectorization).
>> >>
>> >> We can split the loop into two loops:
>> >>
>> >>   while (++k > n)
>> >>     a[k] = b[k]  + 1;
>> >>   while (l++ < n)
>> >>     a[k] = b[k]  + 1;
>> >>
>> >> then for the second loop, it could be optimized.
>> >>
>> >> This patch is splitting this kind of small loop to achieve better
>> >> performance.
>> >>
>> >> Bootstrap and regtest pass on ppc64le.  Is this ok for trunk?
>> >
>> > Do you have any statistics on how often this splits a loop during
>> > bootstrap (use --with-build-config=bootstrap-O3)?  Or alternatively
>> > on SPEC?
>> 
>> In SPEC2017, there are ~240 loops are split.  And I saw some 
>> performance
>> improvement on xz.
>> I would try bootstrap-O3 (encounter ICE).
Without this patch, the ICE is also there when building with 
bootstrap-O3 on ppc64le.

>> 
>> >
>> > Actual comments on the patch inline.
>> >
>> >> Thanks!
>> >>
>> >> Jiufu Guo.
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> 2021-04-29  Jiufu Guo  <guojiufu@linux.ibm.com>
>> >>
>> >>  * params.opt (max-insns-ne-cond-split): New.
>> >>  * tree-ssa-loop-split.c (connect_loop_phis): Add new param.
>> >>  (get_ne_cond_branch): New function.
>> >>  (split_ne_loop): New function.
>> >>  (split_loop_on_ne_cond): New function.
>> >>  (tree_ssa_split_loops): Use split_loop_on_ne_cond.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >> 2021-04-29  Jiufu Guo  <guojiufu@linux.ibm.com>
>> >>
>> >>  * gcc.dg/loop-split1.c: New test.
>> >>
>> >> ---
>> >>  gcc/params.opt                     |   4 +
>> >>  gcc/testsuite/gcc.dg/loop-split1.c |  28 ++++
>> >>  gcc/tree-ssa-loop-split.c          | 219
>> >> ++++++++++++++++++++++++++++-
>> >>  3 files changed, 247 insertions(+), 4 deletions(-)
>> >>  create mode 100644 gcc/testsuite/gcc.dg/loop-split1.c
>> >>
>> >> diff --git a/gcc/params.opt b/gcc/params.opt
>> >> index 2e4cbdd7a71..900b59b5136 100644
>> >> --- a/gcc/params.opt
>> >> +++ b/gcc/params.opt
>> >> @@ -766,6 +766,10 @@ Min. ratio of insns to prefetches to enable
>> >> prefetching for a loop with an unkno
>> >> Common Joined UInteger Var(param_min_loop_cond_split_prob) Init(30)
>> >> IntegerRange(0, 100) Param Optimization
>> >> The minimum threshold for probability of semi-invariant condition statement
>> >> to trigger loop split.
>> >>
>> >> +-param=max-insns-ne-cond-split=
>> >> +Common Joined UInteger Var(param_max_insn_ne_cond_split) Init(64) Param
>> >> Optimization
>> >> +The maximum threshold for insnstructions number of a loop with ne
>> >> condition to split.
>> >> +
>> >>  -param=min-nondebug-insn-uid=
>> >>  Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
>> >>  The minimum UID to be used for a nondebug insn.
>> >> diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
>> >> b/gcc/testsuite/gcc.dg/loop-split1.c
>> >> new file mode 100644
>> >> index 00000000000..4c466aa9f54
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.dg/loop-split1.c
>> >> @@ -0,0 +1,28 @@
>> >> +/* { dg-do compile } */
>> >> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
>> >> +
>> >> +void
>> >> +foo (int *a, int *b, unsigned l, unsigned n)
>> >> +{
>> >> +  while (++l != n)
>> >> +    a[l] = b[l]  + 1;
>> >> +}
>> >> +
>> >> +void
>> >> +foo1 (int *a, int *b, unsigned l, unsigned n)
>> >> +{
>> >> +  while (l++ != n)
>> >> +    a[l] = b[l]  + 1;
>> >> +}
>> >> +
>> >> +unsigned
>> >> +foo2 (char *a, char *b, unsigned l, unsigned n)
>> >> +{
>> >> +  while (++l != n)
>> >> +    if (a[l] != b[l])
>> >> +      break;
>> >> +
>> >> +  return l;
>> >> +}
>> >> +
>> >> +/* { dg-final { scan-tree-dump-times "Loop split" 3 "lsplit" } } */
>> >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
>> >> index b80b6a75e62..a6d28078e5e 100644
>> >> --- a/gcc/tree-ssa-loop-split.c
>> >> +++ b/gcc/tree-ssa-loop-split.c
>> >> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>> >>  #include "cfghooks.h"
>> >>  #include "gimple-fold.h"
>> >>  #include "gimplify-me.h"
>> >> +#include "tree-ssa-loop-ivopts.h"
>> >>
>> >>  /* This file implements two kinds of loop splitting.
>> >>
>> >> @@ -233,7 +234,8 @@ easy_exit_values (class loop *loop)
>> >>     this.  The loops need to fulfill easy_exit_values().  */
>> >>
>> >> static void
>> >> -connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e)
>> >> +connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e,
>> >> +		   bool use_prev = false)
>> >>  {
>> >>    basic_block rest = loop_preheader_edge (loop2)->src;
>> >>    gcc_assert (new_e->dest == rest);
>> >> @@ -248,13 +250,14 @@ connect_loop_phis (class loop *loop1, class loop
>> >> *loop2, edge new_e)
>> >>         !gsi_end_p (psi_first);
>> >>         gsi_next (&psi_first), gsi_next (&psi_second))
>> >>      {
>> >> -      tree init, next, new_init;
>> >> +      tree init, next, new_init, prev;
>> >>        use_operand_p op;
>> >>        gphi *phi_first = psi_first.phi ();
>> >>        gphi *phi_second = psi_second.phi ();
>> >>
>> >>        init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
>> >>        next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
>> >> +      prev = PHI_RESULT (phi_first);
>> >>        op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
>> >>        gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR
>> >> (op)));
>> >>
>> >> @@ -279,7 +282,7 @@ connect_loop_phis (class loop *loop1, class loop
>> >> *loop2, edge new_e)
>> >>
>> >>        gphi * newphi = create_phi_node (new_init, rest);
>> >>        add_phi_arg (newphi, init, skip_first, UNKNOWN_LOCATION);
>> >> -      add_phi_arg (newphi, next, new_e, UNKNOWN_LOCATION);
>> >> +      add_phi_arg (newphi, use_prev ? prev : next, new_e,
>> >> UNKNOWN_LOCATION);
>> >>        SET_USE (op, new_init);
>> >>      }
>> >> }
>> >> @@ -1599,6 +1602,213 @@ split_loop_on_cond (struct loop *loop)
>> >>    return do_split;
>> >>  }
>> >>
>> >> +/* Check if the LOOP exit branch likes "if (idx != bound)".
>> >> +   if INV is not NULL and the branch is "if (bound != idx)", set *INV to
>> >> true.
>> >
>> > If
>> Ok, I will update accordingly.
>> >
>> >> +   return the branch edge which exit loop.  */
>> >
>> > Return
>> Thanks.
>> 
>> >
>> >> +
>> >> +static edge
>> >> +get_ne_cond_branch (struct loop *loop, bool *inv)
>> >> +{
>> >> +  int i;
>> >> +  edge e;
>> >> +
>> >> +  auto_vec<edge> edges = get_loop_exit_edges (loop);
>> >> +  FOR_EACH_VEC_ELT (edges, i, e)
>> >> +    {
>> >> +      basic_block bb = e->src;
>> >> +
>> >> +      /* Check gcond.  */
>> >> +      gimple *last = last_stmt (bb);
>> >> +      if (!last || gimple_code (last) != GIMPLE_COND)
>> >> +	continue;
>> >> +      gcond *cond = as_a<gcond *> (last);
>> >> +      enum tree_code code = gimple_cond_code (cond);
>> >> +      if (code != NE_EXPR)
>> >> +	continue;
>> >
>> > I'm not sure we canonicalize the case with code == EQ_EXPR,
>> > at least for
>> >
>> > void bar();
>> > void foo(unsigned n)
>> > {
>> >   unsigned i = 0;
>> >   do
>> >     {
>> >       if (i == n)
>> >         return;
>> >       bar();
>> >       ++i;
>> >     }
>> >   while (1);
>> > }
>> >
>> > we don't.  Since you return the exit edge can this case be
>> > handled transparently?
>> 
>> Oh, this case was not handled, the patch can be enhanced to handle 
>> this kind
>> of case.
>> 
>> 
>> >
>> >> +
>> >> +      /* Make sure idx and bound.  */
>> >> +      tree idx = gimple_cond_lhs (cond);
>> >> +      tree bnd = gimple_cond_rhs (cond);
>> >> +      if (expr_invariant_in_loop_p (loop, idx))
>> >> +	{
>> >> +	  std::swap (idx, bnd);
>> >> +	  if (inv)
>> >> +	    *inv = true;
>> >> +	}
>> >> +      else if (!expr_invariant_in_loop_p (loop, bnd))
>> >> +	continue;
>> >
>> > We canonicalize i < UINT_MAX to i != UINT_MAX so you want to
>> > detect that and not split the loop if 'bnd' is the maximum
>> > or minimum value of the type I think.
>> Yeap!
>> 
>> >
>> >> +      /* Extract conversion.  */
>> >> +      if (TREE_CODE (idx) == SSA_NAME)
>> >> +	{
>> >> +	  gimple *stmt = SSA_NAME_DEF_STMT (idx);
>> >> +	  if (is_gimple_assign (stmt)
>> >> +	      && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))
>> >> +	      && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> >> +	    idx = gimple_assign_rhs1 (stmt);
>> >> +	}
>> >
>> > This skips arbitrary extensions and truncations - is that intended?
>> Yes. This could handle code like:
>> 
>> _6 = (sizetype) idx;
>> if (_6 != n)
> 
> That looks like widening, but the code above also does
> 
>   _6 = (char) idx;
>   if (_6 != n)
> 

Yes, This is also expected. Because truncation also needs to clear high 
bits.
Comparing a longer index with a shorter bound, wrap may happen.

>> 
>> >
>> >> +      /* Make sure idx is iv.  */
>> >> +      class loop *useloop = loop_containing_stmt (cond);
>> >> +      affine_iv iv;
>> >> +      if (!simple_iv (loop, useloop, idx, &iv, false))
>> >> +	continue;
>> >> +
>> >> +      /* No need to split loop, if base is know value.
>> >> +	 Or check range info.  */
>> >> +      if (TREE_CODE (iv.base) == INTEGER_CST)
>> >> +	continue;
>> >
>> > I think it would be better to check iv.no_overflow?  Also looking
>> > it might be possible to use simple_iv_with_niters with IV_NITERS
>> > not NULL for most of this analysis?
>> 
>> Yes, iv.no_overflow may better to use directly.
>> simple_iv invoke simple_iv_with_niters with IV_NITERS=null.
>> Because wrap may happen, IV_NITERS may not be calculated.
>> 
>> Thanks!
>> >
>> >> +      /* There is type conversion on idx(or rhs of idx's def).
>> >> +	 And there is converting shorter to longer type. */
>> >> +      tree type = TREE_TYPE (idx);
>> >> +      if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
>> >> +	  || !TYPE_UNSIGNED (type)
>> >> +	  || TYPE_PRECISION (type) == TYPE_PRECISION (sizetype))
>> >> +	continue;
>> >
>> > That check can be done before the (expensive) simple_iv check.
>> Ok, will update accordingly.
>> 
>> > I wonder what the 'sizetype' precision check is about?  The
>> > function level comment should probably clarify what kind of
>> > conversions we handle (and why).
>> Currently, this patch is more care about the conversions
>> which may generate ext/truck stmts.
>> I would try to remove this check, since we may split the loop
>> if there is a possible wrap/overflow.
> 
> As said, it should be documented what conversions we look through
> and why we can do that.  I know that eventually simple_iv does
> not return true on the converted IV but does on the conversion
> src - is that why you do this in the first place?

Thanks, sure.  The function comments will be updated.
The code is checking the conversions which may cause overflow/wrap.
Code would be like:

   /* Check if wrap/overflow may happen during type conversion.  */
   tree type = TREE_TYPE (idx);
   if (....)

I want to support the cases both converted iv and conversion src.

> 
> Note getting this and the "overflow" check correct is the most
> important piece of the transform since in the end we want to
> enable followup transforms on the split parts which otherwise
> run into this very problem, no?

Right, we need to check 'wrap/overflow' correctly to enable follow-up
optimizations for the split parts.  This is the basic intention of this 
patch.

> 
>> >
>> >> +      /* Check loop is simple to split.  */
>> >> +      gcc_assert (bb != loop->latch);
>> >> +
>> >> +      if (single_pred_p (loop->latch)
>> >> +	  && single_pred_edge (loop->latch)->src == bb
>> >> +	  && empty_block_p (loop->latch))
>> >> +	return e;
>> >> +
>> >> +      /* Splitting is cheap for idx increase header.  */
>> >> +      if (bb == loop->header)
>> >> +	{
>> >> +	  if (get_virtual_phi (loop->header))
>> >> +	    continue;
>> >> +
>> >> +	  /* In loop header: i++ or ++i.  */
>> >> +	  gimple_stmt_iterator gsi = gsi_start_bb (bb);
>> >> +	  if (gsi_end_p (gsi))
>> >> +	    return e;
>> >> +
>> >> +	  gimple *s1 = gsi_stmt (gsi);
>> >> +	  if (!(is_gimple_assign (s1)
>> >> +		&& (idx == gimple_assign_lhs (s1)
>> >> +		    || idx == gimple_assign_rhs1 (s1))))
>> >> +	    continue;
>> >> +
>> >> +	  gsi_next (&gsi);
>> >> +	  if (!gsi_end_p (gsi) && gsi_stmt (gsi) == cond)
>> >> +	    return e;
>> >> +	}
>> >
>> > I wonder if these "cheapness" heuristics should simply fold
>> > into the cost of the extra duplication of the header/tail
>> > in the overall stmt limit?
>> Without this heuristic, the loop can be split only if the branch to 
>> exit
>> locates at the end of the loop (just before the empty latch).
> 
> why?  Btw, to avoid code-generation differences with -g vs. -g0
> you have to skip debug insns, thus use gsi_start_nondebug_bb and
> gsi_next_nondebug
Thanks, bb would be treated as 'empty' if only debug stmts.

If the interesting "cond branch" is in the middle of the loop:


LH:
  B1
  B2
  if(X != N)
    goto LM
  else
    goto exit
LM:
  B3
  B4 (may also branch exit)
  latch
    goto LH:


When the first loop exit at "if (X > N)", B1 and B2 are already 
executed;
then after the exit of the first loop, we can not jump to the second 
split
loop header, which will re-run B1 and B2; so, we did not support these 
cases.

If the header is simple (e.g. i++,++i), we may support it without too 
much cost.

> 
>> With the heuristics, the loop can be split if the branch is at the 
>> simple
>> header.
>> And I feel the cost of small to rerun the duplicated simple header:
>> maybe just 'move' instructions or one add instruction.
>> 
>> >
>> >> +    }
>> >> +
>> >> +  return NULL;
>> >> +}
>> >> +
>> >> +/* Split the LOOP with NE_EXPR into two loops with GT_EXPR and LT_EXPR.
>> >> */
>> >> +
>> >> +static bool
>> >> +split_ne_loop (struct loop *loop, edge cond_e)
>> >> +{
>> >> +  initialize_original_copy_tables ();
>> >> +
>> >> +  struct loop *loop2 = loop_version (loop, boolean_true_node, NULL,
>> >> +				     profile_probability::always (),
>> >> +				     profile_probability::never (),
>> >> +				     profile_probability::always (),
>> >> +				     profile_probability::always (), true);
>> >> +
>> >> +  gcc_assert (loop2);
>> >> +  update_ssa (TODO_update_ssa);
>> >> +
>> >> +  free_original_copy_tables ();
>> >> +
>> >> +  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */
>> >> +  bool inv = false;
>> >> +  edge dup_cond = get_ne_cond_branch (loop2, &inv);
>> >
>> > I don't think you should rely in pattern-matching to detect the same
>> > condition in the versioned loop - instead you can use the copy
>> > tables, do
>> >
>> >     2nd_loop_exit_block = get_bb_copy (cond_e->src);
>> >
>> > to get to the block with the COND_EXPR (before free_original_copy_tables
>> > obviously).
>> >
>> Thanks! Your suggestion is great, it would save a lot of time to get
>> the new exit branch.
>> 
>> >> +  enum tree_code up_code = inv ? LT_EXPR : GT_EXPR;
>> >> +  enum tree_code down_code = inv ? GT_EXPR : LT_EXPR;
>> >> +
>> >> +  gcond *gc = as_a<gcond *> (last_stmt (cond_e->src));
>> >> +  gimple_cond_set_code (gc, up_code);
>> >> +
>> >> +  gcond *dup_gc = as_a<gcond *> (last_stmt (dup_cond->src));
>> >> +  gimple_cond_set_code (dup_gc, down_code);
>> >> +
>> >> +  /* Link the exit cond edge to new loop.  */
>> >> +  gcond *break_cond = as_a<gcond *> (gimple_copy (gc));
>> >> +  edge pred_e = single_pred_edge (loop->latch);
>> >> +  gcc_assert (pred_e);
>> >> +  bool simple_loop = pred_e->src == cond_e->src && empty_block_p
>> >> (loop->latch);
>> >> +  if (simple_loop)
>> >> +    gimple_cond_set_code (break_cond, down_code);
>> >> +  else
>> >> +    gimple_cond_make_true (break_cond);
>> >> +
>> >> +  basic_block break_bb = split_edge (cond_e);
>> >> +  gimple_stmt_iterator gsi = gsi_last_bb (break_bb);
>> >> +  gsi_insert_after (&gsi, break_cond, GSI_NEW_STMT);
>> >> +
>> >> +  edge to_exit = single_succ_edge (break_bb);
>> >> +  edge to_new_loop = make_edge (break_bb, loop_preheader_edge
>> >> (loop2)->src, 0);
>> >> +  to_new_loop->flags |= EDGE_TRUE_VALUE;
>> >> +  to_exit->flags |= EDGE_FALSE_VALUE;
>> >> +  to_exit->flags &= ~EDGE_FALLTHRU;
>> >> +  to_exit->probability = cond_e->probability;
>> >> +  to_new_loop->probability = to_exit->probability.invert ();
>> >> +
>> >> +  update_ssa (TODO_update_ssa);
>> >> +
>> >> +  connect_loop_phis (loop, loop2, to_new_loop, !simple_loop);
>> >> +
>> >> +  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop);
>> >> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> >> +    fprintf (dump_file, ";; Loop split.\n");
>> >
>> > Maybe ";; Loop split on != condition.\n"?
>> Ok!
>> >
>> >> +
>> >> +  return true;
>> >> +}
>> >> +
>> >> +/* Checks if LOOP contains a suitable NE_EXPR conditional block to split.
>> >> +L_H:
>> >> + if (i!=N)
>> >> +   S;
>> >> + i++;
>> >> + goto L_H;
>> >> +
>> >> +The "i!=N" is like "i>N || i<N", then it could be transform to:
>> >> +
>> >> +L_H:
>> >> + if (i>N)
>> >> +   S;
>> >> + i++;
>> >> + goto L_H;
>> >> +L1_H:
>> >> + if (i<N)
>> >> +   S;
>> >> + i++;
>> >> + goto L1_H;
>> >> +
>> >> +The loop with "i<N" is in favor both GIMPLE and RTL passes.  */
>> >> +
>> >> +static bool
>> >> +split_loop_on_ne_cond (class loop *loop)
>> >> +{
>> >> +  if (!can_duplicate_loop_p (loop))
>> >> +    return false;
>> >> +
>> >> +  int num = 0;
>> >> +  basic_block *bbs = get_loop_body (loop);
>> >
>> > To avoid repeated DFS walks of the loop body do the can_duplicate_loop_p
>> > check here as
>> >
>> >   if (!can_copy_bbs_p (bbs, loop->num_nodes))
>> >     {
>> >       free (bbs);
>> >       return false;
>> >     }
>> >
>> > (see split_loop)
>> >
>> Thanks! This could also save gcc runtime.
>> 
>> >> +  for (unsigned i = 0; i < loop->num_nodes; i++)
>> >> +    num += estimate_num_insns_seq (bb_seq (bbs[i]), &eni_size_weights);
>> >> +  free (bbs);
>> >
>> > So with this and the suggestion above it is maybe possible to re-use
>> > compute_added_num_insns?  That already seems to handle splitting at
>> > aribtrary branches (but maybe not loops with multiple exits?).
>> > The code using this computation uses param_max_peeled_insns - is
>> > that limit not sufficient for your case (to avoid another param?)?
>> > It's default is 100, a bit higher than yours (64).
>> 
>> This patch uses this simple code to calculate and does not re-use
>> compute_added_num_insns,  because there would be no many stmts in 
>> loops are
>> deleted.
>> 
>> Introducing a new param, because I think can independently use it to 
>> control
>> different
>> optimization part.
>> And the name of param_max_peeled_insns seems designed for other 
>> behavior:)
>> Do we prefer to reuse exit params for new optimizations?   If so, we 
>> would
>> reuse it.
> 
> It depends.  values for new params tend to fall out of the air while
> olders might have been tuned carefully.  If they limit related things
> (code growth) then re-using older might be beneficial.
> 
>> >
>> > I'd really like to see some numbers on how much this triggers since
>> > the splitting itself is O (number-of-split-loops * function-size)
>> > complexity and thus worse than quadratic as we do update_ssa for each
>> > split loop.  You can experience this by simply concating one of your
>> > testcase loops N times in a function ...
>> 
>> If I duplicate the test case loop 15 times in the function,
>> it riggers 15 ";; Loop split".
>> 
>> Concatted times, compiling time, gap
>> 4: 0m0.106s
>> 8: 0m0.186s  0.080s
>> 12: 0m0.267s 0.082s
>> 16: 0m0.350s 0.083s
>> 20: 0m0.434s 0.084s
>> 24: 0m0.522s 0.088s
>> 28: 0m0.609s 0.087s
>> 32: 0m0.705s 0.086s
>> 
>> It seems near a linear complexity.
> 
> I see, though 32 is not a lot - I would suggest to try 1000s or even
> more ;)  update_ssa time is linear in the size of the function _at 
> least_.

Thanks for your suggestion!
It is interesting: when the numbers are bigger:(e.g. more than 1000s),
it shows non-linear complexity.

1000 56.57
2000 215.59
3000 486.92
4000 869.46

Without this patch:
1000 9.61
2000 25.54
3000 49.70
4000 80.31

Thanks again!
Jiufu Guo.

> 
> Richard.
> 
>> 
>> Thanks a lot for review!
>> 
>> Jiufu Guo.
>> 
>> >
>> >> +  if (num > param_max_insn_ne_cond_split)
>> >> +    return false;
>> >> +
>> >> +  edge branch_edge = get_ne_cond_branch (loop, NULL);
>> >> +  if (branch_edge && split_ne_loop (loop, branch_edge))
>> >> +    return true;
>> >> +
>> >> +  return false;
>> >> +}
>> >> +
>> >> /* Main entry point.  Perform loop splitting on all suitable loops.  */
>> >>
>> >> static unsigned int
>> >> @@ -1628,7 +1838,8 @@ tree_ssa_split_loops (void)
>> >>         if (optimize_loop_for_size_p (loop))
>> >>   continue;
>> >>
>> >> -      if (split_loop (loop) || split_loop_on_cond (loop))
>> >> +      if (split_loop (loop) || split_loop_on_cond (loop)
>> >> +	  || split_loop_on_ne_cond (loop))
>> >>   {
>> >> 	  /* Mark our containing loop as having had some split inner loops.
>> >> */
>> >>     loop_outer (loop)->aux = loop;
>> >>
>>

next prev parent reply	other threads:[~2021-05-07  8:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  9:50 Jiufu Guo
2021-04-30 16:27 ` Jeff Law
2021-05-06  1:05   ` guojiufu
2021-04-30 21:37 ` Segher Boessenkool
2021-05-06  1:37   ` guojiufu
2021-05-03 12:18 ` Richard Biener
2021-05-06  7:57   ` guojiufu
2021-05-06  8:27     ` Richard Biener
2021-05-07  8:27       ` guojiufu [this message]
2021-05-07  9:52         ` Richard Biener
2021-05-14 14:58           ` [RFC] " guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8335b451836fd14e60d18046ea76c06e@imap.linux.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@tachyum.com \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).