From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=HjHX=EH=suse.de=rguenther@sourceware.org>
Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d])
	by sourceware.org (Postfix) with ESMTPS id DAA103858D28
	for <gcc-patches@gcc.gnu.org>; Tue, 22 Aug 2023 07:53:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DAA103858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
	by smtp-out2.suse.de (Postfix) with ESMTP id 742881F85D;
	Tue, 22 Aug 2023 07:53:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa;
	t=1692690817; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Ob3xTc8L+O487AeFrO05RF5bxdnUiatx5QKIt9IWCyc=;
	b=VVfDkznJpmNGoEpgrGiclbUTtscrER2ggkoU2JAUdJJSxStROoaCd0lA0MkTrTkK9nK5ja
	7OcXD7mPoNajz1bGP5PItJezNDJnXYPb8r/zxxtucjUxXe47JHb/iZefgLd6lCU9v7fQZX
	Ry257y+ZGCKLtOkTNBDh7nDer0JnXmc=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
	s=susede2_ed25519; t=1692690817;
	h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Ob3xTc8L+O487AeFrO05RF5bxdnUiatx5QKIt9IWCyc=;
	b=FR+fenSMhv+jeLilih7vrCqd+xR6SLfBmVqIymzbf3mBUPUGSgiDwAw9l4S4Jk1ZnEZcH4
	YgitbWuWF+2qw2Dg==
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by relay2.suse.de (Postfix) with ESMTPS id 69C2F2C143;
	Tue, 22 Aug 2023 07:53:37 +0000 (UTC)
Date: Tue, 22 Aug 2023 07:53:37 +0000 (UTC)
From: Richard Biener <rguenther@suse.de>
To: Hongtao Liu <crazylht@gmail.com>
cc: Jan Hubicka <hubicka@ucw.cz>, gcc-patches@gcc.gnu.org
Subject: Re: Loop-ch improvements, part 3
In-Reply-To: <CAMZc-bxpkOmDttrXG200g4jb1aRs6pMOhZrcnGRmgdE6yrc3_g@mail.gmail.com>
Message-ID: <nycvar.YFH.7.77.849.2308220745030.12935@jbgna.fhfr.qr>
References: <ZLE+G4KvU7loJtji@kam.mff.cuni.cz> <nycvar.YFH.7.77.849.2307170917330.4723@jbgna.fhfr.qr> <CAMZc-bxpkOmDttrXG200g4jb1aRs6pMOhZrcnGRmgdE6yrc3_g@mail.gmail.com>
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, 22 Aug 2023, Hongtao Liu wrote:

> On Mon, Jul 17, 2023 at 5:18?PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > On Fri, 14 Jul 2023, Jan Hubicka wrote:
> >
> > > Hi,
> > > loop-ch currently does analysis using ranger for all loops to identify
> > > candidates and then follows by phase where headers are duplicated (which
> > > breaks SSA and ranger).  The second stage does more analysis (to see how
> > > many BBs we want to duplicate) but can't use ranger and thus misses
> > > information about static conditionals.
> > >
> > > This patch pushes all analysis into the first stage. We record how many
> > > BBs to duplicate and the second stage just duplicats as it is told so.
> > > This makes it possible to also extend range query done also to basic
> > > blocks that are not headers.  This is easy to do, since we already do
> > > path specific query so we only need to extend the path by headers we
> > > decided to dulicate earlier.
> > >
> > > This makes it possible to track situations where exit that is always
> > > false in the first iteration for tests not in the original loop header.
> > > Doing so lets us to update profile better and do better heuristics.  In
> > > particular I changed logic as follows
> > >   1) should_duplicate_loop_header_p counts size of duplicated region.  When we
> > >      know that a given conditional will be constant true or constant false either
> > >      in the duplicated region, by range query, or in the loop body after
> > >      duplication (since it is loop invariant), we do not account it to code size
> > >      costs
> > >   2) don't need account loop invariant compuations that will be duplicated
> > >      as they will become fully invariant
> > >      (maybe we want to have some cap for register pressure eventually?)
> > >   3) optimize_size logic is now different.  Originally we started duplicating
> > >      iff the first conditional was known to be true by ranger query, but then
> > >      we used same limits as for -O2.
> > >
> > >      I now simply lower limits to 0. This means that every conditional
> > >      in duplicated sequence must be either loop invariant or constant when
> > >      duplicated and we only duplicate statements computing loop invariants
> > >      and those we account to 0 size anyway,
> > >
> > > This makes code IMO more streamlined (and hopefully will let us to merge
> > > ibts with loop peeling logic), but makes little difference in practice.
> > > The problem is that in loop:
> > >
> > > void test2();
> > > void test(int n)
> > > {
> > >   for (int i = 0; n && i < 10; i++)
> > >         test2();
> > > }
> > >
> > > We produce:
> > >   <bb 4> [local count: 1073741824 freq: 9.090909]:
> > >   # i_4 = PHI <0(2), i_9(3)>
> > >   _1 = n_7(D) != 0;
> > >   _2 = i_4 <= 9;
> > >   _3 = _1 & _2;
> > >   if (_3 != 0)
> > >     goto <bb 3>; [89.00%]
> > >   else
> > >     goto <bb 5>; [11.00%]
> > >
> > > and do not understand that the final conditional is a combination of a conditional
> > > that is always true in first iteration and a conditional that is loop invariant.
> > >
> > > This is also the case of
> > > void test2();
> > > void test(int n)
> > > {
> > >   for (int i = 0; n; i++)
> > >     {
> > >       if (i > 10)
> > >         break;
> > >       test2();
> > >     }
> > > }
> > > Which we turn to the earlier case in ifcombine.
> > >
> > > With disabled ifcombine things however works as exepcted.  This is something
> > > I plan to handle incrementally.  However extending loop-ch and peeling passes
> > > to understand such combined conditionals is still not good enough: at the time ifcombine
> > > merged the two conditionals we lost profile information on how often n is 0,
> > > so we can't recover correct profile or know what is expected number of iterations
> > > after the transofrm.
> This regressed
> FAIL: gcc.target/i386/pr93089-2.c scan-assembler vmulps[^\n\r]*zmm
> FAIL: gcc.target/i386/pr93089-3.c scan-assembler vmulps[^\n\r]*zmm
> The testcase is quite simple, not sure why it's regressed.
> 
>  1/* PR target/93089 */
>  2/* { dg-do compile } */
>  3/* { dg-options "-O2 -fopenmp-simd -mtune=znver1" } */
>  4/* { dg-final { scan-assembler "vmulps\[^\n\r]*zmm" } } */
>  5/* { dg-final { scan-assembler "vmulps\[^\n\r]*ymm" } } */
>  6
>  7#pragma omp declare simd notinbranch
>  8float
>  9foo (float x, float y)
> 10{
> 11  return x * y;
> 12}

We seem to peel one iteration for no good reason.  The loop is
a do-while loop already.  The key is we see the first iteration
exit condition is known not taken and then:

 Registering value_relation (path_oracle) (iter.24_6 > iter.24_5) (root: 
bb2)
    Stmt is static (constant in the first iteration)
  Analyzing: if (iter.24_6 != 16)
 Registering killing_def (path_oracle) iter.24_6
 Registering value_relation (path_oracle) (iter.24_6 > iter.24_5) (root: 
bb2)
    Will eliminate peeled conditional in bb 3.
    Duplicating bb 3 is a win; it has zero cost
  Not duplicating bb 5: it is single succ.
Copying headers of loop 1
    Will duplicate bb 3
Duplicating header of the loop 1 up to edge 3->4
Loop 1 is do-while loop
Loop 1 is now do-while loop.
Exit count: 0 (estimated locally)
Entry count: 10631108 (estimated locally)
Peeled all exits: decreased number of iterations of loop 1 by 1.

and that's because of

  /* If the static exit fully optimize out, it is win to "duplicate"
     it.

     TODO: Even if duplication costs some size we may opt to do so in case
     exit probability is significant enough (do partial peeling).  */
  if (static_exit)
    return code_size_cost ? ch_possible_zero_cost : ch_win;

IMHO we're over aggressively apply early peeling here.  That holds
generally, not only for OMP simd loops (which we could identify).

Why are we doing this game for single-block do-while loops?

Richard.


> > >
> > > Bootstrapped/regtested x86_64-linux, OK?
> >
> > OK.
> >
> > Thanks,
> > Richard.
> >
> > > Honza
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > >       * tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready
> > >       for queries not in headers.
> > >       (static_loop_exit): Add basic blck parameter; update use of
> > >       edge_range_query
> > >       (should_duplicate_loop_header_p): Add ranger and static_exits
> > >       parameter.  Do not account statements that will be optimized
> > >       out after duplicaiton in overall size. Add ranger query to
> > >       find static exits.
> > >       (update_profile_after_ch):  Take static_exits has set instead of
> > >       single eliminated_edge.
> > >       (ch_base::copy_headers): Do all analysis in the first pass;
> > >       remember invariant_exits and static_exits.
> > >
> > > diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> > > index 24e7fbc805a..e0139cb432c 100644
> > > --- a/gcc/tree-ssa-loop-ch.cc
> > > +++ b/gcc/tree-ssa-loop-ch.cc
> > > @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3.  If not see
> > >     the range of the solved conditional in R.  */
> > >
> > >  static void
> > > -edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> > > +edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger &ranger)
> > >  {
> > > -  auto_vec<basic_block> path (2);
> > > -  path.safe_push (e->dest);
> > > -  path.safe_push (e->src);
> > > +  auto_vec<basic_block, 8> path;
> > > +  for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = single_pred_edge (bb)->src)
> > > +    path.safe_push (bb);
> > > +  path.safe_push (loop->header);
> > > +  path.safe_push (loop_preheader_edge (loop)->src);
> > >    path_range_query query (ranger, path);
> > >    if (!query.range_of_stmt (r, cond))
> > >      r.set_varying (boolean_type_node);
> > > @@ -63,17 +65,16 @@ edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> > >     and NULL otherwise.  */
> > >
> > >  static edge
> > > -static_loop_exit (class loop *l, gimple_ranger *ranger)
> > > +static_loop_exit (class loop *l, basic_block bb, gimple_ranger *ranger)
> > >  {
> > > -  edge e = loop_preheader_edge (l);
> > > -  gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (e->dest));
> > > +  gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (bb));
> > >    edge ret_e;
> > >
> > >    if (!last)
> > >      return NULL;
> > >
> > >    edge true_e, false_e;
> > > -  extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> > > +  extract_true_false_edges_from_block (bb, &true_e, &false_e);
> > >
> > >    /* If neither edge is the exit edge, this is not a case we'd like to
> > >       special-case.  */
> > > @@ -93,7 +94,7 @@ static_loop_exit (class loop *l, gimple_ranger *ranger)
> > >    }
> > >
> > >    int_range<2> r;
> > > -  edge_range_query (r, e, last, *ranger);
> > > +  edge_range_query (r, l, last, *ranger);
> > >    return r == desired_static_range ? ret_e : NULL;
> > >  }
> > >
> > > @@ -131,8 +132,10 @@ loop_iv_derived_p (class loop *loop,
> > >
> > >  static bool
> > >  should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > > +                             gimple_ranger *ranger,
> > >                               int *limit,
> > > -                             hash_set <edge> *invariant_exits)
> > > +                             hash_set <edge> *invariant_exits,
> > > +                             hash_set <edge> *static_exits)
> > >  {
> > >    gimple_stmt_iterator bsi;
> > >
> > > @@ -209,6 +212,9 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > >        if (is_gimple_debug (last))
> > >       continue;
> > >
> > > +      if (gimple_code (last) == GIMPLE_COND)
> > > +     break;
> > > +
> > >        if (gimple_code (last) == GIMPLE_CALL
> > >         && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
> > >             /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
> > > @@ -222,16 +228,6 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > >         return false;
> > >       }
> > >
> > > -      *limit -= estimate_num_insns (last, &eni_size_weights);
> > > -      if (*limit < 0)
> > > -     {
> > > -       if (dump_file && (dump_flags & TDF_DETAILS))
> > > -         fprintf (dump_file,
> > > -                  "  Not duplicating bb %i contains too many insns.\n",
> > > -                  header->index);
> > > -       return false;
> > > -     }
> > > -
> > >        /* Classify the stmt based on whether its computation is based
> > >           on a IV or whether it is invariant in the loop.  */
> > >        gimple_set_uid (last, 0);
> > > @@ -257,9 +253,36 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > >                 inv = false;
> > >             }
> > >         gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0));
> > > +       /* Loop invariants will be optimized out in loop body after
> > > +          duplication; do not account invariant computation in code
> > > +          size costs.  */
> > > +       if (inv)
> > > +         continue;
> > > +     }
> > > +
> > > +      *limit -= estimate_num_insns (last, &eni_size_weights);
> > > +      if (*limit < 0)
> > > +     {
> > > +       if (dump_file && (dump_flags & TDF_DETAILS))
> > > +         fprintf (dump_file,
> > > +                  "  Not duplicating bb %i contains too many insns.\n",
> > > +                  header->index);
> > > +       return false;
> > >       }
> > >      }
> > >
> > > +  edge static_exit = static_loop_exit (loop, header, ranger);
> > > +
> > > +  if (static_exit && static_exits)
> > > +    {
> > > +      static_exits->add (static_exit);
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +     fprintf (dump_file,
> > > +              "    Will eliminate peeled conditional in bb %d.\n",
> > > +              static_exit->src->index);
> > > +      /* Still look for invariant exits; exit may be both.  */
> > > +    }
> > > +
> > >    /* If the condition tests a non-IV loop variant we do not want to rotate
> > >       the loop further.  Unless this is the original loop header.  */
> > >    tree lhs = gimple_cond_lhs (last);
> > > @@ -284,12 +307,28 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
> > >       }
> > >        return true;
> > >      }
> > > +
> > > +  if (static_exit)
> > > +    return true;
> > > +
> > > +  /* We was not able to prove that conditional will be eliminated.  */
> > > +  *limit -= estimate_num_insns (last, &eni_size_weights);
> > > +  if (*limit < 0)
> > > +    {
> > > +      if (dump_file && (dump_flags & TDF_DETAILS))
> > > +     fprintf (dump_file,
> > > +              "  Not duplicating bb %i contains too many insns.\n",
> > > +              header->index);
> > > +      return false;
> > > +    }
> > > +
> > >    /* TODO: This is heuristics that claims that IV based ocnditionals will
> > >       likely be optimized out in duplicated header.  We could use ranger
> > >       query instead to tell this more precisely.  */
> > > -  if (header != loop->header
> > > -      && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs))
> > > -       || (!rhs_invariant && !loop_iv_derived_p (loop, rhs))))
> > > +  if ((lhs_invariant || loop_iv_derived_p (loop, lhs))
> > > +      && (rhs_invariant || loop_iv_derived_p (loop, rhs)))
> > > +    return true;
> > > +  if (header != loop->header)
> > >      {
> > >        if (dump_file && (dump_flags & TDF_DETAILS))
> > >       fprintf (dump_file,
> > > @@ -386,8 +425,9 @@ do_while_loop_p (class loop *loop)
> > >  static void
> > >  update_profile_after_ch (class loop *loop,
> > >                        basic_block *region, basic_block *region_copy,
> > > -                      unsigned n_region, edge eliminated_edge,
> > > +                      unsigned n_region,
> > >                        hash_set <edge> *invariant_exits,
> > > +                      hash_set <edge> *static_exits,
> > >                        profile_count entry_count)
> > >  {
> > >    for (unsigned int i = 0; i < n_region; i++)
> > > @@ -421,10 +461,13 @@ update_profile_after_ch (class loop *loop,
> > >                     && e_copy->dest == region_copy[i + 1]));
> > >        region_copy[i]->count = entry_count;
> > >        profile_count exit_e_count = exit_e->count ();
> > > -      if (eliminated_edge == exit_e)
> > > +      bool was_static = false;
> > > +      if (static_exits->contains (exit_e))
> > >       {
> > >         /* Update profile and the conditional.
> > >            CFG update is done by caller.  */
> > > +       static_exits->remove (exit_e);
> > > +       was_static = true;
> > >         e_copy->probability = profile_probability::always ();
> > >         exit_e_copy->probability = profile_probability::never ();
> > >         gcond *cond_stmt
> > > @@ -437,7 +480,6 @@ update_profile_after_ch (class loop *loop,
> > >         /* Header copying is a special case of jump threading, so use
> > >            common code to update loop body exit condition.  */
> > >         update_bb_profile_for_threading (region[i], entry_count, e);
> > > -       eliminated_edge = NULL;
> > >       }
> > >        else
> > >       region[i]->count -= region_copy[i]->count;
> > > @@ -448,7 +490,7 @@ update_profile_after_ch (class loop *loop,
> > >            loop, so increase probability accordingly.
> > >            If the edge is eliminated_edge we already corrected
> > >            profile above.  */
> > > -       if (entry_count.nonzero_p () && eliminated_edge != exit_e)
> > > +       if (entry_count.nonzero_p () && !was_static)
> > >           set_edge_probability_and_rescale_others
> > >                   (exit_e_copy, exit_e_count.probability_in
> > >                                                       (entry_count));
> > > @@ -465,7 +507,7 @@ update_profile_after_ch (class loop *loop,
> > >        entry_count = e_copy->count ();
> > >      }
> > >    /* Be sure that we seen all edges we are supposed to update.  */
> > > -  gcc_checking_assert (!eliminated_edge
> > > +  gcc_checking_assert (static_exits->is_empty ()
> > >                      && invariant_exits->is_empty ());
> > >  }
> > >
> > > @@ -564,10 +606,7 @@ protected:
> > >  unsigned int
> > >  ch_base::copy_headers (function *fun)
> > >  {
> > > -  basic_block header;
> > > -  edge exit, nonexit, entry;
> > >    basic_block *bbs, *copied_bbs;
> > > -  unsigned n_bbs;
> > >    unsigned bbs_size;
> > >    bool changed = false;
> > >
> > > @@ -578,7 +617,12 @@ ch_base::copy_headers (function *fun)
> > >    copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
> > >    bbs_size = n_basic_blocks_for_fn (fun);
> > >
> > > -  struct candidate {class loop *loop; edge static_exit;};
> > > +  struct candidate
> > > +    {
> > > +      class loop *loop;
> > > +      unsigned int nheaders;
> > > +      hash_set <edge> *invariant_exits, *static_exits;
> > > +    };
> > >    auto_vec<struct candidate> candidates;
> > >    auto_vec<std::pair<edge, loop_p> > copied;
> > >    auto_vec<class loop *> loops_to_unloop;
> > > @@ -588,13 +632,14 @@ ch_base::copy_headers (function *fun)
> > >    gimple_ranger *ranger = new gimple_ranger;
> > >    for (auto loop : loops_list (cfun, 0))
> > >      {
> > > -      int initial_limit = param_max_loop_header_insns;
> > > +      int initial_limit = optimize_loop_for_speed_p (loop)
> > > +                       ? param_max_loop_header_insns : 0;
> > >        int remaining_limit = initial_limit;
> > >        if (dump_file && (dump_flags & TDF_DETAILS))
> > >       fprintf (dump_file,
> > >                "Analyzing loop %i\n", loop->num);
> > >
> > > -      header = loop->header;
> > > +      basic_block header = loop->header;
> > >        if (!get_max_loop_iterations_int (loop))
> > >       {
> > >         if (dump_file && (dump_flags & TDF_DETAILS))
> > > @@ -613,24 +658,38 @@ ch_base::copy_headers (function *fun)
> > >         || !process_loop_p (loop))
> > >       continue;
> > >
> > > -      edge static_exit = static_loop_exit (loop, ranger);
> > > -
> > > -      /* Avoid loop header copying when optimizing for size unless we can
> > > -      determine that the loop condition is static in the first
> > > -      iteration.  */
> > > -      if (optimize_loop_for_size_p (loop)
> > > -       && !loop->force_vectorize
> > > -       && !static_exit)
> > > +      /* Iterate the header copying up to limit; this takes care of the cases
> > > +      like while (a && b) {...}, where we want to have both of the conditions
> > > +      copied.  TODO -- handle while (a || b) - like cases, by not requiring
> > > +      the header to have just a single successor and copying up to
> > > +      postdominator.  */
> > > +      int nheaders = 0;
> > > +      hash_set <edge> *invariant_exits = new hash_set <edge>;
> > > +      hash_set <edge> *static_exits = new hash_set <edge>;
> > > +      while (should_duplicate_loop_header_p (header, loop, ranger,
> > > +                                          &remaining_limit,
> > > +                                          invariant_exits,
> > > +                                          static_exits))
> > >       {
> > > +       nheaders++;
> > >         if (dump_file && (dump_flags & TDF_DETAILS))
> > > -         fprintf (dump_file,
> > > -                  "  Not duplicating bb %i: optimizing for size.\n",
> > > -                  header->index);
> > > -       continue;
> > > +         fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
> > > +
> > > +       /* Find a successor of header that is inside a loop; i.e. the new
> > > +          header after the condition is copied.  */
> > > +       if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> > > +         header = EDGE_SUCC (header, 0)->dest;
> > > +       else
> > > +         header = EDGE_SUCC (header, 1)->dest;
> > >       }
> > >
> > > -      if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL))
> > > -     candidates.safe_push ({loop, static_exit});
> > > +      if (nheaders)
> > > +     candidates.safe_push ({loop, nheaders, invariant_exits, static_exits});
> > > +      else
> > > +     {
> > > +       delete invariant_exits;
> > > +       delete static_exits;
> > > +     }
> > >      }
> > >    /* Do not use ranger after we change the IL and not have updated SSA.  */
> > >    delete ranger;
> > > @@ -638,37 +697,25 @@ ch_base::copy_headers (function *fun)
> > >    for (auto candidate : candidates)
> > >      {
> > >        class loop *loop = candidate.loop;
> > > -      int initial_limit = param_max_loop_header_insns;
> > > -      int remaining_limit = initial_limit;
> > >        if (dump_file && (dump_flags & TDF_DETAILS))
> > >       fprintf (dump_file,
> > >                "Copying headers of loop %i\n", loop->num);
> > >
> > > -      header = loop->header;
> > > -
> > > -      /* Iterate the header copying up to limit; this takes care of the cases
> > > -      like while (a && b) {...}, where we want to have both of the conditions
> > > -      copied.  TODO -- handle while (a || b) - like cases, by not requiring
> > > -      the header to have just a single successor and copying up to
> > > -      postdominator.  */
> > > -
> > > -      nonexit = NULL;
> > > -      n_bbs = 0;
> > > +      basic_block header = loop->header;
> > > +      edge nonexit = NULL;
> > > +      edge exit = NULL;
> > > +      unsigned int n_bbs = 0;
> > >        int nexits = 0;
> > >        profile_count exit_count = profile_count::zero ();
> > >        profile_count entry_count = profile_count::zero ();
> > >        edge e;
> > >        edge_iterator ei;
> > > -      hash_set <edge> invariant_exits;
> > > +
> > >        FOR_EACH_EDGE (e, ei, loop->header->preds)
> > >       if (e->src != loop->latch)
> > >         entry_count += e->count ();
> > > -      while (should_duplicate_loop_header_p (header, loop, &remaining_limit,
> > > -                                          &invariant_exits))
> > > +      while (n_bbs < candidate.nheaders)
> > >       {
> > > -       if (dump_file && (dump_flags & TDF_DETAILS))
> > > -         fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
> > > -
> > >         /* Find a successor of header that is inside a loop; i.e. the new
> > >            header after the condition is copied.  */
> > >         if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> > > @@ -688,21 +735,10 @@ ch_base::copy_headers (function *fun)
> > >         header = nonexit->dest;
> > >       }
> > >
> > > -      if (!nonexit)
> > > -     continue;
> > > -
> > >        if (dump_file && (dump_flags & TDF_DETAILS))
> > > -     {
> > > -       fprintf (dump_file,
> > > -                "Duplicating header of the loop %d up to edge %d->%d,"
> > > -                " %i insns.\n",
> > > -                loop->num, exit->src->index, exit->dest->index,
> > > -                initial_limit - remaining_limit);
> > > -       if (candidate.static_exit)
> > > -         fprintf (dump_file,
> > > -                  "  Will eliminate peeled conditional in bb %d.\n",
> > > -                  candidate.static_exit->src->index);
> > > -     }
> > > +     fprintf (dump_file,
> > > +              "Duplicating header of the loop %d up to edge %d->%d\n",
> > > +              loop->num, exit->src->index, exit->dest->index);
> > >
> > >        /* Ensure that the header will have just the latch as a predecessor
> > >        inside the loop.  */
> > > @@ -712,20 +748,25 @@ ch_base::copy_headers (function *fun)
> > >         exit = single_pred_edge (header);
> > >       }
> > >
> > > -      entry = loop_preheader_edge (loop);
> > > +      edge entry = loop_preheader_edge (loop);
> > >
> > >        propagate_threaded_block_debug_into (exit->dest, entry->dest);
> > >        if (!gimple_duplicate_seme_region (entry, exit, bbs, n_bbs, copied_bbs,
> > >                                        true))
> > >       {
> > > +       delete candidate.static_exits;
> > > +       delete candidate.invariant_exits;
> > >         if (dump_file && (dump_flags & TDF_DETAILS))
> > >           fprintf (dump_file, "Duplication failed.\n");
> > >         continue;
> > >       }
> > >        if (loop->header->count.initialized_p ())
> > >       update_profile_after_ch (loop, bbs, copied_bbs, n_bbs,
> > > -                              candidate.static_exit, &invariant_exits,
> > > +                              candidate.invariant_exits,
> > > +                              candidate.static_exits,
> > >                                entry_count);
> > > +      delete candidate.static_exits;
> > > +      delete candidate.invariant_exits;
> > >        copied.safe_push (std::make_pair (entry, loop));
> > >
> > >        /* If the loop has the form "for (i = j; i < j + 10; i++)" then
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> > Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> > HRB 36809 (AG Nuernberg)
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)