Re: Loop-ch improvements, part 3

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Loop-ch improvements, part 3
Date: Mon, 17 Jul 2023 09:17:45 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2307170917330.4723@jbgna.fhfr.qr> (raw)
In-Reply-To: <ZLE+G4KvU7loJtji@kam.mff.cuni.cz>

On Fri, 14 Jul 2023, Jan Hubicka wrote:

> Hi,
> loop-ch currently does analysis using ranger for all loops to identify
> candidates and then follows by phase where headers are duplicated (which
> breaks SSA and ranger).  The second stage does more analysis (to see how
> many BBs we want to duplicate) but can't use ranger and thus misses
> information about static conditionals.
> 
> This patch pushes all analysis into the first stage. We record how many
> BBs to duplicate and the second stage just duplicats as it is told so.
> This makes it possible to also extend range query done also to basic
> blocks that are not headers.  This is easy to do, since we already do
> path specific query so we only need to extend the path by headers we
> decided to dulicate earlier.
> 
> This makes it possible to track situations where exit that is always
> false in the first iteration for tests not in the original loop header.
> Doing so lets us to update profile better and do better heuristics.  In
> particular I changed logic as follows
>   1) should_duplicate_loop_header_p counts size of duplicated region.  When we
>      know that a given conditional will be constant true or constant false either
>      in the duplicated region, by range query, or in the loop body after
>      duplication (since it is loop invariant), we do not account it to code size
>      costs
>   2) don't need account loop invariant compuations that will be duplicated
>      as they will become fully invariant
>      (maybe we want to have some cap for register pressure eventually?)
>   3) optimize_size logic is now different.  Originally we started duplicating
>      iff the first conditional was known to be true by ranger query, but then
>      we used same limits as for -O2.
> 
>      I now simply lower limits to 0. This means that every conditional
>      in duplicated sequence must be either loop invariant or constant when
>      duplicated and we only duplicate statements computing loop invariants
>      and those we account to 0 size anyway,
> 
> This makes code IMO more streamlined (and hopefully will let us to merge
> ibts with loop peeling logic), but makes little difference in practice.
> The problem is that in loop:
> 
> void test2();
> void test(int n)
> {
>   for (int i = 0; n && i < 10; i++)
> 	  test2();
> }
> 
> We produce:
>   <bb 4> [local count: 1073741824 freq: 9.090909]:
>   # i_4 = PHI <0(2), i_9(3)>
>   _1 = n_7(D) != 0;
>   _2 = i_4 <= 9;
>   _3 = _1 & _2;
>   if (_3 != 0)
>     goto <bb 3>; [89.00%]
>   else
>     goto <bb 5>; [11.00%]
> 
> and do not understand that the final conditional is a combination of a conditional
> that is always true in first iteration and a conditional that is loop invariant.
> 
> This is also the case of
> void test2();
> void test(int n)
> {
>   for (int i = 0; n; i++)
>     {
>       if (i > 10)
>         break;
>       test2();
>     }
> }
> Which we turn to the earlier case in ifcombine.
> 
> With disabled ifcombine things however works as exepcted.  This is something
> I plan to handle incrementally.  However extending loop-ch and peeling passes
> to understand such combined conditionals is still not good enough: at the time ifcombine
> merged the two conditionals we lost profile information on how often n is 0,
> so we can't recover correct profile or know what is expected number of iterations
> after the transofrm.
> 
> Bootstrapped/regtested x86_64-linux, OK?

OK.

Thanks,
Richard.

> Honza
> 
> 
> gcc/ChangeLog:
> 
> 	* tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready
> 	for queries not in headers.
> 	(static_loop_exit): Add basic blck parameter; update use of
> 	edge_range_query
> 	(should_duplicate_loop_header_p): Add ranger and static_exits
> 	parameter.  Do not account statements that will be optimized
> 	out after duplicaiton in overall size. Add ranger query to
> 	find static exits.
> 	(update_profile_after_ch):  Take static_exits has set instead of
> 	single eliminated_edge.
> 	(ch_base::copy_headers): Do all analysis in the first pass;
> 	remember invariant_exits and static_exits.
> 
> diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
> index 24e7fbc805a..e0139cb432c 100644
> --- a/gcc/tree-ssa-loop-ch.cc
> +++ b/gcc/tree-ssa-loop-ch.cc
> @@ -49,11 +49,13 @@ along with GCC; see the file COPYING3.  If not see
>     the range of the solved conditional in R.  */
>  
>  static void
> -edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
> +edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger &ranger)
>  {
> -  auto_vec<basic_block> path (2);
> -  path.safe_push (e->dest);
> -  path.safe_push (e->src);
> +  auto_vec<basic_block, 8> path;
> +  for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = single_pred_edge (bb)->src)
> +    path.safe_push (bb);
> +  path.safe_push (loop->header);
> +  path.safe_push (loop_preheader_edge (loop)->src);
>    path_range_query query (ranger, path);
>    if (!query.range_of_stmt (r, cond))
>      r.set_varying (boolean_type_node);
> @@ -63,17 +65,16 @@ edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
>     and NULL otherwise.  */
>  
>  static edge
> -static_loop_exit (class loop *l, gimple_ranger *ranger)
> +static_loop_exit (class loop *l, basic_block bb, gimple_ranger *ranger)
>  {
> -  edge e = loop_preheader_edge (l);
> -  gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (e->dest));
> +  gcond *last = safe_dyn_cast <gcond *> (*gsi_last_bb (bb));
>    edge ret_e;
>  
>    if (!last)
>      return NULL;
>  
>    edge true_e, false_e;
> -  extract_true_false_edges_from_block (e->dest, &true_e, &false_e);
> +  extract_true_false_edges_from_block (bb, &true_e, &false_e);
>  
>    /* If neither edge is the exit edge, this is not a case we'd like to
>       special-case.  */
> @@ -93,7 +94,7 @@ static_loop_exit (class loop *l, gimple_ranger *ranger)
>    }
>  
>    int_range<2> r;
> -  edge_range_query (r, e, last, *ranger);
> +  edge_range_query (r, l, last, *ranger);
>    return r == desired_static_range ? ret_e : NULL;
>  }
>  
> @@ -131,8 +132,10 @@ loop_iv_derived_p (class loop *loop,
>  
>  static bool
>  should_duplicate_loop_header_p (basic_block header, class loop *loop,
> +				gimple_ranger *ranger,
>  				int *limit,
> -				hash_set <edge> *invariant_exits)
> +				hash_set <edge> *invariant_exits,
> +				hash_set <edge> *static_exits)
>  {
>    gimple_stmt_iterator bsi;
>  
> @@ -209,6 +212,9 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>        if (is_gimple_debug (last))
>  	continue;
>  
> +      if (gimple_code (last) == GIMPLE_COND)
> +	break;
> +
>        if (gimple_code (last) == GIMPLE_CALL
>  	  && (!gimple_inexpensive_call_p (as_a <gcall *> (last))
>  	      /* IFN_LOOP_DIST_ALIAS means that inner loop is distributed
> @@ -222,16 +228,6 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>  	  return false;
>  	}
>  
> -      *limit -= estimate_num_insns (last, &eni_size_weights);
> -      if (*limit < 0)
> -	{
> -	  if (dump_file && (dump_flags & TDF_DETAILS))
> -	    fprintf (dump_file,
> -		     "  Not duplicating bb %i contains too many insns.\n",
> -		     header->index);
> -	  return false;
> -	}
> -
>        /* Classify the stmt based on whether its computation is based
>           on a IV or whether it is invariant in the loop.  */
>        gimple_set_uid (last, 0);
> @@ -257,9 +253,36 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>  		  inv = false;
>  	      }
>  	  gimple_set_uid (last, (iv ? 1 : 0) | (inv ? 2 : 0));
> +	  /* Loop invariants will be optimized out in loop body after
> +	     duplication; do not account invariant computation in code
> +	     size costs.  */
> +	  if (inv)
> +	    continue;
> +	}
> +
> +      *limit -= estimate_num_insns (last, &eni_size_weights);
> +      if (*limit < 0)
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file,
> +		     "  Not duplicating bb %i contains too many insns.\n",
> +		     header->index);
> +	  return false;
>  	}
>      }
>  
> +  edge static_exit = static_loop_exit (loop, header, ranger);
> +
> +  if (static_exit && static_exits)
> +    {
> +      static_exits->add (static_exit);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file,
> +		 "    Will eliminate peeled conditional in bb %d.\n",
> +		 static_exit->src->index);
> +      /* Still look for invariant exits; exit may be both.  */
> +    }
> +
>    /* If the condition tests a non-IV loop variant we do not want to rotate
>       the loop further.  Unless this is the original loop header.  */
>    tree lhs = gimple_cond_lhs (last);
> @@ -284,12 +307,28 @@ should_duplicate_loop_header_p (basic_block header, class loop *loop,
>  	}
>        return true;
>      }
> +
> +  if (static_exit)
> +    return true;
> +
> +  /* We was not able to prove that conditional will be eliminated.  */
> +  *limit -= estimate_num_insns (last, &eni_size_weights);
> +  if (*limit < 0)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	fprintf (dump_file,
> +		 "  Not duplicating bb %i contains too many insns.\n",
> +		 header->index);
> +      return false;
> +    }
> +
>    /* TODO: This is heuristics that claims that IV based ocnditionals will
>       likely be optimized out in duplicated header.  We could use ranger
>       query instead to tell this more precisely.  */
> -  if (header != loop->header
> -      && ((!lhs_invariant && !loop_iv_derived_p (loop, lhs))
> -	  || (!rhs_invariant && !loop_iv_derived_p (loop, rhs))))
> +  if ((lhs_invariant || loop_iv_derived_p (loop, lhs))
> +      && (rhs_invariant || loop_iv_derived_p (loop, rhs)))
> +    return true;
> +  if (header != loop->header)
>      {
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	fprintf (dump_file,
> @@ -386,8 +425,9 @@ do_while_loop_p (class loop *loop)
>  static void
>  update_profile_after_ch (class loop *loop,
>  			 basic_block *region, basic_block *region_copy,
> -			 unsigned n_region, edge eliminated_edge,
> +			 unsigned n_region,
>  			 hash_set <edge> *invariant_exits,
> +			 hash_set <edge> *static_exits,
>  			 profile_count entry_count)
>  {
>    for (unsigned int i = 0; i < n_region; i++)
> @@ -421,10 +461,13 @@ update_profile_after_ch (class loop *loop,
>  		      && e_copy->dest == region_copy[i + 1]));
>        region_copy[i]->count = entry_count;
>        profile_count exit_e_count = exit_e->count ();
> -      if (eliminated_edge == exit_e)
> +      bool was_static = false;
> +      if (static_exits->contains (exit_e))
>  	{
>  	  /* Update profile and the conditional.
>  	     CFG update is done by caller.  */
> +	  static_exits->remove (exit_e);
> +	  was_static = true;
>  	  e_copy->probability = profile_probability::always ();
>  	  exit_e_copy->probability = profile_probability::never ();
>  	  gcond *cond_stmt
> @@ -437,7 +480,6 @@ update_profile_after_ch (class loop *loop,
>  	  /* Header copying is a special case of jump threading, so use
>  	     common code to update loop body exit condition.  */
>  	  update_bb_profile_for_threading (region[i], entry_count, e);
> -	  eliminated_edge = NULL;
>  	}
>        else
>  	region[i]->count -= region_copy[i]->count;
> @@ -448,7 +490,7 @@ update_profile_after_ch (class loop *loop,
>  	     loop, so increase probability accordingly.
>  	     If the edge is eliminated_edge we already corrected
>  	     profile above.  */
> -	  if (entry_count.nonzero_p () && eliminated_edge != exit_e)
> +	  if (entry_count.nonzero_p () && !was_static)
>  	    set_edge_probability_and_rescale_others
>  		    (exit_e_copy, exit_e_count.probability_in
>  							(entry_count));
> @@ -465,7 +507,7 @@ update_profile_after_ch (class loop *loop,
>        entry_count = e_copy->count ();
>      }
>    /* Be sure that we seen all edges we are supposed to update.  */
> -  gcc_checking_assert (!eliminated_edge
> +  gcc_checking_assert (static_exits->is_empty ()
>  		       && invariant_exits->is_empty ());
>  }
>  
> @@ -564,10 +606,7 @@ protected:
>  unsigned int
>  ch_base::copy_headers (function *fun)
>  {
> -  basic_block header;
> -  edge exit, nonexit, entry;
>    basic_block *bbs, *copied_bbs;
> -  unsigned n_bbs;
>    unsigned bbs_size;
>    bool changed = false;
>  
> @@ -578,7 +617,12 @@ ch_base::copy_headers (function *fun)
>    copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
>    bbs_size = n_basic_blocks_for_fn (fun);
>  
> -  struct candidate {class loop *loop; edge static_exit;};
> +  struct candidate
> +    {
> +      class loop *loop;
> +      unsigned int nheaders;
> +      hash_set <edge> *invariant_exits, *static_exits;
> +    };
>    auto_vec<struct candidate> candidates;
>    auto_vec<std::pair<edge, loop_p> > copied;
>    auto_vec<class loop *> loops_to_unloop;
> @@ -588,13 +632,14 @@ ch_base::copy_headers (function *fun)
>    gimple_ranger *ranger = new gimple_ranger;
>    for (auto loop : loops_list (cfun, 0))
>      {
> -      int initial_limit = param_max_loop_header_insns;
> +      int initial_limit = optimize_loop_for_speed_p (loop)
> +			  ? param_max_loop_header_insns : 0;
>        int remaining_limit = initial_limit;
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	fprintf (dump_file,
>  		 "Analyzing loop %i\n", loop->num);
>  
> -      header = loop->header;
> +      basic_block header = loop->header;
>        if (!get_max_loop_iterations_int (loop))
>  	{
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -613,24 +658,38 @@ ch_base::copy_headers (function *fun)
>  	  || !process_loop_p (loop))
>  	continue;
>  
> -      edge static_exit = static_loop_exit (loop, ranger);
> -
> -      /* Avoid loop header copying when optimizing for size unless we can
> -	 determine that the loop condition is static in the first
> -	 iteration.  */
> -      if (optimize_loop_for_size_p (loop)
> -	  && !loop->force_vectorize
> -	  && !static_exit)
> +      /* Iterate the header copying up to limit; this takes care of the cases
> +	 like while (a && b) {...}, where we want to have both of the conditions
> +	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
> +	 the header to have just a single successor and copying up to
> +	 postdominator.  */
> +      int nheaders = 0;
> +      hash_set <edge> *invariant_exits = new hash_set <edge>;
> +      hash_set <edge> *static_exits = new hash_set <edge>;
> +      while (should_duplicate_loop_header_p (header, loop, ranger,
> +					     &remaining_limit,
> +					     invariant_exits,
> +					     static_exits))
>  	{
> +	  nheaders++;
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
> -	    fprintf (dump_file,
> -		     "  Not duplicating bb %i: optimizing for size.\n",
> -		     header->index);
> -	  continue;
> +	    fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
> +
> +	  /* Find a successor of header that is inside a loop; i.e. the new
> +	     header after the condition is copied.  */
> +	  if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> +	    header = EDGE_SUCC (header, 0)->dest;
> +	  else
> +	    header = EDGE_SUCC (header, 1)->dest;
>  	}
>  
> -      if (should_duplicate_loop_header_p (header, loop, &remaining_limit, NULL))
> -	candidates.safe_push ({loop, static_exit});
> +      if (nheaders)
> +	candidates.safe_push ({loop, nheaders, invariant_exits, static_exits});
> +      else
> +	{
> +	  delete invariant_exits;
> +	  delete static_exits;
> +	}
>      }
>    /* Do not use ranger after we change the IL and not have updated SSA.  */
>    delete ranger;
> @@ -638,37 +697,25 @@ ch_base::copy_headers (function *fun)
>    for (auto candidate : candidates)
>      {
>        class loop *loop = candidate.loop;
> -      int initial_limit = param_max_loop_header_insns;
> -      int remaining_limit = initial_limit;
>        if (dump_file && (dump_flags & TDF_DETAILS))
>  	fprintf (dump_file,
>  		 "Copying headers of loop %i\n", loop->num);
>  
> -      header = loop->header;
> -
> -      /* Iterate the header copying up to limit; this takes care of the cases
> -	 like while (a && b) {...}, where we want to have both of the conditions
> -	 copied.  TODO -- handle while (a || b) - like cases, by not requiring
> -	 the header to have just a single successor and copying up to
> -	 postdominator.  */
> -
> -      nonexit = NULL;
> -      n_bbs = 0;
> +      basic_block header = loop->header;
> +      edge nonexit = NULL;
> +      edge exit = NULL;
> +      unsigned int n_bbs = 0;
>        int nexits = 0;
>        profile_count exit_count = profile_count::zero ();
>        profile_count entry_count = profile_count::zero ();
>        edge e;
>        edge_iterator ei;
> -      hash_set <edge> invariant_exits;
> +
>        FOR_EACH_EDGE (e, ei, loop->header->preds)
>  	if (e->src != loop->latch)
>  	  entry_count += e->count ();
> -      while (should_duplicate_loop_header_p (header, loop, &remaining_limit,
> -					     &invariant_exits))
> +      while (n_bbs < candidate.nheaders)
>  	{
> -	  if (dump_file && (dump_flags & TDF_DETAILS))
> -	    fprintf (dump_file, "    Will duplicate bb %i\n", header->index);
> -
>  	  /* Find a successor of header that is inside a loop; i.e. the new
>  	     header after the condition is copied.  */
>  	  if (flow_bb_inside_loop_p (loop, EDGE_SUCC (header, 0)->dest))
> @@ -688,21 +735,10 @@ ch_base::copy_headers (function *fun)
>  	  header = nonexit->dest;
>  	}
>  
> -      if (!nonexit)
> -	continue;
> -
>        if (dump_file && (dump_flags & TDF_DETAILS))
> -	{
> -	  fprintf (dump_file,
> -		   "Duplicating header of the loop %d up to edge %d->%d,"
> -		   " %i insns.\n",
> -		   loop->num, exit->src->index, exit->dest->index,
> -		   initial_limit - remaining_limit);
> -	  if (candidate.static_exit)
> -	    fprintf (dump_file,
> -		     "  Will eliminate peeled conditional in bb %d.\n",
> -		     candidate.static_exit->src->index);
> -	}
> +	fprintf (dump_file,
> +		 "Duplicating header of the loop %d up to edge %d->%d\n",
> +		 loop->num, exit->src->index, exit->dest->index);
>  
>        /* Ensure that the header will have just the latch as a predecessor
>  	 inside the loop.  */
> @@ -712,20 +748,25 @@ ch_base::copy_headers (function *fun)
>  	  exit = single_pred_edge (header);
>  	}
>  
> -      entry = loop_preheader_edge (loop);
> +      edge entry = loop_preheader_edge (loop);
>  
>        propagate_threaded_block_debug_into (exit->dest, entry->dest);
>        if (!gimple_duplicate_seme_region (entry, exit, bbs, n_bbs, copied_bbs,
>  					 true))
>  	{
> +	  delete candidate.static_exits;
> +	  delete candidate.invariant_exits;
>  	  if (dump_file && (dump_flags & TDF_DETAILS))
>  	    fprintf (dump_file, "Duplication failed.\n");
>  	  continue;
>  	}
>        if (loop->header->count.initialized_p ())
>  	update_profile_after_ch (loop, bbs, copied_bbs, n_bbs,
> -				 candidate.static_exit, &invariant_exits,
> +				 candidate.invariant_exits,
> +				 candidate.static_exits,
>  				 entry_count);
> +      delete candidate.static_exits;
> +      delete candidate.invariant_exits;
>        copied.safe_push (std::make_pair (entry, loop));
>  
>        /* If the loop has the form "for (i = j; i < j + 10; i++)" then
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2023-07-17  9:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14 12:22 Jan Hubicka
2023-07-17  9:17 ` Richard Biener [this message]
2023-08-22  5:24   ` Hongtao Liu
2023-08-22  7:53     ` Richard Biener
2023-08-22 12:03       ` Jan Hubicka
2023-08-23  8:34       ` Jan Hubicka
2023-08-23  8:58         ` Richard Biener
2023-07-20  7:09 loop-ch " Jan Hubicka
2023-07-20 13:03 ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2307170917330.4723@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).