RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: "Li, Pan2" <pan2.li@intel.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
Date: Fri, 14 Jun 2024 15:10:49 +0200 (CEST)	[thread overview]
Message-ID: <1727s277-781s-2574-2qo8-99n97951s107@fhfr.qr> (raw)
In-Reply-To: <MW5PR11MB5908B59C9C44C57950D94F61A9C22@MW5PR11MB5908.namprd11.prod.outlook.com>

On Fri, 14 Jun 2024, Li, Pan2 wrote:

> Hi Richard,
> 
> Here is one PR related to this patch (by git bisect), details as below.
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115458
> 
> I am still trying to narrow down which change caused this failure or any hints here?

It definitely looks like a latent issue being triggered.  Either in LRA
or in how the target presents itself.

Richard.

> Pan
> 
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de> 
> Sent: Wednesday, May 15, 2024 5:39 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH] tree-optimization/114589 - remove profile based sink heuristics
> 
> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9.9999...e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, re-testing in progress.
> 
> 	PR tree-optimization/114589
> 	* tree-ssa-sink.cc (select_best_block): Remove profile-based
> 	heuristics.  Instead reject sink locations that sink
>         to post-dominators.  Move empty latch check here from
> 	statement_sink_location.  Also consider early_bb for the
> 	loop depth check.
> 	(statement_sink_location): Remove superfluous check.  Remove
> 	empty latch check.
> 	(pass_sink_code::execute): Compute/release post-dominators.
> 
> 	* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
> 	excess precision when not spilling.
> ---
>  gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
>  gcc/tree-ssa-sink.cc                     | 62 ++++++++----------------
>  2 files changed, 20 insertions(+), 43 deletions(-)
> 
> diff --git a/gcc/testsuite/gfortran.dg/streamio_9.f90 b/gcc/testsuite/gfortran.dg/streamio_9.f90
> index b6bddb973f8..f29ded6ba54 100644
> --- a/gcc/testsuite/gfortran.dg/streamio_9.f90
> +++ b/gcc/testsuite/gfortran.dg/streamio_9.f90
> @@ -1,4 +1,5 @@
>  ! { dg-do run }
> +! { dg-options "-ffloat-store" }
>  ! PR29053 Stream IO test 9.
>  ! Contributed by Jerry DeLisle <jvdelisle@gcc.gnu.org>.
>  ! Test case derived from that given in PR by Steve Kargl.
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2f90acb7ef4..2188b7523c7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>  
>     We want the most control dependent block in the shallowest loop nest.
>  
> -   If the resulting block is in a shallower loop nest, then use it.  Else
> -   only use the resulting block if it has significantly lower execution
> -   frequency than EARLY_BB to avoid gratuitous statement movement.  We
> -   consider statements with VOPS more desirable to move.
> -
> -   This pass would obviously benefit from PDO as it utilizes block
> -   frequencies.  It would also benefit from recomputing frequencies
> -   if profile data is not available since frequencies often get out
> -   of sync with reality.  */
> +   If the resulting block is in a shallower loop nest, then use it.  */
>  
>  static basic_block
>  select_best_block (basic_block early_bb,
> @@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
>  {
>    basic_block best_bb = late_bb;
>    basic_block temp_bb = late_bb;
> -  int threshold;
>  
>    while (temp_bb != early_bb)
>      {
> +      /* Walk up the dominator tree, hopefully we'll find a shallower
> +	 loop nest.  */
> +      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> +
>        /* If we've moved into a lower loop nest, then that becomes
>  	 our best block.  */
>        if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>  	best_bb = temp_bb;
> -
> -      /* Walk up the dominator tree, hopefully we'll find a shallower
> - 	 loop nest.  */
> -      temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>      }
>  
>    /* Placing a statement before a setjmp-like function would be invalid
> @@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
>      return best_bb;
>  
> +  /* Do not move stmts to post-dominating places on the same loop depth.  */
> +  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, best_bb))
> +    return early_bb;
> +
> +  /* If the latch block is empty, don't make it non-empty by sinking
> +     something into it.  */
> +  if (best_bb == early_bb->loop_father->latch
> +      && empty_block_p (best_bb))
> +    return early_bb;
> +
>    /* Avoid turning an unconditional read into a conditional one when we
>       still might want to perform vectorization.  */
>    if (best_bb->loop_father == early_bb->loop_father
> @@ -233,28 +234,7 @@ select_best_block (basic_block early_bb,
>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>      return early_bb;
>  
> -  /* Get the sinking threshold.  If the statement to be moved has memory
> -     operands, then increase the threshold by 7% as those are even more
> -     profitable to avoid, clamping at 100%.  */
> -  threshold = param_sink_frequency_threshold;
> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> -    {
> -      threshold += 7;
> -      if (threshold > 100)
> -	threshold = 100;
> -    }
> -
> -  /* If BEST_BB is at the same nesting level, then require it to have
> -     significantly lower execution frequency to avoid gratuitous movement.  */
> -  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> -	 Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> -    return best_bb;
> -
> -  /* No better block found, so return EARLY_BB, which happens to be the
> -     statement's original block.  */
> -  return early_bb;
> +  return best_bb;
>  }
>  
>  /* Given a statement (STMT) and the basic block it is currently in (FROMBB),
> @@ -452,13 +432,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>      return false;
>    
>    sinkbb = select_best_block (frombb, sinkbb, stmt);
> -  if (!sinkbb || sinkbb == frombb)
> -    return false;
> -
> -  /* If the latch block is empty, don't make it non-empty by sinking
> -     something into it.  */
> -  if (sinkbb == frombb->loop_father->latch
> -      && empty_block_p (sinkbb))
> +  if (sinkbb == frombb)
>      return false;
>  
>    *togsi = gsi_after_labels (sinkbb);
> @@ -822,6 +796,7 @@ pass_sink_code::execute (function *fun)
>    mark_dfs_back_edges (fun);
>    memset (&sink_stats, 0, sizeof (sink_stats));
>    calculate_dominance_info (CDI_DOMINATORS);
> +  calculate_dominance_info (CDI_POST_DOMINATORS);
>  
>    virtual_operand_live vop_live;
>  
> @@ -833,6 +808,7 @@ pass_sink_code::execute (function *fun)
>  
>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> +  free_dominance_info (CDI_POST_DOMINATORS);
>    remove_fake_exit_edges ();
>    loop_optimizer_finalize ();
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

next prev parent reply	other threads:[~2024-06-14 13:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15  9:38 Richard Biener
2024-05-18  3:27 ` Hans-Peter Nilsson
2024-06-14 13:08 ` Li, Pan2
2024-06-14 13:10   ` Richard Biener [this message]
2024-06-14 13:23     ` Li, Pan2

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1727s277-781s-2574-2qo8-99n97951s107@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=pan2.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).